WO2012142975A1 - 会场终端音频信号处理方法及会场终端和视讯会议系统 - Google Patents

会场终端音频信号处理方法及会场终端和视讯会议系统 Download PDF

Info

Publication number
WO2012142975A1
WO2012142975A1 PCT/CN2012/074534 CN2012074534W WO2012142975A1 WO 2012142975 A1 WO2012142975 A1 WO 2012142975A1 CN 2012074534 W CN2012074534 W CN 2012074534W WO 2012142975 A1 WO2012142975 A1 WO 2012142975A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
audio signal
terminal
movable
pickup device
Prior art date
Application number
PCT/CN2012/074534
Other languages
English (en)
French (fr)
Inventor
赵云轩
Original Assignee
华为终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为终端有限公司 filed Critical 华为终端有限公司
Publication of WO2012142975A1 publication Critical patent/WO2012142975A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a conference terminal audio signal processing method, a conference site terminal, and a video conference system.
  • the current video conferencing system generally includes: a conference terminal and a conference server (the conference server in FIG. 1 takes an example of a multipoint control unit (MCU)).
  • MCU multipoint control unit
  • each site has at least one site terminal, and each site terminal collects the sounds, images, and codes of the respective sites and sends them to the MCU.
  • the multi-point control unit processes sounds and images in a certain manner (for example, sound mixing, image forwarding, or processing of multiple frames), and sends the processed sounds and images to other venue terminals in the video conference, each The site terminal decodes the sound and image of the remote site to achieve remote video communication.
  • the video conferencing system generally uses a fixed microphone or the like as an audio pickup device, and one or more microphones are fixed on the desktop or the ceiling to pick up the voice of the speaker.
  • a removable audio pickup device may also be used (eg no Line microphones, etc.) as a compensation for fixed audio pickup devices.
  • Embodiments of the present invention provide a method for processing an audio signal of a conference terminal, a conference terminal, and a video conference system, so as to implement sound image matching in a deployment scenario of the movable audio pickup device.
  • the present invention provides the following technical solutions:
  • a video conferencing system comprising:
  • the first site terminal and the second site terminal, the first site terminal and the second site terminal are connected through a network; the site where the first site terminal is located is deployed with a movable audio pickup device and an image capturing device;
  • the first site terminal is configured to receive an audio signal picked up by the movable audio pickup device, and acquire a direction of the current movable audio pickup device relative to the first site terminal; receive the image.
  • An image signal captured by the photographing device for a region where the movable audio pickup device is currently located; generating a multi-channel audio signal corresponding to the audio signal, the multi-channel being at least two channels; according to the movable audio Adjusting, by the picking device, a delay, a phase, and/or a signal strength of at least one of the multi-channel audio signals relative to a direction of the first venue terminal to enable the adjusted multi-channel audio And a direction of sound presented when the signal is played is matched with a direction of the movable audio pickup device currently relative to the first venue terminal; transmitting the image signal and the adjusted multi-channel audio signal;
  • the second site terminal is configured to receive an image signal from the first venue terminal and an adjusted multi-channel audio signal; and play the image signal and the adjusted multi-channel audio signal.
  • a method for processing audio signal of a conference terminal terminal comprising:
  • the venue terminal receives the audio signal picked up by the movable audio pickup device, and acquires the movable The direction of the current audio pickup device relative to the venue terminal;
  • a venue terminal including:
  • a receiving determining unit configured to receive an audio signal picked up by the movable audio pick-up device, and obtain a direction of the movable audio pick-up device relative to the venue terminal;
  • An adjusting unit configured to generate a multi-channel audio signal corresponding to the audio signal; and adjust at least one channel audio signal of the multi-channel audio signal according to a direction of the movable audio pick-up device currently relative to the venue terminal Delay, phase and/or signal strength such that the direction of sound presented by the adjusted multi-channel audio signal is matched to the direction of the currently movable audio pickup device relative to the venue terminal;
  • a sending unit configured to send the multi-channel audio signal adjusted by the adjusting unit.
  • a video conferencing system comprising:
  • a third site terminal, a fourth site terminal, and a conference server wherein the third site terminal and the fourth site terminal are connected to the conference server through a network, and the site where the third site terminal is located is deployed.
  • Mobile audio pickup device and image capturing device
  • the third site terminal is configured to receive an audio signal picked up by the movable audio pickup device, and acquire a direction of the movable audio pickup device relative to the third site terminal; and receive the image capturing device for the An image signal captured by the area in which the audio pickup device is currently located; generating direction indication information indicating a direction of the sound presented when the audio signal is played, according to a direction of the current movable audio pickup device relative to the third venue terminal, The sound direction to be presented when the audio signal is indicated by the direction indication information matches the current direction of the movable audio pickup device relative to the third venue terminal; sending the image signal and sound Frequency signal and direction indication information;
  • the conference server is configured to receive an image signal, an audio signal, and direction indication information sent by the third conference terminal; generate a multi-channel audio signal corresponding to the audio signal, where the multi-channel is at least two channels Adjusting a delay, a phase, and/or a signal strength of at least one of the multi-channel audio signals according to the direction indication information, so that the sound of the adjusted multi-channel audio signal is played
  • the direction is matched with a direction of the movable audio pickup device currently relative to the third venue terminal; transmitting the image signal and the adjusted multi-channel audio signal;
  • the fourth site terminal is configured to receive an image signal sent by the conference server and the adjusted multi-channel audio signal; and play the image signal and the adjusted multi-channel audio signal.
  • a video conferencing system comprising:
  • the fifth site terminal and the sixth site terminal are connected by a network;
  • the site where the fifth site terminal is located is deployed with a movable audio pickup device and an image capturing device;
  • a fifth site terminal configured to receive an audio signal picked up by the movable audio pickup device, and obtain a direction of the current movable audio pickup device relative to the fifth site terminal;
  • the receiving image capturing device is currently located for the movable audio picking device An image signal captured by the area; according to a direction of the current audio pickup device relative to the fifth venue terminal, direction indication information for indicating a direction of the sound presented when the audio signal is played, wherein the direction indication information is generated And indicating the direction of the sound presented when the audio signal is played is matched with the current direction of the movable audio pick-up device relative to the fifth venue terminal; transmitting the image signal, the audio signal, and the direction indication information;
  • a sixth conference terminal configured to receive an image signal, an audio signal, and direction indication information corresponding to the audio signal from the fifth conference terminal; play the image signal and play the audio signal according to the direction indication information.
  • a method for processing audio signal of a conference terminal terminal comprising:
  • Direction indicating information indicating a direction of the sound presented when the audio signal is played, wherein the direction of the sound to be presented when the audio signal is indicated by the direction indicating information is opposite to that of the movable audio picking device
  • the direction of the terminal of the venue is matched;
  • the audio signal and direction indication information are transmitted.
  • a venue terminal including:
  • a receiving determining unit configured to receive an audio signal picked up by the movable audio pick-up device, and obtain a direction of the movable audio pick-up device relative to the venue terminal;
  • a generating unit configured to generate direction indication information for indicating a direction of a sound to be presented when the audio signal is played, according to a direction of the current movable audio pickup device relative to the venue terminal, where the direction indication information The direction of the sound to be presented when the audio signal is displayed is matched with the direction of the current terminal of the movable audio pickup device; the sending unit is configured to send the audio signal and the direction indication information.
  • a conference server including:
  • a second receiving unit configured to receive an image signal, an audio signal, and direction indication information sent by the conference terminal, where the audio signal is picked up by the movable audio pickup device, where the direction indication information is currently according to the movable audio pickup device Generating with respect to the direction of the venue terminal, indicating a direction of sound to be presented when the audio signal is played;
  • a second adjusting unit configured to generate a multi-channel audio signal corresponding to the audio signal, where the multi-channel includes at least two channels; and adjust at least one channel audio of the multi-channel audio signal according to the direction indication information
  • the delay, phase, and/or signal strength of the signal such that the direction of the sound presented by the adjusted multi-channel audio signal matches the current direction of the movable audio pickup device relative to the venue terminal;
  • a second sending unit configured to send the image signal and the multi-channel audio signal adjusted by the second adjusting unit.
  • the venue terminal receives the audio signal picked up by the movable audio pickup device, and acquires a current direction of the movable audio pickup device relative to the venue terminal; and receives the image capturing device.
  • the sound direction presented is matched with the current direction of the movable audio pickup device relative to the venue terminal; the image signal and the adjusted multi-channel audio signal are transmitted.
  • the venue terminal adjusts the delay, phase and/or signal strength of at least one channel audio signal of the multi-channel audio signal, so that the adjusted multi-channel audio signal plays a sound direction Matching with the direction of the mobile audio pick-up device relative to the venue terminal, which is played by other venue terminals after receiving the image signal and the adjusted multi-channel audio signal, and capable of playing with the sound image matching effect.
  • the image signal and the adjusted audio signal lay the foundation, which is beneficial to realize the "listening and discriminating" function in the video conferencing system deploying the movable audio pickup device scene.
  • the venue terminal receives the audio signal picked up by the movable audio pickup device, and obtains a direction of the movable audio pickup device relative to the venue terminal; according to the current movable audio pickup device, the current relative to the conference site Direction of the terminal, generating direction indication information indicating a direction of sound presented when the audio signal is played; transmitting the audio signal and direction indication information.
  • the direction of the sound to be presented when the audio signal is played matches the direction of the mobile audio pickup device currently relative to the venue terminal; this is the conference server or other
  • the site terminal adjusts or plays the audio signal according to the direction indication information, thereby laying a foundation for playing the audio signal and the corresponding image signal by the effect of the sound image matching, that is, It is beneficial to realize the "listening and discriminating" function in the scene of the mobile audio pickup device deployed in the video conferencing system.
  • FIG. 1 is a schematic diagram of a video conferencing system of the prior art
  • FIG. 2 is a schematic diagram of a voice phase generation process in a video conference system according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a video conference system according to an embodiment of the present disclosure
  • FIG. 4 is a schematic flow chart of a method for processing an audio signal of a conference terminal according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram of a modular audio signal processing according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of another modular audio signal processing according to an embodiment of the present invention
  • FIG. 8-a is a schematic diagram of another video conferencing system according to an embodiment of the present invention
  • FIG. 8 is a schematic diagram of a conference server according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of still another video conference system according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a conference terminal according to an embodiment of the present disclosure.
  • FIG. 11 is a schematic diagram of another venue terminal according to an embodiment of the present invention.
  • Embodiments of the present invention provide a method for processing an audio signal of a conference terminal, a conference terminal, and a video conference system, so as to implement sound image matching in a deployment scenario of the movable audio pickup device.
  • the direction of the sound refers to the sounding direction of the sounding object in the sound field, that is, the direction of the sound source relative to the receiving end (the receiving end may be a device such as a person or a venue terminal), for example, left or right.
  • the human ear determines the position of the sound by the time difference and the sound level difference between the sound signals picked up by the two ears. This is the so-called “binaural effect”.
  • the so-called “listening to the sound” refers to the use of the direction information of the sound to identify the position of the speaker.
  • the generation process of the sound direction in the video conference system is described by taking two channels as an example.
  • “microphone_left”, “microphone_right” have the same characteristics and are placed in the same orientation, and “speaker_left”, “speaker_left” have the same characteristics, the volume control is consistent, and both are placed toward the “listening position”.
  • “sounding position A” is spoken, compared with “microphone_right”, since "microphone_left” is closer to the speaker, the sound it picks up is larger and the delay is smaller, respectively.
  • speaker _ left “, speaker _ right” is played, since the left channel sound is loud and the playing time is earlier, the listener will feel the sound coming out from the left direction, and the sound has direction information.
  • Sound image matching that is, matching between sound and image, means that the direction of the played sound matches the display orientation of the sound source in the image.
  • the video conferencing system in addition to the sound information, you can also see the image of the opposite end of the video communication with the site. If the peer speaker displayed in the display on the site side is at the left position of the image, the sound needs to be left. The player plays it out. If the speaker is on the right side of the image, the sound needs to be played from the right side so that the sound matches the image.
  • the removable audio pickup device in the embodiment of the present invention may be, for example, a mobile audio pickup device such as a wireless microphone or a long-line microphone.
  • the position of the removable audio pickup device may continue to move as the speaker holding the removable audio pickup device moves.
  • the embodiment of the present invention seeks to provide a solution for solving the sound image matching problem in the scenario of deploying a removable audio pickup device, so as to implement the "listening" function in the scenario of deploying a portable audio pickup device.
  • a video conference system may include: a first conference terminal 310 and a second conference terminal 320.
  • the first site terminal 310 and the second site terminal 320 may be connected through a communication network.
  • the site where the first site terminal is located is deployed with a removable audio pickup device and an image capturing device.
  • the communication network is not shown in FIG. Mobile audio pickup device And image capture equipment, etc.
  • the first site terminal 310 is configured to receive an audio signal picked up by the movable audio pickup device, and acquire a direction of the current movable audio pickup device relative to the first site terminal 310; and receive the image capturing device for the movable An image signal captured by an area in which the audio pickup device is currently located; generating a multi-channel audio signal corresponding to the audio signal (the multi-channel is at least two channels); according to the movable audio pickup device currently relative to the first venue terminal a direction of 310, adjusting a delay, a phase, and/or a signal strength of at least one of the multi-channel audio signals to cause a sound direction of the adjusted multi-channel audio signal to be played
  • the movable audio pickup device is currently matched with respect to the direction of the first venue terminal 310; the image signal and the adjusted multi-channel audio signal are transmitted.
  • the first site terminal 310 and other site terminals can negotiate the number of channels of the conference during the conference establishment process, and the number of channels of the multi-channel audio signal generated by the first venue terminal 310 and the second through negotiation.
  • the number of channels supported by the venue terminal 320 is equal.
  • the first venue terminal 310 can obtain the direction of the currently available mobile audio pickup device relative to the first venue terminal 310 in a variety of manners.
  • the first venue terminal 310 is used as an absolute reference frame to represent the direction.
  • the first venue terminal 310 can also acquire the movable audio pickup device currently relative to other reference objects (for example, a conference screen, an image capturing device). Or the orientation of the reference device and the first venue terminal 310, which is equivalent to obtaining the current direction of the movable audio pickup device relative to the first venue terminal 310.
  • the first venue terminal 310 can also obtain the current location of the removable audio pickup device.
  • the current direction of the removable audio pickup device relative to the first venue terminal 310 can be achieved in several ways:
  • the first venue terminal 310 receives the audio signal picked up by the movable audio pickup device, and determines the direction of the movable audio pickup device relative to the first venue terminal 310 by image recognition technology (the direction is, for example, relative to the first The venue terminal 310 is to the left, center or right, etc.);
  • the first venue terminal 310 can receive the audio signal picked up by the movable audio pickup device through the at least two receiving modules; the difference of the audio signals received by the at least two receiving modules (the The difference may include: receiving, by the receiving module, at least one of a time difference, a phase difference, and an intensity difference of the audio signal), determining a direction of the current movable audio pickup device relative to the first venue terminal 310;
  • the first venue terminal 310 receives the audio signal picked up by the movable audio pickup device, and receives the location identification information sent by the removable audio pickup device (the location identification information can be used to identify the current state of the removable audio pickup device) Any information of the orientation); determining the direction of the currently movable audio pickup device relative to the first venue terminal 310 by the location identification information.
  • the first site terminal 310 receives the location identification information of the removable audio pickup device, and determines the direction of the movable audio pickup device relative to the first site terminal 310 by using the location identification information. For example, the following implementation manner is as follows:
  • the first venue terminal 310 may also adopt other manners to obtain the current direction of the movable audio pickup device relative to the first venue terminal 310, which is not limited in all embodiments of the present invention, and other implementations. Examples can be implemented in a similar manner.
  • the second venue terminal 320 is configured to receive the image signal from the first venue terminal 310 and the adjusted multi-channel audio signal; and play the image signal and the adjusted multi-channel audio signal.
  • the conference server may receive the image signal sent by the first site terminal 310 and the adjusted audio signal, and perform the mixing and the like processing, and then send the result to the other site terminal; and the second site terminal 320 may receive the message from the conference server.
  • the site terminal in this embodiment receives the audio signal picked up by the movable audio pickup device, and acquires the current direction of the movable audio pickup device relative to the site terminal; and receives the image capturing device for the movable audio.
  • An embodiment of the method for processing an audio signal of a venue terminal comprising: receiving, by the venue terminal, an audio signal picked up by the movable audio pickup device, and acquiring a current direction of the movable audio pickup device relative to the venue terminal; a multi-channel audio signal corresponding to the audio signal, the multi-channel being at least two channels; adjusting at least one channel audio of the multi-channel audio signal according to a direction of the movable audio pick-up device currently relative to the venue terminal.
  • the delay, phase, and/or signal strength of the signal such that the direction of the sound presented by the adjusted multi-channel audio signal matches the current direction of the movable audio pickup device relative to the venue terminal, and is adjusted Multi-channel audio signal; sends the adjusted multi-channel audio signal.
  • specific steps may include:
  • the venue terminal receives an audio signal picked up by the removable audio pick-up device, and obtains a current direction of the movable audio pick-up device relative to the venue terminal.
  • the audio signal picked up by the movable audio pickup device is a single signal.
  • the venue terminal can obtain the current direction of the removable audio pickup device relative to the venue terminal in a variety of ways. It can be understood that the location terminal is used as an absolute reference system to represent the direction. Of course, the venue terminal can also obtain the direction of the movable audio pickup device relative to other reference objects (for example, a conference screen, an image capturing device, or other devices). And based on the orientation relationship between the reference object and the venue terminal, it is equivalent to obtaining a removable audio pickup device. The direction relative to the front of the venue terminal. The venue terminal can also obtain the current location of the removable audio pickup device.
  • the site terminal of this embodiment can adopt a similar manner that the first site terminal 310 acquires the current direction of the movable audio pickup device relative to the first site terminal 310 in the above embodiment, to obtain the current movable audio pickup device relative to the site.
  • the direction of the terminal is not described here.
  • the site terminal generates a multi-channel audio signal corresponding to the received audio signal (the multi-channel is at least two channels); and adjusts the multi-acoustic according to the current direction of the movable audio pick-up device relative to the site terminal.
  • the delay, phase and/or signal strength of at least one channel audio signal in the channel audio signal such that the direction of the sound presented by the adjusted multi-channel audio signal is currently relative to the movable audio pickup device.
  • the direction of the terminal of the venue matches;
  • the venue terminal sends the adjusted multi-channel audio signal.
  • the venue terminal may further receive an image capturing device (if present) for an image signal captured by an area including the current location of the movable audio pickup device, and transmit the image signal.
  • the conference server for example, the MCU
  • the conference server can receive the adjusted multi-channel audio signal (and the image signal) sent by the conference terminal, perform processing such as mixing, and then forward to other conference terminals, and other conference terminals
  • the adjusted multi-channel audio signal (and corresponding image signal) can be received and played to obtain a sound image matching effect.
  • the site terminal in this embodiment receives the audio signal picked up by the movable audio pickup device, and acquires the current direction of the movable audio pickup device relative to the site terminal; generates multi-channel audio corresponding to the audio signal. And adjusting a delay, a phase, and/or a signal strength of the at least one channel audio signal of the multi-channel audio signal according to a direction of the current movable audio pickup device relative to the venue terminal, so that the adjusted multiple sounds The sound direction presented by the channel audio signal is matched with the current direction of the venue terminal relative to the movable audio pickup device.
  • the direction of the sound that is displayed when the audio signal is played is matched with the direction of the terminal of the mobile audio pickup device, which is the same for the other venue terminal after receiving the adjusted multi-channel audio signal.
  • Play the corresponding image signal and the adjustment with the effect of sound image matching The latter multi-channel audio signal lays the foundation, which is beneficial to the realization of the "listening" function in the video conferencing system deployment of the portable audio pickup device.
  • the field terminal is divided into several modules, and each module cooperates with each other to implement an audio signal processing.
  • the application scenario in which the mobile audio pickup device deployed in the videoconferencing system is a wireless microphone is taken as an example in this embodiment.
  • the application scenario of deploying other types of removable audio pickup devices is similar.
  • FIG. 5 to FIG. 7 Three exemplary embodiments are shown in FIG. 5 to FIG. 7. It can be understood that the conference terminal may also use other module division methods to process audio signals.
  • the purpose of identifying the current location of the wireless microphone is achieved by adding the number of receiving modules for receiving the audio signal picked up by the wireless microphone at the venue terminal.
  • the number of the receiving modules in the site terminal is greater than or equal to two, according to the requirements of the current positional accuracy of the wireless microphone.
  • the audio signal processing flow can be as shown in FIG. 5, wherein the solid arrow line is the data flow direction, and the dashed arrow line is the control flow direction, and the subsequent embodiments are no longer described.
  • the wireless microphone sends the audio signal picked up by the audio pick-up module to the venue terminal.
  • the venue terminal in FIG. 5 may include: an orientation recognition module, an adjustment module, a code sending module, and multiple receiving modules.
  • the multiple receiving modules deployed in the site terminal respectively receive the audio signals sent by the wireless microphone, and the multiple receiving modules respectively send the received audio signals to the position recognition module for position analysis.
  • the azimuth identification module calculates information about a current direction of the wireless microphone relative to the site terminal by using information such as a time difference, a phase difference, and/or an intensity difference between the signals of the plurality of receiving modules. For example, the calculated direction is relative to the site terminal. Left, centered or right, etc.;
  • the azimuth identification module sends the information of the currently located wireless microphone relative to the direction of the venue terminal (which can be regarded as the sound source direction) to the adjustment module.
  • the azimuth recognition module can also select one of the received N channels of audio signals according to parameters such as signal to noise ratio, volume, continuity, etc. (for example, selecting one channel with better audio signal quality) to send to the tone. Entire module.
  • the adjustment module generates a multi-channel audio signal corresponding to the received audio signal (the multi-channel includes at least two channels), and adjusts at least the multi-channel audio signal according to a direction of the wireless microphone currently relative to the site terminal.
  • the delay, phase, and/or signal strength of a channel audio signal such that the direction of the sound presented by the adjusted multi-channel audio signal matches the current direction of the wireless microphone relative to the venue terminal;
  • the adjusted multi-channel audio signal is sent to the code transmitting module.
  • the code sending module encodes and transmits the multi-channel audio signal.
  • the venue terminal shown in FIG. 5 can also receive an image signal captured by an image capturing device (if present) for an area including the current position of the wireless microphone, and transmit the image signal.
  • the conference server for example, the MCU
  • receives the adjusted multi-channel audio signal (and the image signal) sent by the conference terminal performs processing such as mixing, and forwards the message to other conference terminals, and other conference terminals can
  • the adjusted multi-channel audio signal (and corresponding image signal) is received and played to obtain a sound image matching effect.
  • a location identification information transmitting module for transmitting location identification information (the location identification information is information that can be used to identify the current location of the removable audio pickup device) is added to the wireless microphone, in the conference terminal.
  • the orientation recognition module is added to achieve the purpose of identifying the current location of the wireless microphone.
  • the audio signal processing flow can be as shown in FIG. 6, and can include:
  • the wireless microphone sends the audio signal picked up by the picking module to the venue terminal.
  • the location identification information sending module deployed in the wireless microphone sends location identification information to the venue terminal.
  • the venue terminal shown in FIG. 6 may include: a receiving module, an azimuth identifying module, an adjusting module, and a code sending module.
  • the receiving module in the conference terminal receives the audio signal sent by the wireless microphone, and sends the received audio signal to the adjustment module.
  • the position recognition module receives the location identification information sent by the wireless microphone, and determines, according to the received location identification signal, a direction of the wireless microphone relative to the site terminal, and the The line microphone is currently sent to the adjustment module with respect to the direction information of the site terminal, as a basis for adjusting the adjustment module;
  • the manner of position recognition of the position recognition module includes but is not limited to the following two methods: Infrared image recognition method: an infrared signal transmitting module (ie, a position identification information transmitting module) is added to the mobile microphone, and an infrared camera is provided at the venue terminal.
  • the azimuth recognition module analyzes the direction of the mobile microphone relative to the venue terminal by using an image captured by the infrared camera.
  • Infrared signal positioning method Adding an infrared signal transmitting module (ie, position identification information transmitting module) on the mobile microphone, adding an infrared signal receiver at the venue terminal, and the orientation recognition module uses the mature infrared signal positioning technology to calculate the current relative state of the mobile microphone. In the direction of the venue terminal.
  • an infrared signal transmitting module ie, position identification information transmitting module
  • the orientation recognition module uses the mature infrared signal positioning technology to calculate the current relative state of the mobile microphone. In the direction of the venue terminal.
  • the adjustment module generates a multi-channel audio signal corresponding to the received audio signal (the multi-channel is at least two channels); and at least adjusts the multi-channel audio signal according to a direction of the wireless microphone currently relative to the venue terminal.
  • the delay, phase, and/or signal strength of a channel audio signal such that the direction of the sound presented by the adjusted multi-channel audio signal matches the current direction of the wireless microphone relative to the venue terminal;
  • the adjusted multi-channel audio signal is sent to the code sending module;
  • the code sending module encodes and transmits the multi-channel audio signal.
  • the venue terminal shown in FIG. 6 can also receive an image signal captured by an image capturing device (if present) for an area including the current position of the wireless microphone, and transmit the image signal.
  • the conference server for example, the MCU
  • receives the adjusted multi-channel audio signal (and the image signal) sent by the conference terminal performs processing such as mixing, and forwards the message to other conference terminals, and other conference terminals can
  • the adjusted multi-channel audio signal (and corresponding image signal) is received and played to obtain a sound image matching effect.
  • FIG. 7 the position of the mobile microphone is recognized by the image recognition method, thereby guiding the present embodiment to perform audio signal processing without adding any hardware equipment.
  • the audio signal processing flow can be as shown in FIG. 7, and can include:
  • the wireless microphone sends an audio signal picked up by the picking module to the venue terminal.
  • the site terminal shown in FIG. 7 may include: a receiving module, an azimuth identifying module, an adjusting module, and a code sending module.
  • the receiving module of the site terminal receives the audio signal sent by the wireless microphone, and the received signal is received. 703.
  • the azimuth recognition module analyzes the current direction of the current wireless microphone relative to the site terminal by using an image recognition technology, and sends the current direction information of the wireless microphone to the adjustment module as a basis for adjusting the adjustment module.
  • the image recognition technology is a technology for identifying objects in an image.
  • the more common face recognition is a kind of image recognition technology, which will not be described in detail here.
  • the adjustment module generates a multi-channel audio signal corresponding to the received audio signal (the multi-channel is at least two channels); and adjust at least the multi-channel audio signal according to a direction of the wireless microphone currently relative to the venue terminal.
  • the delay, phase, and/or signal strength of a channel audio signal such that the direction of the sound presented by the adjusted multi-channel audio signal matches the current direction of the wireless microphone relative to the venue terminal;
  • the adjusted multi-channel audio signal is sent to the code transmitting module.
  • the code sending module encodes and transmits the multi-channel audio signal.
  • the venue terminal shown in FIG. 7 can also receive an image signal captured by an image capturing device (if present) for an area including the current position of the wireless microphone, and transmit the image signal.
  • the conference server for example, the MCU
  • receives the adjusted audio signal (and the image signal) sent by the site terminal performs corresponding processing on the site, and forwards the message to other site terminals, and other site terminals can receive and play the adjustment.
  • the audio signal (and corresponding image signal) is followed by a sound image matching effect.
  • the site terminal in this embodiment receives an audio signal picked up by a removable audio pickup device such as a wireless microphone, and acquires a current direction of the movable audio pickup device relative to the venue terminal; generating corresponding to the audio signal.
  • a multi-channel audio signal adjusting a delay, a phase, and/or a signal strength of at least one of the multi-channel audio signals according to a direction of the currently movable audio pickup device relative to the venue terminal to enable the adjustment
  • the sound direction presented by the subsequent multi-channel audio signal is matched with the current direction of the venue terminal relative to the movable audio pickup device.
  • the direction of the sound presented by the channel audio signal is matched with the current direction of the mobile audio pickup device relative to the venue terminal, which is another venue terminal.
  • the corresponding image signal and the adjusted multi-channel audio signal can be played by the effect of the sound image matching, which lays a foundation for facilitating the deployment of the video conference system.
  • the audio signal picked up by the movable audio pickup device is adjusted mainly by the venue terminal that transmits the audio signal, so that the sound direction presented by the adjusted audio signal is played with the sound direction.
  • the mobile audio pick-up device is currently matched with respect to the direction of the venue terminal.
  • the audio signal picked up by the removable audio pick-up device can also be adjusted by the conference server (such as an MCU) or by the venue terminal or other device that receives the audio signal. Delay and / or phase and / or signal strength, etc.
  • the following describes a scenario in which an audio signal picked up by a removable audio pickup device is adjusted by a conference server (such as an MCU) or by a venue terminal that receives an audio signal.
  • a conference server such as an MCU
  • a venue terminal that receives an audio signal.
  • FIG. 8 Another embodiment of a video conferencing system of the present invention, as shown in FIG. 8, may include: a third venue terminal 810, a conference server 820, and a fourth venue terminal 830.
  • the third site terminal 810 is configured to receive an audio signal picked up by the movable audio pickup device, and obtain a direction of the movable audio pickup device relative to the third site terminal 810; and receive the image capturing device for the movable audio.
  • An image signal captured by the current location of the device is picked up; according to the current direction of the movable audio pickup device relative to the third venue terminal 810, direction indication information indicating the direction of the sound presented when the audio signal is played is generated (the direction indication information)
  • the direction indicator or the auxiliary sound image information For example, the direction indicator or the auxiliary sound image information), wherein the direction of the sound to be presented when the audio signal is indicated by the direction indication information matches the direction of the movable audio pickup device currently relative to the third venue terminal 810;
  • the image signal, the audio signal, and the direction indication information is configured to receive an audio signal picked up by the movable audio pickup device, and obtain a direction of the movable audio pickup device relative to the third site terminal 810; and receive the image capturing device
  • the third site terminal 810 obtains the movable audio pickup device, and the third site terminal 810 can generate an indication according to the direction of the movable audio pickup device relative to the third site terminal 810.
  • the direction indicator of the sound direction is displayed when the audio signal is played, and the direction identifier is added and sent in the header field of the message for carrying the audio signal or at another location; or the third venue terminal 810 can be configured according to the Mobile audio pickup device currently
  • the panning auxiliary information corresponding to the audio signal is generated with respect to the direction of the third venue terminal 810 (the sound direction presented from the adjusted audio signal based on the sound phase auxiliary information and the movable audio pickup device are currently relative to the third venue terminal 810
  • the directions are matched;), and the panning auxiliary information is added and transmitted in the code stream to be transmitted corresponding to the audio signal.
  • a conference server 820 configured to receive an image signal, an audio signal, and direction indication information sent by the third conference terminal 810; generate a multi-channel audio signal corresponding to the audio signal (the multi-channel is at least two channels);
  • the direction indication information adjusts a delay, a phase, and/or a signal strength of the at least one channel audio signal of the multi-channel audio signal such that the adjusted sound direction and the movable direction of the multi-channel audio signal are played
  • the audio pickup device is currently matched with respect to the direction of the third venue terminal 810; the image signal and the adjusted multi-channel audio signal are transmitted;
  • the fourth venue terminal 830 is configured to receive the image signal sent by the conference server 820 and the adjusted multi-channel audio signal; and play the image signal and the adjusted multi-channel audio signal.
  • the site terminal in the embodiment receives the audio signal picked up by the movable audio pickup device, and acquires the direction of the movable audio pickup device relative to the site terminal; according to the movable audio pickup device, currently relative to the venue terminal And a direction indication information indicating a direction of the sound that is displayed when the audio signal is played; sending the audio signal and the direction indication information, which is to be presented when the audio signal is indicated by the direction indication information generated and sent by the venue terminal The direction of the sound is matched with the direction of the terminal of the mobile audio pickup device.
  • the conference server or other site terminal can receive the audio signal and the direction indication information according to the direction indication information.
  • the signal is adjusted and played, which can lay the foundation for playing the audio signal and the corresponding image signal with the effect of sound image matching, which is beneficial to realize the "listening and arranging" function in the scene of the mobile audio pickup device deployed in the video conferencing system. .
  • Another embodiment of the method for processing the terminal audio signal of the present invention may include: the conference server receiving the image signal, the audio signal, and the direction indication information sent by the conference terminal, wherein the audio signal is picked up by the movable audio pickup device, the direction indication The information is generated according to the current direction of the movable audio pick-up device relative to the venue terminal, and is used to indicate a sound direction to be presented when the audio signal is played; generating a multi-channel audio signal corresponding to the audio signal, where the multi-channel includes at least Two Adjusting a delay, a phase, and/or a signal strength of at least one of the multi-channel audio signals according to the direction indication information, so that the adjusted multi-channel audio signal is played The sound direction is matched with the current direction of the movable audio pickup device relative to the venue terminal; the image signal and the adjusted multi-channel audio signal are transmitted.
  • a conference server provided by the implementation of the present invention may include: a second receiving unit 821, a second adjusting unit 822, and a second sending unit 823.
  • the second receiving unit 821 is configured to receive an image signal, an audio signal, and direction indication information sent by the conference terminal, where the audio signal is picked up by the movable audio pickup device, where the direction indication information is currently according to the movable audio pickup device. Generated relative to the direction of the venue terminal, used to indicate the direction of the sound to be presented when the audio signal is played;
  • a second adjusting unit 822 configured to generate a multi-channel audio signal corresponding to the audio signal, the multi-channel includes at least two channels; and adjust at least one channel audio signal of the multi-channel audio signal according to the direction indication information Delay, phase and/or signal strength such that the direction of sound presented by the adjusted multi-channel audio signal during playback is matched to the current direction of the movable audio pickup device relative to the venue terminal;
  • the second transmitting unit 823 is configured to send the image signal and the multi-channel audio signal adjusted by the second adjusting unit 822.
  • the conference server can also implement the above functions by deploying several modules of other modules, which are not exemplified herein.
  • a further embodiment of a video conferencing system of the present invention may include: a fifth site terminal 910 and a sixth site terminal 920.
  • the fifth site terminal 910 is configured to receive an audio signal picked up by the movable audio pick-up device, and obtain a direction of the current movable audio pick-up device relative to the fifth site terminal; and receive the image capturing device for the movable audio pick-up An image signal captured by the current location of the device; according to the direction of the current audio pickup device relative to the fifth venue terminal 910, direction indication information indicating the direction of the sound presented when the audio signal is played is generated (the direction indication information) For example, the direction indicator or the auxiliary audiovisual information), wherein the direction indicated by the direction indication information is the sound direction presented by the audio signal and the current movable audio pickup device is currently relative to the fifth venue terminal.
  • the 910 direction is matched; the image signal, the audio signal, and the direction indication information are transmitted.
  • the fifth site terminal 910 can generate a direction identifier indicating the direction of the sound presented when the audio signal is played, according to the direction of the movable audio pickup device relative to the fifth venue terminal 910, and can be used for carrying Adding the direction identifier to the header field of the message of the audio signal or the other location; or, the fifth site terminal 910 can generate an audio signal corresponding to the current direction of the movable audio pickup device relative to the fifth site terminal 910.
  • the panning auxiliary information (based on the direction in which the sound phase auxiliary information is played from the adjusted audio signal during playback is matched with the direction of the movable audio pickup device currently relative to the fifth venue terminal 910;), and corresponding to the audio signal
  • the panning auxiliary information is added to the to-be-sent code stream and transmitted.
  • the sixth site terminal 920 is configured to receive an image signal, an audio signal, and direction indication information corresponding to the audio signal from the fifth site terminal 910; play the image signal and play the audio signal according to the direction indication information.
  • the sixth venue terminal 920 can play the audio signal only on the left speaker; or the sixth venue terminal 920 can also Play the audio signal through multi-channel, but increase the volume of the left speaker and / or lower the volume of other speakers, or adjust the phase and delay of other speakers, so that the sound direction presented by the audio signal and the sound
  • the mobile audio pickup device currently matches the direction of the fifth venue terminal 910.
  • the fifth venue terminal 910 in this embodiment receives the audio signal picked up by the movable audio pickup device, and obtains the direction of the movable audio pickup device relative to the venue terminal; according to the current movable audio pickup device, the current relative to the mobile audio pickup device Direction of the location of the terminal, generating direction indication information indicating the direction of the sound that is displayed when the audio signal is played; transmitting the audio signal and direction indication information, the audio indicated by the direction indication information generated and sent by the fifth conference terminal 910
  • the direction of the sound to be presented when the signal is played is matched with the direction of the current terminal of the mobile audio pickup device; this is after the conference server or other site terminal receives the audio signal and the direction indication information, according to the
  • the direction indication information adjusts and plays the audio signal, thereby laying the foundation for playing the audio signal and the corresponding image signal with the effect of sound image matching, which is beneficial to realize the "listening" of the video conferencing system in the case of deploying the movable audio pickup device. Tone recognition function.
  • Another embodiment of the method for processing the terminal audio signal of the present invention may include: receiving, by the venue terminal, an audio signal picked up by the movable audio pickup device, and acquiring a current direction of the movable audio pickup device relative to the venue terminal;
  • the direction of the direction of the sound presented by the audio signal when the audio signal is played (the direction indication information is, for example, the direction identifier or the auxiliary sound image information), is generated by the movable audio pickup device.
  • the direction of the sound to be presented when the audio signal is indicated by the direction indication information matches the current direction of the movable audio pickup device relative to the venue terminal; the audio signal and the direction indication information are transmitted.
  • the site terminal in the embodiment receives the audio signal picked up by the movable audio pickup device, and acquires the direction of the movable audio pickup device relative to the site terminal; according to the movable audio pickup device, currently relative to the venue terminal And a direction indication information indicating a direction of the sound that is displayed when the audio signal is played; sending the audio signal and the direction indication information, which is to be presented when the audio signal is indicated by the direction indication information generated and sent by the venue terminal The direction of the sound is matched with the direction of the terminal of the mobile audio pickup device.
  • the conference server or other site terminal can receive the audio signal and the direction indication information according to the direction indication information.
  • the signal is adjusted and played, which can lay the foundation for playing the audio signal and the corresponding image signal with the effect of sound image matching, which is beneficial to realize the "listening and arranging" function in the scene of the mobile audio pickup device deployed in the video conferencing system. .
  • the embodiment of the present invention further provides a venue terminal 1000, including: a receiving determining unit 1010, an adjusting unit 1020, and a sending unit 1030.
  • the receiving determining unit 1010 is configured to receive an audio signal that is picked up by the movable audio pick-up device, and obtain a direction of the movable audio picking device that is currently relative to the venue terminal 1000.
  • the adjusting unit 1020 is configured to generate a corresponding signal of the audio signal.
  • the receiving determining unit 1010 may include: a first location determining submodule and at least two receiving modules;
  • the receiving module is configured to receive the audio signal picked up by the movable audio pick-up device, and the first position determining sub-module is configured to determine, by the difference between the audio signals received by each of the at least two receiving modules, the movable audio.
  • the receiving determining unit 1010 may include: an information receiving module and a second position determining submodule, wherein the information receiving module is configured to receive an audio signal picked up by the movable audio pick-up device and position identification information sent by the movable audio pick-up device;
  • a second location determining submodule configured to determine, by using the location identification information, a direction of the current removable audio pickup device relative to the venue terminal 1000;
  • the reception determining unit 1010 may include: a receiving module and an image recognition module.
  • the receiving module is configured to receive an audio signal picked up by the movable audio pick-up device, and the image identifying module is configured to determine, by using an image recognition technology, a direction of the movable audio pick-up device relative to the venue terminal 1000.
  • the site terminal 1000 in this embodiment may be the site terminal in the foregoing method embodiment, and the functions of the respective function modules may be specifically implemented according to the method in the foregoing embodiment.
  • the terminal 1000 receives the audio signal picked up by the movable audio pickup device, and acquires the current direction of the movable audio pickup device relative to the venue terminal; and receives an image captured by the image capturing device for the current region of the movable audio pickup device.
  • another venue terminal 1100 may include: a receiving determining unit 1110, a generating unit 1120, and a sending unit 1130.
  • the receiving determining unit 1110 is configured to receive an audio signal picked up by the movable audio pick-up device, and acquire a current direction of the movable audio pick-up device relative to the venue terminal 1100.
  • the generating unit 1120 is configured to perform, according to the removable audio pick-up
  • the device currently generates direction indication information for indicating the direction of the sound to be presented when the audio signal is played, with respect to the direction of the venue terminal 1100, wherein the direction indicated by the direction indication information indicates the direction of the sound to be presented when the audio signal is played
  • the mobile audio pickup device currently matches the direction of the venue terminal 1100;
  • the transmitting unit 1130 is configured to send the direction indication information and the audio signal received by the receiving determining unit 1110.
  • the site terminal 1100 in this embodiment may be the site terminal in the foregoing method embodiment, and the function of each function module may be specifically implemented according to the method in the foregoing embodiment.
  • the terminal 1100 receives the audio signal picked up by the removable audio pick-up device, and acquires the direction of the movable audio pick-up device relative to the venue terminal; according to the direction of the movable audio pick-up device relative to the venue terminal, generating the audio signal Direction indication information of the sound direction presented during playback; transmitting the audio signal and the direction indication information, the direction of the sound to be presented when the audio signal is played, indicated by the direction indication information generated and transmitted by the venue terminal, and the movable audio pickup
  • the device is currently matched with the direction of the terminal of the site; after receiving the audio signal and the direction indication information, the conference server or other site terminal can adjust and play the audio signal according to the direction indication information, thereby enabling sound
  • the basis for playing the audio signal and the corresponding image signal lays a foundation, which is beneficial to the realization of

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本发明实施例公开了会场终端音频信号处理方法及会场终端和视讯会议系统。其中,一种会场终端音频信号处理方法,包括:会场终端接收可移动音频拾取设备所拾取的音频信号,并获取可移动音频拾取设备当前相对于会场终端的方向;生成该音频信号对应的多声道音频信号;根据可移动音频拾取设备当前相对于该会场终端的方向,调整该多声道音频信号中至少1个声道音频信号的延迟、相位和/或信号强度,以使该调整后的多声道音频信号播放时所呈现出的声音方向与该可移动音频拾取设备当前相对于会场终端的方向相匹配,得到调整后的多声道音频信号;发送调整后的多声道音频信号。本发明实施例的方案有利于解决可移动音频拾取设备部署场景下的声像匹配问题。

Description

会场终端音频信号处理方法及会场终端和视讯会议系统 本申请要求于 2011 年 4 月 22 日提交中国专利局、 申请号为 201110101877.6、 发明名称为"会场终端音频信号处理方法及会场终端和视讯 会议系统"的中国专利申请的优先权, 其全部内容通过引用结合在本申请中。
技术领域
本发明涉及通信技术领域, 特别涉及会场终端音频信号处理方法及会场 终端和视讯会议系统。
背景技术
目前的视讯会议系统一般包括: 会场终端和会议服务器(图 1 中会议服 务器以多点控制单元( MCU, Multipoint Control Unit ) 为例)。 在一个视讯会 议系统中, 每个会场均具有至少 1 个会场终端, 各会场终端采集各自会场的 声音、 图像并编码发送给 MCU。 多点控制单元按照一定的方式对声音、 图像 进行处理(例如声音混音、 图像转发或组成多画面等处理), 并将处理后的声 音和图像发送给视讯会议中的其他各个会场终端, 各会场终端解码输出远端 会场的声音和图像, 实现远程视讯通信的目的。
随着视讯技术的不断发展, 交互性、 易用性成为视讯会议系统的一个发 展方向, 实现面对面的交互感成为人们追求的目标。 人们不再仅仅满足于看 到清晰的图像, 听到悦耳的声音, "真人大小"、 "眼对眼"、 "听音辨位"等更高 层次的需求已成为是视讯会议系统发展的方向。 例如, 在 3屏远程呈现会场 的应用场景下, 人们可能还期望不需要抬头看说话人, 就能够从声音的方向 上判断出是谁在说话, 即"听音辨位", 从而获得更强的现场感。
视讯会议系统一般采用固定式的麦克风等作为音频拾取设备, 一个或多 个麦克风固定在桌面或者天花板上, 用来拾取发言者的声音。 当会议室较大 或发言人位置不确定的情况下, 还可能会采用可移动音频拾取设备(例如无 线麦克风等)作为固定式音频拾取设备的一种补偿。
在"听音辨位"的功能需求下,视讯会议系统中的可移动音频拾取设备的声 像匹配问题, 成为一个影响会议效果的关键因素。 而业内目前还没有一种在 部署了可移动音频拾取设备场景下能够较好的解决其声像匹配问题, 以实现 "听音辨位"功能的有效方案。
发明内容
本发明实施例提供会场终端音频信号处理方法及会场终端和视讯会议系 统, 以便实现可移动音频拾取设备部署场景下的声像匹配。
为解决上述技术问题, 本发明所提供以下技术方案实现:
一种视讯会议系统, 包括:
第一会场终端和第二会场终端, 所述第一会场终端和所述第二会场终端 通过网络相连接; 所述第一会场终端所在的会场部署有可移动音频拾取设备 以及图像拍摄设备;
其中, 所述第一会场终端, 用于接收所述可移动音频拾取设备所拾取的 音频信号, 并获取所述可移动音频拾取设备当前相对于所述第一会场终端的 方向; 接收所述图像拍摄设备针对所述可移动音频拾取设备当前所在区域所 拍摄的图像信号; 生成所述音频信号对应的多声道音频信号, 所述多声道为 至少两个声道; 根据所述可移动音频拾取设备当前相对于所述第一会场终端 的方向, 调整所述多声道音频信号中的至少一个声道音频信号的延迟、 相位 和 /或信号强度, 以使该调整后的多声道音频信号播放时所呈现出的声音方向 与所述可移动音频拾取设备当前相对于所述第一会场终端的方向相匹配; 发 送所述图像信号和调整后的多声道音频信号;
所述第二会场终端, 用于接收来自所述第一会场终端的图像信号和调整 后的多声道音频信号; 播放所述图像信号和调整后的多声道音频信号。
一种会场终端音频信号处理方法, 包括:
会场终端接收可移动音频拾取设备所拾取的音频信号, 并获取所述可移 动音频拾取设备当前相对于所述会场终端的方向;
生成所述音频信号对应的多声道音频信号, 其中, 所述多声道为至少两 个声道;
根据所述可移动音频拾取设备当前相对于所述会场终端的方向, 调整所 述多声道音频信号中至少一个声道音频信号的延迟、 相位和 /或信号强度, 以 使该调整后的多声道音频信号播放时所呈现出的声音方向与所述可移动音频 拾取设备当前相对于所述会场终端方向相匹配;
发送所述调整后的多声道音频信号。
一种会场终端, 包括:
接收确定单元, 用于接收可移动音频拾取设备所拾取的音频信号, 并获 取所述可移动音频拾取设备当前相对于所述会场终端的方向;
调整单元, 用于生成所述音频信号对应的多声道音频信号; 根据所述可 移动音频拾取设备当前相对于所述会场终端的方向调整所述多声道音频信号 中至少一个声道音频信号的延迟、 相位和 /或信号强度, 以使该调整后的多声 道音频信号播放时所呈现出的声音方向与该可移动音频拾取设备当前相对于 所述会场终端的方向相匹配;
发送单元, 用于发送所述调整单元调整后的多声道音频信号。
一种视讯会议系统, 包括:
第三会场终端、 第四会场终端以及会议服务器, 其中, 所述第三会场终 端和所述第四会场终端通过网络与所述会议服务器相连接, 所述第三会场终 端所在的会场部署有可移动音频拾取设备以及图像拍摄设备;
所述第三会场终端, 用于接收所述可移动音频拾取设备所拾取的音频信 号, 并获取所述可移动音频拾取设备相对于第三会场终端的方向; 接收所述 图像拍摄设备针对所述可移动音频拾取设备当前所在区域所拍摄的图像信 号; 根据所述可移动音频拾取设备当前相对于第三会场终端的方向, 生成指 示出所述音频信号播放时所呈现声音方向的方向指示信息, 其中, 所述方向 指示信息指示出的所述音频信号播放时所要呈现的声音方向与所述可移动音 频拾取设备当前相对于第三会场终端的方向相匹配; 发送所述图像信号、 音 频信号和方向指示信息;
所述会议服务器, 用于接收所述第三会场终端发送的图像信号、 音频信 号和方向指示信息; 生成所述音频信号对应的多声道音频信号, 所述多声道 为至少两个声道; 根据所述方向指示信息调整所述多声道音频信号中至少一 个声道音频信号的延迟、 相位和 /或信号强度, 以使得该调整后的多声道音频 信号播放时所呈现出的声音方向与该可移动音频拾取设备当前相对于第三会 场终端的方向相匹配; 发送所述图像信号和调整后的多声道音频信号;
所述第四会场终端, 用于接收所述会议服务器发送的图像信号和调整后 的多声道音频信号; 播放该图像信号和调整后的多声道音频信号。
一种视讯会议系统, 包括:
第五会场终端和第六会场终端, 所述第五会场终端和所述第六会场终端 通过网络相连接; 所述第五会场终端所在的会场部署有可移动音频拾取设备 以及图像拍摄设备;
第五会场终端, 用于接收可移动音频拾取设备所拾取的音频信号, 并获 取该可移动音频拾取设备当前相对于第五会场终端的方向; 接收图像拍摄设 备针对该可移动音频拾取设备当前所在区域所拍摄的图像信号; 根据该可移 动音频拾取设备当前相对于第五会场终端的方向, 生成用于指示所述音频信 号播放时所呈现声音方向的方向指示信息, 其中, 所述方向指示信息指示出 的所述音频信号播放时所呈现的声音方向与所述可移动音频拾取设备当前相 对于第五会场终端的方向相匹配; 发送所述图像信号、 音频信号和方向指示 信息;
第六会场终端, 用于接收来自第五会场终端的图像信号、 音频信号和该 音频信号对应的方向指示信息; 播放该图像信号并根据所述方向指示信息播 放该音频信号。
一种会场终端音频信号处理方法, 包括:
会场终端接收可移动音频拾取设备所拾取的音频信号, 并获取该可移动 音频拾取设备当前相对于所述会场终端的方向;
根据所述可移动音频拾取设备当前相对于所述会场终端的方向, 生成用 于指示所述音频信号播放时所呈现声音方向的方向指示信息, 其中, 所述方 向指示信息指示出的所述音频信号播放时所要呈现的声音方向与所述可移动 音频拾取设备当前相对于所述会场终端方向相匹配;
发送所述音频信号和方向指示信息。
一种会场终端, 包括:
接收确定单元, 用于接收可移动音频拾取设备所拾取的音频信号, 并获 取该可移动音频拾取设备当前相对于所述会场终端的方向;
生成单元, 用于根据所述可移动音频拾取设备当前相对于所述会场终端 的方向, 生成用于指示所述音频信号播放时所要呈现的声音方向的方向指示 信息, 其中, 所述方向指示信息指示出的所述音频信号播放时所要呈现的声 音方向与所述可移动音频拾取设备当前相对于所述会场终端方向相匹配; 发送单元, 用于发送所述音频信号和所述方向指示信息。
一种会议服务器, 包括:
第二接收单元, 用于接收会场终端发送的图像信号、 音频信号和方向指 示信息, 其中, 所述音频信号由可移动音频拾取设备拾取, 所述方向指示信 息根据所述可移动音频拾取设备当前相对于所述会场终端的方向生成, 用于 指示所述音频信号播放时所要呈现的声音方向;
第二调整单元, 用于生成所述音频信号对应的多声道音频信号, 所述多 声道包括至少两个声道; 根据该方向指示信息调整该多声道音频信号中至少 一个声道音频信号的延迟、 相位和 /或信号强度, 以使得该调整后的多声道音 频信号播放时所呈现出的声音方向与该可移动音频拾取设备当前相对于所述 会场终端的方向相匹配;
第二发送单元, 用于发送所述图像信号和第二调整单元调整后的多声道 音频信号。
由上可见, 本发明实施例的一种方案中, 会场终端接收可移动音频拾取 设备所拾取的音频信号, 并获取该可移动音频拾取设备当前相对于该会场终 端的方向; 并接收图像拍摄设备针对该可移动音频拾取设备当前所在区域所 拍摄的图像信号; 生成该音频信号对应的多声道音频信号; 根据该可移动音 频拾取设备当前相对于该会场终端的方向,调整该多声道音频信号中的至少 1 个声道音频信号的延迟、 相位和 /或信号强度, 以使该调整后的多声道音频信 号播放时所呈现出的声音方向与该可移动音频拾取设备当前相对于该会场终 端的方向相匹配; 发送该图像信号和调整后的多声道音频信号。 由于该会场 终端调整了该多声道音频信号中的至少 1个声道音频信号的延迟、 相位和 /或 信号强度, 以使该调整后的多声道音频信号播放时所呈现出的声音方向与该 可移动音频拾取设备当前相对于该会场终端的方向相匹配, 这就为其它会场 终端在接收到该图像信号和调整后的多声道音频信号后, 能够以声像匹配的 效果来播放该图像信号和该调整后的音频信号奠定了基础, 也就有利于实现 视讯会议系统部署可移动音频拾取设备场景下的 "听音辨位"功能。
本发明实施例的另一种方案中, 会场终端接收可移动音频拾取设备所拾 取的音频信号, 获取该可移动音频拾取设备相对于会场终端的方向; 根据该 可移动音频拾取设备当前相对于会场终端的方向, 生成指示出该音频信号播 放时所呈现声音方向的方向指示信息; 发送该音频信号和方向指示信息。 由 于该会场终端生成并发送的方向指示信息所指示出的该音频信号播放时所要 呈现的声音方向, 与该可移动音频拾取设备当前相对于该会场终端方向相匹 配; 这就为会议服务器或其它的会场终端在接收到该音频信号和方向指示信 息后, 根据该方向指示信息对音频信号进行调整或播放, 进而以声像匹配的 效果来播放音频信号和对应的图像信号奠定了基础, 也就有利于实现视讯会 议系统部署可移动音频拾取设备场景下的 "听音辨位"功能。
附图说明
为了更清楚地说明本发明实施例和现有技术中的技术方案, 下面将对实 施例和现有技术描述所需要使用的附图作筒单地介绍, 显而易见地, 下面描 述中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在 不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。
图 1为现有技术的一种视讯会议系统的示意图;
图 2为本发明实施例提供的视讯会议系统中声相产生过程的示意图; 图 3为本发明实施例提供的一种视讯会议系统的示意图;
图 4为本发明实施例提供的一种会场终端音频信号处理方法的流程示意 图;
图 5为本发明实施例提供的一种模块化音频信号处理的示意图; 图 6为本发明实施例提供的另一种模块化音频信号处理的示意图; 图 7为本发明实施例提供的再一种模块化音频信号处理的示意图; 图 8-a为本发明实施例提供的另一种视讯会议系统的示意图;
图 8-b为本发明实施例提供的一种会议服务器的示意图;
图 9为本发明实施例提供的再一种视讯会议系统的示意图;
图 10为本发明实施例提供的一种会场终端的示意图;
图 11为本发明实施例提供另一种会场终端的示意图。
具体实施方式
本发明实施例提供会场终端音频信号处理方法及会场终端和视讯会议系 统, 以便实现可移动音频拾取设备部署场景下的声像匹配。
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而 不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作 出创造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
本发明实施例中, 声音的方向是指发声物体在声场中的发声方向, 就是 声源相对于接收端 (接收端可能是为人或会场终端等设备) 的方向, 例如是 靠左还是靠右。 人耳是靠两耳拾取到的声音信号之间的时间差和声级差, 来 判别声音的方位。 这就是所谓的"双耳效应"。 所谓的 "听音辨位"就是指利用声 音的方向信息, 辨别出发言人的位置。
例如, 图 2所示, 以双声道为例来描述视讯会议系统中声音方向的产生 过程。 假设"麦克风 _左"、 "麦克风_右"具有相同的特性且放置朝向相同, 并且 "扬声器_左"、 "扬声器_左 "具有相同的特性、 音量控制一致且均朝向"听音位 置"放置。 当"发声位置 A"讲话时, 与"麦克风 _右"相比, 由于"麦克风 _左 "距发言人 距离较近, 因此, 其拾取到的声音较大且延时较小, 在分别经过"扬声器 _左"、 "扬声器_右 "播放后, 由于左声道声音较大且播放时间较早, 故听音者会感觉 到声音从左边方向出来, 于是声音就有了方向信息。
同理, 当"发声位置 C"发声时, 听音者会感觉声音从右边方向出来。
当"发声位置 B"讲话时, 由于"麦克风 _左"、 "麦克风_右 "距发言人距离相 当, 故拾取到的声音大小、 延时基本一致, 在分别经过"扬声器 _左"、 "扬声器 _右 "播放后, 由于两个声道声音大小、 延时基本一致, 故听音者会感觉到声 音从中间方向出来。 声像匹配问题:
声像匹配, 即声音和图像间的匹配, 是指播放出的声音的方向与图像中 声源显示方位之间相匹配。 在视讯会议系统中, 除了有声音信息外, 还可以 看到与本会场视频通信的对端的图像, 若在本会场端显示器中显示的对端发 言人在图像的左边位置, 则声音需要从左方播放出来, 若发言人在图像的右 边位置, 则声音需要从右方播放出来, 这样才能做到声音与图像匹配。
本发明实施例中的可移动音频拾取设备例如可以指: 无线麦克风、 长线 麦克风等移动式的音频拾取设备。
可以理解的是, 可移动音频拾取设备的位置可能随着持有该可移动音频 拾取设备的发言者的移动而不断移动。
本发明实施例中力求提供一种在部署可移动音频拾取设备场景下用以解 决其声像匹配问题的方案, 以便实现部署可移动音频拾取设备场景下的"听音 辨位"功能。
下面首先从一种视讯会议系统的角度进行描述。
参见图 3 , 本发明实施例提供的一种视讯会议系统, 可以包括: 第一会场 终端 310和第二会场终端 320。 其中, 第一会场终端 310和第二会场终端 320 可以通过通信网络相连接, 第一会场终端所在的会场部署有可移动音频拾取 设备以及图像拍摄设备, 在图 3 中未示出通信网络、 可移动音频拾取设备以 及图像拍摄设备等。
其中, 第一会场终端 310, 用于接收可移动音频拾取设备所拾取的音频信 号, 并获取该可移动音频拾取设备当前相对于第一会场终端 310 的方向; 接 收上述图像拍摄设备针对该可移动音频拾取设备当前所在区域所拍摄的图像 信号; 生成该音频信号对应的多声道音频信号 (该多声道为至少两个声道); 根据该可移动音频拾取设备当前相对于第一会场终端 310 的方向, 调整该多 声道音频信号中的至少 1个声道音频信号的延迟、 相位和 /或信号强度, 以使 该调整后的多声道音频信号播放时所呈现出的声音方向与该可移动音频拾取 设备当前相对于第一会场终端 310 的方向相匹配; 发送该图像信号和调整后 的多声道音频信号。
在实际应用中, 第一会场终端 310和其它会场终端可以在会议建立过程 中协商会议的声道数量, 通过协商, 第一会场终端 310生成的多声道音频信 号的声道个数与第二会场终端 320支持的声道个数相等。
在实际应用中, 第一会场终端 310可采用多种多样的方式获取可移动音 频拾取设备当前相对于第一会场终端 310的方向。
可以理解, 此处是以第一会场终端 310作为绝对参考系来体现方向的, 当然第一会场终端 310也可获取可移动音频拾取设备当前相对于其它参照物 (例如为会议屏幕、 图像拍摄设备或其它设备) 的方向, 而基于该参照物和 第一会场终端 310的方位关系, 也就相当于获得了可移动音频拾取设备当前 相对于第一会场终端 310的方向。 第一会场终端 310还可获取可移动音频拾 取设备当前的位置。
作为实施方式的举例, 可以有如下几种方式实现可移动音频拾取设备当 前相对于第一会场终端 310的方向的获取:
( 1 )第一会场终端 310接收可移动音频拾取设备所拾取的音频信号, 并 通过图像识别技术确定该可移动音频拾取设备当前相对于第一会场终端 310 的方向 (该方向例如相对于第一会场终端 310靠左、 居中或靠右等);
( 2 )第一会场终端 310可通过至少两个接收模块接收可移动音频拾取设 备所拾取的音频信号;通过该至少两个接收模块接收到的音频信号的差异(该 差异可包括各接收模块接收到音频信号的时间差、相位差、 强度差中的至少 1 项), 确定该可移动音频拾取设备当前相对于第一会场终端 310的方向;
( 3 )第一会场终端 310接收可移动音频拾取设备所拾取的音频信号, 并 接收该可移动音频拾取设备发送的位置识别信息 (该位置识别信息是能够用 以识别可移动音频拾取设备当前的方位的任意信息); 通过该位置识别信息确 定该可移动音频拾取设备当前相对于第一会场终端 310的方向。
其中, 第一会场终端 310接收可移动音频拾取设备的位置识别信息, 并 通过位置识别信息确定该可移动音频拾取设备当前相对于第一会场终端 310 的方向, 举例说明如下实施方式:
1 )接收所述可移动音频拾取设备发送的红外信号; 使用红外信号图像识 别技术分析该红外信号的发送方向得到可移动音频拾取设备当前相对于第一 会场终端 310的方向; 或者,
2 )接收所述可移动音频拾取设备发送的红外信号; 使用红外信号定位技 术计算该红外信号的发送方向, 得到该可移动音频拾取设备当前相对于第一 会场终端 310的方向。
当然, 第一会场终端 310 亦可采用其它的方式, 来获取可移动音频拾取 设备当前相对于第一会场终端 310 的方向, 而本发明的所有实施例中对此均 不加限制, 其它各实施例均可采用类似的方式实施。
第二会场终端 320,用于接收来自第一会场终端 310的图像信号和调整后 的多声道音频信号; 播放该图像信号和调整后的多声道音频信号。
在实际应用中, 会议服务器可接收第一会场终端 310发送的图像信号和 调整后的音频信号, 对其进行混音等处理后向其它会场终端发送; 而第二会 场终端 320可从会议服务器接收来自第一会场终端 310的该图像信号和调整 后的多声道音频信号。
由上可见, 本实施例中的会场终端接收可移动音频拾取设备所拾取的音 频信号, 并获取该可移动音频拾取设备当前相对于该会场终端的方向; 并接 收图像拍摄设备针对该可移动音频拾取设备当前所在区域所拍摄的图像信 号; 生成该音频信号对应的多声道音频信号; 根据该可移动音频拾取设备当 前相对于该会场终端的方向, 调整该多声道音频信号中的至少 1 个声道音频 信号的延迟、 相位和 /或信号强度, 以使该调整后的多声道音频信号播放时所 呈现出的声音方向与该可移动音频拾取设备当前相对于该会场终端的方向相 匹配; 发送该图像信号和调整后的多声道音频信号, 由于该会场终端调整该 多声道音频信号中的至少 1个声道音频信号的延迟、 相位和 /或信号强度, 以 使该调整后的多声道音频信号播放时所呈现出的声音方向与该可移动音频拾 取设备当前相对于该会场终端的方向相匹配, 这就为其它会场终端在接收到 该图像信号和调整的后的多声道音频信号后, 能够以声像匹配的效果来播放 该图像信号和该调整后的音频信号奠定了基础, 也就有利于实现视讯会议系 统部署可移动音频拾取设备场景下的 "听音辨位"功能。
下面从视讯会议系统的音频信号发送端的角度进行描述。
本发明一种会场终端音频信号处理方法的一个实施例, 包括: 会场终端 接收可移动音频拾取设备所拾取的音频信号, 并获取该可移动音频拾取设备 当前相对于该会场终端的方向; 生成该音频信号对应的多声道音频信号, 该 多声道为至少两个声道; 根据该可移动音频拾取设备当前相对于该会场终端 的方向, 调整该多声道音频信号中至少一个声道音频信号的延迟、 相位和 /或 信号强度, 以使该调整后的多声道音频信号播放时所呈现出的声音方向与该 可移动音频拾取设备当前相对于该会场终端方向相匹配, 得到调整后的多声 道音频信号; 发送该调整后的多声道音频信号。
参见图 4, 具体步骤可以包括:
401、 会场终端接收可移动音频拾取设备所拾取的音频信号, 并获取该可 移动音频拾取设备当前相对于该会场终端的方向;
本实施例中, 可移动音频拾取设备所拾取的音频信号为单路信号。
在实际应用中, 会场终端可采用多种多样的方式获取可移动音频拾取设 备当前相对于会场终端的方向。 可以理解, 此处是以会场终端作为绝对参考 系来体现方向的, 当然会场终端也可以获取可移动音频拾取设备当前相对于 其它参照物 (例如为会议屏幕、 图像拍摄设备或其它设备) 的方向, 而基于 该参照物和会场终端的方位关系, 也就相当于获得了可移动音频拾取设备当 前相对于会场终端的方向。 会场终端还可获取可移动音频拾取设备当前的位 置。
可以理解的是, 本实施例的会场终端可采用如上述实施例中第一会场终 端 310获取可移动音频拾取设备当前相对于其的方向的类似方式, 来获取可 移动音频拾取设备当前相对于会场终端的方向, 此处不再赘述。
402、 会场终端生成接收到的音频信号对应的多声道音频信号(该多声道 为至少两个声道); 并根据上述可移动音频拾取设备当前相对于会场终端的方 向, 调整该多声道音频信号中至少一个声道音频信号的延迟、 相位和 /或信号 强度, 以使该调整后的多声道音频信号播放时所呈现出的声音方向与该可移 动音频拾取设备当前相对于该会场终端方向相匹配;
403、 会场终端发送上述调整后的多声道音频信号。
此外, 上述会场终端还可接收图像拍摄设备(若存在)针对包括该可移 动音频拾取设备当前所在区域所拍摄的图像信号, 并发送该图像信号。 相应 的, 会议服务器(例如 MCU )可接收该会场终端发送的调整后的多声道音频 信号(和该图像信号), 对其执行混音等处理后向其它会场终端转发, 而其它 会场终端则可以接收并播放该调整后的多声道音频信号 (和对应的图像信 号), 以获得声像匹配效果。
由上可见, 本实施例中的会场终端接收可移动音频拾取设备所拾取的音 频信号, 并获取该可移动音频拾取设备当前相对于该会场终端的方向; 生成 该音频信号对应的多声道音频信号, 根据该可移动音频拾取设备当前相对于 该会场终端的方向, 调整该多声道音频信号中至少一个声道音频信号的延迟、 相位和 /或信号强度, 以使该调整后的多声道音频信号播放时所呈现出的声音 方向与该可移动音频拾取设备当前相对于该会场终端方向相匹配。 由于会场 根据该可移动音频拾取设备当前相对于该会场终端的方向, 调整该多声道音 频信号中至少一个声道音频信号的延迟、 相位和 /或信号强度, 以使该调整后 的多声道音频信号播放时所呈现出的声音方向与该可移动音频拾取设备当前 相对于该会场终端方向相匹配, 这就为其它的会场终端在接收到该调整后的 多声道音频信号后, 能够以声像匹配的效果来播放对应的图像信号和该调整 后的多声道音频信号奠定了基础, 也就有利于实现视讯会议系统部署可移动 音频拾取设备场景下的 "听音辨位"功能。
为便于更好的理解和实施本发明实施例的方案, 下面以将会场终端划分 为若干个模块, 各个模块相互配合实现音频信号处理的场景为例, 进行具体 而详细的介绍。 其中, 本实施例中以在视讯会议系统中部署的可移动音频拾 取设备为无线麦克风的应用场景为例, 当然, 部署其它类型的可移动音频拾 取设备的应用场景与之类似。
图 5〜图 7中示出了三种举例的实施方式, 可以理解, 会场终端还可能采 用其它的模块划分方式来进行音频信号的处理。
参见图 5 ,图 5中通过在会场终端增加用于接收无线麦克风拾取的音频信 号的接收模块的个数, 来达到识别无线麦克风当前位置的目的。
其中, 根据无线麦克风当前位置定位精度的需求不同, 会场终端中的接 收模块的个数大于等于 2个。
音频信号处理流程可如图 5所示, 其中, 实箭头线为数据流方向, 虚线 箭头线为控制流方向, 后续实施例不再——说明。
501、 无线麦克风向会场终端发送音频拾取模块拾取的音频信号; 其中, 图 5 中的会场终端可包括: 方位识别模块、 调整模块、 编码发送 模块和多个接收模块。
502、 会场终端中部署的多个接收模块分别接收无线麦克风发送的音频信 号, 该多个接收模块分别将接收到的该音频信号发送给方位识别模块进行位 置分析;
503、 方位识别模块利用多个接收模块信号之间的时间差、相位差和 /或强 度差等信息, 计算出无线麦克风当前相对于会场终端的方向, 例如计算出的 方向为相对于会场终端如靠左、 居中或靠右等;
方位识别模块将定位出的无线麦克风当前相对于会场终端的方向 (可看 成音源方向 )信息发给调整模块。
方位识别模块还可根据信噪比、 音量、 连续性等参数, 从接收到的 N路 音频信号中选择其中的 1路(例如选出音频信号质量较优的一路)发送给调 整模块。
504、 调整模块生成接收到的音频信号对应的多声道音频信号(该多声道 包括至少两个声道); 根据无线麦克风当前相对于会场终端的方向, 调整该多 声道音频信号中至少一个声道音频信号的延迟、 相位和 /或信号强度, 以使该 调整后的多声道音频信号播放时所呈现出的声音方向与该无线麦克风当前相 对于该会场终端方向相匹配; 并将调整后的多声道音频信号送给编码发送模 块。
505、 编码发送模块将多声道音频信号进行编码并发送。
此外, 图 5所示会场终端还可接收图像拍摄设备(若存在)针对包括该 无线麦克风当前的位置在内的区域所拍摄的图像信号, 并发送该图像信号。 相应的, 会议服务器(例如 MCU )接收该会场终端发送的调整后的多声道音 频信号(和该图像信号), 对其执行混音等处理后向其它会场终端转发, 而其 它会场终端则可以接收并播放该调整后的多声道音频信号 (和对应的图像信 号), 以获得声像匹配效果。
参见图 6, 该图中通过在无线麦克风中增加用于发送位置识别信息(该位 置识别信息是能够用以识别可移动音频拾取设备当前的位置的信息) 的位置 识别信息发送模块, 在会场终端中增加方位识别模块, 以达到识别无线麦克 风当前位置的目的。
其音频信号处理流程可如图 6所示, 可包括:
601、 无线麦克风向会场终端发送拾取模块拾取的音频信号;
602、 无线麦克风中部署的位置识别信息发送模块向会场终端发送位置识 别信息;
其中, 图 6所示会场终端可包括: 接收模块、 方位识别模块、 调整模块 和编码发送模块。
603、 会场终端中的接收模块接收无线麦克风发送的音频信号, 将接收到 的音频信号发送给调整模块;
604、 方位识别模块接收无线麦克风发送的位置识别信息, 根据接收到的 该位置识别信号判断出该无线麦克风当前相对于会场终端的方向, 并将该无 线麦克风当前相对于会场终端的方向信息发给调整模块, 作为调整模块调整 的依据;
本步骤中, 方位识别模块位置识别的方式包括但不限于如下两种方式: 红外图像识别法: 在移动麦克风上增加红外信号发射模块(即位置识别 信息发送模块), 在会场终端配备红外摄像机。 方位识别模块通过红外摄像机 拍摄到的图像, 利用图像识别技术分析移动麦克风相对于会场终端的方向。
红外信号定位法: 在移动麦克风上增加红外信号发射模块(即位置识别 信息发送模块), 在会场终端增加红外信号接收器, 而方位识别模块利用成熟 的红外信号定位技术, 计算出移动麦克风当前相对于会场终端的方向。
605、 调整模块生成接收到的音频信号对应的多声道音频信号(该多声道 为至少两个声道); 根据无线麦克风当前相对于会场终端的方向, 调整该多声 道音频信号中至少一个声道音频信号的延迟、 相位和 /或信号强度, 以使该调 整后的多声道音频信号播放时所呈现出的声音方向与该无线麦克风当前相对 于该会场终端方向相匹配; 并将调整后的多声道音频信号送给编码发送模块;
606、 编码发送模块将多声道音频信号进行编码并发送。
此外, 图 6所示会场终端还可接收图像拍摄设备(若存在)针对包括该 无线麦克风当前的位置在内的区域所拍摄的图像信号, 并发送该图像信号。 相应的, 会议服务器(例如 MCU )接收该会场终端发送的调整后的多声道音 频信号(和该图像信号), 对其执行混音等处理后向其它会场终端转发, 而其 它会场终端则可以接收并播放该调整后的多声道音频信号 (和对应的图像信 号), 以获得声像匹配效果。
参见图 7, 图 7中通过图像识别方法识别出移动麦克风的位置, 从而指导 本实施例在不需要增加任何硬件设备的条件下, 进行音频信号处理
其音频信号处理流程可如图 7所示, 可包括:
701、 无线麦克风向会场终端发送拾取模块拾取的音频信号;
其中, 图 7所示会场终端可包括: 接收模块、 方位识别模块、 调整模块 和编码发送模块。
702、 会场终端的接收模块接收无线麦克风发送的音频信号, 将接收到的 703、 方位识别模块通过图像识别技术, 分析出当前无线麦克风当前相对 于会场终端的方向, 并将该无线麦克风当前相对于会场终端的方向信息发给 调整模块, 作为调整模块调整的依据。
其中, 图像识别技术是识别图像中目标的一种技术, 例如比较常见的人 脸识别就是图像识别技术的一种, 此处不再详述。
704、 调整模块生成接收到的音频信号对应的多声道音频信号(该多声道 为至少两个声道); 根据无线麦克风当前相对于会场终端的方向, 调整该多声 道音频信号中至少一个声道音频信号的延迟、 相位和 /或信号强度, 以使该调 整后的多声道音频信号播放时所呈现出的声音方向与该无线麦克风当前相对 于该会场终端方向相匹配; 并将调整后的多声道音频信号送给编码发送模块。
705、 编码发送模块将多声道音频信号进行编码并发送。
此外, 图 7所示会场终端还可接收图像拍摄设备(若存在)针对包括该 无线麦克风当前的位置在内的区域所拍摄的图像信号, 并发送该图像信号。 相应的, 会议服务器(例如 MCU )接收该会场终端发送的调整后的音频信号 (和该图像信号), 对其执行相应处理后向其它会场终端转发, 而其它会场终 端则可以接收并播放该调整后的音频信号(和对应的图像信号), 以获得声像 匹配效果。
由上可见, 本实施例中的会场终端接收例如无线麦克风等可移动音频拾 取设备所拾取的音频信号, 并获取该可移动音频拾取设备当前相对于该会场 终端的方向; 生成该音频信号对应的多声道音频信号, 根据该可移动音频拾 取设备当前相对于该会场终端的方向, 调整该多声道音频信号中至少一个声 道音频信号的延迟、 相位和 /或信号强度, 以使该调整后的多声道音频信号播 放时所呈现出的声音方向与该可移动音频拾取设备当前相对于该会场终端方 向相匹配。 由于会场根据该可移动音频拾取设备当前相对于该会场终端的方 向, 调整该多声道音频信号中至少一个声道音频信号的延迟、 相位和 /或信号 强度, 以使该调整后的多声道音频信号播放时所呈现出的声音方向与该可移 动音频拾取设备当前相对于该会场终端方向相匹配, 这就为其它的会场终端 在接收到该调整后的多声道音频信号后, 能够以声像匹配的效果来播放对应 的图像信号和该调整后的多声道音频信号奠定了基础, 也就有利于实现视讯 会议系统部署可移动音频拾取设备场景下的 "听音辨位"功能。
需要说明的是, 上述实施例中主要是以发送音频信号的会场终端来调整 可移动音频拾取设备所拾取的音频信号, 以使该调整后的音频信号播放时所 呈现出的声音方向与该可移动音频拾取设备当前相对于会场终端方向相匹配 为例进行说明, 当然, 亦可由会议服务器(如 MCU )或者由接收音频信号的 会场终端或其它设备来调整可移动音频拾取设备所拾取的音频信号的延迟和 / 或相位和 /或信号强度等。
下面介绍由会议服务器(如 MCU )或者由接收音频信号的会场终端调整 可移动音频拾取设备所拾取的音频信号的场景。
下面从视讯会议系统的角度进行描述。
本发明一种视讯会议系统的另一个实施例, 参见图 8, 可包括: 第三会场 终端 810、 会议服务器 820和第四会场终端 830。
其中, 第三会场终端 810, 用于接收可移动音频拾取设备所拾取的音频信 号, 并获取该可移动音频拾取设备相对于第三会场终端 810的方向; 接收该 图像拍摄设备针对该可移动音频拾取设备当前所在区域所拍摄的图像信号; 根据该可移动音频拾取设备当前相对于第三会场终端 810 的方向, 生成指示 出该音频信号播放时所呈现声音方向的方向指示信息 (该方向指示信息例如 为方向标识或辅助声像信息), 其中, 该方向指示信息指示出的该音频信号播 放时所要呈现的声音方向与该可移动音频拾取设备当前相对于第三会场终端 810方向相匹配; 发送该图像信号、 音频信号和方向指示信息。
其中, 第三会场终端 810获取可移动音频拾取设备当前相对于第三会场 举例来说, 第三会场终端 810可根据该可移动音频拾取设备当前相对于 第三会场终端 810的方向, 生成指示出该音频信号播放时所呈现声音方向的 方向标识, 并可在用于承载该音频信号的报文的头域中或其它位置添加该方 向标识并发送; 或者, 第三会场终端 810可根据该可移动音频拾取设备当前 相对于第三会场终端 810的方向生成音频信号对应的声相辅助信息 (基于该 声相辅助信息来自调整的音频信号播放时所呈现声音方向与可移动音频拾取 设备当前相对于第三会场终端 810的方向相匹配;), 并在对应该音频信号的待 发送码流中添加该声相辅助信息并发送。
会议服务器 820, 用于接收第三会场终端 810发送的图像信号、音频信号 和方向指示信息; 生成该音频信号对应的多声道音频信号 (该多声道为至少 两个声道); 根据该方向指示信息调整该多声道音频信号中至少一个声道音频 信号的延迟、 相位和 /或信号强度, 以使得该调整后的多声道音频信号播放时 所呈现出的声音方向与该可移动音频拾取设备当前相对于第三会场终端 810 的方向相匹配; 发送该图像信号和调整后的多声道音频信号;
第四会场终端 830,用于接收会议服务器 820发送的图像信号和调整后的 多声道音频信号; 播放该图像信号和调整后的多声道音频信号。
由上可见, 本实施例中的会场终端接收可移动音频拾取设备所拾取的音 频信号, 并获取该可移动音频拾取设备相对于会场终端的方向; 根据该可移 动音频拾取设备当前相对于会场终端的方向, 生成指示出该音频信号播放时 所呈现声音方向的方向指示信息; 发送该音频信号和方向指示信息, 由于该 会场终端生成并发送的方向指示信息指示出的该音频信号播放时所要呈现的 声音方向与该可移动音频拾取设备当前相对于该会场终端方向相匹配; 这就 为会议服务器或其它的会场终端在接收到该音频信号和方向指示信息后, 可 根据该方向指示信息对音频信号进行调整播放, 进而能够以声像匹配的效果 来播放音频信号和对应的图像信号奠定了基础, 也就有利于实现视讯会议系 统部署可移动音频拾取设备场景下的 "听音辨位"功能。
下面再从视讯会议系统的会议服务器的角度进行描述。
本发明会场终端音频信号处理方法的另一个实施例, 可包括: 会议服务 器接收会场终端发送的图像信号、 音频信号和方向指示信息, 其中, 该音频 信号由可移动音频拾取设备拾取, 该方向指示信息根据该可移动音频拾取设 备当前相对于该会场终端的方向生成, 用于指示该音频信号播放时所要呈现 的声音方向; 生成该音频信号对应的多声道音频信号, 该多声道包括至少两 个声道; 根据该方向指示信息调整该多声道音频信号中至少一个声道音频信 号的延迟、 相位和 /或信号强度, 以使得该调整后的多声道音频信号播放时所 呈现出的声音方向与该可移动音频拾取设备当前相对于上述会场终端的方向 相匹配; 发送上述图像信号和调整后的多声道音频信号。
参见图 8-b , 本发明实施提供的一种会议服务器, 可包括: 第二接收单元 821、 第二调整单元 822和第二发送单元 823。
其中, 第二接收单元 821 , 用于接收会场终端发送的图像信号、 音频信号 和方向指示信息, 其中, 该音频信号由可移动音频拾取设备拾取, 该方向指 示信息根据该可移动音频拾取设备当前相对于该会场终端的方向生成, 用于 指示该音频信号播放时所要呈现的声音方向;
第二调整单元 822, 用于生成该音频信号对应的多声道音频信号, 该多声 道包括至少两个声道; 根据该方向指示信息调整该多声道音频信号中至少一 个声道音频信号的延迟、 相位和 /或信号强度, 以使得该调整后的多声道音频 信号播放时所呈现出的声音方向与该可移动音频拾取设备当前相对于上述会 场终端的方向相匹配;
第二发送单元 823 ,用于发送上述图像信号和第二调整单元 822调整后的 多声道音频信号。
可以理解, 该会议服务器还可通过部署其它模块若干个模块, 以实施上 述功能, 此处不再举例。
本发明一种视讯会议系统的再一个实施例, 参见图 9, 可包括: 第五会场 终端 910和第六会场终端 920。
其中, 第五会场终端 910, 用于接收可移动音频拾取设备所拾取的音频信 号, 并获取该可移动音频拾取设备当前相对于第五会场终端的方向; 接收图 像拍摄设备针对该可移动音频拾取设备当前所在区域所拍摄的图像信号; 根 据该可移动音频拾取设备当前相对于第五会场终端 910的方向, 生成用于指 示该音频信号播放时所呈现声音方向的方向指示信息 (该方向指示信息例如 为方向标识或辅助声像信息), 其中, 该方向指示信息指示出的上述音频信号 播放时所呈现的声音方向与该可移动音频拾取设备当前相对于第五会场终端 910方向相匹配; 发送该图像信号、 音频信号和方向指示信息。
举例来说, 第五会场终端 910可根据该可移动音频拾取设备当前相对于 第五会场终端 910的方向, 生成指示出该音频信号播放时所呈现声音方向的 方向标识, 并可在用于承载该音频信号的报文的头域中或其它位置添加该方 向标识并发送; 或者, 第五会场终端 910可根据该可移动音频拾取设备当前 相对于第五会场终端 910的方向生成音频信号对应的声相辅助信息 (基于该 声相辅助信息来自调整的音频信号播放时所呈现声音方向与可移动音频拾取 设备当前相对于第五会场终端 910的方向相匹配;), 并在对应该音频信号的待 发送码流中添加该声相辅助信息并发送。
第六会场终端 920, 用于接收来自第五会场终端 910的图像信号、 音频信 号和该音频信号对应的方向指示信息; 播放该图像信号并根据该方向指示信 息播放该音频信号。
在实际应用中, 若该方向指示信息指示出的音频信号播放时所呈现的声 音方向为左方, 则第六会场终端 920可只在左边喇叭播放该音频信号; 或者 第六会场终端 920 亦可通过多声道播放该音频信号, 但增大左边喇叭的音量 和 /或调低其它喇叭的音量, 或者调整其它喇叭的相位和延迟, 进而使得该音 频信号播放时所呈现的声音方向与该可移动音频拾取设备当前相对于第五会 场终端 910方向相匹配。
由上可见, 本实施例中的第五会场终端 910接收可移动音频拾取设备所 拾取的音频信号, 获取该可移动音频拾取设备相对于会场终端的方向; 根据 该可移动音频拾取设备当前相对于会场终端的方向, 生成指示出该音频信号 播放时所呈现声音方向的方向指示信息; 发送该音频信号和方向指示信息, 由于该第五会场终端 910生成并发送的方向指示信息指示出的该音频信号播 放时所要呈现的声音方向与该可移动音频拾取设备当前相对于该会场终端方 向相匹配; 这就为会议服务器或其它的会场终端在接收到该音频信号和方向 指示信息后, 可根据该方向指示信息对音频信号进行调整播放, 进而能够以 声像匹配的效果来播放音频信号和对应的图像信号奠定了基础, 也就有利于 实现视讯会议系统部署可移动音频拾取设备场景下的 "听音辨位"功能。 下面再从视讯会议系统的发送音频信号的会场终端角度进行描述。
本发明会场终端音频信号处理方法的另一个实施例, 可包括: 会场终端 接收可移动音频拾取设备所拾取的音频信号, 并获取该可移动音频拾取设备 当前相对于上述会场终端的方向; 根据上述可移动音频拾取设备当前相对于 上述会场终端的方向, 生成用于指示上述音频信号播放时所呈现声音方向的 方向指示信息(该方向指示信息例如为方向标识或辅助声像信息), 其中, 该 方向指示信息指示出的上述音频信号播放时所要呈现的声音方向与该可移动 音频拾取设备当前相对于上述会场终端方向相匹配; 发送该音频信号和方向 指示信息。
由上可见, 本实施例中的会场终端接收可移动音频拾取设备所拾取的音 频信号, 并获取该可移动音频拾取设备相对于会场终端的方向; 根据该可移 动音频拾取设备当前相对于会场终端的方向, 生成指示出该音频信号播放时 所呈现声音方向的方向指示信息; 发送该音频信号和方向指示信息, 由于该 会场终端生成并发送的方向指示信息指示出的该音频信号播放时所要呈现的 声音方向与该可移动音频拾取设备当前相对于该会场终端方向相匹配; 这就 为会议服务器或其它的会场终端在接收到该音频信号和方向指示信息后, 可 根据该方向指示信息对音频信号进行调整播放, 进而能够以声像匹配的效果 来播放音频信号和对应的图像信号奠定了基础, 也就有利于实现视讯会议系 统部署可移动音频拾取设备场景下的 "听音辨位"功能。
本发明实施例还提供一种会场终端 1000, 包括: 接收确定单元 1010、 调 整单元 1020和发送单元 1030。
其中, 接收确定单元 1010, 用于接收可移动音频拾取设备所拾取的音频 信号, 并获取该可移动音频拾取设备当前相对于会场终端 1000的方向; 调整单元 1020, 用于生成该音频信号对应的多声道音频信号; 根据该可 移动音频拾取设备当前相对于会场终端 1000的方向调整该多声道音频信号中 至少一个声道音频信号的延迟、 相位和 /或信号强度, 以使该调整后的多声道 音频信号播放时所呈现出的声音方向与该可移动音频拾取设备当前相对于会 场终端 1000的方向相匹配, 得到调整后的多声道音频信号; 发送单元 1030, 发送上述调整单元得到的调整后的多声道音频信号。 在一种应用场景下, 接收确定单元 1010可包括: 第一位置确定子模块和 至少两个接收模块;
其中, 接收模块, 用于接收可移动音频拾取设备所拾取的音频信号; 第一位置确定子模块, 用于上述至少两个接收模块中各接收模块接收到 的音频信号的差异确定该可移动音频拾取设备当前相对于会场终端 1000的方 向;
或者,
接收确定单元 1010可包括: 信息接收模块和第二位置确定子模块 其中, 信息接收模块, 用于接收可移动音频拾取设备所拾取的音频信号 和该可移动音频拾取设备发送的位置识别信息;
第二位置确定子模块, 用于通过上述位置识别信息确定上述可移动音频 拾取设备当前相对于会场终端 1000的方向;
或者,
接收确定单元 1010可包括: 接收模块和图像识别模块。
其中, 接收模块, 用于接收可移动音频拾取设备所拾取的音频信号; 图像识别模块, 用于通过图像识别技术确定该可移动音频拾取设备当前 相对于会场终端 1000的方向。
可以理解的是, 本实施例中的会场终端 1000可如上述方法实施例中的会 场终端, 其各个功能模块的功能可以根据上述实施例中的方法具体实现, 其 由上可见, 本实施例会场终端 1000接收可移动音频拾取设备所拾取的音 频信号, 并获取该可移动音频拾取设备当前相对于该会场终端的方向; 并接 收图像拍摄设备针对该可移动音频拾取设备当前所在区域所拍摄的图像信 号; 生成该音频信号对应的多声道音频信号; 根据该可移动音频拾取设备当 前相对于该会场终端的方向, 调整该多声道音频信号中的至少 1 个声道音频 信号的延迟、 相位和 /或信号强度, 以使该调整后的多声道音频信号播放时所 呈现出的声音方向与该可移动音频拾取设备当前相对于该会场终端的方向相 匹配; 发送该图像信号和调整后的多声道音频信号, 由于该会场终端调整该 多声道音频信号中的至少 1个声道音频信号的延迟、 相位和 /或信号强度, 以 使该调整后的多声道音频信号播放时所呈现出的声音方向与该可移动音频拾 取设备当前相对于该会场终端的方向相匹配, 这就为其它会场终端在接收到 该图像信号和调整的后的多声道音频信号后, 能够以声像匹配的效果来播放 该图像信号和该调整后的音频信号奠定了基础, 也就有利于实现视讯会议系 统部署可移动音频拾取设备场景下的 "听音辨位"功能。
参见图 11、 本发明实施例提供的另一种会场终端 1100, 可包括: 接收确 定单元 1110、 生成单元 1120和发送单元 1130。
其中, 接收确定单元 1110, 用于接收可移动音频拾取设备所拾取的音频 信号, 并获取该可移动音频拾取设备当前相对于会场终端 1100的方向; 生成单元 1120, 用于根据该可移动音频拾取设备当前相对于会场终端 1100的方向, 生成用于指示该音频信号播放时所要呈现的声音方向的方向指 示信息, 其中, 该方向指示信息指示出的音频信号播放时所要呈现的声音方 向与该可移动音频拾取设备当前相对于会场终端 1100方向相匹配;
发送单元 1130,用于发送方向指示信息和接收确定单元 1110接收的音频 信号。
可以理解的是, 本实施例中的会场终端 1100可如上述方法实施例中的会 场终端, 其各个功能模块的功能可以根据上述实施例中的方法具体实现, 其 由上可见, 本实施例会场终端 1100接收可移动音频拾取设备所拾取的音 频信号, 并获取该可移动音频拾取设备相对于会场终端的方向; 根据该可移 动音频拾取设备当前相对于会场终端的方向, 生成指示出该音频信号播放时 所呈现声音方向的方向指示信息; 发送该音频信号和方向指示信息, 由于该 会场终端生成并发送的方向指示信息指示出的该音频信号播放时所要呈现的 声音方向与该可移动音频拾取设备当前相对于该会场终端方向相匹配; 这就 为会议服务器或其它的会场终端在接收到该音频信号和方向指示信息后, 可 根据该方向指示信息对音频信号进行调整播放, 进而能够以声像匹配的效果 来播放音频信号和对应的图像信号奠定了基础, 也就有利于实现视讯会议系 统部署可移动音频拾取设备场景下的 "听音辨位"功能。
需要说明的是, 对于前述的各方法实施例, 为了筒单描述, 故将其都表 述为一系列的动作组合, 但是本领域技术人员应该知悉, 本发明并不受所描 述的动作顺序的限制, 因为依据本发明, 某些步骤可以采用其他顺序或者同 时进行。 其次, 本领域技术人员也应该知悉, 说明书中所描述的实施例均属 于优选实施例, 所涉及的动作和模块并不一定是本发明所必须的。
在上述实施例中, 对各个实施例的描述都各有侧重, 某个实施例中没有 详述的部分, 可以参见其他实施例的相关描述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤 是可以通过程序来指令相关的硬件完成, 上述的程序可以存储于一种计算机 可读存储介质中, 上述提到的存储介质可以是只读存储器, 磁盘或光盘等。
以上对本发明实施例所提供的会场终端音频信号处理及会场终端和视讯 式进行了阐述, 以上实施例的说明只是用于帮助理解本发明的方法及其核心 思想; 同时, 对于本领域的一般技术人员, 依据本发明的思想, 在具体实施 方式及应用范围上均会有改变之处, 综上, 本说明书内容不应理解为对本发 明的限制。

Claims

权利要求书
1、 一种视讯会议系统, 其特征在于, 包括:
第一会场终端和第二会场终端, 所述第一会场终端和所述第二会场终端通 过网络相连接; 所述第一会场终端所在的会场部署有可移动音频拾取设备以及 图像拍摄设备;
其中, 所述第一会场终端, 用于接收所述可移动音频拾取设备所拾取的音 频信号, 并获取所述可移动音频拾取设备当前相对于所述第一会场终端的方向; 接收所述图像拍摄设备针对所述可移动音频拾取设备当前所在区域所拍摄的图 像信号; 生成所述音频信号对应的多声道音频信号, 所述多声道为至少两个声 道; 根据所述可移动音频拾取设备当前相对于所述第一会场终端的方向, 调整 所述多声道音频信号中的至少一个声道音频信号的延迟、 相位和 /或信号强度, 以使该调整后的多声道音频信号播放时所呈现出的声音方向与所述可移动音频 拾取设备当前相对于所述第一会场终端的方向相匹配; 发送所述图像信号和调 整后的多声道音频信号;
所述第二会场终端, 用于接收来自所述第一会场终端的图像信号和调整后 的多声道音频信号; 播放所述图像信号和调整后的多声道音频信号。
2、 根据权利要求 1所述的视讯会议系统, 其特征在于, 包括:
所述第一会场终端生成的多声道音频信号的声道个数与所述第二会场终端 支持的声道个数相等。
3、 一种会场终端音频信号处理方法, 其特征在于, 包括:
会场终端接收可移动音频拾取设备所拾取的音频信号, 并获取所述可移动 音频拾取设备当前相对于所述会场终端的方向;
生成所述音频信号对应的多声道音频信号, 其中, 所述多声道为至少两个 声道;
根据所述可移动音频拾取设备当前相对于所述会场终端的方向, 调整所述 多声道音频信号中至少一个声道音频信号的延迟、 相位和 /或信号强度, 以使该 调整后的多声道音频信号播放时所呈现出的声音方向与所述可移动音频拾取设 备当前相对于所述会场终端方向相匹配;
4、 根据权利要求 3所述的方法, 其特征在于,
所述接收可移动音频拾取设备所拾取的音频信号, 并获取所述可移动音频 拾取设备当前相对于所述会场终端的方向, 包括:
接收所述可移动音频拾取设备所拾取的音频信号, 并通过图像识别技术确 定该可移动音频拾取设备当前相对于所述会场终端的方向;
或者,
通过至少两个音频接收模块接收所述可移动音频拾取设备所拾取的音频信 号; 并通过各音频接收模块接收到的音频信号的差异, 确定所述可移动音频拾 取设备当前相对于所述会场终端的方向;
或者,
接收所述可移动音频拾取设备所拾取的音频信号; 接收所述可移动音频拾 取设备发送的位置识别信息; 并通过所述位置识别信息确定所述可移动音频拾 取设备当前相对于所述会场终端的方向。
5、 根据权利要求 4所述的方法, 其特征在于,
所述各音频接收模块接收到的音频信号的差异, 包括: 各音频接收模块接 收到音频信号的时间差、 相位差、 强度差中的至少一项。
6、 根据权利要求 4所述的方法, 其特征在于, 所述接收所述可移动音频拾 取设备发送的位置识别信息; 并通过所述位置识别信息确定所述可移动音频拾 取设备当前相对于所述会场终端的方向, 包括:
接收所述可移动音频拾取设备发送的红外信号;
使用红外信号图像识别技术分析所述红外信号的发送方向得到所述可移动 音频拾取设备当前相对于所述会场终端的方向; 或者, 使用红外信号定位技术 计算所述红外信号的发送方向得到所述可移动音频拾取设备当前相对于所述会 场终端的方向。
7、 一种会场终端, 其特征在于, 包括:
接收确定单元, 用于接收可移动音频拾取设备所拾取的音频信号, 并获取 所述可移动音频拾取设备当前相对于所述会场终端的方向; 调整单元, 用于生成所述音频信号对应的多声道音频信号; 根据所述可移 动音频拾取设备当前相对于所述会场终端的方向调整所述多声道音频信号中至 少一个声道音频信号的延迟、 相位和 /或信号强度, 以使该调整后的多声道音频 信号播放时所呈现出的声音方向与该可移动音频拾取设备当前相对于所述会场 终端的方向相匹配;
发送单元, 用于发送所述调整单元调整后的多声道音频信号。
8、 根据权利要求 7所述的会场终端, 其特征在于,
所述接收确定单元包括: 第一位置确定子模块和至少两个接收模块; 所述至少两个接收模块, 用于分别接收所述会场终端所在会场的可移动音 频拾取设备所拾取的音频信号;
第一位置确定子模块, 用于通过所述至少两个接收模块中各接收模块接收 到的音频信号的差异确定所述可移动音频拾取设备当前相对于所述会场终端的 方向;
或者,
所述接收确定单元包括: 信息接收模块和第二位置确定子模块;
其中, 所述信息接收模块, 用于接收所述会场终端所在会场的可移动音频 拾取设备所拾取的音频信号和该可移动音频拾取设备发送的位置识别信息; 所述第二位置确定子模块, 用于通过所述位置识别信息确定所述可移动音 频拾取设备当前相对于所述会场终端的方向;
或者,
所述接收确定单元包括: 接收模块和图像识别模块;
其中, 所述接收模块, 用于接收所述会场终端所在会场的可移动音频拾取 设备所拾取的音频信号;
所述图像识别模块, 用于通过图像识别技术确定所述可移动音频拾取设备 当前相对于所述会场终端的方向。
9、 一种视讯会议系统, 其特征在于, 包括:
第三会场终端、 第四会场终端以及会议服务器, 其中, 所述第三会场终端 和所述第四会场终端通过网络与所述会议服务器相连接, 所述第三会场终端所 在的会场部署有可移动音频拾取设备以及图像拍摄设备;
所述第三会场终端, 用于接收所述可移动音频拾取设备所拾取的音频信号, 并获取所述可移动音频拾取设备相对于第三会场终端的方向; 接收所述图像拍 摄设备针对所述可移动音频拾取设备当前所在区域所拍摄的图像信号; 根据所 述可移动音频拾取设备当前相对于第三会场终端的方向, 生成指示出所述音频 信号播放时所呈现声音方向的方向指示信息, 其中, 所述方向指示信息指示出 的所述音频信号播放时所要呈现的声音方向与所述可移动音频拾取设备当前相 对于第三会场终端的方向相匹配; 发送所述图像信号、 音频信号和方向指示信 息;
所述会议服务器, 用于接收所述第三会场终端发送的图像信号、 音频信号 和方向指示信息; 生成所述音频信号对应的多声道音频信号, 所述多声道为至 少两个声道; 根据所述方向指示信息调整所述多声道音频信号中至少一个声道 音频信号的延迟、 相位和 /或信号强度, 以使得该调整后的多声道音频信号播放 时所呈现出的声音方向与该可移动音频拾取设备当前相对于第三会场终端的方 向相匹配; 发送所述图像信号和调整后的多声道音频信号;
所述第四会场终端, 用于接收所述会议服务器发送的图像信号和调整后的 多声道音频信号; 播放该图像信号和调整后的多声道音频信号。
10、 一种视讯会议系统, 其特征在于, 包括:
第五会场终端和第六会场终端, 所述第五会场终端和所述第六会场终端通 过网络相连接; 所述第五会场终端所在的会场部署有可移动音频拾取设备以及 图像拍摄设备;
第五会场终端, 用于接收可移动音频拾取设备所拾取的音频信号, 并获取 该可移动音频拾取设备当前相对于第五会场终端的方向; 接收图像拍摄设备针 对该可移动音频拾取设备当前所在区域所拍摄的图像信号; 根据该可移动音频 拾取设备当前相对于第五会场终端的方向, 生成用于指示所述音频信号播放时 所呈现声音方向的方向指示信息, 其中, 所述方向指示信息指示出的所述音频 信号播放时所呈现的声音方向与所述可移动音频拾取设备当前相对于第五会场 终端的方向相匹配; 发送所述图像信号、 音频信号和方向指示信息; 第六会场终端, 用于接收来自第五会场终端的图像信号、 音频信号和该音 频信号对应的方向指示信息; 播放该图像信号并根据所述方向指示信息播放该 音频信号。
11、 一种会场终端音频信号处理方法, 其特征在于, 包括:
会场终端接收可移动音频拾取设备所拾取的音频信号, 并获取该可移动音 频拾取设备当前相对于所述会场终端的方向;
根据所述可移动音频拾取设备当前相对于所述会场终端的方向, 生成用于 指示所述音频信号播放时所呈现声音方向的方向指示信息, 其中, 所述方向指 示信息指示出的所述音频信号播放时所要呈现的声音方向与所述可移动音频拾 取设备当前相对于所述会场终端方向相匹配;
发送所述音频信号和方向指示信息。
12、 一种会场终端, 其特征在于, 包括:
接收确定单元, 用于接收可移动音频拾取设备所拾取的音频信号, 并获取 该可移动音频拾取设备当前相对于所述会场终端的方向;
生成单元, 用于根据所述可移动音频拾取设备当前相对于所述会场终端的 方向, 生成用于指示所述音频信号播放时所要呈现的声音方向的方向指示信息, 其中, 所述方向指示信息指示出的所述音频信号播放时所要呈现的声音方向与 所述可移动音频拾取设备当前相对于所述会场终端方向相匹配;
发送单元, 用于发送所述音频信号和所述方向指示信息。
13、 一种会议服务器, 其特征在于, 包括:
第二接收单元, 用于接收会场终端发送的图像信号、 音频信号和方向指示 信息, 其中, 所述音频信号由可移动音频拾取设备拾取, 所述方向指示信息根 据所述可移动音频拾取设备当前相对于所述会场终端的方向生成, 用于指示所 述音频信号播放时所要呈现的声音方向;
第二调整单元, 用于生成所述音频信号对应的多声道音频信号, 所述多声 道包括至少两个声道; 根据该方向指示信息调整该多声道音频信号中至少一个 声道音频信号的延迟、 相位和 /或信号强度, 以使得该调整后的多声道音频信号 播放时所呈现出的声音方向与该可移动音频拾取设备当前相对于所述会场终端 的方向相匹配;
第二发送单元, 用于发送所述图像信号和第二调整单元调整后的多声道音 频信号。
PCT/CN2012/074534 2011-04-22 2012-04-23 会场终端音频信号处理方法及会场终端和视讯会议系统 WO2012142975A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN 201110101877 CN102186049B (zh) 2011-04-22 2011-04-22 会场终端音频信号处理方法及会场终端和视讯会议系统
CN201110101877.6 2011-04-22

Publications (1)

Publication Number Publication Date
WO2012142975A1 true WO2012142975A1 (zh) 2012-10-26

Family

ID=44572110

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/074534 WO2012142975A1 (zh) 2011-04-22 2012-04-23 会场终端音频信号处理方法及会场终端和视讯会议系统

Country Status (2)

Country Link
CN (1) CN102186049B (zh)
WO (1) WO2012142975A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015067072A1 (zh) * 2013-11-05 2015-05-14 华为终端有限公司 一种切换播放设备的方法及移动终端
US11308971B2 (en) 2020-07-15 2022-04-19 Bank Of America Corporation Intelligent noise cancellation system for video conference calls in telepresence rooms

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102186049B (zh) * 2011-04-22 2013-03-20 华为终端有限公司 会场终端音频信号处理方法及会场终端和视讯会议系统
CN102724604B (zh) * 2012-06-06 2014-11-26 北京中自投资管理有限公司 一种视频会议的声音处理方法
CN103596116B (zh) * 2012-08-15 2015-06-03 华平信息技术股份有限公司 一种视频会议系统中自动调节实现立体声效果的方法
CN103646314B (zh) * 2013-12-13 2016-07-06 国家电网公司 基于web网站的班组安全活动管理系统管控方法
CN104093121B (zh) * 2014-07-18 2018-06-19 北京智谷睿拓技术服务有限公司 相对方位确定方法及装置
WO2017004831A1 (zh) * 2015-07-09 2017-01-12 华为技术有限公司 一种实现多媒体会议的方法、装置和系统
CN105898666A (zh) * 2016-06-23 2016-08-24 乐视控股(北京)有限公司 声道数据匹配方法及装置
CN106851035A (zh) * 2017-01-19 2017-06-13 努比亚技术有限公司 声音处理装置及方法
CN108881795A (zh) * 2017-12-12 2018-11-23 北京视联动力国际信息技术有限公司 一种基于摄像头的录像方法和装置
CN111145793B (zh) * 2018-11-02 2022-04-26 北京微播视界科技有限公司 音频处理方法和装置
CN110996238B (zh) * 2019-12-17 2022-02-01 杨伟锋 双耳同步信号处理助听系统及方法
CN112788489B (zh) * 2021-01-28 2023-02-03 维沃移动通信有限公司 控制方法、装置和电子设备
CN115631758B (zh) * 2022-12-21 2023-03-31 无锡沐创集成电路设计有限公司 音频信号处理方法、装置、设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09307870A (ja) * 1996-05-16 1997-11-28 Nec Corp テレビ会議システムにおけるカメラ自動方向制御装置
JP2007274462A (ja) * 2006-03-31 2007-10-18 Yamaha Corp テレビ会議装置、テレビ会議システム
CN101350931A (zh) * 2008-08-27 2009-01-21 深圳华为通信技术有限公司 音频信号的生成、播放方法及装置、处理系统
CN101384105A (zh) * 2008-10-27 2009-03-11 深圳华为通信技术有限公司 三维声音重现的方法、装置及系统
CN102186049A (zh) * 2011-04-22 2011-09-14 华为终端有限公司 会场终端音频信号处理方法及会场终端和视讯会议系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005110103A (ja) * 2003-10-01 2005-04-21 Kyushu Electronics Systems Inc テレビ会議における音声の定位方法
US7612793B2 (en) * 2005-09-07 2009-11-03 Polycom, Inc. Spatially correlated audio in multipoint videoconferencing
CN100556151C (zh) * 2006-12-30 2009-10-28 华为技术有限公司 一种视频终端以及一种音频码流处理方法
CN101132516B (zh) * 2007-09-28 2010-07-28 华为终端有限公司 一种视频通讯的方法、系统及用于视频通讯的装置
CN101442654B (zh) * 2008-12-26 2012-05-23 华为终端有限公司 视频通信中视频对象切换的方法、装置及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09307870A (ja) * 1996-05-16 1997-11-28 Nec Corp テレビ会議システムにおけるカメラ自動方向制御装置
JP2007274462A (ja) * 2006-03-31 2007-10-18 Yamaha Corp テレビ会議装置、テレビ会議システム
CN101350931A (zh) * 2008-08-27 2009-01-21 深圳华为通信技术有限公司 音频信号的生成、播放方法及装置、处理系统
CN101384105A (zh) * 2008-10-27 2009-03-11 深圳华为通信技术有限公司 三维声音重现的方法、装置及系统
CN102186049A (zh) * 2011-04-22 2011-09-14 华为终端有限公司 会场终端音频信号处理方法及会场终端和视讯会议系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015067072A1 (zh) * 2013-11-05 2015-05-14 华为终端有限公司 一种切换播放设备的方法及移动终端
US11308971B2 (en) 2020-07-15 2022-04-19 Bank Of America Corporation Intelligent noise cancellation system for video conference calls in telepresence rooms

Also Published As

Publication number Publication date
CN102186049B (zh) 2013-03-20
CN102186049A (zh) 2011-09-14

Similar Documents

Publication Publication Date Title
WO2012142975A1 (zh) 会场终端音频信号处理方法及会场终端和视讯会议系统
US11991315B2 (en) Audio conferencing using a distributed array of smartphones
US9113034B2 (en) Method and apparatus for processing audio in video communication
US8705778B2 (en) Method and apparatus for generating and playing audio signals, and system for processing audio signals
US20050280701A1 (en) Method and system for associating positional audio to positional video
US8115799B2 (en) Method and apparatus for obtaining acoustic source location information and a multimedia communication system
CN1984310B (zh) 再现运动图像的方法和通信装置
EP2352290B1 (en) Method and apparatus for matching audio and video signals during a videoconference
US20090110212A1 (en) Audio Transmission System and Communication Conference Device
US8390665B2 (en) Apparatus, system and method for video call
US9025002B2 (en) Method and apparatus for playing audio of attendant at remote end and remote video conference system
US10104490B2 (en) Optimizing the performance of an audio playback system with a linked audio/video feed
WO2010022658A1 (zh) 多视点媒体内容的发送和播放方法、装置及系统
CN105959614A (zh) 一种视频会议的处理方法及系统
JP2007274462A (ja) テレビ会議装置、テレビ会議システム
JP5120020B2 (ja) 画像付音声通信システム、画像付音声通信方法およびプログラム
TWI783344B (zh) 聲源追蹤系統及其方法
CN113556503A (zh) 会议系统、远程会议平台及音频处理方法
TWI774490B (zh) 通訊終端、通訊系統和音訊資訊處理方法
EP4300918A1 (en) A method for managing sound in a virtual conferencing system, a related system, a related acoustic management module, a related client device
WO2018113083A1 (zh) 语音获取方法、设备及系统
CN115002401B (zh) 一种信息处理方法、电子设备、会议系统及介质
US20230029845A1 (en) Communication terminal, communication system and audio information processing method
CN117527768A (zh) 对发言人追踪拍摄的音视频选择方法及其系统
JPWO2007122729A1 (ja) 通信システム、通信装置及び音源方向特定装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12774556

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12774556

Country of ref document: EP

Kind code of ref document: A1