WO2011120407A1 - Realization method and apparatus for video communication - Google Patents

Realization method and apparatus for video communication Download PDF

Info

Publication number
WO2011120407A1
WO2011120407A1 PCT/CN2011/072198 CN2011072198W WO2011120407A1 WO 2011120407 A1 WO2011120407 A1 WO 2011120407A1 CN 2011072198 W CN2011072198 W CN 2011072198W WO 2011120407 A1 WO2011120407 A1 WO 2011120407A1
Authority
WO
WIPO (PCT)
Prior art keywords
speaker
remote user
user
remote
playback
Prior art date
Application number
PCT/CN2011/072198
Other languages
French (fr)
Chinese (zh)
Inventor
岳中辉
Original Assignee
华为终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为终端有限公司 filed Critical 华为终端有限公司
Publication of WO2011120407A1 publication Critical patent/WO2011120407A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the present invention relates to the field of communications, and in particular, to a method and an apparatus for implementing video communication. Background technique
  • the video conferencing service uses multimedia communication technology to hold conferences using audio and video input and output devices and communication networks, and can simultaneously realize image, voice, and data interaction between two or more places.
  • the method for implementing the video communication provided by the prior art is: receiving image and sound data sent by the video conference terminal of another conference site that communicates with the video conference terminal of the conference site, and adopting a two-channel stereo coding and decoding scheme for the voice data.
  • This site obtain the sound data of the left channel sent by another site, and play it out from the speaker on the left side of the site to obtain the sound data of the right channel sent by another site, and from the right side of the venue.
  • the speakers are played out.
  • the prior art solution uses two-channel stereo codec to process the sound data, and the sound picked up by the left channel is transmitted from the left speaker, and the sound picked up by the right channel is transmitted from the right speaker to form a two-channel listening area. .
  • the central sound image of the two channels is unstable, sometimes it is left or right, and the gap between the two images is large, so that the user can only distinguish the left, middle and right directions, and the sound orientation is difficult to be accurate and fine.
  • Embodiments of the present invention provide a method and an apparatus for implementing video communication, which can enable a local user in a video communication to hear the location of a remote user's voice and a remote user seen by a local user.
  • the orientation of the image is basically the same, enhancing the user's sense of presence.
  • An embodiment of the present invention provides a method for implementing video communication, where the method includes:
  • the local device After the local device establishes a connection with the remote device, obtaining the location information of the remote user's head; determining, according to the location information of the remote user, the speaker playback mode corresponding to the remote user; , according to the speaker's corresponding speaker playback mode for playback.
  • the embodiment of the present invention further provides an apparatus for implementing video communication, where the apparatus includes: an acquiring unit, configured to acquire, after the local device establishes a connection with the remote device, the head position information of the remote user;
  • a playback control unit configured to determine, according to the location information of the remote user, a speaker playback mode corresponding to the remote user; when the remote user speaks, perform playback according to a speaker playback mode corresponding to the speaker.
  • the present invention further provides a system for implementing video communication, the system comprising: a remote device, a local device, and a media server;
  • the remote device is configured to collect video and audio data of the remote user and send the data to the media server;
  • the local device is configured to: after the local user establishes a connection with the remote user, determine, according to the acquired location information of the remote user, a speaker playing mode of the remote user; when the remote user speaks, according to the speaker The speaker playback mode is used for playback.
  • the present invention further provides a video communication system, the system comprising: a remote device, a local device, and a multipoint control unit media server;
  • the remote device is configured to collect video and audio data of the remote user and send the data to the media server.
  • the media server is used to exchange video and audio data between the remote device and the local device, and between the local user and the remote user.
  • the speaker playback mode corresponding to the remote user is determined according to the acquired location information of the remote user; when the remote user speaks, the playback command is sent to the local device according to the speaker playback mode corresponding to the speaker;
  • the local device is configured to control the local playback device to play according to the playback command.
  • the technical solution of the embodiment of the present invention obtains the header information of the remote user after establishing the connection between the local device and the remote device, and establishes the corresponding information according to the head information of the remote user.
  • the speaker playing mode controls the playing of the speaker by the playing method, so that the local user can hear the position of the far-end user's voice and the orientation of the image of the far-end user seen by the local user substantially consistent, thereby enhancing the user's sense of presence.
  • FIG. 1 is a flowchart of a method for implementing video communication according to the present invention
  • FIG. 2 is a diagram of a flat panel speaker array according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a flat panel speaker array according to an embodiment of the present invention.
  • FIG. 4 is a flowchart of a method for implementing video communication according to an embodiment of the present invention
  • FIG. 5 is a flowchart of a method for implementing video communication according to another embodiment of the present invention
  • FIG. 7 is a structural diagram of an apparatus for implementing video communication according to the present invention
  • FIG. 8 is a structural diagram of a system for implementing video communication according to the present invention.
  • FIG. 9 is a technical scenario diagram of a method according to Embodiment 1 of the present invention.
  • FIG. 10 is a schematic diagram of the upper and lower arrangement of the speaker provided by the present invention.
  • FIG. 11 is a schematic diagram showing the left and right settings of the speaker provided by the present invention. detailed description
  • An embodiment of the present invention provides a method for implementing video communication.
  • the method is as shown in FIG. 1 and includes the following steps:
  • the location information of the remote user is obtained.
  • the video communication device of the local site and the video communication device of the remote site establish a connection through the network.
  • the local site is referred to as "the local end" and the remote site is referred to as the "remote”.
  • the method for obtaining the location information of the remote user's head may be obtained by using an image processing method, such as: face recognition technology, to obtain the location information of the remote user's head; or manually obtaining the head of the remote user.
  • Part location information that is, by assigning a fixed location to the far-end participant, and thus the area information of the head position itself is determined.
  • the remote user speaks, play the sound according to the speaker playing mode corresponding to the speaker.
  • the foregoing specific method for determining the speaking of the remote user may be performed by, for example, using a face recognition technology to determine a speaker in the remote user for the image of the remote user, or by using a media server.
  • the Multipoint Control Unit (MCU) is used as an example to determine the speaker of the remote user through the audio stream transmitted by the remote microphone.
  • the specific method for the media server to determine the speaker of the remote user by using the audio stream transmitted by the remote microphone may be as follows:
  • the remote user is three people, for example, the number of users in the actual situation may also be other numbers.
  • the remote site sets up a microphone for each of the 3 participants, for example, assigning a microphone 1 to user A, assigning a microphone 2 to user B, and assigning a microphone 3 to user C; if the media server receives the microphone 1
  • the audio stream is transmitted, it is confirmed that the user A speaks.
  • the media server receives the code stream of the microphone 2 it confirms that the user B speaks
  • the media server receives the code stream of the microphone 3 it confirms that the user C speaks and passes.
  • the correspondence between the microphone and the participant determines the speaker of the speech.
  • the manner of confirming the user's speech in the above example is only an example for implementing the present invention. In practical applications, the present invention does not limit the specific method of confirming the user's speech, as long as it can confirm the user's speech.
  • the method for performing the sound reproduction according to the speaker playing mode corresponding to the speaker may be: the local device controls the sounding device corresponding to the speaker to play according to the speaker playing mode corresponding to the speaker; the method may also be The media server sends a playback command to the local device according to the speaker playing mode corresponding to the speaker, and the local device controls the speaker corresponding according to the playback command.
  • the playback device plays the sound.
  • the method for implementing S12 and S13 may specifically be:
  • the speaker in the corresponding flat panel speaker array is confirmed according to the head position information of the remote user, and when the remote user speaks, the speaker corresponding to the speaker is activated to play the sound.
  • the method for implementing the S12 and the S13 may be: displaying the image of the remote user up and down, and calculating the vertical distance from the center of the remote user's head position to the center of the display image. Calculating a ratio of the vertical distance to the total height of the displayed image;
  • the output effect of the speaker is that the sound is output from the middle direction of the up and down speakers
  • the difference between the upper and lower speaker volume when the difference between the upper and lower speaker volume is greater than or equal to 15 dB, the sound heard by the user is output from the upper speaker; when the upper and lower speaker volume difference is less than or equal to -15 dB, the lower speaker volume is greater than the above
  • the speaker when the speaker is 15dB, the sound heard by the user is output from the lower speaker; when the difference between the upper and lower speaker volume is between (-15 ⁇ +15), the heard sound is output from a certain height in the middle of the upper and lower speakers.
  • the position corresponding to the common output of the upper and lower speakers can be equivalent to a virtual sound source.
  • the difference between the volume of the upper and lower speakers and the position of the virtual sound source can be roughly evaluated by the following formula:
  • the parameter X in the formula (1) indicates the number of dBs that the virtual sound source needs to adjust to deviate from the two equal parts of the two speakers.
  • the value is related to the height of the display device and the distance between the user and the display device. It is difficult to give a specific Formula, so only the range can be given for user adjustment, X value range is [0,15dB];
  • the volume of the upper and lower speakers is controlled to 73 dB
  • the volume of the lower speaker is 70 dB.
  • the value of the upper speaker is the reference volume.
  • the reference volume can be set by the user. For the above 63 dB, of course, it can be 53dB, 60 dB, and so on. Of course, the above difference can also be -3 dB.
  • the control method can be: the volume of the upper speaker is controlled to 70 dB, and the volume of the speaker under control is 73 dB.
  • the volume of the upper speaker here is also the reference.
  • the volume, the specific volume value can also be set by the user.
  • the above X is the sound coefficient set by the user.
  • the center and total height of the displayed image have different designs depending on how the image is displayed.
  • the center and the total height of the display image are respectively the center of the projected image and the total height of the projected image; when the image display mode is displayed by the display, the center and the total height of the display image are respectively the display panel.
  • the height of the center and display panel are respectively the center of the center and display panel.
  • the method for implementing S12 and S13 may specifically be:
  • the parameters in Equation 2 are similar to the parameter definitions in Equation 1. For this reason, this example will not be described again. According to the difference, the volume of the left and right speakers is adjusted and then played; the following is an example to illustrate the specific operation of adjusting the playback.
  • the volume of the left speaker is controlled to be 44 dB.
  • the volume of the right speaker is 40 dB, and the value of the left speaker volume is the reference volume.
  • the reference volume can be set by the user, for example, 44 dB above, or 54 dB, 60 dB, etc. Of course, the above difference can also be -4 dB.
  • the control method can be: control the volume of the left speaker to be 40 dB, and control the volume of the right speaker to be 44 dB, where the volume of the left speaker is also the reference.
  • the volume, the specific volume value can also be set by the user.
  • the above X is the sound coefficient set by the user.
  • the method provided by the present invention determines the corresponding speaker playing mode according to the head position information of the remote user.
  • the remote user speaks
  • the speaker corresponding to the speaker is played to play the sound
  • the local user hears the remote user.
  • the orientation of the sound is substantially consistent with the orientation of the image of the far-end user seen by the local user, which enhances the user's sense of presence.
  • Embodiment 1 provides a method for implementing video communication, and the technical scenario is implemented in the local device.
  • the system consists of the media server and the remote device.
  • the specific implementation scenario is shown in Figure 9.
  • the video and audio collection devices A, B, C, D, and E are responsible for collecting remote users A, B, and C, respectively. D and the video and audio data of the local user E), wherein the media server (in FIG.
  • the remote device completes the exchange of video data and audio data between the remote device and the local device, and the remote device collects the remote user's Video data and audio data, and sent to a media server (MCU); wherein the remote device may be one or more; the local user and the remote user perform video communication with the input device through audio and video output, wherein, the audio input
  • the device is a microphone or a microphone array
  • the audio output device is a speaker or a speaker array
  • the video input device is a camera or Array camera
  • the video output device is a display or display arrays.
  • the display device in this embodiment takes a projector as an example, and sets a flat panel speaker array on the projection plane (as shown in FIG. 2 or FIG.
  • the remote device After establishing a connection between the local site and the remote site, the remote device starts the face recognition technology to determine the head position information of each user of A, B, C, and D.
  • the above method for determining the position information of the heads of the eighth, B, C and D heads is only described by taking the face recognition technology as an example. In practical applications, other methods such as manually confirming the A, B, C and D heads can be used.
  • Location information or use other identification technologies eg, iris detection technology, etc.
  • the invention is not limited to determine A, B, C and D The specific method of head position information.
  • the preferred method is to directly collect the participant image information of the remote site through the remote end, and use the face recognition technology to determine the location information of each participant.
  • the specific method for implementing S42 may be: as shown in FIG. 2, the panel speaker array is divided into 36 regions according to the number of speakers, and the head position information of A is determined by the face recognition technology to be located in the region 11 as shown in FIG. Then, it is confirmed that the speaker corresponding to the head of the A is the speaker 11; similarly, the speakers corresponding to the heads of the C and D are respectively: the speakers 13, 15, and 17.
  • the head location information determined by the face recognition technology is located in a plurality of regions as shown in FIG. 2, such as the region 10 of FIG.
  • the speaker corresponding to the A head is the speaker corresponding to all the regions corresponding to the A head position information, for example, the regions 10 and 11 corresponding to the head, and the speaker is determined to be the speaker 10 And 11, when the areas 21, 22, and 23 corresponding to the head, the speakers are determined to be the speakers 21, 22, 23.
  • the above methods for knowing the voice of the remote user are various, and the mouth shape change of the face can be detected by the manual identification method, and the remote user can be detected by the audio collection mode.
  • the foregoing method for controlling the local sound output device may be performed by a local conference device, or may be performed by a media server (in FIG. 9, corresponding to the MCU);
  • the remote device sends the user image information of the remote site to the local conference device through the media server.
  • the local device establishes the correspondence between the participant information of the remote site and the local voice output device. Relationship: When the participant at the remote site speaks, the speaker is determined by the local face recognition, and then the local sound output device is controlled by the local device. In this embodiment, by controlling the remote speaker corresponding The speaker of the local speaker array emits sound to realize the control of the speaker array, so that the orientation of the local user to hear the far-end user's voice is consistent with the orientation of the image of the far-end user seen by the local user, thereby achieving the effect of increasing the user's presence. ;
  • the information of the sound output device of the local conference terminal is determined by the media server, and the information may include: the type, the number, the arrangement manner of the sound output device, and the like, and the user at the remote site is obtained.
  • the head information of the remote user is obtained according to the image information, and the correspondence between the head information of the remote user and the sound output device of the local end is established for the local site.
  • the media server detects the location of the sound source of the remote site sent by the remote site, and then according to the correspondence between the head information of the remote user and the sound output device of the local end. , determine the output of the corresponding speaker in the sound output device of the local end.
  • the corresponding processing and control functions can be implemented in the media server, and the orientation of the local user to hear the voice of the remote user is consistent with the orientation of the image of the remote user seen by the local user, and the presence of the user is increased.
  • the effect of the sense but also reduces the complexity of the local device to achieve this solution.
  • the method provided in this embodiment determines the speaker corresponding to the head position information according to each head position information of A, B, C, and D.
  • the speaker corresponding to the speaker is activated.
  • the playback of the line achieves the purpose that the local user hears the location of the far-end user's voice and the orientation of the image of the far-end user that the local user sees, which increases the user's sense of presence.
  • Another embodiment of the present invention provides a video communication implementation method, which is implemented in the following manner:
  • the method provided in this embodiment is implemented between a system consisting of a local device, a media server, and a remote device, where the media server
  • the video and audio data of the remote device and the local device are exchanged, and the remote device collects video and audio data of the remote user and sends the data to the media server;
  • the local user and the remote user perform video communication through the display device, and the foregoing display device
  • the foregoing display device Can be: CRT display, LCD, plasma display, etc.
  • the upper and lower center positions of the display device are respectively set with a speaker (as shown in FIG. 10).
  • the setting of the speaker can also deviate from the center line position of the display device.
  • the speaker is set to the left position of the display device, and the lower speaker is set to the liquid crystal.
  • the display device of the television has a right-to-right position.
  • the present invention When the present invention is set up and down, it does not limit the left and right positions of the speaker. It is only necessary to ensure that one speaker is arranged above and below the display device;
  • the user has 4 people, respectively set to ABCD, and the local user is set to E; assuming that the ABCD avatars are arranged in the order from top to bottom: ABCD; the avatar position in this embodiment refers to the mouth center position of the avatar;
  • the above method can be as shown in FIG. 5, and includes the following steps:
  • Step 51 After the local site establishes a connection with the remote site, the remote device determines the head location information of the remote users A B C and D according to the face recognition.
  • Step 52 Calculate a vertical distance from the center of the respective head position to the center of the display image (the center of the display device) according to the orientation of the ABCD avatar, and calculate a ratio of the vertical distance to the total height of the display image (ie, the total height of the display image of the display device);
  • Step 53 When the remote user speaks, adjust the volume of the up and down speakers according to the ratio corresponding to the speaker, and play the sound according to the adjusted volume.
  • the volume value of the upper speaker is set to the reference volume value, which may be 40 dB
  • the volume of the upper speaker is controlled to be 40 dB
  • the reference volume value which may be 40 dB
  • the volume of the upper speaker is controlled to be 40 dB
  • it can also be other volume values.
  • the technical effects of the embodiment are described below by using the principle implemented in this embodiment. It is proved by experiments that when the human ear hears two sound source pronunciations (for example, up and down), the actual perceived sound is sent out for one location. We generally call this location a virtual sound source. For example, when the volume of the two sound sources is the same, the synthesized virtual sound source is the center position of the two sound sources. If the sound source is set to the upper and lower settings, the volume of the upper sound source is large, then the synthesized virtual sound is The source is close to the position of the sound source. Similarly, if the volume of the lower sound source is large, the position of the synthesized virtual sound source is close to the position of the lower sound source.
  • the speaker may be in a position to adjust the position of the synthesized virtual sound source by controlling the volume of the upper and lower sound sources (the speaker in this embodiment).
  • the position of the virtual sound source is modulated to the image position of the speaker, the local user hears the orientation of the far-end user's voice substantially consistent with the orientation of the image of the far-end user seen by the local user, thereby increasing the effect of the user's presence.
  • the method provided by the embodiment calculates the ratio of the vertical distance of the avatar to the center of the display image and the total width of the display image according to the head position information of the ABC and the D, and controls the volume of the upper and lower speakers according to the ratio, thereby performing playback.
  • the purpose of the local user to hear the sound of the remote user is consistent with the orientation of the image of the remote user seen by the local user, which increases the user's sense of presence.
  • the avatar when the speaker of the display device is horizontally set, the avatar may be displayed horizontally, and the ratio is modified to a ratio of the horizontal distance of the avatar to the center of the display image to the total width of the displayed image, and then according to formula 2 Perform the calculation of the volume difference.
  • the horizontal setting speaker may be configured to respectively set a speaker at the left and right center line positions of the display device (as shown in FIG. 11);
  • the setting of the speaker can also be deviated from the center line position of the display device, such as the left speaker is set at the upper position of the display device, and the right speaker is set at the lower position of the display device.
  • the present invention is not limited in the horizontal setting. For the upper and lower specific positions of the speaker, just set a speaker on the left and right of the display device.
  • the synthesized virtual sound source is the center position of the two sound sources. If the sound source is set to the left and right, the volume of the left sound source is large, and the synthesized virtual sound source is close to the left sound source. Similarly, if the volume of the right sound source is large, the synthesized virtual sound source position is close to the position of the lower right sound source.
  • the position of the synthesized virtual sound source may be adjusted by controlling the volume of the left and right sound sources (the speaker in this embodiment).
  • the position of the virtual sound source is modulated to the image position of the speaker, the local user hears the orientation of the far-end user's voice substantially consistent with the orientation of the image of the far-end user seen by the local user, thereby increasing the effect of the user's presence.
  • the present invention provides a further embodiment.
  • the present embodiment is implemented between a system consisting of a local device, a media server, and a remote device.
  • the media server completes the exchange of video and audio data between the remote device and the local device, and the remote device collects the data.
  • the video and audio data of the remote user are sent to the media server; the local user and the remote user perform video communication through projection, and set a flat panel speaker array on the projection plane (as shown in Fig. 2), which assumes that the remote user has 4 people. , respectively, set to A, B, C, D, the remote device assigns microphones 1, 2, 3, 4 to A, B, C, D respectively; the local user is set to E; then the above method can be as shown in Figure 6. Show, including:
  • the remote device uses the face recognition method to determine the head position information of the users A, B, C, and D.
  • the method for implementing the S61 may be specifically as follows:
  • the method for determining the position information of the heads of the eighth, B, C, and D is only described by taking the face recognition method as an example, and other methods, such as manually confirming A, may be used in practical applications.
  • the ergonomic point of view determines the location information of the participants at the venue, and the present invention does not limit the specific method of determining the location information of the user's A, B, C, and D heads.
  • the preferred method is to directly collect the participant image information of the remote site through the remote end, and use the face recognition technology to determine the location information of each participant.
  • the local device determines, according to the head position information of the A, B, C, and D, the positions of the speakers in the flat speaker array corresponding to the heads of the A, B, C, and D respectively;
  • the specific method for implementing S52 may be: as shown in FIG. 2, the panel speaker array is divided into 36 regions according to the number of speakers, and the face recognition information of A is determined by the face recognition method to be located in the region 11 as shown in FIG. Then, it is confirmed that the speaker corresponding to the head of the A is the speaker 11; similarly, the speakers corresponding to the heads of the C and D are respectively: the speakers 13, 15, and 17. In an actual case, it may also occur that the head position information of A is determined by the face recognition method to be located in a plurality of areas as shown in FIG. 2, such as areas 10 and 11 of FIG.
  • the speaker corresponding to the A head is the speaker corresponding to all the regions corresponding to the A head position information, for example, the regions 10 and 11 corresponding to the head, determining that the speaker is the speakers 10 and 11, such as the regions 21 and 22 corresponding to the head. And at 23 o'clock, the speaker is determined to be the speakers 21, 22, 23.
  • the media server determines, according to the audio code stream sent by the microphone 1, that the audio stream sent by the microphone 1 and the information that determines that A is the speaker are sent to the local device.
  • the actual method for implementing S63 may be as follows: Since the remote users have A, B, C, and D, respectively, respectively, the microphones 1, 2, 3, and 4 are allocated, the media server establishes the correspondence between the microphone 1 and the user A, and the same reason is established. Correspondence between the microphone 2 and the user B, the correspondence between the microphone 3 and the user C, and the correspondence between the microphone 4 and the user D, when the media server detects the audio stream sent by the microphone 1, according to the microphone 1 and the user A Corresponding relationship, the user A is determined to speak, and the audio code stream sent by the microphone 1 and the information determining that A is the speaker are sent to the local device.
  • the local device starts A corresponding speaker to play the audio code stream sent by the microphone 1.
  • the steps performed by the foregoing local device may be performed by the media server to control the local device.
  • the local device in the method provided by the embodiment determines the speaker corresponding to the head position information according to the head position information of A, B, C, and D.
  • the media server determines the speaker
  • the local device starts the speaker corresponding to the speaker.
  • the speaker plays the sound, which achieves the purpose that the local user hears the sound of the far-end user and the orientation of the image of the far-end user seen by the local user is substantially the same, which increases the user's sense of presence.
  • the present invention also provides an apparatus for implementing video communication, which is shown in FIG. 7, wherein the dotted line module represents an optional module, and the apparatus specifically includes:
  • the obtaining unit 71 is configured to acquire, after the local user establishes a connection with the remote user, the location information of the remote user.
  • the playing control unit 72 is configured to determine, according to the head position information of the remote user, a speaker playing mode corresponding to the remote user; when the remote user speaks, play the sound according to a speaker playing mode corresponding to the speaker.
  • the sound emission control unit 72 includes: an array module 721, configured to confirm a speaker in the corresponding flat panel speaker array according to the head position information of the remote user ,
  • the playing module 722 when the remote user speaks, starts the speaker corresponding to the speaker to play.
  • the sound emission control unit 72 includes: a height calculation module 723, configured to display an image of the remote user up and down, and calculate a center position of the remote user head to display the image a vertical distance of the center, and calculating a ratio of the vertical distance to the total height of the displayed image;
  • the vertical playback module 724 is configured to adjust the volume of the upper and lower speakers according to the volume difference between the upper and lower speakers; and the method for calculating the difference between the upper speaker and the lower speaker can be referred to the description in Equation 1. .
  • the sound emission control unit 72 includes: a width calculation module 725, configured to display the image of the remote user to the left and right, and calculate the remote user header Calculating the ratio of the horizontal distance to the total width of the displayed image by the horizontal distance from the center of the position to the center of the display image;
  • the horizontal playback module 726 is configured to play the volume adjustment of the left and right speakers according to the volume difference between the left and right speakers; the difference between the volume of the left speaker and the right speaker can be seen in the description in Equation 2.
  • the device may be a separately existing device.
  • the device may also be installed in the local device.
  • the device may also be installed in the media server.
  • the device provided by the present invention determines the corresponding speaker playing method according to the head position information of the remote user.
  • the remote user speaks
  • the speaker corresponding to the speaker is played to play the sound
  • the local user hears the remote user.
  • the orientation of the sound is substantially consistent with the orientation of the image of the far-end user seen by the local user, which increases the user's sense of presence.
  • the present invention also provides a system for implementing video communication.
  • the system is as shown in FIG. 8 and includes: a remote device 81, a local device 82, and a media server 83;
  • the remote device 81 is configured to collect video and audio data of the remote user and send the data to the media server.
  • the media server 83 is configured to exchange video and audio data of the remote device 81 and the local device 82;
  • the local device 82 is configured to: after the local user establishes a connection with the remote user, determine, according to the acquired location information of the remote user, a speaker playing mode corresponding to the remote user; when the remote user speaks, according to the speaker The speaker playback mode is used for playback.
  • the local device 82 in the system provided by the present invention can determine the corresponding speaker playing mode according to the head position information of the remote user.
  • the remote user speaks the speaker corresponding to the speaker is played to play the sound, and the local mode is reached.
  • the present invention also provides another video communication system, the system comprising: a remote device, a local device, and a media server;
  • the remote device is configured to collect video and audio data of the remote user and send the data to the media server.
  • the media server is used to exchange video and audio data between the remote device and the local device.
  • the media server is also used for local users and remote users.
  • the end user determines the speaker playback mode corresponding to the remote user according to the acquired location information of the remote user; when the remote user speaks, sends the speaker to the local device 82 according to the speaker playback mode corresponding to the speaker.
  • the local command is used to control the local playback device to play according to the playback command.
  • the media server in the system provided by the present invention can determine the corresponding speaker playing mode according to the head position information of the remote user.
  • the remote user speaks, the speaker corresponding to the speaker is played to play the sound, and the local user is reached.
  • the purpose of hearing the location of the far-end user's voice is basically the same as the orientation of the image of the far-end user seen by the local user, which increases the user's sense of presence.
  • the technical solution provided by the specific embodiment of the present invention has the same orientation that the local user hears the voice of the remote user and the orientation of the image of the remote user that the local user sees, which increases the user's presence. advantage.

Abstract

Embodiments of the present invention disclose a realization method and apparatus for video communication, and the method and apparatus relate to the communication technology field. The method includes: when a local user has established a connection with a remote user, obtaining header position information of the remote user; determining a loudspeaker playback manner corresponding to said remote user according to the header position information of said remote user; when the remote user is speaking, performing playback according to the loudspeaker playback manner corresponding to the speaker. The method and apparatus above enable the direction of the remote user sound heard by the local user to be consistent with the direction of the remote user image watched by the local user, thus improve the user telepresence.

Description

视频通信的实现方法及装置 本申请要求于 2010 年 3 月 30 日提交中国专利局、 申请号为 201010137021.X、发明名称为"视频通信的实现方法及装置 "的中国专利申请的 优先权, 其全部内容通过引用结合在本申请中。 技术领域  The present invention claims the priority of the Chinese patent application filed on March 30, 2010, the Chinese Patent Application No. 201010137021.X, the name of the invention is "video communication implementation method and device", The entire contents are incorporated herein by reference. Technical field
本发明涉及通信领域, 尤其涉及一种视频通信的实现方法及装置。 背景技术  The present invention relates to the field of communications, and in particular, to a method and an apparatus for implementing video communication. Background technique
视讯会议业务是通过多媒体通信技术手段, 利用音视频输入及输出设备 和通信网络召开会议, 可以同时实现两地或多地之间的图像、 语音、 数据的 交互。 现有技术提供的视频通信的实现方法为: 接收与本会场的视讯会议终 端进行通信的另一会场的视讯会议终端发送过来的图像和声音数据, 对声音 数据采用双声道立体声编解码方案, 在本会场, 获取另一会场发送的左声道 的声音数据, 并从本会场的左侧的音箱播放出来, 获取另一会场发送的右声 道的声音数据, 并从本会场的右侧的音箱播放出来。  The video conferencing service uses multimedia communication technology to hold conferences using audio and video input and output devices and communication networks, and can simultaneously realize image, voice, and data interaction between two or more places. The method for implementing the video communication provided by the prior art is: receiving image and sound data sent by the video conference terminal of another conference site that communicates with the video conference terminal of the conference site, and adopting a two-channel stereo coding and decoding scheme for the voice data. In this site, obtain the sound data of the left channel sent by another site, and play it out from the speaker on the left side of the site to obtain the sound data of the right channel sent by another site, and from the right side of the venue. The speakers are played out.
在实现本发明的过程中, 发明人发现现有技术存在如下问题:  In the process of implementing the present invention, the inventors have found that the prior art has the following problems:
现有技术的方案采用双声道立体声编解码对声音数据进行处理, 则左声 道拾取的声音从左边音箱传出来, 右声道拾取的声音从右边音箱传出来,形成 双声道听音区域。 双声道的中央声像不稳定, 有时会偏左或偏右, 与图像的 差距较大, 使得用户只能大概辨别左, 中, 右三个方位, 声音方位难以做到 准确精细。 发明内容  The prior art solution uses two-channel stereo codec to process the sound data, and the sound picked up by the left channel is transmitted from the left speaker, and the sound picked up by the right channel is transmitted from the right speaker to form a two-channel listening area. . The central sound image of the two channels is unstable, sometimes it is left or right, and the gap between the two images is large, so that the user can only distinguish the left, middle and right directions, and the sound orientation is difficult to be accurate and fine. Summary of the invention
本发明实施例提供一种视频通信的实现方法及装置, 该方法及装置能使 视频通信中本地用户听到远端用户声音的方位和本地用户看到的远端用户的 图像的方位基本保持一致, 增强用户的临场感。 Embodiments of the present invention provide a method and an apparatus for implementing video communication, which can enable a local user in a video communication to hear the location of a remote user's voice and a remote user seen by a local user. The orientation of the image is basically the same, enhancing the user's sense of presence.
本发明实施例提供一种视频通信的实现方法, 所述方法包括:  An embodiment of the present invention provides a method for implementing video communication, where the method includes:
在本地设备与远端设备建立连接后, 获取远端用户的头部位置信息; 根据所述远端用户的头部位置信息确定所述远端用户对应的扬声器播放 方式; 当远端用户发言时, 根据发言者对应的扬声器播放方式进行放音。  After the local device establishes a connection with the remote device, obtaining the location information of the remote user's head; determining, according to the location information of the remote user, the speaker playback mode corresponding to the remote user; , according to the speaker's corresponding speaker playback mode for playback.
本发明实施例还提供一种实现视频通信的装置, 所述装置包括: 获取单元, 用于在本地设备与远端设备建立连接后, 获取远端用户的头 部位置信息;  The embodiment of the present invention further provides an apparatus for implementing video communication, where the apparatus includes: an acquiring unit, configured to acquire, after the local device establishes a connection with the remote device, the head position information of the remote user;
放音控制单元, 用于根据所述远端用户的头部位置信息确定所述远端用 户对应的扬声器播放方式; 当远端用户发言时, 根据发言者对应的扬声器播 放方式进行放音。  And a playback control unit, configured to determine, according to the location information of the remote user, a speaker playback mode corresponding to the remote user; when the remote user speaks, perform playback according to a speaker playback mode corresponding to the speaker.
本发明又提供一种实现视频通信的系统, 该系统包括: 远端设备、 本地 设备和媒体服务器;  The present invention further provides a system for implementing video communication, the system comprising: a remote device, a local device, and a media server;
远端设备, 用于采集远端用户的视频和音频数据, 并发送给媒体服务器; 的交换;  The remote device is configured to collect video and audio data of the remote user and send the data to the media server;
本地设备, 用于在本地用户与远端用户建立连接后, 根据获取的远端用 户的头部位置信息确定所述远端用户对应的扬声器播放方式; 当远端用户发 言时, 根据发言者对应的扬声器播放方式进行放音。  The local device is configured to: after the local user establishes a connection with the remote user, determine, according to the acquired location information of the remote user, a speaker playing mode of the remote user; when the remote user speaks, according to the speaker The speaker playback mode is used for playback.
本发明再提供一种视频通信系统, 该系统包括: 远端设备、 本地设备和 多点控制单元媒体服务器;  The present invention further provides a video communication system, the system comprising: a remote device, a local device, and a multipoint control unit media server;
远端设备, 用于采集远端用户的视频和音频数据, 并发送给媒体服务器; 媒体服务器, 用于完成远端设备与本地设备的视频和音频数据的交换, 及在本地用户与远端用户建立连接后, 根据获取的远端用户的头部位置信息 确定所述远端用户对应的扬声器播放方式; 当远端用户发言时, 根据发言者 对应的扬声器播放方式向本地设备发送放音命令; 本地设备用于根据所述放音命令控制本地放音装置进行放音。 由上述所提供的技术方案可以看出, 本发明实施例的技术方案在本地设 备与远端设备建立连接后, 获取远端用户的头部信息, 并根据远端用户的头 部信息建立其对应的扬声器播放方式, 由该播放方法控制扬声器的播放, 进 而能使本地用户听到远端用户声音的方位和本地用户看到的远端用户的图像 的方位基本保持一致, 增强用户的临场感。 附图说明 The remote device is configured to collect video and audio data of the remote user and send the data to the media server. The media server is used to exchange video and audio data between the remote device and the local device, and between the local user and the remote user. After the connection is established, the speaker playback mode corresponding to the remote user is determined according to the acquired location information of the remote user; when the remote user speaks, the playback command is sent to the local device according to the speaker playback mode corresponding to the speaker; The local device is configured to control the local playback device to play according to the playback command. It can be seen from the technical solution provided by the foregoing that the technical solution of the embodiment of the present invention obtains the header information of the remote user after establishing the connection between the local device and the remote device, and establishes the corresponding information according to the head information of the remote user. The speaker playing mode controls the playing of the speaker by the playing method, so that the local user can hear the position of the far-end user's voice and the orientation of the image of the far-end user seen by the local user substantially consistent, thereby enhancing the user's sense of presence. DRAWINGS
图 1为本发明提供的一种视频通信的实现方法的流程图;  1 is a flowchart of a method for implementing video communication according to the present invention;
图 2为本发明一实施例提供的一种平板扬声器阵列图;  2 is a diagram of a flat panel speaker array according to an embodiment of the present invention;
图 3为本发明一实施例提供的一种平板扬声器阵列图;  FIG. 3 is a schematic diagram of a flat panel speaker array according to an embodiment of the present invention; FIG.
图 4为本发明一实施例提供的一种视频通信的实现方法的流程图; 图 5为本发明另一实施例提供的一种视频通信的实现方法的流程图; 图 6为本发明又一实施例提供的一种视频通信的实现方法的流程图; 图 7为本发明提供的一种视频通信的实现装置的结构图;  FIG. 4 is a flowchart of a method for implementing video communication according to an embodiment of the present invention; FIG. 5 is a flowchart of a method for implementing video communication according to another embodiment of the present invention; A flowchart of a method for implementing video communication provided by an embodiment; FIG. 7 is a structural diagram of an apparatus for implementing video communication according to the present invention;
图 8为本发明提供的一种视频通信的实现系统的结构图;  FIG. 8 is a structural diagram of a system for implementing video communication according to the present invention; FIG.
图 9为本发明实现实施例一所述的方法的技术场景图;  FIG. 9 is a technical scenario diagram of a method according to Embodiment 1 of the present invention;
图 10为本发明提供的扬声器上、 下设置示意图;  10 is a schematic diagram of the upper and lower arrangement of the speaker provided by the present invention;
图 11为本发明提供的扬声器左、 右设置示意图。 具体实施方式  FIG. 11 is a schematic diagram showing the left and right settings of the speaker provided by the present invention. detailed description
本发明实施方式提供了一种视频通信的实现方法, 该方法如图 1 所示, 包括如下步骤:  An embodiment of the present invention provides a method for implementing video communication. The method is as shown in FIG. 1 and includes the following steps:
Sll、在本地设备与远端设备建立连接后,获取远端用户的头部位置信息; 其中, 本地会场的视频通信设备与远端会场的视频通信设备通过网络建 立连接。 本地会场简称为 "本端", 远端会场简称为 "远端"。 上述获取远端用户的头部位置信息的具体方法可以为, 通过图像处理的 方法, 譬如: 人脸识别技术, 来获取远端用户的头部位置信息; 或者通过人 工方法获取远端用户的头部位置信息, 即通过为远端与会者分配固定的位置, 进而其头部位置的区域信息本身就是确定的。 Sll. After the local device establishes a connection with the remote device, the location information of the remote user is obtained. The video communication device of the local site and the video communication device of the remote site establish a connection through the network. The local site is referred to as "the local end" and the remote site is referred to as the "remote". The method for obtaining the location information of the remote user's head may be obtained by using an image processing method, such as: face recognition technology, to obtain the location information of the remote user's head; or manually obtaining the head of the remote user. Part location information, that is, by assigning a fixed location to the far-end participant, and thus the area information of the head position itself is determined.
512、根据所述远端用户的头部位置信息确定所述远端用户对应的扬声器 播放方式;  512. Determine, according to the location information of the remote user, a speaker playing manner corresponding to the remote user.
513、 当远端用户发言时, 根据发言者对应的扬声器播放方式进行放音。 可选的, 上述确定远端用户发言的具体方法可以采用以下方法, 例如, 针对远端用户的图像采用人脸识别技术来确定远端用户中的发言者, 还可以 由媒体服务器(以多点控制单元 ( Multipoint Control Unit , MCU )为例)通 过远端麦克传输来的音频码流来判断远端用户的发言者。  513. When the remote user speaks, play the sound according to the speaker playing mode corresponding to the speaker. Optionally, the foregoing specific method for determining the speaking of the remote user may be performed by, for example, using a face recognition technology to determine a speaker in the remote user for the image of the remote user, or by using a media server. The Multipoint Control Unit (MCU) is used as an example to determine the speaker of the remote user through the audio stream transmitted by the remote microphone.
上述媒体服务器通过远端麦克传输来的音频码流来判断远端用户的发言 者的具体方法可以为: 这里以远端用户为 3人为例, 当然实际情况下用户的 人数也可以为其他的数目, 在用户为 3人时, 远端会场为 3个与会者分别设 置一个麦克, 例如对用户 A分配麦克 1, 对用户 B分配麦克 2, 对用户 C分 配麦克 3; 若媒体服务器接收到麦克 1传送的音频码流时, 则确认用户 A发 言, 同理, 当媒体服务器接收到麦克 2的码流时, 确认用户 B发言, 媒体服 务器接收到麦克 3的码流时, 确认用户 C发言, 通过这种麦克风与与会者的 对应关系, 确定讲话的发言者。  The specific method for the media server to determine the speaker of the remote user by using the audio stream transmitted by the remote microphone may be as follows: Here, the remote user is three people, for example, the number of users in the actual situation may also be other numbers. When the user is 3 people, the remote site sets up a microphone for each of the 3 participants, for example, assigning a microphone 1 to user A, assigning a microphone 2 to user B, and assigning a microphone 3 to user C; if the media server receives the microphone 1 When the audio stream is transmitted, it is confirmed that the user A speaks. Similarly, when the media server receives the code stream of the microphone 2, it confirms that the user B speaks, and when the media server receives the code stream of the microphone 3, it confirms that the user C speaks and passes. The correspondence between the microphone and the participant determines the speaker of the speech.
上述举例中的确认用户发言的方式仅为实现本发明而进行的举例, 在实 际应用中, 本发明并不限制确认用户发言的具体方法, 只要其能够确认用户 发言即可。  The manner of confirming the user's speech in the above example is only an example for implementing the present invention. In practical applications, the present invention does not limit the specific method of confirming the user's speech, as long as it can confirm the user's speech.
可选的, 上述根据发言者对应的扬声器播放方式进行放音实现的方法可 以为, 本地设备根据发言者对应的扬声器播放方式控制该发言者对应的放音 设备进行放音; 该方法还可以为, 媒体服务器根据发言者对应的扬声器播放 方式向本地设备发送放音命令, 本地设备根据该放音命令控制该发言者对应 的放音设备进行放音。 Optionally, the method for performing the sound reproduction according to the speaker playing mode corresponding to the speaker may be: the local device controls the sounding device corresponding to the speaker to play according to the speaker playing mode corresponding to the speaker; the method may also be The media server sends a playback command to the local device according to the speaker playing mode corresponding to the speaker, and the local device controls the speaker corresponding according to the playback command. The playback device plays the sound.
可选的, 当扬声器为平板扬声器阵列时, 实现 S12、 S13的方法具体可以 为:  Optionally, when the speaker is a flat panel speaker array, the method for implementing S12 and S13 may specifically be:
根据该远端用户的头部位置信息确认其对应的该平板扬声器阵列中的扬 声器, 当该远端用户发言时, 启动发言者对应的扬声器进行放音。  The speaker in the corresponding flat panel speaker array is confirmed according to the head position information of the remote user, and when the remote user speaks, the speaker corresponding to the speaker is activated to play the sound.
可选的, 当扬声器为上、 下设置时, 实现 S12、 S 13的方法具体可以为: 将远端用户的图像上下显示, 并计算远端用户头部位置中心到显示图像中心 的垂直距离, 计算出该垂直距离与所述显示图像总高度的比值;  Optionally, when the speaker is set up and down, the method for implementing the S12 and the S13 may be: displaying the image of the remote user up and down, and calculating the vertical distance from the center of the remote user's head position to the center of the display image. Calculating a ratio of the vertical distance to the total height of the displayed image;
当上下扬声器音量差值为 0 时, 使得扬声器的输出效果为声音从上下扬 声器的中间方位输出;  When the upper and lower speaker volume difference is 0, the output effect of the speaker is that the sound is output from the middle direction of the up and down speakers;
根据双耳立体声理论, 当上下扬声器音量的差值大于等于 15dB时, 用户 所听到的声音是从上面扬声器输出; 当上下的扬声器音量差值小于等于 -15dB 时, 即下面的扬声器音量大于上面的扬声器 15dB时, 用户所听到的声音是从 下面扬声器输出; 当上下扬声器音量的差值在 (-15〜+15 之间时, 听到的声音 从上下扬声器的中间的某一高度输出。 其中, 通过上下扬声器的共同的输出 所对应的位置可等效为一个虚拟声源。  According to the binaural stereo theory, when the difference between the upper and lower speaker volume is greater than or equal to 15 dB, the sound heard by the user is output from the upper speaker; when the upper and lower speaker volume difference is less than or equal to -15 dB, the lower speaker volume is greater than the above When the speaker is 15dB, the sound heard by the user is output from the lower speaker; when the difference between the upper and lower speaker volume is between (-15~+15), the heard sound is output from a certain height in the middle of the upper and lower speakers. The position corresponding to the common output of the upper and lower speakers can be equivalent to a virtual sound source.
具体的, 上下扬声器音量的差值与虚拟声源的位置关系可以大致采用以 下公式来评估:  Specifically, the difference between the volume of the upper and lower speakers and the position of the virtual sound source can be roughly evaluated by the following formula:
上扬声器与下扬声器音量的差值 =8X* ( 0.5 - 该垂直距离与该显示图像总 高度的比值) dB (公式 1 );  The difference between the upper speaker and the lower speaker volume = 8X* (0.5 - the ratio of the vertical distance to the total height of the displayed image) dB (Equation 1);
对上述公式的具体解释: 8表示把整个显示设备对应的高度分成 8等分, 虚拟声源落入所分的 8等分的某个区间。 由于人耳的听觉特性, 分的过细人 耳也很难感觉到, 故对应的将整个显示设备对应的高度分为 8等分。 可以理 解, 对于本领域的普通技术人员来说, 可以根据不同显示设备的高度以及声 源特性采用其他的分法;  The specific explanation of the above formula: 8 means that the height corresponding to the entire display device is divided into 8 equal parts, and the virtual sound source falls into a certain interval of 8 points divided. Due to the auditory characteristics of the human ear, it is difficult to feel the subtle human ear, so the corresponding height of the entire display device is divided into 8 equal parts. It will be appreciated that one of ordinary skill in the art can employ other methods depending on the height of the different display devices and the characteristics of the sound source;
公式中: 8 x ( 0.5 - 该垂直距离与该显示图像总高度的比值)为虚拟声源 距离上面扬声器的距离 (其单位表示具体的份数); 例如显示设备总高度In the formula: 8 x ( 0.5 - the ratio of the vertical distance to the total height of the displayed image) is the virtual sound source Distance from the upper speaker (the unit indicates the specific number of copies); for example, the total height of the display device
100cm,人的头部在 75cm处,那么垂直距离为 25cm,显示设备总高度 lOOcmm, 8 x ( 0.5 - 25/100 ) = 8 x 2/8 = 2 , 表示上面的扬声器的音量要比下面的扬 声器大出 2等分。 公式(1 )中的参数 X表示虚拟声源偏离 1等分两个扬声器 所需要调节的 dB数, 该值与显示设备的高度、 以及用户与显示设备的距离有 关, 很难给出个具体的公式, 因此只能给出范围供用户调节, X取值范围为 [0,15dB]; 100cm, the human head is at 75cm, then the vertical distance is 25cm, the total height of the display device is lOOcmm, 8 x (0.5 - 25/100) = 8 x 2/8 = 2, indicating that the volume of the above speaker is lower than the following The speaker is 2 points larger. The parameter X in the formula (1) indicates the number of dBs that the virtual sound source needs to adjust to deviate from the two equal parts of the two speakers. The value is related to the height of the display device and the distance between the user and the display device. It is difficult to give a specific Formula, so only the range can be given for user adjustment, X value range is [0,15dB];
并根据所述差值对上、 下扬声器的音量调整后进行放音; 下面以一个具 体例子来说明调整放音的具体操作。 这里假设上、 下扬声器差值为 3dB, 则 控制上扬声器的音量为 73 dB, 下扬声器的音量为 70 dB, 其中上扬声器音量 的值为基准音量, 该基准音量用户可以自行设定, 如可以为上述的 63 dB, 当 然也可以为 53dB、 60 dB等等。 当然上述差值也可以为 -3 dB, 当为 -3 dB时, 其控制方法可以为, 控制上扬声器的音量为 70 dB , 控制下扬声器的音量为 73dB , 这里的上扬声器的音量也为基准音量, 具体的音量值用户也可以自行 设定。  And adjusting the volume of the upper and lower speakers according to the difference, and then playing the sound; the specific operation of adjusting the sound reproduction is described below with a specific example. It is assumed that the difference between the upper and lower speakers is 3dB, the volume of the upper speaker is controlled to 73 dB, and the volume of the lower speaker is 70 dB. The value of the upper speaker is the reference volume. The reference volume can be set by the user. For the above 63 dB, of course, it can be 53dB, 60 dB, and so on. Of course, the above difference can also be -3 dB. When it is -3 dB, the control method can be: the volume of the upper speaker is controlled to 70 dB, and the volume of the speaker under control is 73 dB. The volume of the upper speaker here is also the reference. The volume, the specific volume value can also be set by the user.
上述 X为用户设定的声音系数。  The above X is the sound coefficient set by the user.
显示图像的中心以及总高度根据图像显示的方式而有不同的设计。 当上 述图像显示方式采用投影时, 显示图像的中心以及总高度分别为投影图像的 中心和投影图像的总高度; 当上述图像显示方式采用显示器显示时, 显示图 像的中心以及总高度分别为显示器面板的中心和显示器面板的高度。  The center and total height of the displayed image have different designs depending on how the image is displayed. When the image display mode adopts projection, the center and the total height of the display image are respectively the center of the projected image and the total height of the projected image; when the image display mode is displayed by the display, the center and the total height of the display image are respectively the display panel. The height of the center and display panel.
可选的, 当本端的扬声器为左、 右设置时, 实现 S12、 S13的方法具体可 以为:  Optionally, when the local speaker is set to the left or the right, the method for implementing S12 and S13 may specifically be:
将远端用户的图像左右显示, 并计算远端用户头部位置信息中心到显示 图像中心的水平距离, 计算出所述水平距离与所述显示图像总宽度的比值; 左扬声器与右扬声器音量的差值 =8 X * ( 0.5 - 所述水平距离与所述显示 图像总宽度的比值 ) dB (公式 2 ); 其中公式 2中的参数与公式 1中的参数定义相似, 对此, 本例不再赘述。 根据所述差值对左、 右扬声器的音量调整后进行放音; 下面以一个实例 来说明调整放音的具体操作, 这里假设左、 右扬声器差值为 4dB, 则控制左 扬声器的音量为 44dB, 右扬声器的音量为 40 dB, 其中左扬声器音量的值为 基准音量, 该基准音量用户可以自行设定, 如可以为上述的 44 dB, 当然也可 以为 54dB、 60 dB等等。 当然上述差值也可以为 -4 dB, 当为 -4 dB时, 其控制 方法可以为, 控制左扬声器的音量为 40 dB, 控制右扬声器的音量为 44dB, 这里的左扬声器的音量也为基准音量, 具体的音量值用户也可以自行设定。 Displaying the image of the remote user to the left and right, and calculating the horizontal distance from the center position information center of the remote user to the center of the display image, and calculating the ratio of the horizontal distance to the total width of the displayed image; the volume of the left speaker and the right speaker Difference = 8 X * (0.5 - the ratio of the horizontal distance to the total width of the displayed image) dB (Equation 2); The parameters in Equation 2 are similar to the parameter definitions in Equation 1. For this reason, this example will not be described again. According to the difference, the volume of the left and right speakers is adjusted and then played; the following is an example to illustrate the specific operation of adjusting the playback. Assuming that the difference between the left and right speakers is 4 dB, the volume of the left speaker is controlled to be 44 dB. The volume of the right speaker is 40 dB, and the value of the left speaker volume is the reference volume. The reference volume can be set by the user, for example, 44 dB above, or 54 dB, 60 dB, etc. Of course, the above difference can also be -4 dB. When it is -4 dB, the control method can be: control the volume of the left speaker to be 40 dB, and control the volume of the right speaker to be 44 dB, where the volume of the left speaker is also the reference. The volume, the specific volume value can also be set by the user.
上述 X为用户设定的声音系数。  The above X is the sound coefficient set by the user.
本发明提供的方法根据远端用户的头部位置信息确定其对应的扬声器播 放方式, 当远端用户发言时, 启动发言者对应的扬声器播放方式进行放音, 达到了本地用户听到远端用户声音的方位和本地用户看到的远端用户的图像 的方位基本保持一致的目的, 增强了用户的临场感。  The method provided by the present invention determines the corresponding speaker playing mode according to the head position information of the remote user. When the remote user speaks, the speaker corresponding to the speaker is played to play the sound, and the local user hears the remote user. The orientation of the sound is substantially consistent with the orientation of the image of the far-end user seen by the local user, which enhances the user's sense of presence.
为了更加明确的说明本发明的实施, 下面通过具体的实施例来进行说明: 实施例一: 本实施例提供一种视频通信的实现方法, 实现的技术场景为, 本发明实施例在本地设备、 媒体服务器、 远端设备组成的系统之间完成(其 具体实现的场景如图 9所示, 其中视音频采集设备 A、 B、 C、 D、 E分别负责 采集远端用户 A、 B、 C、 D和本地用户 E的视音频数据),其中媒体服务器(在 图 9中, 即指的是 MCU )完成远端设备与本地设备的视频数据和音频数据的 交换, 远端设备采集远端用户的视频数据和音频数据, 并发送给媒体服务器 ( MCU ); 其中远端设备可以为一个, 也可以为多个; 本地用户与远端用户通 过音视频输出与输入设备进行视频通信, 其中, 音频输入设备为麦克风或者 麦克风阵列, 音频输出设备为扬声器或者扬声器阵列, 视频输入设备为摄像 头或者摄像头阵列, 视频输出设备为显示器或者显示器阵列。 本实施例中的 显示设备以投影仪为例, 并在投影平面上设置平板扬声器阵列(如图 2或图 3 所示; 其中图 2中序号 1 ~ 36分别表示阵列分配的区域以及对应的扬声器编 号; 图 3 中序号 1 ~ 9分别表示阵列分配的区域以及对应的扬声器编号), 在 图 9中, 假设远端用户有 4人, 分别设定为 A、 B、 C、 D, 本地用户设定为 E; 则上述方法可以如图 4所示, 这里仅以图 2所示的平板扬声器陈列为例来 进行说明, 包括如下步骤: In order to clarify the implementation of the present invention, the following is a description of the specific embodiments: Embodiment 1: This embodiment provides a method for implementing video communication, and the technical scenario is implemented in the local device. The system consists of the media server and the remote device. The specific implementation scenario is shown in Figure 9. The video and audio collection devices A, B, C, D, and E are responsible for collecting remote users A, B, and C, respectively. D and the video and audio data of the local user E), wherein the media server (in FIG. 9, which refers to the MCU) completes the exchange of video data and audio data between the remote device and the local device, and the remote device collects the remote user's Video data and audio data, and sent to a media server (MCU); wherein the remote device may be one or more; the local user and the remote user perform video communication with the input device through audio and video output, wherein, the audio input The device is a microphone or a microphone array, the audio output device is a speaker or a speaker array, and the video input device is a camera or Array camera, the video output device is a display or display arrays. The display device in this embodiment takes a projector as an example, and sets a flat panel speaker array on the projection plane (as shown in FIG. 2 or FIG. 3; wherein numbers 1 to 36 in FIG. 2 respectively indicate an area allocated by the array and corresponding speakers. Edit No. 1 to 9 in Figure 3 indicate the area allocated by the array and the corresponding speaker number. In Figure 9, it is assumed that there are 4 remote users, which are set to A, B, C, D, respectively. The method is as shown in FIG. 4 , and only the flat panel display shown in FIG. 2 is taken as an example, and the following steps are included:
541、 本端会场与远端会场建立连接后, 通过远端设备启动人脸识别技术 确定 A、 B、 C和 D的各用户的头部位置信息;  541. After establishing a connection between the local site and the remote site, the remote device starts the face recognition technology to determine the head position information of each user of A, B, C, and D.
上述确定八、 B、 C和 D头部位置信息的方法仅以人脸识别技术为例来进 行说明, 在实际应用中还可以用其他的方式, 如人工确认 A、 B、 C和 D头部 位置信息或使用其他的识别技术(如: 虹膜检测技术等等), 譬如: 根据人体 工程学的角度来确定,与会者在会场的位置信息,本发明并不局限确定 A、 B、 C和 D头部位置信息的具体方法。  The above method for determining the position information of the heads of the eighth, B, C and D heads is only described by taking the face recognition technology as an example. In practical applications, other methods such as manually confirming the A, B, C and D heads can be used. Location information or use other identification technologies (eg, iris detection technology, etc.), such as: According to the ergonomic point of view, the location information of the participants at the venue, the invention is not limited to determine A, B, C and D The specific method of head position information.
可选的, 在实施本步骤时, 较佳的方式是通过远端直接采集远端会场的 与会者图像信息, 从中利用人脸识别的技术确定各与会者的位置信息。  Optionally, when performing this step, the preferred method is to directly collect the participant image information of the remote site through the remote end, and use the face recognition technology to determine the location information of each participant.
542、 根据 A、 B、 C和 D的各头部位置信息确定该 A、 B、 C和 D的各 头部分别对应的平板扬声器阵列中扬声器的位置;  542. Determine, according to each head position information of A, B, C, and D, a position of a speaker in the flat speaker array corresponding to each of the heads of the A, B, C, and D;
实现 S42的具体方法可以为, 如图 2所示, 根据扬声器的个数将平板扬 声器阵列划分为 36个区域,通过人脸识别技术确定 A的头部位置信息位于如 图 2所示的区域 11, 则确认 A头部对应的扬声器为扬声器 11 ; 同理确定^ C、 D头部对应的扬声器分别为: 扬声器 13、 15、 17。 在实际情况中, 当用户 的头部区域跨多个区域时, 还可以出现通过人脸识别技术确定 A的头部位置 信息位于如图 2所示的多个区域, 例如图 2的区域 10和 11, 或者区域 21、 22和 23, 此时, 确认 A头部对应的扬声器为 A头部位置信息对应的所有区 域对应的扬声器, 例如头部对应的区域 10和 11时, 确定扬声器为扬声器 10 和 11 , 如头部对应的区域 21、 22和 23时, 确定扬声器为扬声器 21、 22、 23。  The specific method for implementing S42 may be: as shown in FIG. 2, the panel speaker array is divided into 36 regions according to the number of speakers, and the head position information of A is determined by the face recognition technology to be located in the region 11 as shown in FIG. Then, it is confirmed that the speaker corresponding to the head of the A is the speaker 11; similarly, the speakers corresponding to the heads of the C and D are respectively: the speakers 13, 15, and 17. In a practical situation, when the user's head region spans multiple regions, it may also occur that the head location information determined by the face recognition technology is located in a plurality of regions as shown in FIG. 2, such as the region 10 of FIG. 11, or the regions 21, 22, and 23, at this time, it is confirmed that the speaker corresponding to the A head is the speaker corresponding to all the regions corresponding to the A head position information, for example, the regions 10 and 11 corresponding to the head, and the speaker is determined to be the speaker 10 And 11, when the areas 21, 22, and 23 corresponding to the head, the speakers are determined to be the speakers 21, 22, 23.
543、 当远端用户发言时, 启动发言者对应的扬声器进行放音; 例如 A发 言时, 启动 A对应的扬声器进行放音。 543. When the remote user speaks, start the speaker corresponding to the speaker to play the sound; for example, A hair In other words, start the speaker corresponding to A for playback.
上述获知远端用户发言的方法有多种, 可以用人工识别的方式检测人脸 的口型变化, 也可以通过音频采集的方式检测该远端用户是否发言。  The above methods for knowing the voice of the remote user are various, and the mouth shape change of the face can be detected by the manual identification method, and the remote user can be detected by the audio collection mode.
可选的, 上述对本端声音输出设备的控制方法可以由本地会议设备完成, 也可以由媒体服务器(在图 9中, 对应 MCU )来完成;  Optionally, the foregoing method for controlling the local sound output device may be performed by a local conference device, or may be performed by a media server (in FIG. 9, corresponding to the MCU);
当通过本地设备完成时, 远端设备通过媒体服务器将远端会场的用户图 像信息发送给本端会议设备, 由本地设备建立远端会场的参会者信息与本端 的声音输出设备之间的对应关系, 当远端会场的参会者发言时, 通过本端的 人脸识别确定发言者, 进而通过本地设备完成对本端的声音输出设备的控制, 在本实施例中, 通过控制远端发言者对应的本地扬声器阵列的扬声器发声以 实现扬声器阵列的控制, 进而实现本地用户听到远端用户声音的方位和本地 用户看到的远端用户的图像的方位基本保持一致, 达到增加用户的临场感的 效果;  When the local device is configured, the remote device sends the user image information of the remote site to the local conference device through the media server. The local device establishes the correspondence between the participant information of the remote site and the local voice output device. Relationship: When the participant at the remote site speaks, the speaker is determined by the local face recognition, and then the local sound output device is controlled by the local device. In this embodiment, by controlling the remote speaker corresponding The speaker of the local speaker array emits sound to realize the control of the speaker array, so that the orientation of the local user to hear the far-end user's voice is consistent with the orientation of the image of the far-end user seen by the local user, thereby achieving the effect of increasing the user's presence. ;
当通过媒体服务器来完成时, 通过媒体服务器确定本地会议终端的声音 输出设备的信息, 该信息可以包括: 该声音输出设备的类型、 个数、 排列方 式等, 进而在获取到远端会场的用户的图像信息之后, 根据该图像信息获取 远端用户的头部信息, 为本地会场建立起远端用户的头部信息与本端的声音 输出设备的对应关系。 进而, 当远端会场的某一个用户发言时, 通过媒体服 务器检测出远端会场发送过来的远端会场的声源位置, 进而根据远端用户的 头部信息与本端的声音输出设备的对应关系, 确定本端的声音输出设备中对 应的扬声器完成声音的输出。 通过本实施方式, 能够使得相应的处理与控制 功能在媒体服务器实现, 实现本地用户听到远端用户声音的方位和本地用户 看到的远端用户的图像的方位基本保持一致, 增加用户的临场感的效果, 同 时也降低了本地设备实现本方案的复杂度。  When the information is obtained by the media server, the information of the sound output device of the local conference terminal is determined by the media server, and the information may include: the type, the number, the arrangement manner of the sound output device, and the like, and the user at the remote site is obtained. After the image information is obtained, the head information of the remote user is obtained according to the image information, and the correspondence between the head information of the remote user and the sound output device of the local end is established for the local site. In addition, when a user at the remote site speaks, the media server detects the location of the sound source of the remote site sent by the remote site, and then according to the correspondence between the head information of the remote user and the sound output device of the local end. , determine the output of the corresponding speaker in the sound output device of the local end. With the embodiment, the corresponding processing and control functions can be implemented in the media server, and the orientation of the local user to hear the voice of the remote user is consistent with the orientation of the image of the remote user seen by the local user, and the presence of the user is increased. The effect of the sense, but also reduces the complexity of the local device to achieve this solution.
本实施例提供的方法根据 A、 B、 C和 D的各头部位置信息确定其头部位 置信息分别对应的扬声器, 当远端用户发言时, 启动发言者对应的扬声器进 行放音, 达到了本地用户听到远端用户声音的方位和本地用户看到的远端用 户的图像的方位基本保持一致的目的, 增加了用户的临场感。 The method provided in this embodiment determines the speaker corresponding to the head position information according to each head position information of A, B, C, and D. When the remote user speaks, the speaker corresponding to the speaker is activated. The playback of the line achieves the purpose that the local user hears the location of the far-end user's voice and the orientation of the image of the far-end user that the local user sees, which increases the user's sense of presence.
另一实施例: 本实施例提供一种视频通信的实现方法, 其实现的技术场 景为: 本实施例提供的方法在本地设备、 媒体服务器、 远端设备组成的系统 之间完成, 其中媒体服务器完成远端设备与本地设备的视频和音频数据的交 换, 远端设备采集远端用户的视频和音频数据, 并发送给媒体服务器; 本地 用户与远端用户通过显示设备进行视频通信, 上述显示设备可以为: CRT显 示器, 液晶显示器、 等离子显示器等等。 假设显示设备的上、 下的中线位置 分别设置一扬声器(如图 10所示), 当然扬声器的设置也可以偏离显示设备 的中线位置, 如上扬声器设置显示设备的中线偏左位置, 下扬声器设置液晶 电视的显示设备中线偏右位置, 本发明在上、 下设置时, 并不局限扬声器的 左、 右的具体位置, 只需保证显示设备的上、 下各设置一个扬声器即可; 这 里假设远端用户有 4人, 分别设定为 A B C D, 本地用户设定为 E; 假 设 A B C D头像的排列顺序为, 自上而下: A B C D; 本实施例中 的头像位置均指头像的嘴部中心位置; 则上述方法可以如图 5 所示, 包括如 下步骤:  Another embodiment of the present invention provides a video communication implementation method, which is implemented in the following manner: The method provided in this embodiment is implemented between a system consisting of a local device, a media server, and a remote device, where the media server The video and audio data of the remote device and the local device are exchanged, and the remote device collects video and audio data of the remote user and sends the data to the media server; the local user and the remote user perform video communication through the display device, and the foregoing display device Can be: CRT display, LCD, plasma display, etc. Assume that the upper and lower center positions of the display device are respectively set with a speaker (as shown in FIG. 10). Of course, the setting of the speaker can also deviate from the center line position of the display device. The speaker is set to the left position of the display device, and the lower speaker is set to the liquid crystal. The display device of the television has a right-to-right position. When the present invention is set up and down, it does not limit the left and right positions of the speaker. It is only necessary to ensure that one speaker is arranged above and below the display device; The user has 4 people, respectively set to ABCD, and the local user is set to E; assuming that the ABCD avatars are arranged in the order from top to bottom: ABCD; the avatar position in this embodiment refers to the mouth center position of the avatar; The above method can be as shown in FIG. 5, and includes the following steps:
步骤 51 : 本地会场与远端会场建立连接后, 通过远端设备根据人脸识别 确定远端用户 A B C和 D的各头部位置信息;  Step 51: After the local site establishes a connection with the remote site, the remote device determines the head location information of the remote users A B C and D according to the face recognition.
步骤 52、 根据 A B C D头像方位计算其各自头部位置中心到显示图 像中心 (显示设备中心) 的垂直距离, 并计算该垂直距离与显示图像(即显 示设备显示图像的总高度)总高度的比值;  Step 52: Calculate a vertical distance from the center of the respective head position to the center of the display image (the center of the display device) according to the orientation of the ABCD avatar, and calculate a ratio of the vertical distance to the total height of the display image (ie, the total height of the display image of the display device);
步骤 53、 当远端用户发言时, 根据该发言者对应的比值数调整上、 下扬 声器的音量, 并按调整后的音量进行放音。  Step 53: When the remote user speaks, adjust the volume of the up and down speakers according to the ratio corresponding to the speaker, and play the sound according to the adjusted volume.
其具体的调整方式可以为:假设 A B C D对应的比值数分别为: 0.125 0.375 0.625 0.875; 则根据上述公式 1 (其中 X=3 )计算出的对应的上、 下 扬声器的音量的差值分别为: 9dB 3dB 3dB和一 9dB。 当然在公式 1的 X 取其他值时, 对应的差值也可以为其他的数值, 如当 X=2时, 计算出的差值 分别为: 6dB 2dB 2dB和一 6dB, 在实际情况中 X的取值也可以为其他 的数值, 这里用户可以自行设定 X的具体数值。 The specific adjustment method may be as follows: the ratio of the ratio corresponding to ABCD is: 0.125 0.375 0.625 0.875; then the difference between the volume of the corresponding upper and lower speakers calculated according to the above formula 1 (where X=3) is: 9dB 3dB 3dB and a 9dB. Of course the X in Formula 1 When taking other values, the corresponding difference can also be other values. For example, when X=2, the calculated differences are: 6dB 2dB 2dB and a 6dB. In actual cases, the value of X can also be other. The value, here the user can set the specific value of X.
在用户设定一基准音量值后, 例如将上扬声器的音量值设定为基准音量 值, 具体可以为 40dB, 则控制上扬声器的音量为 40dB, 下扬声器的音量为 43dB (其中 X=3, 比值为 0.625 )或 38 dB (其中 X=2, 比值为 0.375 )。 当然 在实际情况中也可以为其他的音量值。  After the user sets a reference volume value, for example, the volume value of the upper speaker is set to the reference volume value, which may be 40 dB, the volume of the upper speaker is controlled to be 40 dB, and the volume of the lower speaker is 43 dB (where X=3, The ratio is 0.625) or 38 dB (where X=2, the ratio is 0.375). Of course, in the actual situation, it can also be other volume values.
下面以本实施例实现的原理来说明本实施例的技术效果, 通过实验证明, 人耳在听到两个声源发音(例如上、 下) 时, 其实际感受到的声音为一个地 点发送出来的, 我们一般将这个地点叫为虚拟声源。 例如, 当两个声源的音 量大小一致时, 合成的虚拟声源为两个声源的中心位置, 如声源设置为上、 下设置时, 上声源的音量大, 则合成的虚拟声源离上声源位置近。 同理, 下 声源的音量大, 则合成的虚拟声源位置离下声源位置近。 所以出现如本实施 例的情况时, 具体可以为: 当发言者发言时, 可以通过控制上、 下声源 (本 实施例为扬声器) 的音量来调整其合成的虚拟声源的位置, 将该虚拟声源的 位置调制至发言者的图像位置时, 本地用户听到远端用户声音的方位和本地 用户看到的远端用户的图像的方位基本保持一致, 增加用户的临场感的效果。  The technical effects of the embodiment are described below by using the principle implemented in this embodiment. It is proved by experiments that when the human ear hears two sound source pronunciations (for example, up and down), the actual perceived sound is sent out for one location. We generally call this location a virtual sound source. For example, when the volume of the two sound sources is the same, the synthesized virtual sound source is the center position of the two sound sources. If the sound source is set to the upper and lower settings, the volume of the upper sound source is large, then the synthesized virtual sound is The source is close to the position of the sound source. Similarly, if the volume of the lower sound source is large, the position of the synthesized virtual sound source is close to the position of the lower sound source. Therefore, when the situation occurs in the embodiment, the speaker may be in a position to adjust the position of the synthesized virtual sound source by controlling the volume of the upper and lower sound sources (the speaker in this embodiment). When the position of the virtual sound source is modulated to the image position of the speaker, the local user hears the orientation of the far-end user's voice substantially consistent with the orientation of the image of the far-end user seen by the local user, thereby increasing the effect of the user's presence.
本实施例提供的方法根据 A B C和 D的各头部位置信息计算头像到显 示图像中心的垂直距离与显示图像总宽度的比值, 并根据该比值来控制上、 下扬声器的音量, 从而进行放音, 达到了本地用户听到远端用户声音的方位 和本地用户看到的远端用户的图像的方位基本保持一致的目的, 增加了用户 的临场感。  The method provided by the embodiment calculates the ratio of the vertical distance of the avatar to the center of the display image and the total width of the display image according to the head position information of the ABC and the D, and controls the volume of the upper and lower speakers according to the ratio, thereby performing playback. The purpose of the local user to hear the sound of the remote user is consistent with the orientation of the image of the remote user seen by the local user, which increases the user's sense of presence.
上述另一实施例中, 当显示设备的扬声器为水平设置时, 可以将头像进 行水平显示, 并将比值修改成, 头像到显示图像中心的水平距离与显示图像 总宽度的比值, 然后根据公式 2进行音量差值的计算。 该水平设置扬声器可 以为, 在显示设备的左、 右的中线位置分别设置一扬声器(如图 11 所示); 当然设置该扬声器的设置也可以偏离显示设备的中线位置, 如左扬声器设置 在显示设备的中线偏上位置, 右扬声器设置在显示设备的中线偏下位置, 本 发明在水平设置时, 并不局限扬声器的上、 下具体位置, 只需保证显示设备 的左、 右各设置一个扬声器即可。 In another embodiment, when the speaker of the display device is horizontally set, the avatar may be displayed horizontally, and the ratio is modified to a ratio of the horizontal distance of the avatar to the center of the display image to the total width of the displayed image, and then according to formula 2 Perform the calculation of the volume difference. The horizontal setting speaker may be configured to respectively set a speaker at the left and right center line positions of the display device (as shown in FIG. 11); Of course, the setting of the speaker can also be deviated from the center line position of the display device, such as the left speaker is set at the upper position of the display device, and the right speaker is set at the lower position of the display device. The present invention is not limited in the horizontal setting. For the upper and lower specific positions of the speaker, just set a speaker on the left and right of the display device.
当人耳在听到两个声源发音(例如左、 右) 时, 其实际感受到的声音为 一个地点发送出来的, 我们一般将这个地点叫为虚拟声源, 例如, 当两个声 源的音量大小一致时, 合成的虚拟声源为两个声源的中心位置, 如声源设置 为左、 右设置时, 左声源的音量大, 则合成的虚拟声源离左声源位置近, 同 理, 右声源的音量大, 则合成的虚拟声源位置离下右声源位置近。 所以出现 如本实施例的情况时, 具体可以为: 当发言者发言时, 可以通过控制左、 右 声源 (本实施例为扬声器) 的音量来调整其合成的虚拟声源的位置, 将该虚 拟声源的位置调制至发言者的图像位置时, 本地用户听到远端用户声音的方 位和本地用户看到的远端用户的图像的方位基本保持一致, 增加用户的临场 感的效果。  When the human ear hears two sound sources (such as left and right), the actual perceived sound is sent out in one place. We generally call this location a virtual sound source, for example, when two sound sources When the volume of the sound is the same, the synthesized virtual sound source is the center position of the two sound sources. If the sound source is set to the left and right, the volume of the left sound source is large, and the synthesized virtual sound source is close to the left sound source. Similarly, if the volume of the right sound source is large, the synthesized virtual sound source position is close to the position of the lower right sound source. Therefore, when the situation of the present embodiment occurs, the following may be specifically: when the speaker speaks, the position of the synthesized virtual sound source may be adjusted by controlling the volume of the left and right sound sources (the speaker in this embodiment). When the position of the virtual sound source is modulated to the image position of the speaker, the local user hears the orientation of the far-end user's voice substantially consistent with the orientation of the image of the far-end user seen by the local user, thereby increasing the effect of the user's presence.
本发明提供又一实施例, 本实施例在本地设备、 媒体服务器、 远端设备 组成的系统之间完成, 其中媒体服务器完成远端设备与本地设备的视频和音 频数据的交换, 远端设备采集远端用户的视频和音频数据, 并发送给媒体服 务器; 本地用户与远端用户通过投影进行视频通信, 并在投影平面上设置平 板扬声器阵列 (如图 2 ), 这里假设远端用户有 4人, 分别设定为 A、 B、 C、 D, 远端设备为 A、 B、 C、 D分别分配麦克 1、 2、 3、 4; 本地用户设定为 E; 则上述方法可以如图 6所示, 包括:  The present invention provides a further embodiment. The present embodiment is implemented between a system consisting of a local device, a media server, and a remote device. The media server completes the exchange of video and audio data between the remote device and the local device, and the remote device collects the data. The video and audio data of the remote user are sent to the media server; the local user and the remote user perform video communication through projection, and set a flat panel speaker array on the projection plane (as shown in Fig. 2), which assumes that the remote user has 4 people. , respectively, set to A, B, C, D, the remote device assigns microphones 1, 2, 3, 4 to A, B, C, D respectively; the local user is set to E; then the above method can be as shown in Figure 6. Show, including:
S61、 本地会场与远端会场建立连接后, 远端设备利用人脸识别方法确定 用户 A、 B、 C和 D的各头部位置信息;  S61. After the local site establishes a connection with the remote site, the remote device uses the face recognition method to determine the head position information of the users A, B, C, and D.
实现 S61的方法具体可以为: 上述确定八、 B、 C和 D头部位置信息的方 法仅以人脸识别方法为例来进行说明, 在实际应用中还可以用其他的方式, 如人工确认 A、 B、 C和 D头部位置信息或使用其他的识别技术, 譬如: 根据 人体工程学的角度来确定, 与会者在会场的位置信息, 本发明并不局限确定 用户 A、 B、 C和 D头部位置信息的具体方法。 The method for implementing the S61 may be specifically as follows: The method for determining the position information of the heads of the eighth, B, C, and D is only described by taking the face recognition method as an example, and other methods, such as manually confirming A, may be used in practical applications. , B, C, and D head position information or use other identification techniques, such as: The ergonomic point of view determines the location information of the participants at the venue, and the present invention does not limit the specific method of determining the location information of the user's A, B, C, and D heads.
可选的, 在实施本步骤时, 较佳的方式是通过远端直接采集远端会场的 与会者图像信息, 从中利用人脸识别的技术确定各与会者的位置信息。  Optionally, when performing this step, the preferred method is to directly collect the participant image information of the remote site through the remote end, and use the face recognition technology to determine the location information of each participant.
562、 本地设备根据 A、 B、 C和 D的头部位置信息确定该 A、 B、 C和 D 的头部分别对应的平板扬声器阵列中扬声器的位置;  562. The local device determines, according to the head position information of the A, B, C, and D, the positions of the speakers in the flat speaker array corresponding to the heads of the A, B, C, and D respectively;
实现 S52的具体方法可以为, 如图 2所示, 根据扬声器的个数将平板扬 声器阵列划分为 36个区域,利用人脸识别方法确定 A的头部位置信息位于如 图 2所示的区域 11, 则确认 A头部对应的扬声器为扬声器 11 ; 同理确定^ C、 D头部对应的扬声器分别为: 扬声器 13、 15、 17。 在实际情况中, 还可能 出现通过人脸识别方法确定 A的头部位置信息位于如图 2所示的多个区域, 例如图 2的区域 10和 11, 或者区域 21、 22和 23, 此时, 确认 A头部对应的 扬声器为 A头部位置信息对应的所有区域对应的扬声器, 例如头部对应的区 域 10和 11时, 确定扬声器为扬声器 10和 11 , 如头部对应的区域 21、 22和 23时, 确定扬声器为扬声器 21、 22、 23。  The specific method for implementing S52 may be: as shown in FIG. 2, the panel speaker array is divided into 36 regions according to the number of speakers, and the face recognition information of A is determined by the face recognition method to be located in the region 11 as shown in FIG. Then, it is confirmed that the speaker corresponding to the head of the A is the speaker 11; similarly, the speakers corresponding to the heads of the C and D are respectively: the speakers 13, 15, and 17. In an actual case, it may also occur that the head position information of A is determined by the face recognition method to be located in a plurality of areas as shown in FIG. 2, such as areas 10 and 11 of FIG. 2, or areas 21, 22, and 23, , confirming that the speaker corresponding to the A head is the speaker corresponding to all the regions corresponding to the A head position information, for example, the regions 10 and 11 corresponding to the head, determining that the speaker is the speakers 10 and 11, such as the regions 21 and 22 corresponding to the head. And at 23 o'clock, the speaker is determined to be the speakers 21, 22, 23.
563、媒体服务器根据麦克风 1发送的音频码流确定 A发言时, 将麦克风 1发送的音频码流和确定 A为发言者的信息发送给本地设备;  563. The media server determines, according to the audio code stream sent by the microphone 1, that the audio stream sent by the microphone 1 and the information that determines that A is the speaker are sent to the local device.
实现 S63的实际方法可以为: 由于远端用户有 A、 B、 C、 D, 其分别分 配麦克风 1、 2、 3、 4, 则媒体服务器建立麦克风 1与用户 A的对应关系, 同 理, 建立麦克风 2与用户 B的对应关系, 麦克风 3与用户 C的对应关系, 麦 克风 4与用户 D的对应关系, 则当媒体服务器检测到麦克风 1发送过来的音 频码流时, 根据麦克风 1与用户 A的对应关系, 确定用户 A发言, 并将麦克 风 1发送的音频码流和确定 A为发言者的信息发送给本地设备。  The actual method for implementing S63 may be as follows: Since the remote users have A, B, C, and D, respectively, respectively, the microphones 1, 2, 3, and 4 are allocated, the media server establishes the correspondence between the microphone 1 and the user A, and the same reason is established. Correspondence between the microphone 2 and the user B, the correspondence between the microphone 3 and the user C, and the correspondence between the microphone 4 and the user D, when the media server detects the audio stream sent by the microphone 1, according to the microphone 1 and the user A Corresponding relationship, the user A is determined to speak, and the audio code stream sent by the microphone 1 and the information determining that A is the speaker are sent to the local device.
564、 本地设备启动 A对应的扬声器播放麦克风 1发送的音频码流。 可选的, 上述本地设备完成的步骤均可以由媒体服务器控制本地设备完 成。 本实施例提供的方法中的本地设备根据 A、 B、 C和 D的头部位置信息确 定其头部位置信息分别对应的扬声器, 当媒体服务器确定发言者时, 由本地 设备启动发言者对应的扬声器进行放音, 达到了本地用户听到远端用户声音 的方位和本地用户看到的远端用户的图像的方位基本保持一致的目的, 增加 了用户的临场感。 564. The local device starts A corresponding speaker to play the audio code stream sent by the microphone 1. Optionally, the steps performed by the foregoing local device may be performed by the media server to control the local device. The local device in the method provided by the embodiment determines the speaker corresponding to the head position information according to the head position information of A, B, C, and D. When the media server determines the speaker, the local device starts the speaker corresponding to the speaker. The speaker plays the sound, which achieves the purpose that the local user hears the sound of the far-end user and the orientation of the image of the far-end user seen by the local user is substantially the same, which increases the user's sense of presence.
本发明还提供一种实现视频通信的装置, 该装置如图 7所示, 其中虚线 模块表示可选模块, 该装置具体包括:  The present invention also provides an apparatus for implementing video communication, which is shown in FIG. 7, wherein the dotted line module represents an optional module, and the apparatus specifically includes:
获取单元 71, 用于在本地用户与远端用户建立连接后, 获取远端用户的 头部位置信息;  The obtaining unit 71 is configured to acquire, after the local user establishes a connection with the remote user, the location information of the remote user.
放音控制单元 72, 用于根据所述远端用户的头部位置信息确定所述远端 用户对应的扬声器播放方式; 当远端用户发言时, 根据发言者对应的扬声器 播放方式进行放音。  The playing control unit 72 is configured to determine, according to the head position information of the remote user, a speaker playing mode corresponding to the remote user; when the remote user speaks, play the sound according to a speaker playing mode corresponding to the speaker.
可选的, 当所述扬声器为平板扬声器阵列时, 放音控制单元 72包括: 阵列模块 721,用于根据所述远端用户的头部位置信息确认其对应的所述 平板扬声器阵列中的扬声器,  Optionally, when the speaker is a flat panel speaker array, the sound emission control unit 72 includes: an array module 721, configured to confirm a speaker in the corresponding flat panel speaker array according to the head position information of the remote user ,
放音模块 722, 当所述远端用户发言时, 启动发言者对应的扬声器进行放 音。  The playing module 722, when the remote user speaks, starts the speaker corresponding to the speaker to play.
可选的, 当所述扬声器为上、 下设置时, 放音控制单元 72包括: 高度计算模块 723, 用于将远端用户的图像上下显示, 并计算远端用户头 部位置中心到显示图像中心的垂直距离, 计算出所述垂直距离与所述显示图 像总高度的比值;  Optionally, when the speaker is set up and down, the sound emission control unit 72 includes: a height calculation module 723, configured to display an image of the remote user up and down, and calculate a center position of the remote user head to display the image a vertical distance of the center, and calculating a ratio of the vertical distance to the total height of the displayed image;
竖直放音模块 724, 用于根据上、 下扬声器的音量差值对上、 下扬声器的 音量调整后进行放音; 上扬声器与下扬声器音量的差值的计算方法可以参见 公式 1中的描述。  The vertical playback module 724 is configured to adjust the volume of the upper and lower speakers according to the volume difference between the upper and lower speakers; and the method for calculating the difference between the upper speaker and the lower speaker can be referred to the description in Equation 1. .
可选的, 当所述扬声器为左、 右设置时, 放音控制单元 72包括: 宽度计算模块 725, 用于将远端用户的图像左右显示, 并计算远端用户头 部位置中心到显示图像中心的水平距离, 计算出所述水平距离与所述显示图 像总宽度的比值; Optionally, when the speaker is set to the left or the right, the sound emission control unit 72 includes: a width calculation module 725, configured to display the image of the remote user to the left and right, and calculate the remote user header Calculating the ratio of the horizontal distance to the total width of the displayed image by the horizontal distance from the center of the position to the center of the display image;
水平放音模块 726, 用于根据左、 右扬声器的音量差值对左、 右扬声器的 音量调整后进行放音; 左扬声器与右扬声器音量的差值可以参见公式 2 中的 描述。  The horizontal playback module 726 is configured to play the volume adjustment of the left and right speakers according to the volume difference between the left and right speakers; the difference between the volume of the left speaker and the right speaker can be seen in the description in Equation 2.
可选的, 本装置可以为单独存在的设备, 当然该装置也可以安装于本地 设备内, 当然在实际情况中, 本装置也可以安装在媒体服务器内。  Optionally, the device may be a separately existing device. Of course, the device may also be installed in the local device. Of course, in actual situations, the device may also be installed in the media server.
本发明提供的装置根据远端用户的头部位置信息确定其对应的扬声器播 放方法, 当远端用户发言时, 启动发言者对应的扬声器播放方法进行放音, 达到了本地用户听到远端用户声音的方位和本地用户看到的远端用户的图像 的方位基本保持一致的目的, 增加了用户的临场感。  The device provided by the present invention determines the corresponding speaker playing method according to the head position information of the remote user. When the remote user speaks, the speaker corresponding to the speaker is played to play the sound, and the local user hears the remote user. The orientation of the sound is substantially consistent with the orientation of the image of the far-end user seen by the local user, which increases the user's sense of presence.
本发明还提供一种实现视频通信的系统, 该系统如图 8所示: 包括: 远 端设备 81、 本地设备 82和媒体服务器 83 ;  The present invention also provides a system for implementing video communication. The system is as shown in FIG. 8 and includes: a remote device 81, a local device 82, and a media server 83;
远端设备 81用于采集远端用户的视频和音频数据, 并发送给媒体服务器 The remote device 81 is configured to collect video and audio data of the remote user and send the data to the media server.
83 ; 83 ;
媒体服务器 83用于完成远端设备 81与本地设备 82的视频和音频数据的 交换;  The media server 83 is configured to exchange video and audio data of the remote device 81 and the local device 82;
本地设备 82用于在本地用户与远端用户建立连接后, 根据获取的远端用 户的头部位置信息确定所述远端用户对应的扬声器播放方式; 当远端用户发 言时, 根据发言者对应的扬声器播放方式进行放音。  The local device 82 is configured to: after the local user establishes a connection with the remote user, determine, according to the acquired location information of the remote user, a speaker playing mode corresponding to the remote user; when the remote user speaks, according to the speaker The speaker playback mode is used for playback.
本发明提供的系统中的本地设备 82可以根据远端用户的头部位置信息确 定其对应的扬声器播放方式, 当远端用户发言时, 启动发言者对应的扬声器 播放方式进行放音, 达到了本地用户听到远端用户声音的方位和本地用户看 到的远端用户的图像的方位基本保持一致的目的, 增加了用户的临场感。  The local device 82 in the system provided by the present invention can determine the corresponding speaker playing mode according to the head position information of the remote user. When the remote user speaks, the speaker corresponding to the speaker is played to play the sound, and the local mode is reached. The user hears that the orientation of the remote user's voice is substantially consistent with the orientation of the image of the remote user as seen by the local user, which increases the user's sense of presence.
本发明还提供另一种视频通信系统, 该系统包括: 远端设备、 本地设备 和媒体服务器; 远端设备用于采集远端用户的视频和音频数据, 并发送给媒体服务器; 媒体服务器用于完成远端设备与本地设备的视频和音频数据的交换; 媒体服务器还用于在本地用户与远端用户建立连接后, 根据获取的远端 用户的头部位置信息确定所述远端用户对应的扬声器播放方式; 当远端用户 发言时, 根据发言者对应的扬声器播放方式向本地设备 82发送放音命令; 本地设备用于根据该放音命令控制本地放音装置进行放音。 The present invention also provides another video communication system, the system comprising: a remote device, a local device, and a media server; The remote device is configured to collect video and audio data of the remote user and send the data to the media server. The media server is used to exchange video and audio data between the remote device and the local device. The media server is also used for local users and remote users. After the connection is established, the end user determines the speaker playback mode corresponding to the remote user according to the acquired location information of the remote user; when the remote user speaks, sends the speaker to the local device 82 according to the speaker playback mode corresponding to the speaker. The local command is used to control the local playback device to play according to the playback command.
本发明提供的系统中的媒体服务器可以根据远端用户的头部位置信息确 定其对应的扬声器播放方式, 当远端用户发言时, 启动发言者对应的扬声器 播放方式进行放音, 达到了本地用户听到远端用户声音的方位和本地用户看 到的远端用户的图像的方位基本保持一致的目的, 增加了用户的临场感。  The media server in the system provided by the present invention can determine the corresponding speaker playing mode according to the head position information of the remote user. When the remote user speaks, the speaker corresponding to the speaker is played to play the sound, and the local user is reached. The purpose of hearing the location of the far-end user's voice is basically the same as the orientation of the image of the far-end user seen by the local user, which increases the user's sense of presence.
本领域技术人员可以理解附图只是一个优选实施例的示意图, 附图中的 模块或流程并不一定是实施本发明所必须的。  A person skilled in the art can understand that the drawings are only a schematic diagram of a preferred embodiment, and the modules or processes in the drawings are not necessarily required to implement the invention.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤 可以通过程序来指令相关的硬件完成, 所述的程序可以存储于一种计算机可 读存储介质中, 该程序在执行时, 包括方法实施例的步骤之一或其组合。  A person skilled in the art can understand that all or part of the steps of implementing the foregoing embodiments may be performed by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, when executed, Include one of the steps of the method embodiments or a combination thereof.
综上所述, 本发明具体实施方式提供的技术方案, 具有本地用户听到远 端用户声音的方位和本地用户看到的远端用户的图像的方位基本保持一致, 增加了用户的临场感的优点。  In summary, the technical solution provided by the specific embodiment of the present invention has the same orientation that the local user hears the voice of the remote user and the orientation of the image of the remote user that the local user sees, which increases the user's presence. advantage.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤 是可以通过程序来指令相关的硬件完成, 该程序可以存储于一种计算机可读 存储介质中, 上述提到的存储介质可以是只读存储器, 磁盘或光盘等。  A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, and the above mentioned storage medium may be It is a read-only memory, a disk or a disc.
以上对本发明所提供的一种信息交互方法及界面控制系统进行了详细介 绍, 对于本领域的一般技术人员, 依据本发明实施例的思想, 在具体实施方 式及应用范围上均会有改变之处, 因此, 本说明书内容不应理解为对本发明 的限制。  The information interaction method and the interface control system provided by the present invention are described in detail above. For those skilled in the art, according to the idea of the embodiment of the present invention, there are some changes in the specific implementation manner and application scope. Therefore, the content of the specification should not be construed as limiting the invention.

Claims

权利要求 书 Claim
1、 一种视频通信的实现方法, 其特征在于, 所述方法包括:  A method for implementing video communication, the method comprising:
在本地设备与远端设备建立连接后, 获取远端用户的头部位置信息; 根据所述远端用户的头部位置信息确定所述远端用户对应的扬声器播放方 法;  After the connection between the local device and the remote device, the location information of the remote user is obtained; and the speaker playback method corresponding to the remote user is determined according to the location information of the remote user;
当远端用户发言时, 根据发言者对应的扬声器播放方式进行放音。  When the remote user speaks, the sound is played according to the speaker playing mode corresponding to the speaker.
2、 根据权利要求 1所述的方法, 其特征在于, 根据所述远端用户的头部位 置信息确定所述远端用户对应的扬声器播放方式; 当远端用户发言时, 根据发 言者对应的扬声器播放方式进行放音具体包括:  The method according to claim 1, wherein the speaker position mode corresponding to the remote user is determined according to the head position information of the remote user; when the remote user speaks, according to the speaker corresponding Speaker playback mode for playback includes:
当所述扬声器为平板扬声器阵列时, 根据所述远端用户的头部位置信息确 认其对应的所述平板扬声器阵列中的扬声器, 当所述远端用户发言时, 启动发 言者对应的扬声器进行放音。  When the speaker is a flat panel speaker array, the speaker in the corresponding flat panel speaker array is confirmed according to the head position information of the remote user, and when the remote user speaks, the speaker corresponding to the speaker is activated. play music.
3、 根据权利要求 1所述的方法, 其特征在于, 根据所述远端用户的头部位 置信息确定所述远端用户对应的扬声器播放方式; 当远端用户发言时, 根据发 言者对应的扬声器播放方式进行放音具体包括:  The method according to claim 1, wherein the speaker playing mode corresponding to the remote user is determined according to the head position information of the remote user; when the remote user speaks, according to the speaker corresponding Speaker playback mode for playback includes:
当所述扬声器为上、 下设置时, 将远端用户的图像上下显示, 并计算远端 用户头部位置中心到显示图像中心的垂直距离, 计算出所述垂直距离与所述显 示图像总高度的比值;  When the speaker is set up and down, the image of the remote user is displayed up and down, and the vertical distance from the center of the remote user's head position to the center of the display image is calculated, and the vertical distance and the total height of the displayed image are calculated. Ratio
上扬声器与下扬声器音量的差值 =8X* ( 0.5—所述垂直距离与所述显示图像 总高度的比值) dB; 并根据所述差值对上、 下扬声器的音量调整后进行放音; 上述 X为用户设定的声音系数。  The difference between the upper speaker and the lower speaker volume=8X* (0.5—the ratio of the vertical distance to the total height of the display image) dB; and the volume of the upper and lower speakers is adjusted according to the difference; The above X is the sound coefficient set by the user.
4、 根据权利要求 1所述的方法, 其特征在于, 根据所述远端用户的头部位 置信息确定所述远端用户对应的扬声器播放方式; 当远端用户发言时, 根据发 言者对应的扬声器播放方式进行放音具体包括:  The method according to claim 1, wherein, according to the head position information of the remote user, the speaker playing mode corresponding to the remote user is determined; when the remote user speaks, according to the speaker corresponding to Speaker playback mode for playback includes:
当所述扬声器为左、 右设置时, 将远端用户的图像左右显示, 并计算远端 用户头部位置中心到显示图像中心的水平距离, 计算出所述水平距离与所述显 示图像总宽度的比值; When the speaker is set to the left and right, the image of the remote user is displayed to the left and right, and the horizontal distance from the center of the head of the remote user to the center of the display image is calculated, and the horizontal distance and the display are calculated. The ratio of the total width of the image;
左扬声器与右扬声器音量的差值 =8X* ( 0.5—所述水平距离与所述显示图像 总宽度的比值) dB; 并根据所述差值对左、 右扬声器的音量调整后进行放音; 上述 X为用户设定的声音系数。  The difference between the volume of the left speaker and the right speaker = 8X* (0.5 - the ratio of the horizontal distance to the total width of the displayed image) dB; and the volume of the left and right speakers is adjusted according to the difference; The above X is the sound coefficient set by the user.
5、 一种实现视频通信的装置, 其特征在于, 所述装置包括:  5. A device for implementing video communication, the device comprising:
获取单元, 用于在本地用户与远端用户建立连接后, 获取远端用户的头部 位置信息;  An obtaining unit, configured to obtain a location information of a remote user after the local user establishes a connection with the remote user;
放音控制单元, 用于根据所述远端用户的头部位置信息确定所述远端用户 对应的扬声器播放方法; 当远端用户发言时, 根据发言者对应的扬声器播放方 式进行放音。  The playback control unit is configured to determine a speaker playing method corresponding to the remote user according to the head position information of the remote user; and when the remote user speaks, play the sound according to the speaker playing mode corresponding to the speaker.
6、 根据权利要求 5所述的装置, 其特征在于, 当所述扬声器为平板扬声器 阵列时, 所述放音控制单元包括:  The device according to claim 5, wherein when the speaker is a flat panel speaker array, the playback control unit comprises:
位置确认模块, 用于根据所述远端用户的头部位置信息确认其对应的所述 平板扬声器阵列中的扬声器,  a location confirmation module, configured to confirm, according to the location information of the remote user, a speaker in the corresponding panel speaker array,
放音模块, 当所述远端用户发言时, 启动发言者对应的扬声器进行放音。 The playing module, when the remote user speaks, starts the speaker corresponding to the speaker to play the sound.
7、 根据权利要求 5所述的装置, 其特征在于, 当所述扬声器为上、 下设置 时, 所述放音控制单元包括: The device according to claim 5, wherein when the speaker is set up and down, the playback control unit comprises:
高度计算模块, 用于将远端用户的图像上下显示, 并计算远端用户头部位 置中心到显示图像中心的垂直距离, 计算出所述垂直距离与所述显示图像总高 度的比值;  a height calculation module, configured to display an image of the remote user up and down, and calculate a vertical distance from the center of the remote user's head to the center of the display image, and calculate a ratio of the vertical distance to the total height of the displayed image;
竖直放音模块, 用于根据上、 下扬声器的音量差值对上、 下扬声器的音量 调整后进行放音; 上扬声器与下扬声器音量的差值 =8X* ( 0.5—所述垂直距离与 所述显示图像总高度的比值) dB; 上述 X为用户设定的声音系数。  Vertical playback module, used to adjust the volume of the upper and lower speakers according to the volume difference between the upper and lower speakers; the difference between the upper speaker and the lower speaker volume = 8X* (0.5 - the vertical distance and The ratio of the total height of the displayed image) dB; the above X is the sound coefficient set by the user.
8、 根据权利要求 5所述的装置, 其特征在于, 当所述扬声器为左、 右设置 时, 所述放音控制单元包括:  The device according to claim 5, wherein when the speaker is set to the left or the right, the sound emission control unit comprises:
宽度计算模块, 用于将远端用户的图像左右显示, 并计算远端用户头部位 置中心到显示图像中心的水平距离, 计算出所述水平距离与所述显示图像总宽 度的比值; a width calculation module for displaying the image of the remote user to the left and right, and calculating the head portion of the remote user Centering to a horizontal distance of the center of the display image, calculating a ratio of the horizontal distance to the total width of the displayed image;
水平放音模块, 用于根据左、 右扬声器的音量差值对左、 右扬声器的音量 调整后进行放音; 左扬声器与右扬声器音量的差值 =8X* ( 0.5—所述水平距离与 所述显示图像总宽度的比值) dB; 上述 X为用户设定的声音系数。  Horizontal playback module, used to adjust the volume of the left and right speakers according to the volume difference between the left and right speakers; the difference between the left speaker and the right speaker volume = 8X* (0.5 - the horizontal distance and the position The ratio of the total width of the displayed image) dB; The above X is the sound coefficient set by the user.
9、 一种实现视频通信的系统, 其特征在于, 所述系统包括: 远端设备、 本 地设备和多点控制单元媒体服务器;  9. A system for implementing video communication, the system comprising: a remote device, a local device, and a multipoint control unit media server;
远端设备, 用于采集远端用户的视频和音频数据, 并发送给媒体服务器; 交换;  The remote device is configured to collect video and audio data of the remote user and send the data to the media server;
本地设备, 用于在本地用户与远端用户建立连接后, 根据获取的远端用户 的头部位置信息确定所述远端用户对应的扬声器播放方式; 当远端用户发言时, 根据发言者对应的扬声器播放方式进行放音。  The local device is configured to: after the local user establishes a connection with the remote user, determine, according to the acquired location information of the remote user, a speaker playing mode of the remote user; when the remote user speaks, according to the speaker The speaker playback mode is used for playback.
10、 一种视频通信系统, 其特征在于, 所述系统包括: 远端设备、 本地设 备和多点控制单元媒体服务器;  10. A video communication system, the system comprising: a remote device, a local device, and a multipoint control unit media server;
远端设备, 用于采集远端用户的视频和音频数据, 并发送给媒体服务器; 媒体服务器, 用于完成远端设备与本地设备的视频和音频数据的交换, 及 在本地用户与远端用户建立连接后, 根据获取的远端用户的头部位置信息确定 所述远端用户对应的扬声器播放方式; 当远端用户发言时, 根据发言者对应的 扬声器播放方式向本地设备发送放音命令;  The remote device is configured to collect video and audio data of the remote user and send the data to the media server. The media server is used to exchange video and audio data between the remote device and the local device, and between the local user and the remote user. After the connection is established, the speaker playback mode corresponding to the remote user is determined according to the acquired location information of the remote user; when the remote user speaks, the playback command is sent to the local device according to the speaker playback mode corresponding to the speaker;
本地设备用于根据所述放音命令控制本地放音装置进行放音。  The local device is configured to control the local playback device to play according to the playback command.
PCT/CN2011/072198 2010-03-30 2011-03-28 Realization method and apparatus for video communication WO2011120407A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010137021.X 2010-03-30
CN 201010137021 CN102209225B (en) 2010-03-30 2010-03-30 Method and device for realizing video communication

Publications (1)

Publication Number Publication Date
WO2011120407A1 true WO2011120407A1 (en) 2011-10-06

Family

ID=44697862

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/072198 WO2011120407A1 (en) 2010-03-30 2011-03-28 Realization method and apparatus for video communication

Country Status (2)

Country Link
CN (1) CN102209225B (en)
WO (1) WO2011120407A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103888274A (en) * 2014-03-18 2014-06-25 华为技术有限公司 Communication method and device of sub meetings in virtual meeting
CN104270552A (en) * 2014-08-29 2015-01-07 华为技术有限公司 Sound image playing method and device
CN106774830B (en) * 2016-11-16 2020-04-14 网易(杭州)网络有限公司 Virtual reality system, voice interaction method and device
CN110049409B (en) * 2019-04-30 2021-02-19 中国联合网络通信集团有限公司 Dynamic stereo adjusting method and device for holographic image
CN112584299A (en) * 2020-12-09 2021-03-30 重庆邮电大学 Immersive conference system based on multi-excitation flat panel speaker

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1929593A (en) * 2005-09-07 2007-03-14 宝利通公司 Spatially correlated audio in multipoint videoconferencing
US20070097222A1 (en) * 2005-10-27 2007-05-03 Takeshi Makita Information processing apparatus and control method thereof
CN1984310A (en) * 2005-11-08 2007-06-20 Tcl通讯科技控股有限公司 Method and communication apparatus for reproducing a moving picture, and use in a videoconference system
CN101132516A (en) * 2007-09-28 2008-02-27 深圳华为通信技术有限公司 Method, system for video communication and device used for the same
CN101427574A (en) * 2006-04-20 2009-05-06 思科技术公司 System and method for providing location specific sound in a telepresence system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6829018B2 (en) * 2001-09-17 2004-12-07 Koninklijke Philips Electronics N.V. Three-dimensional sound creation assisted by visual information
CN101330585A (en) * 2007-06-20 2008-12-24 深圳Tcl新技术有限公司 Method and system for positioning sound
CN101459797B (en) * 2007-12-14 2012-02-01 深圳Tcl新技术有限公司 Sound positioning method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1929593A (en) * 2005-09-07 2007-03-14 宝利通公司 Spatially correlated audio in multipoint videoconferencing
US20070097222A1 (en) * 2005-10-27 2007-05-03 Takeshi Makita Information processing apparatus and control method thereof
CN1984310A (en) * 2005-11-08 2007-06-20 Tcl通讯科技控股有限公司 Method and communication apparatus for reproducing a moving picture, and use in a videoconference system
CN101427574A (en) * 2006-04-20 2009-05-06 思科技术公司 System and method for providing location specific sound in a telepresence system
CN101132516A (en) * 2007-09-28 2008-02-27 深圳华为通信技术有限公司 Method, system for video communication and device used for the same

Also Published As

Publication number Publication date
CN102209225B (en) 2013-04-17
CN102209225A (en) 2011-10-05

Similar Documents

Publication Publication Date Title
US20230216965A1 (en) Audio Conferencing Using a Distributed Array of Smartphones
US8115799B2 (en) Method and apparatus for obtaining acoustic source location information and a multimedia communication system
US8379076B2 (en) System and method for displaying a multipoint videoconference
US9113034B2 (en) Method and apparatus for processing audio in video communication
US6850496B1 (en) Virtual conference room for voice conferencing
TWI473009B (en) Systems for enhancing audio and methods for output audio from a computing device
JP4255461B2 (en) Stereo microphone processing for conference calls
US9049339B2 (en) Method for operating a conference system and device for a conference system
EP2568702B1 (en) Method and device for audio signal mixing processing
EP2352290B1 (en) Method and apparatus for matching audio and video signals during a videoconference
US20050280701A1 (en) Method and system for associating positional audio to positional video
US9025002B2 (en) Method and apparatus for playing audio of attendant at remote end and remote video conference system
US7720212B1 (en) Spatial audio conferencing system
WO2012142975A1 (en) Conference terminal audio signal processing method, and conference terminal and video conference system
WO2020063675A1 (en) Smart loudspeaker box and method for using smart loudspeaker box
WO2011120407A1 (en) Realization method and apparatus for video communication
CN102724604A (en) Sound processing method for video meeting
JP2006254064A (en) Remote conference system, sound image position allocating method, and sound quality setting method
US20220368554A1 (en) Method and system for processing remote active speech during a call
JP2006339869A (en) Apparatus for integrating video signal and voice signal
US11019216B1 (en) System and method for acoustically defined remote audience positions
JP2001036881A (en) Voice transmission system and voice reproduction device
JP2000059881A (en) Sound reproducer and communication conference device
WO2017211447A1 (en) Method for reproducing sound signals at a first location for a first participant within a conference with at least two further participants at at least one further location
TW201537986A (en) Audio/video synchronization device and audio/video synchronization method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11761981

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11761981

Country of ref document: EP

Kind code of ref document: A1