WO2010094219A1 - Procédé et dispositif de traitement et de reproduction de signaux vocaux - Google Patents

Procédé et dispositif de traitement et de reproduction de signaux vocaux Download PDF

Info

Publication number
WO2010094219A1
WO2010094219A1 PCT/CN2010/070491 CN2010070491W WO2010094219A1 WO 2010094219 A1 WO2010094219 A1 WO 2010094219A1 CN 2010070491 W CN2010070491 W CN 2010070491W WO 2010094219 A1 WO2010094219 A1 WO 2010094219A1
Authority
WO
WIPO (PCT)
Prior art keywords
site
largest
frequency band
information
orientation
Prior art date
Application number
PCT/CN2010/070491
Other languages
English (en)
Chinese (zh)
Inventor
梁丽燕
刘智辉
Original Assignee
华为终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为终端有限公司 filed Critical 华为终端有限公司
Publication of WO2010094219A1 publication Critical patent/WO2010094219A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting

Definitions

  • the present invention relates to the field of video communication technologies, and in particular, to a method and apparatus for processing and playing a voice signal.
  • each conference site participating in the conference encodes the local voice signal and image signal and sends it to the MCU (Multipoint Control Unit), and the MCU processes the received voice signal and image signal, and processes the signal.
  • the subsequent voice signal and image signal are sent to each venue terminal, and the conference field plays the decoded voice signal and the image signal, thereby realizing video communication.
  • the MCU calculates the envelope of the voice signal of each site after the speech signal is processed, and compares the envelopes of the voice signal to the N sites with the largest envelope as the largest N-party venue, and then maximizes The voice signal of the N-party site is mixed and sent to other sites outside the largest N-party site in the conference.
  • the voice signal received by the largest N-party site is the voice signal of the largest N-1 party site other than the site where it is located. Mix signal. Therefore, after the conference site decodes the received mix signal, the other N-party sites can hear the voice of the largest N-party site, and the largest N-party can hear the other N-1. The voice of the party venue.
  • Embodiments of the present invention provide a method and apparatus for mixing and playing a voice signal to improve the spatial hearing effect of the video conference.
  • the embodiment of the invention discloses a method for processing a voice signal, comprising: determining, at a maximum N-party venue, each time in the mixed signal according to the orientation information set for the meeting place participating in the conference Azimuth information of the site with the largest energy in the frequency band; the sounding signal of the largest N-party site and the orientation information of the site with the largest energy in each frequency band at each time are sent to the conference terminal participating in the conference.
  • the embodiment of the invention further discloses a method for playing a voice signal, comprising: acquiring a mixing signal of a maximum N-party venue and a position information of a site with the largest energy in each frequency band at each time; according to the auditory space parameter of the playing device Corresponding relationship between the orientation information, obtaining an auditory spatial parameter of the playback device corresponding to the orientation information of the site with the largest energy at each frequency band at each time; adjusting the mix by using the auditory spatial parameter of the playback device Signal, and play the adjusted mix signal.
  • the embodiment of the invention further discloses a processing device for a voice signal, comprising: an orientation determining unit, configured to determine each moment in the mixed signal in the largest N-party venue according to the orientation information set for the meeting place participating in the conference The orientation information of the site with the largest energy in each frequency band; the sending unit, configured to send the sound mixing signal of the largest N-party meeting site and the position information of the site with the largest energy in each frequency band at each time to the conference terminal participating in the meeting .
  • the embodiment of the invention further discloses a playback device for a voice signal, comprising: an acquisition unit, configured to acquire a sound mixing signal of a maximum N-party venue and a position information of a site with the largest energy in each frequency band at each time; a unit, configured to obtain, according to a correspondence between the auditory spatial parameter and the azimuth information of the playback device, an auditory spatial parameter of the playback device corresponding to the orientation information of the site with the largest energy in each frequency band at each time; And for adjusting the mixing signal by using an auditory space parameter of the playing device, so as to play the adjusted mixing signal.
  • the orientation information when processing the voice signal, the orientation information is set in advance for all the sites participating in the conference, and in the largest N-party conference site, the orientation of the site with the largest energy in each frequency band at each moment is determined. Information, the orientation information is sent together with the mixing signal of the largest N-party venue.
  • the spatial parameters of each playing device at the playing end are obtained, and the spatial parameters of the playing device are used to adjust the mixing signal.
  • the auditory space of each mixing site can be reconstructed at the venue, so that the sound of the largest N-party venue has a spatial stereoscopic feeling during playback, and the user can clearly understand each of the largest N-party venues.
  • the sound of the user adds to the user's experience of the spot.
  • FIG. 2-a is a schematic diagram of a position of 10 conference venues
  • Figure 2-b is a schematic diagram of the orientation of four sites in a multi-screen
  • Figure 3-a shows the orientation of the four largest 4-party venues
  • Figure 3-b is a schematic diagram of the orientation of four sites in a multi-screen
  • Figure 4 shows the setting method of the orientation when the number of multi-screens is 16, and the number of orientations is 4;
  • Figure 5 is a schematic diagram of processing of a voice signal in the present invention.
  • FIG. 6 is a structural diagram of a method for processing a voice signal according to Embodiment 2 of the present invention
  • FIG. 7 is a flowchart of a method for playing a voice signal according to Embodiment 3 of the present invention
  • FIG. 9 is a structural diagram of a device for playing a voice signal according to Embodiment 4 of the present invention.
  • FIG. 1 is a flowchart of a method for processing a voice signal according to the present invention, where the method includes the following steps:
  • Step 101 Determine, according to the orientation information set for the conference venue participating in the conference, the location information of the site with the largest energy in each frequency band at each moment in the mixed signal in the maximum N-party conference site;
  • the voice signal of the largest N-party site is first time-frequency transformed, the voice signal in the time domain is converted into a voice signal in the frequency domain, and then the energy value in each frequency band at each time is calculated, and each time is obtained.
  • the site with the highest energy in each frequency band is finally determined according to the orientation information set for the meeting place participating in the meeting, and the orientation information of the site with the largest energy in each frequency band is determined.
  • the location information of the site with the largest energy in the largest N-party site in each frequency band at each time can be determined by two methods.
  • a method for determining the method is as follows: According to the order of joining the conference sites participating in the conference, the orientation of the conference site is set in advance. When comparing the energy of each band in the speech signal by comparing the largest N-party venue After obtaining the site with the largest energy in the largest N-party site in each frequency band at each time, it is determined whether the site with the largest energy is in the multi-screen, and if so, the orientation information of the site with the largest energy is set to be more The screen orientation information, if not, sets the orientation information of the site with the largest energy to the preset orientation information. For example, in a videoconferencing system, there are ten venues participating in the conference. The first conference site number is 1, the second conference site number is 2, and so on. The tenth conference site number is 10.
  • the orientation of field 1-3 is set to the upper left
  • the orientation of field 4-6 is set to the upper right
  • the orientation of field 7-8 is set to the lower left, which will be 9-10.
  • the orientation is set to the lower right, please refer to Figure 2-a.
  • Figure 2-a shows the orientation of the 10 joining venues.
  • the site 1-4 is the largest 4-party site, and in a certain frequency band at a certain time, the site 1 is the site with the largest energy in the largest 4-party site, and it is determined whether the site 1 is in the multi-screen, when the site 1 In multi-screen, set the orientation information of field 1 in multi-screen to the orientation information of site 1.
  • site 1 is at the bottom right of the multi-screen, see Figure 2-b, Figure 2-b is multi-screen.
  • the orientation information of the four sites is the right lower part of the site.
  • the orientation information of the site is obtained.
  • the orientation information of the site 1 is the upper left.
  • the other method is as follows: After determining the maximum N-party site, the orientation of the largest N-party site is set in advance according to the order of joining the largest N-party site, and the orientation information of the largest N-party site is obtained.
  • the orientation information of the site with the largest energy is set as the orientation information of the site with the largest energy in the multi-screen, and if not, the site with the largest energy.
  • the orientation information is set to the orientation information of the preset maximum N-party venue. Take the video communication between the above ten sites as an example.
  • the venues 1-4 are the maximum 4-party venues. According to the order of joining the venues 1-4, the orientation of the field 1 is set to the upper left, and the orientation of the field 2 is set. For the upper right, set the orientation of field 3 to the lower left, and set the orientation of field 4 to the lower right. See Figure 3-a. Figure 3-a shows the orientation of the four largest 4-party venues.
  • the orientation information of the site 1 in the multi-screen is set to the orientation information of the site 1.
  • the site 1 is at the lower right of the multi-screen.
  • Figure 3-b shows the orientation of the four sites in the multi-screen.
  • the location information of field 1 is When the site 1 is not in the multi-screen, it can be obtained according to the preset position of the maximum 4-party site.
  • the orientation information of the site 1 is the upper left.
  • the orientation information of the site with the largest energy also changes correspondingly with the change of the orientation.
  • the conference site 1-4 is the maximum 4-party venue.
  • the orientation of the field 1 is set to the upper left
  • the orientation of the field 2 is Set to the upper right, set the orientation of field 3 to the lower left, and set the orientation of field 4 to the lower right.
  • the site 1 is the site with the largest energy in the largest 4-party site
  • the orientation information of the field 1 is the orientation information of the site 1 in the multi-screen, assuming The orientation of the site 1 in the multi-screen is the upper left, and the orientation information of the field 1 is the upper left.
  • the orientation information of field 1 changes accordingly to the upper right.
  • the method for setting the orientation information of the site with the largest energy in the largest N-party site is not limited, and the orientation information is not limited to the four directions of the upper left, the upper right, the lower left, and the lower right.
  • the site in the multi-picture cannot completely correspond to any one of the positions.
  • the site in the multi-picture cannot correspond to any one of the positions, and the most similar position is taken for the site in the multi-picture.
  • 4 is a setting method in which the number of multi-pictures is 16, and the number of orientations is 4, and the orientation of the venue 7 in the figure is set to the upper right according to the approximation principle.
  • Step 102 Send the mixed signal of the largest N-party venue and the orientation information of the site with the highest energy in each frequency band at each time.
  • the sound mixing signal of the largest N-party site and the orientation information of the site with the largest energy in each frequency band at each time are first encoded, respectively, and the mixed code stream and the position information stream are respectively obtained, and then the sound is mixed.
  • the code stream and the azimuth information code stream are sent to the site terminal participating in the conference; or, only the mixed signal of the largest N-party site may be encoded to obtain a mixed code stream, and then the mixed code stream and each time of each time
  • the location information of the site with the largest energy in the frequency band is sent to the site terminal participating in the conference. For example, if the destination site belongs to the largest N-party site, the mix signal sent to the site is the mix signal of the largest N-1 site other than the site.
  • FIG. 5 is a schematic diagram of processing of a voice signal according to the present invention.
  • FIG. 6 is a structural diagram of a processing apparatus for a voice signal according to the present invention.
  • the apparatus includes an orientation determining unit 601 and a transmitting unit 602. The internal structure and connection relationship will be further described below in conjunction with the working principle of the device.
  • the position determining unit 601 is configured to determine, according to the orientation information set for the meeting place of the meeting, the orientation information of the site with the largest energy in each frequency band at each time in the largest N-party meeting place;
  • the sending unit 602 is configured to send the sound mixing signal of the largest N-party venue and the orientation information of the site with the largest energy in each frequency band at each time.
  • the orientation determining unit 601 may include: a first orientation determining unit 603, configured to pre-set an orientation for the conference site participating in the conference according to the order of joining, to obtain preset orientation information; and comparing unit 604, for comparing The maximum value of the energy value of the voice signal of each of the N-party sites in each frequency band is obtained, and the first setting unit 605 is configured to: when the site with the largest energy is not in the multi-picture, according to The preset orientation information sets the orientation information of the site with the largest energy; the second setting unit 606 is configured to set the orientation information of the site with the largest energy according to the multi-screen orientation information when the site with the largest energy is in the multi-screen.
  • the orientation determining unit 601 may further include: a second orientation presetting unit, configured to pre-set the orientation for the largest N-party venue according to the order of joining, and obtain preset orientation information of the largest N-party venue; a comparison unit, configured to compare energy values of each frequency band of the voice signal of the largest N-party site at each time, to obtain a site with the largest energy in each frequency band at each time; and a third setting unit, configured to use the maximum energy When the site is not in the multi-screen, the orientation information of the site with the largest energy is set according to the preset orientation information.
  • the fourth setting unit is configured to set the maximum energy according to the multi-screen orientation information when the site with the largest energy is in the multi-screen. Location information of the venue.
  • the sending unit 602 may include: a first sending unit 607 and/or a second sending unit 608, where the first sending unit 607 is configured to use the mixed signal and the maximum energy in each frequency band at each time.
  • the orientation information is encoded, and the mixed code stream and the position information stream are respectively obtained, and the mixed code stream and the position information code stream are sent to the conference terminal participating in the conference;
  • the second sending unit 608 is configured to encode the mixed signal to obtain a mixed code stream, and send the mixed code stream and the position information of the site with the largest energy in each frequency band at each time to participate.
  • the venue terminal of the conference Embodiment 3 Referring to FIG. 7, FIG. 7 is a flowchart of a method for playing a voice signal according to the present invention, and the method includes the following steps:
  • Step 701 Acquire a mixing signal of a maximum N-party site and a position information of a site with the largest energy in each frequency band at each time;
  • the location information of the site with the largest energy in the largest N-party site is determined from the location information of the largest N-party site based on the site number.
  • Step 702 According to the correspondence between the auditory space parameter and the orientation information of the playback device, Obtaining an auditory spatial parameter of the playback device corresponding to the orientation information of the site with the highest energy on each frequency band at each time;
  • the auditory spatial parameters of the playback device include a level parameter and a delay parameter.
  • the specific implementation process of step 902 may be: firstly setting a level parameter and a delay parameter corresponding to the azimuth information for the playback device, and acquiring, in step 701, the orientation information of the site with the highest energy in each frequency band at each time. After that, the corresponding relationship between the orientation information set by the playback device and the level parameter and the delay parameter is queried, and the level parameter of the playback device corresponding to the orientation information of the site with the largest energy at each time band is obtained. And delay parameters.
  • the position information of the site with the largest energy in a certain frequency band acquired is the upper left
  • the level parameters and delay parameters of the two speakers can be obtained as follows: 1) Speaker 1 Level parameter in the upper left; 2) Level parameter in the upper left of speaker 2; 3) Delay parameter in the upper left of speaker 1; 4) Delay parameter in the upper left of speaker 2.
  • Step 703 Adjust the mixing signal by using the auditory space parameter of the playing device to play the adjusted mixing signal.
  • the time-frequency conversion of the mixed signal is first performed, and the mixed signal in the time domain is converted into the mixed signal in the frequency domain, and the orientation information corresponding to the site with the largest energy in each frequency band is obtained.
  • the level and delay of the mixing signal in the frequency domain are adjusted by using the auditory spatial parameters of the playing device in each frequency band.
  • Figure 8 and Figure 8 for the adjustment of the auditory space parameters of the playback device in each frequency band.
  • FIG. 9 is a structural diagram of a playback apparatus for a voice signal according to the present invention.
  • the apparatus includes an acquisition unit 901, a spatial parameter obtaining unit 902, and an adjustment unit 903.
  • the internal structure and connection relationship will be further described below in conjunction with the working principle of the device.
  • the obtaining unit 901 is configured to acquire a mixing signal of a maximum N-party venue and each frequency band at each moment The orientation information of the site with the largest energy;
  • the spatial parameter obtaining unit 902 is configured to obtain, according to the correspondence between the auditory spatial parameter and the orientation information of the playback device, the auditory space of the playback device corresponding to the orientation information of the site with the largest energy in each frequency band at each time.
  • the adjusting unit 903 is configured to adjust the mixed signal by using an auditory space parameter of the playing device, so as to play the adjusted mixed signal.
  • the obtaining unit 901 may include:
  • a first receiving unit 904 configured to receive a mixed code stream and a position information stream
  • the first decoding unit 905 is configured to decode the mixed code stream and the azimuth information code stream to obtain the mixed signal and the orientation information of the site with the largest energy in each frequency band at each time.
  • the first receiving unit 904 may be replaced by a second receiving unit, configured to receive the mixed code stream and the orientation information of the site with the largest energy in each frequency band at each time; the first decoding unit 905 may be replaced with the second decoding. And a unit, configured to decode the mixed code stream to obtain the mixed signal.
  • the obtaining unit 901 may further include a first receiving unit, a first decoding unit, and a second receiving unit, and a second decoding unit.
  • the spatial parameter obtaining unit 902 may include:
  • the auditory spatial parameter preset unit 906 is configured to preset a level parameter and a delay parameter corresponding to the orientation information for the playback device;
  • the query unit 907 is configured to query a correspondence between the orientation information and the level parameter and the delay parameter, and obtain a level parameter corresponding to the orientation information of the site with the largest energy in each frequency band at each moment. Delay parameter.
  • Fig. 9 does not define a complete structural diagram of a playback apparatus for a voice signal, but merely highlights the part involved in the inventive aspect of the present invention. It will be clear to those skilled in the art that the playback device of the voice signal should also include a player, and the adjusted mix signal output by the adjustment unit 903 is used as an input signal of the player. The player plays the adjusted mix signal.
  • the orientation information is set in advance for all the sites participating in the conference, and in the largest N-party conference site, the orientation of the site with the largest energy in each frequency band is determined. Information that transmits the orientation information along with the mixing signal.
  • the spatial parameters of each playback device at the playback end are obtained, and the spatial parameters of the playback device are used to adjust the mixing signal, which will be adjusted.
  • the auditory space of the sound source can be reconstructed in the venue, so that the sound of the largest N-party venue has a spatial stereoscopic feeling during playback, and the user can clearly understand the sound of each of the largest N-party venues, and further increases the sound.
  • the user's experience of the spot experience is not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, buthepta, a spatial stereoscopic feeling during playback, and the user can clearly understand the sound of each of the largest N-party venues, and further increases the sound.
  • the orientation information of the site with the largest energy will change accordingly with the change of the orientation of the site in the multi-picture, so that the orientation of the source is made when the voice signal is played.
  • the orientation of the images is consistent, further increasing the user's experience of the presence experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

L'invention porte sur un procédé et un dispositif de traitement et de reproduction de signaux vocaux, ledit procédé de traitement comprenant : selon des informations d'emplacement réglées pour des salles de conférence participant à la conférence, la détermination des informations d'emplacement de la salle de conférence ayant l'énergie maximale à chaque instant et dans chaque bande de fréquence parmi les salles de conférence de N correspondants au maximum; la transmission des signaux vocaux mélangés des salles de conférence de N correspondants au maximum et des informations d'emplacement de la salle de conférence ayant l'énergie maximale à chaque instant et dans chaque bande de fréquences à des terminaux dans les salles de conférence participant à la conférence. Le procédé de reproduction comprend : l'obtention des signaux vocaux mélangés et des informations d'emplacement de la salle de conférence ayant l'énergie maximale dans chaque bande de fréquence; selon les correspondances entre des paramètres spatiaux auditifs d'un dispositif de reproduction et des informations d'emplacement, l'obtention des paramètres spatiaux auditifs du dispositif de reproduction correspondant aux informations d'emplacement de la salle de conférence ayant l'énergie maximale dans chaque bande de fréquence; l'ajustement desdits signaux vocaux mélangés par utilisation des paramètres spatiaux auditifs du dispositif de reproduction; et la reproduction des signaux vocaux mélangés après ajustement. Selon le mode de réalisation de la présente invention, l'effet auditif spatial de la visioconférence est amélioré.
PCT/CN2010/070491 2009-02-19 2010-02-03 Procédé et dispositif de traitement et de reproduction de signaux vocaux WO2010094219A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200910005681XA CN101510988B (zh) 2009-02-19 2009-02-19 一种语音信号的处理、播放方法和装置
CN200910005681.X 2009-02-19

Publications (1)

Publication Number Publication Date
WO2010094219A1 true WO2010094219A1 (fr) 2010-08-26

Family

ID=41003219

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/070491 WO2010094219A1 (fr) 2009-02-19 2010-02-03 Procédé et dispositif de traitement et de reproduction de signaux vocaux

Country Status (2)

Country Link
CN (1) CN101510988B (fr)
WO (1) WO2010094219A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101951492A (zh) * 2010-09-15 2011-01-19 中兴通讯股份有限公司 视频通话中视频录制的方法及装置
CN116403589A (zh) * 2023-03-01 2023-07-07 天地阳光通信科技(北京)有限公司 一种音频处理方法、单元及系统

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510988B (zh) * 2009-02-19 2012-03-21 华为终端有限公司 一种语音信号的处理、播放方法和装置
CN102222503B (zh) * 2010-04-14 2013-08-28 华为终端有限公司 一种音频信号的混音处理方法、装置及系统
CN102270456B (zh) * 2010-06-07 2012-11-21 华为终端有限公司 一种音频信号的混音处理方法及装置
CN101877643B (zh) * 2010-06-29 2014-12-10 中兴通讯股份有限公司 多点混音远景呈现方法、装置及系统
CN102436818A (zh) * 2011-10-25 2012-05-02 浙江万朋网络技术有限公司 一种基于能量优先的服务器端选路混音方法
CN103794216B (zh) * 2014-02-12 2016-08-24 能力天空科技(北京)有限公司 一种语音混音处理方法及装置
CN103870234B (zh) * 2014-02-27 2017-03-15 北京六间房科技有限公司 一种混音方法及其装置
CN104167210A (zh) * 2014-08-21 2014-11-26 华侨大学 一种轻量级的多方会议混音方法和装置
CN115065571B (zh) * 2022-06-14 2023-10-27 南昌职业大学 一种用于大会场的语音设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030026441A1 (en) * 2001-05-04 2003-02-06 Christof Faller Perceptual synthesis of auditory scenes
JP2005110103A (ja) * 2003-10-01 2005-04-21 Kyushu Electronics Systems Inc テレビ会議における音声の定位方法
US20050135280A1 (en) * 2003-12-18 2005-06-23 Lam Siu H. Distributed processing in conference call systems
CN101510988A (zh) * 2009-02-19 2009-08-19 深圳华为通信技术有限公司 一种语音信号的处理、播放方法和装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007052726A1 (fr) * 2005-11-02 2007-05-10 Yamaha Corporation Dispositif pour teleconference
CN1937664B (zh) * 2006-09-30 2010-11-10 华为技术有限公司 一种实现多语言会议的系统及方法
CN101179693B (zh) * 2007-09-26 2011-02-02 深圳市迪威视讯股份有限公司 一种会议电视系统的混音处理方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030026441A1 (en) * 2001-05-04 2003-02-06 Christof Faller Perceptual synthesis of auditory scenes
JP2005110103A (ja) * 2003-10-01 2005-04-21 Kyushu Electronics Systems Inc テレビ会議における音声の定位方法
US20050135280A1 (en) * 2003-12-18 2005-06-23 Lam Siu H. Distributed processing in conference call systems
CN101510988A (zh) * 2009-02-19 2009-08-19 深圳华为通信技术有限公司 一种语音信号的处理、播放方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101951492A (zh) * 2010-09-15 2011-01-19 中兴通讯股份有限公司 视频通话中视频录制的方法及装置
CN116403589A (zh) * 2023-03-01 2023-07-07 天地阳光通信科技(北京)有限公司 一种音频处理方法、单元及系统

Also Published As

Publication number Publication date
CN101510988A (zh) 2009-08-19
CN101510988B (zh) 2012-03-21

Similar Documents

Publication Publication Date Title
WO2010094219A1 (fr) Procédé et dispositif de traitement et de reproduction de signaux vocaux
US9843455B2 (en) Conferencing system with spatial rendering of audio data
US8477950B2 (en) Home theater component for a virtualized home theater system
US20190073993A1 (en) Artificially generated speech for a communication session
US8243120B2 (en) Method and device for realizing private session in multipoint conference
WO2011153905A1 (fr) Procédé et dispositif pour un traitement de mélange de signaux audio
US9113034B2 (en) Method and apparatus for processing audio in video communication
US9172912B2 (en) Telepresence method, terminal and system
US20110261151A1 (en) Video and audio processing method, multipoint control unit and videoconference system
US20130064387A1 (en) Audio processing method, system, and control server
US8749611B2 (en) Video conference system
WO2009043275A1 (fr) Procede, systeme et dispositif de communication video
WO2013053336A1 (fr) Procédé, dispositif et système de mélange de sons
WO2011057511A1 (fr) Procédé, appareil et système pour l'implémentation de mixage audio
WO2012142975A1 (fr) Procédé de traitement de signal audio de terminal de conférence, terminal de conférence et système de vidéoconférence
WO2008014697A1 (fr) Procédé et dispositif pour obtenir des informations de position initiale acoustiques et système e communication multimédia
CN112135285B (zh) 多蓝牙音频设备的实时音频交互方法
WO2011127816A1 (fr) Procédé, dispositif et système pour le traitement du mixage de signaux audio
WO2011015136A1 (fr) Procédé, équipement et système de commande de conférence
US9088690B2 (en) Video conference system
WO2014094461A1 (fr) Procédé, dispositif et système pour traiter des informations vidéo/audio en vidéoconférence
WO2012055291A1 (fr) Procédé et système de transmission de données audio
JP3818054B2 (ja) 多地点ビデオ会議制御装置、音声切替え方法、およびそのプログラムを記録した記録媒体
WO2011120407A1 (fr) Procédé et appareil de réalisation pour communication vidéo
WO2014026478A1 (fr) Procédé de traitement de signal de visioconférence, serveur de visioconférence et système de visioconférence

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10743400

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10743400

Country of ref document: EP

Kind code of ref document: A1