CN1845573A - Simultaneous interpretation video conference system and method for supporting high capacity mixed sound - Google Patents
Simultaneous interpretation video conference system and method for supporting high capacity mixed sound Download PDFInfo
- Publication number
- CN1845573A CN1845573A CN200610040060.1A CN200610040060A CN1845573A CN 1845573 A CN1845573 A CN 1845573A CN 200610040060 A CN200610040060 A CN 200610040060A CN 1845573 A CN1845573 A CN 1845573A
- Authority
- CN
- China
- Prior art keywords
- high capacity
- simultaneous interpretation
- sound
- conference system
- video conference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000001514 detection method Methods 0.000 claims abstract description 13
- 230000005540 biological transmission Effects 0.000 claims description 7
- 238000012706 support-vector machine Methods 0.000 claims description 6
- 230000003044 adaptive effect Effects 0.000 claims description 5
- 230000001360 synchronised effect Effects 0.000 claims description 2
- 238000001228 spectrum Methods 0.000 abstract 1
- 238000012545 processing Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000011664 signaling Effects 0.000 description 2
- 101100285899 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) SSE2 gene Proteins 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Landscapes
- Telephonic Communication Services (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The disclosed simultaneous interpretation video conference system comprises: based on Mel-scale reverse spectrum signature and SVM, applying silence detection method with higher silence detection rate to detect silence and normal voice, applying a large-scale mixing-voice method with voice short-time energy as weight for process, and defining new audio data package head format for the simultaneous interpretation.
Description
Technical field
The present invention is a kind of the Internet simultaneous interpretation video conference system that is used for, and has specifically solved the communication problem of a meeting room high capacity mixed sound and simultaneous interpretation.
Background technology
Along with the high speed development of industries such as domestic affairs concerning foreign affairs, foreign trade, a kind ofly can satisfy high capacity mixed sound and will have good application prospects with the multilingual voice-over-net communication platform that exchanges.
More common audio mixing framework is centralized and distributed audio mixing now, under centralized configuration, each conference terminal sends to the center frequency mixer with the voice data of oneself, finishes the audio mix process and the audio mixing result is fed back to all terminals on the frequency mixer of center.Under distributed frame, each conference terminal receives voice data and the independent audio mixing of carrying out on self website from other all members.Clearly, this mode has caused the double counting of audio mixing process, and Internet traffic is very big, causes network congestion and investment expensive easily.Centralized processing has and reduces the client amount of calculation, and Internet traffic is low, and is simple and be easy to characteristics such as realization.At present the less multimedia conference system of scale all is this processing mode that adopts, but along with the increase of parliamentary dimension, the drawback of centralized processing is also more and more obvious.At first be that the audio mixing amount of calculation increases along with the increase of participant number of terminals, audio mixing time-delay simultaneously must increase; Next is the decline of voice quality, present disclosed several audio mixing algorithms: linear superposition, on average adjust the method for weighting, the align method of weighting, the weak alignment method of weighting etc. by force, the shortcoming that volume reduces, random noise is overflowed and introduced in summation have audio mixing when audio mixing voice way reaches some after.Therefore,, generally all adopt right of speech to switch and realize for audio mixing quantity is limited, very inconvenient for the user like this.A part of the present invention is exactly in order to solve this a series of problem, and concrete grammar is to suppress the quiet transmission of speech end and use more effective sound mixing method in mixer by efficient mute detection method, can accomplish at least 20 tunnel real-time sound mixing in the use.
General multimedia conference system is that unit carries out speech processes with the meeting room, each meeting room has only a mixer, this pattern can't satisfy international style exchange activity requirement, the international style exchange activity comprises meeting, commercial affairs exchange, product recommendations can wait, this conferencing environment requires multilingual information to issue simultaneously and allows sponsor to exchange with the country variant personnel, and some video meeting systems of existing market must be offered a plurality of meeting rooms at different language, could guarantee the multilingual audio frequency can be simultaneously by audio mixing be sent to different objects, obvious this mode be uneconomic with bring the not convenient of operation.
Summary of the invention
In order to improve audio mixing efficient and to solve the simultaneous interpretation problem, the invention provides a kind of more efficient mute detection method, sound mixing method and simultaneous interpretation method.Can realize higher silence detection rate, carry out multilingual synchronous mixed audio than the more audio mixing way of other sound mixing method with at same meeting room.
The objective of the invention is to be achieved through the following technical solutions:
System adopts the centralized processing framework, has defined two main systems: client terminal (Terminal), multipoint control unit (MCU).Client terminal comprises functional modules such as coding and decoding video, audio coding decoding, control unit, transport network layer, auxiliary office, and audio coding decoding adopts the mute detection method that proposes below, and whether detected before compressed audio needs to compress these frame voice.Multipoint control unit generally is installed on the server, and MCU has comprised multiple spot control module and multiple spot processing module, and multiple spot processing module formula is with the sound mixing method of adaptive weighting in short-term that proposes below.
Support the method for high capacity mixed sound to realize by following steps:
1, client terminal sound intermediate frequency coding/decoding module uses provided by the invention based on Mel yardstick cepstrum feature and the transmission of SVMs mute detection method with the minimizing voice data.Here adopt Mel yardstick cepstrum coefficient as phonetic feature, Mel yardstick cepstrum coefficient utilizes the auditory masking effect of people's ear, voice is divided into a series of critical band forms leg-of-mutton bank of filters on frequency domain, be i.e. Mel filter sequence.The process of silence detection is:
1) the Mel yardstick cepstrum coefficient of extraction one frame voice data, Mel yardstick cepstrum coefficient (CMFCC) computing formula is as follows:
Wherein:
In the formula, o (l), c (l) and h (l) are respectively lower limit, center and the upper limiting frequency of 1 triangle filter.
2) with two category support vector machines the Mel yardstick cepstrum coefficient of audio frequency is differentiated, obtained normal voice and quiet two class results.Certainly also can use other grader, the present invention is unrestricted to this.
2, adaptive weighting sound mixing method in short-term in the multipoint control unit
Definition audio mixing weight w[j], at first calculate the averaged amplitude value of every road sound in k Frame:
Data[j in the following formula, i] i sample value of expression j road voice, alphabetical 1 represents the sample number of sound in the Frame.Then according to Avg[j] calculate the weight w[j that j road voice should occupy]:
Then according to w[j] sound is mixed:
The performing step of simultaneous interpretation method is: define new voice data packet header form, make tool can show languages.When MCU sets up meeting room, be that a meeting room is created n languages mixer.Show speech languages classification when speech side begins, reciever shows accepts the languages classification, perhaps to making a speech, accept languages setting.Judge when MCU receives audio frequency that this road audio frequency belongs to which meeting room, languages, and send into corresponding mixer.MCU transmits data behind the audio mixing respectively according to the request of reciever then.
Description of drawings
Fig. 1 is a modular structure schematic diagram of the present invention;
Fig. 2 is a system flow chart of the present invention.
Embodiment
1, Figure 1 shows that the composition frame chart of system module,,, after the encoded device compression,, send by network according to the certain format packing from the video and audio signal that input equipment obtains sending client terminal; At multipoint control unit, the multiple spot control module provides controlled function to all meetings, and the multiple spot processing module provides the data forwarding service; Receiving client terminal, at first unpacked from output packet, the video of acquisition, audio compression data are sent into output equipment after decoding, and user data and control data have also obtained corresponding processing.System comprises each function:
Coding and decoding video: finish redundant compressed encoding, can pass through MPEG4, H.264 wait realization to video code flow.
Audio coding decoding: finish the silence detection and the encoding and decoding of voice signal, and selectively add buffer delay to guarantee the continuity of voice, can use g.723, g729 etc. at receiving terminal.
Control unit: provide end-to-end signaling, to guarantee the proper communication of terminal.Defined request, replied, signaling and four kinds of information of indication, communicate capability negotiation by various terminal rooms, the opening/closing logic channel sends operations such as order or indication, finishes control of communication.
Transport network layer:, receive data from network simultaneously with data formattings such as video, audio frequency, control and transmission.In addition, also be responsible for to handle some such as logic divide frame, add sequence number, function such as error detection.
Auxiliary office: be used for realizing concrete operations functions such as electronic whiteboard, text chat, file transmission.
Fig. 2 has described the flow of data stream of system of the present invention middle pitch, video.The feature of sound, video and sequence number etc. can be realized by Real-time Transport Protocol, adopt TCP or udp protocol during transmission.
2, support the method for high capacity mixed sound to implement to describe: in the silence detection, Mel yardstick cepstrum coefficient is L=12, the inner product function of SVMs is selected RBF for use, and the training method of SVMs can adopt the SMO method, and the present invention is also unrestricted to this.
The adaptive weighting sound mixing method can be designed the computation structure of highly-parallelization in short-term.Notice the averaged amplitude value Avg[j of each road audio frequency in the formula (4)] calculating be separate, so each road can be calculated Avg[j concurrently].Mix this step and arrived, the calculating on each road remains separate, therefore is fit to carry out parallel computation equally.Also available MMX, SSE, SSE2 instruction set are optimized program in the programming process.Actual test shows, this method audio mixing is respond well, does not produce new audio mixing noise, has kept the details of former each road sound under the principle of volume justice preferably.
3, simultaneous interpretation technology is when concrete the use, each client can freely be selected the languages listened to from a plurality of different languages, for right to speak, need carry out authority setting, client for general identity, the languages of its speech can only be used a kind of languages of acquiescence, and having only identity is that the languages that translation or senior client can select to make a speech are other languages.Each client is all being uploaded to MCU after the audio compression of this locality, the languages of making a speech and selecting according to the client by MCU, in different mixers, mix behind the decompress(ion) respectively, and then listen to selected languages according to the client its needed languages recompression transmission is gone down.For making a speech and listening to the client who is in same languages, MCU also needs earlier its sound to be cut from the sound that mixes, and hears the sound of oneself to avoid this client.
Can effectively represent and distinguish the datagram languages type that sends or receive in order to make MCU, client, define new voice data packet header form, in data packet head, use many number of bits that languages are defined, use when general 3 bits just can satisfy 8 languages.
Claims (3)
1, a kind of simultaneous interpretation video conference system and method for supporting high capacity mixed sound is characterized in that it comprises:
(1) method of support high capacity mixed sound is by suppressing the quiet transmission of speech end and use adaptive weighting sound mixing method in short-term in the multipoint control unit mixer based on Mel yardstick cepstrum feature and SVMs mute detection method.
(2) same meeting room carries out multilingual synchronous mixed audio, has defined new voice data packet header form, and uses a plurality of audio mixing processes at a meeting room.
2, according to the simultaneous interpretation video conference system and the method for right 1 described support high capacity mixed sound, it is characterized in that: in the content (1), propose based on Mel yardstick cepstrum feature and SVMs mute detection method, adaptive weighting sound mixing method in short-term.
3, according to the simultaneous interpretation video conference system and the method for right 1 described support high capacity mixed sound, it is characterized in that: in the content (2), defined new voice data packet header form, and used a plurality of audio mixing processes at a meeting room.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200610040060.1A CN1845573A (en) | 2006-04-30 | 2006-04-30 | Simultaneous interpretation video conference system and method for supporting high capacity mixed sound |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200610040060.1A CN1845573A (en) | 2006-04-30 | 2006-04-30 | Simultaneous interpretation video conference system and method for supporting high capacity mixed sound |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1845573A true CN1845573A (en) | 2006-10-11 |
Family
ID=37064483
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200610040060.1A Pending CN1845573A (en) | 2006-04-30 | 2006-04-30 | Simultaneous interpretation video conference system and method for supporting high capacity mixed sound |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1845573A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008040258A1 (en) * | 2006-09-30 | 2008-04-10 | Huawei Technologies Co., Ltd. | System and method for realizing multi-language conference |
CN103327014A (en) * | 2013-06-06 | 2013-09-25 | 腾讯科技(深圳)有限公司 | Voice processing method, device and system |
CN105304079A (en) * | 2015-09-14 | 2016-02-03 | 上海可言信息技术有限公司 | Multi-party call multi-mode speech synthesis method and system |
CN106060707A (en) * | 2016-05-27 | 2016-10-26 | 北京小米移动软件有限公司 | Reverberation processing method and device |
CN107046523A (en) * | 2016-11-22 | 2017-08-15 | 深圳大学 | A kind of simultaneous interpretation method and client based on individual mobile terminal |
CN113257256A (en) * | 2021-07-14 | 2021-08-13 | 广州朗国电子科技股份有限公司 | Voice processing method, conference all-in-one machine, system and storage medium |
-
2006
- 2006-04-30 CN CN200610040060.1A patent/CN1845573A/en active Pending
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9031849B2 (en) | 2006-09-30 | 2015-05-12 | Huawei Technologies Co., Ltd. | System, method and multipoint control unit for providing multi-language conference |
WO2008040258A1 (en) * | 2006-09-30 | 2008-04-10 | Huawei Technologies Co., Ltd. | System and method for realizing multi-language conference |
US9311920B2 (en) * | 2013-06-06 | 2016-04-12 | Tencent Technology (Shenzhen) Company Limited | Voice processing method, apparatus, and system |
CN103327014A (en) * | 2013-06-06 | 2013-09-25 | 腾讯科技(深圳)有限公司 | Voice processing method, device and system |
WO2014194728A1 (en) * | 2013-06-06 | 2014-12-11 | Tencent Technology (Shenzhen) Company Limited | Voice processing method, apparatus, and system |
US20150112668A1 (en) * | 2013-06-06 | 2015-04-23 | Tencent Technology (Shenzhen) Company Limited | Voice processing method, apparatus, and system |
CN103327014B (en) * | 2013-06-06 | 2015-08-19 | 腾讯科技(深圳)有限公司 | A kind of method of speech processing, Apparatus and system |
CN105304079A (en) * | 2015-09-14 | 2016-02-03 | 上海可言信息技术有限公司 | Multi-party call multi-mode speech synthesis method and system |
CN105304079B (en) * | 2015-09-14 | 2019-05-07 | 上海可言信息技术有限公司 | A kind of multi-mode phoneme synthesizing method of multi-party call and system and server |
CN106060707A (en) * | 2016-05-27 | 2016-10-26 | 北京小米移动软件有限公司 | Reverberation processing method and device |
CN106060707B (en) * | 2016-05-27 | 2021-05-04 | 北京小米移动软件有限公司 | Reverberation processing method and device |
CN107046523A (en) * | 2016-11-22 | 2017-08-15 | 深圳大学 | A kind of simultaneous interpretation method and client based on individual mobile terminal |
CN113257256A (en) * | 2021-07-14 | 2021-08-13 | 广州朗国电子科技股份有限公司 | Voice processing method, conference all-in-one machine, system and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102226944B (en) | Audio mixing method and equipment thereof | |
CN112104836A (en) | Audio mixing method, system, storage medium and equipment for audio server | |
CN101502089B (en) | Method for carrying out an audio conference, audio conference device, and method for switching between encoders | |
US9456273B2 (en) | Audio mixing method, apparatus and system | |
US9462224B2 (en) | Guiding a desired outcome for an electronically hosted conference | |
Hardman et al. | Reliable audio for use over the Internet | |
CN105304079A (en) | Multi-party call multi-mode speech synthesis method and system | |
CN1845573A (en) | Simultaneous interpretation video conference system and method for supporting high capacity mixed sound | |
CN103988486B (en) | The method of active channel is selected in the audio mixing of multiparty teleconferencing | |
CN102741831B (en) | Scalable audio frequency in multidrop environment | |
CN101218813A (en) | Spatialization arrangement for conference call | |
CN101179693A (en) | Mixed audio processing method of session television system | |
CN101513030A (en) | Voice mixing method, multipoint conference server using the method, and program | |
CN113140225A (en) | Voice signal processing method and device, electronic equipment and storage medium | |
CN104167210A (en) | Lightweight class multi-side conference sound mixing method and device | |
CN107580155B (en) | Network telephone quality determination method, network telephone quality determination device, computer equipment and storage medium | |
CN102915736B (en) | Mixed audio processing method and stereo process system | |
WO2023202250A1 (en) | Audio transmission method and apparatus, terminal, storage medium and program product | |
CN115662437A (en) | Voice transcription method under scene of simultaneous use of multiple microphones | |
CN101502043A (en) | Method for carrying out a voice conference, and voice conference system | |
Baskaran et al. | Audio mixer with automatic gain controller for software based multipoint control unit | |
CN115831132A (en) | Audio encoding and decoding method, device, medium and electronic equipment | |
Sethi et al. | A new weighted audio mixing algorithm for a multipoint processor in a VoIP conferencing system | |
CN101123572A (en) | Packet loss hiding method | |
CN113936669A (en) | Data transmission method, system, device, computer readable storage medium and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C57 | Notification of unclear or unknown address | ||
DD01 | Delivery of document by public notice |
Addressee: Xue Wei Document name: Notification before expiration of term |
|
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Open date: 20061011 |