CN101166377A

CN101166377A - A low code rate coding and decoding scheme for multi-language circle stereo

Info

Publication number: CN101166377A
Application number: CNA2006101370321A
Authority: CN
Inventors: 施伟强
Original assignee: Individual
Current assignee: Individual
Priority date: 2006-10-17
Filing date: 2006-10-17
Publication date: 2008-04-23

Abstract

Belonging to technical area for encoding and decoding audio of digit TV and video disk, the invention solve issue how to reduce data quanitity to be transmitted and stored when digit TV or video disk transmits or stores multilingual accompanying sounds, all of which are walkaround stereo in 5.1 sound track. Main points of the technical scheme are that dialogue signal of multilingual human voices and one route of background sound signal in 5.1 sound track are compressed all alone respectively; then, after packaging and multiplexing operation, the method transmits or stores the said compressed data. When decoding and playing back, recombining and mixing the compressed dialogue signal of multilingual human voices (selecting one language) with background sound signal so as to restore walkaround stereo in 5.1 sound track for one language. The invention is applicable to digit TV and video disk (DVD,high resolution DVD) mainly.

Description

A kind of low code rate coding and decoding scheme for multi-language circle stereo

We can look forward to preface in the near future: when 2008 Olympic Games, we will provide the digitized live telecast program of high-quality to the whole world: image adopts the image of high definition, but also the explanation sound accompaniment of the multiple languages that can select for the user is provided, and the sound accompaniment of every kind of languages is the surround sound signal of 5.1 sound channels.The explanation that makes the spectator can select oneself to understand is also just like the audiovisual impression of coming to in-situ match personally.

And the video and audio CD disc of high definition as EVD, blue-ray DVD etc., can not only store the image of high definition, but but also the sound accompaniment of the sound accompaniment of the alternative multiple languages of stored record and every kind of languages is the surround sound signal of 5.1 sound channels.Make the beholder can select the dialogue sound accompaniment of own languages, can enjoy the audio true to nature of 5.1 sound channel surround sounds again to the full.

But under current encoder mode and coded format, the transmission code rate of the 5.1 sound channel surround sound signals of multilingual (multichannel) is very big, influences its a large amount of storages and transmission on video and audio CD disc and Digital Television.For this reason, I propose brand-new coding, the scheme of decoding,---a kind of low code rate coding and decoding scheme for multi-language circle stereo.For convenience of explanation, letter is called in this specification: sweet sound (sweetone)

(1) technical field:

This Audiotechnica scheme (sweetone) belongs to Digital Television and (comprises DTV, HDTV) and video disc (comprise DVD, high definition DVD) audio encoding and decoding technique field, specifically belong to Digital Television or video disc when the sound accompaniment of the sound accompaniment that transmits or store multiple languages and every kind of languages is 5.1 sound channel surround sounds, the encoding and decoding technique field that how to reduce transmission or storage data volume (code check).

(2) background technology:

Present DVD disc: 5.1 sound channel surround sound signals of the disc that has a record languages (single channel), the disc that has then records many Plant 5.1 sound channel surround sound signals of languages (multichannel).

The composition of every kind of languages 5.1 sound channel surround sound signals: no matter how ever-changing the signal of these 6 sound channels is, but the acoustic information that comprises mainly is:

(1) background sound (as music, natural environment audio and background voice, synthetic audio of computer or the like), at preposition L, R, mid-C around SL, SR, all has distribution in 6 sound channels of bass BASS.

(2) leading role or host's voice dialogue (speech speech) has two kinds of situations:＜1〉(most of video discs and TV programme) generally speaking, dialogue is positioned at the central authorities in the place ahead, sound channel of promptly mid-C.＜2〉video disc of having relatively high expectations at audio, mobile sense and the actual effect of leading role's (dialogue) in environment for performance leading role (dialogue), dialogue is at preposition L, R, mid-C, around certain sound channel among SL, the SR or certain several sound channel distribution is arranged all, the voice that the recording engineer will record on these sound channels has been processed into amplitude, phase place, frequency response, time-delay, reverberation ... Deng difference.

And contrast in the identical time, 5.1 sound channel surround sound signals of the various languages of same DVD or digital television program, can find that background sound (5.1 sound channel) signal in the 5.1 sound channel surround sound signals of various languages is on all four, unique different be the languages of hero or host's dialogue.Promptly in the identical time, 5.1 sound channel surround sound signals of every kind of languages all are to be formed by the same background sound signal and the dialogue voice audio mixing of different language separately.

Background sound and leading role or host's voice dialogue (speech speech) is audio mixing stored record together in the 5.1 sound channel surround sound signals of single languages (single channel).The mode (sequencing) of 5.1 sound channel surround sound signals of single languages (single channel) of gathering at present, encode is: the voice dialogue that collects earlier and the background sound audio mixing, form one tunnel 5.1 sound channel surround sound signal, and then with these road 5.1 sound channel surround sound Signal Compression, packaging multiplexing is for transmitting or storing.

And stored record among the DVD of 5.1 sound channel surround sound signals of multiple languages (multichannels): under the present record coding mode, in the 5.1 sound channel surround sound signals of every kind of languages: background sound and leading role or host's voice dialogue (speech speech) also is audio mixing stored record together.The mode (sequencing) of 5.1 sound channel surround sound signals of collection at present, coding multilingual (if 10 kinds) is: multiple languages (10 tunnel) the voice dialogue elder generation and the identical background sound audio mixing that collect, form 5.1 sound channel surround sound signals of multiple languages (10 tunnel), with 5.1 sound channel surround sound Signal Compression of various languages (10 tunnel), packaging multiplexing is for transmitting or storing again then.

In the digital signal code stream that DVD storage or Digital Television transmit, the per second kind is made up of the packets of audio data of a lot of unit interval sections, so decision audio signal data volume each second (code check) size is the size of the data volume of every section audio packet.

Under the mode (sequencing) of 5.1 sound channel surround sound signals of the single languages (single channel) of gathering at present, encode, the form of the every section audio packet that forms is shown in (Fig. 1): when DVD need store or Digital Television need transmit single 5.1 sound channel surround sound signals of planting languages (a tunnel) time, every section audio packet just has only the 5.1 sound channel surround sound compressed data packets of one group of unit sample period to constitute.To adopt AC-3 compressed format is example, and the code check that transmits 5.1 sound channel surround sound data per second kinds of a kind of languages (a tunnel) is 384kbps.

Gather at present, under the mode (sequencing) and coded format of 5.1 sound channel surround sound signals of coding multilingual (multichannel), the form of the every section audio packet that forms is shown in (Fig. 2): when DVD need store or Digital Television need transmit 5.1 sound channel surround sound signals of multiple languages (as 10 kinds) time, every section audio packet just has the 5.1 sound channel surround sound compressed data packets packaging multiplexings of many group (as 10 groups) unit sample periods to form.To adopt AC-3 compressed format is example, the code check that transmits 5.1 sound channel surround sound data per second kinds of multiple languages (10 tunnel) is about 3841kbps * 10=3840kbps, will be about 384kbps * 50=19200kbps if transmit the code check of 5.1 sound channel surround sound data per second kinds of 50 kinds of languages (50 tunnel).Under current encoder mode (sequencing) and the coded format, the transmission code rate of the 5.1 sound channel surround sound signals of multilingual (multichannel) is very big, influences its a large amount of storages and transmission on DVD and Digital Television.

Up-to-date high density video discs such as blue-ray DVD, although the video disc capacity is very big, but record high definition, high-quality video/audio signal can make the storage data volume heighten and take bigger memory space, so also there is certain difficulty in the 5.1 sound channel surround sound signals that write down multiple languages with current encoder mode (sequencing) and coded format on this type of high density video disc.

Equally, although the broader bandwidth of each program channel of high definition TV HDTV, the data volume of transmission is heightened and take bigger bandwidth but transmit high definition, high-quality video/audio signal, so the 5.1 sound channel surround sound signals that transmit multiple languages with current encoder mode (sequencing) and coded format also have certain difficulty.

There is any simple method can reduce the data volume (code check) of the 5.1 sound channel surround sound signals of multilingual (multichannel) so, can make and can store the down 5.1 sound channel surround sound signals of multilingual (multichannel) on the video disc, perhaps can in the HDTV program, transmit the 5.1 sound channel surround sound signals of multilingual (multichannel) and take less bandwidth?

But the mode (sequencing) and the coded format (as Fig. 2) of 5.1 sound channel surround sound signals from present collection, coding multilingual (multichannel), can know that the identical background sound signal that is included in the sound channel surround sound signal (compressed data packets) of identical sampling periods 5.1 of multichannel (10 tunnel) also is repeated to transmit or stored repeatedly (10 times), the background sound signal that is repeated to transmit or store repeatedly can be described as redundant data.

(3) summary of the invention:

Since gather at present, under the mode (sequencing) and coded format of 5.1 sound channel surround sound signals of multilingual (multichannel) of encoding, redundant background sound signal can not be deleted, (because the 5.1 sound channel surround sound signals of each languages (each road) all can have the background sound signal), be difficult to reduce the data volume (code check) of 5.1 multilingual sound channel surround sound signals, still designed brand-new coding, decoding scheme, temporary name: sweet sound (sweetone), can reduce the data volume (code check) of storing or transmitting 5.1 multilingual sound channel surround sound signals.

Sweetone scheme principle is very simple, change a thinking exactly: the mode (sequencing) of the 5.1 sound channel surround sound signals of will gather at present, encode multilingual (multichannel): at front end, multilingual voice dialogue elder generation and background sound audio mixing, the mode of coding transmission is changed into the mode (sequencing) of sweetone again: at front end, multilingual voice dialogue and background sound be audio mixing not, multilingual voice dialogue voice and the independent coding transmission of background sound, in terminal (client) again with voice dialogue and background sound audio mixing.Detailed statement is promptly: by the voice dialogue and the background sound audio mixing of every kind of languages will gathering earlier, 5.1 sound channel surround sound Signal Compression of every kind of languages that will mix again, to transmit behind the 5.1 sound channel surround sound compressed data packets packaging multiplexings of multiple languages (multichannel) or the mode of storage changes into then: the voice dialogue of every kind of languages and background sound be audio mixing in advance not, with separately independently the voice of multiple languages the compression respectively separately of white signal and independent one tunnel 5.1 sound channel background sound signals is transmitted or storage behind the 5.1 sound channel background sound Signal Compression packet packaging multiplexings with the voice dialogue Signal Compression packet of multiple languages (multichannel) and independent a tunnel then.In decoding during playback, again will through decompress separately independently voice white signal (select in the multiple languages a kind of) and background sound signal reorganization audio mixing are restored 5.1 sound channel surround sound signals of any one languages.

Under the sweetone encoding scheme, originally be repeated to transmit or store background sound signal repeatedly, be reduced to only transmit or storage once.And the surround sound signal of the various languages that originally transmitted or stored, also becoming only needs to transmit or store the voice of various languages to white signal.Thereby reduced the data volume (code check) of storing or transmit 5.1 multilingual sound channel surround sound signals.

According to 2 kinds of situations of leading role or host's voice dialogue, my concept sweet sound (sweetone) scheme of two kinds of versions: sweet sound version 1 (sweetone version 1) and sweet sound version 2 (sweetone version 2).

(1) sweetone version 1 scheme is used (most of video discs and TV programme) in the ordinary course of things, and the voice dialogue only is positioned at the central authorities in the place ahead, sound channel of promptly mid-C.So the voice dialogue of every kind of languages only need transmit or store the voice data of a sound channel (monophony) and get final product.When client, terminal (Digital Television, video disc player) playback, monaural voice to the center channels signal audio mixing in white signal (select in the multiple languages a kind of) and the background sound signal, is restored the 5.1 sound channel surround sound signals of a kind of (the voice dialogue is positioned at the central authorities in the place ahead) of languages.

Under sweetone version 1 scheme, the signal encoding form as shown in Figure 3: every section audio packet is exactly to be formed by the monaural voice dialogue compressed data packets of the identical sampling period of multiple languages (for example 10 tunnel) and the background sound compressed data packets packaging multiplexing of one tunnel 5.1 sound channel.

When adopting AC-3 compressed format, the code check that transmits a kind of languages (a tunnel) monaural audio data each second is 64kbps, so the code check of transmission 10 kinds of languages (10 tunnel) monaural audio data each second is about 64kbps * 10=640kbps

When adopting AC-3 compressed format, the code check of one tunnel 5.1 sound channel surround sound background sound voice data each second is 384kbps

640+384=1024kbps, when so sweetone version 1 coding and decoding scheme adopts AC-3 compressed format, the code check that transmits 5.1 sound channel surround sound voice data each seconds of 10 kinds of languages (10 tunnel) is 1024kbps, general up till now coding and decoding scheme (as Fig. 2) transmit 10 kinds of languages (10 tunnel) 5.1 sound channel surround sound voice data each seconds transmission code rate 3840kbps 1/3.

Can, transmit how many kinds of languages (road) 5.1 sound channel surround sound voice datas with sweetone version 1 coding and decoding scheme with general coding and decoding scheme? when adopting when transmitting the same transmission code rate of the code check 3840kbps of 10 tunnel 5.1 sound channel surround sound voice data each seconds

3840-384 (code check of one tunnel 5.1 sound channel surround sound background sound voice data each second is 384kbps)=3456kbps

3456 ÷ 64 (a kind of languages (a tunnel) monophony voice dialogue voice data each second code check be 64kbps)=54 kinds (road)

Can transmit 54 kinds of languages (road), 5.1 sound channel surround sound voice datas with sweetone version 1 coding and decoding scheme, be under the general coding and decoding scheme identical traffic code check more than 5 times.

We know: the Hz-KHz of voice dialogue audio signal is less than the scope of 0-20KHZ, and dynamic range neither be very big.Therefore, suitably reduce sampling frequency and quantizing bit number during to voice dialogue audio signal sample, can under the situation of the tonequality that does not reduce voice dialogue audio signal, further reduce the voice dialogue voice data code check of each second.For example, the code check of a kind of languages (a tunnel) monophony voice dialogue voice data each second can be got 32kbps.

3456 ÷ 32=108 kinds (road)

Like this, just can transmit 108 kinds of languages (road), 5.1 sound channel surround sound voice datas with sweetone version 1 coding and decoding scheme, be under the general coding and decoding scheme identical traffic code check more than 10 times.

And for other audio compression forms (as DTS, MPEG-2, EAC, AAC etc.), the same with AC-3, also can reduce the data volume (code check) of storing or transmitting 5.1 multilingual sound channel surround sound signals with the sweetone coding and decoding scheme than general coding and decoding scheme.

(2) the audio video disc or the TV programme of having relatively high expectations, mobile sense and the actual effect of leading role's (dialogue) in environment for performance leading role (dialogue), the voice dialogue is at preposition L, R, and mid-C all might distribute around certain sound channel among SL, the SR or certain several sound channel.General coding and decoding scheme be exactly with voice to white signal through amplitude, phase place, frequency response, time-delay, reverberation ... handle certain sound channel or certain the several sound channels of back audio mixing in 5 sound channels of background sound signal etc. multiple audio, compressed again, packing back transmission or storage, that yes is feasible for the high-transmission code check for this.And sweetone version 1 coding and decoding scheme is to be purpose with the low transmission code check: the voice dialogue of every kind of languages and background sound be audio mixing in advance not, and every section audio packet is exactly to be formed by the monaural voice dialogue compressed data packets of the identical sampling period of multiple languages (for example 10 tunnel) and the background sound compressed data packets packaging multiplexing of one tunnel 5.1 sound channel.If will show leading role's's (dialogue) mobile sense and the actual effect of leading role's (dialogue) in environment, and become the monaural voice of transmission to white signal for the voice after the audio processing of the several sound channels of transmission or 5 sound channels to white signal, the signal code check will be increased, obviously can not in sweetone version 1 coding and decoding scheme, use.

But know from the process of present making 5.1 sound channel surround sound signals: voice to white signal be by the voice-over actor when listening background sound and watching picture, with the collection separately of a sound channel.Then, again by the recording engineer with the voice dialogue by the story of a play or opera need be through amplitude, phase place, frequency response, time-delay, reverberation ... after multiple audio handles, in audio mixing certain several sound channel or 5 sound channels in 5 sound channels of background sound signal.

In view of the above, my concept sweetone version 2 coding and decoding schemes: voice dialogue not precompose audio is handled, the background sound signal audio mixing of also getting along well.As sweetone version 1 coding and decoding scheme, the voice data that also only needs to transmit a sound channel (monophony) of every kind of languages, make transmission code rate lower, but in the voice dialogue voice data that transmits, add the low rate code that is referred to as " descriptor ".When client, terminal (Digital Television, video disc player) decoding playback, the parameter of utilizing " descriptor " reflection with monaural voice to white signal (select in the multiple languages a kind of) through amplitude, phase place, frequency response, time-delay, reverberation ... after multiple audio processing, in audio mixing certain sound channel or certain several sound channel in 5 sound channels of background sound signal.Restore 5.1 sound channel surround sound signals of a kind of languages (the voice dialogue also has the sense of moving and actual effect in environment is arranged).

Under sweetone version 2 schemes, the signal encoding form as shown in Figure 4: every section audio packet is exactly monaural voice dialogue compressed data packets and a tunnel by the identical sampling period of multiple languages (for example 10 tunnel) to be used for describing interior voice of this sampling period white signal is formed at the descriptor of the parameter of L, C, R, SL, each sound channel of SR and the background sound compressed data packets packaging multiplexing of one tunnel 5.1 sound channel.

Because sweetone version 2 schemes slightly different from the Signal coding form of sweetone version 1 scheme (the former Duos one road descriptor than the latter in every section audio packet) are take AC-3 compressed format as example; The code check that adopts sweetone version 2 schemes to transmit 5.1 sound channel surround sound voice data each seconds of 10 kinds of languages (10 tunnel) is larger than the 1024kbps of sweetone version 1 scheme, but also not up till now general coding and decoding scheme (such as Fig. 2) transmit 10 kinds of languages (10 tunnel) 5.1 sound channel surround sound voice data each seconds transmission code rate 3840kbps 1/3.

When adopting when transmitting the same transmission code rate of the code check 3840kbps of 10 tunnel 5.1 sound channel surround sound voice data each seconds, also can transmit 5.1 sound channel surround sound voice datas more than 50 kinds of languages (50 tunnel) even the 100 kinds of languages (100 tunnel) with sweetone version2 coding and decoding scheme with general coding and decoding scheme.

Explanation about " descriptor ":

" descriptor " is used for describing voice dialogue data: (as shown in Figure 5) the different parameters of the different parameters of each sound channel volume, frequency response in each sound channel, in each sound channel different parameters, the different parameters of in each sound channel, delaying time, the different parameters of reverberation in each sound channel of phase place

Because the dialogue of each languages is all recorded when listening background sound and watching picture by the voice-over actor, the shape of the mouth as one speaks on pronunciation and the picture is synchronous.So shared one road descriptor is described the parameter of voice dialogue in each sound channel of 5.1 sound channel surround sound signals of various languages, can reduce the code check (every kind of languages are respectively with one road descriptor, and the code check of transmission will increase) of transmission.In addition, the words of voice dialogue signal common one road descriptor of various languages, also can make the voice of the various languages of playback that the audio of white signal is all had good equally effect, and the difference of (descriptor) cause the quality of audio different can not make the audio of white signal because of various languages voice the time.

The parameter of descriptor is similar to each hairclipper of sound console, audio mixing equipment, the diverse location of knob, and during owing to the actual fabrication audio, the adjusting change frequency of each hairclipper, knob is not a lot of in a second.So audio data stream of being made up of a lot of section audio packets for the per second kind: actual conditions are to have the descriptor basically identical in a lot of sections continuous packets of audio data not change in the per second kind.Therefore, we can transmit vicissitudinous descriptor, and will not repeat to transmit to the descriptor that does not have to change, and this also is the method that reduces transmission code rate.

In sum, the sweetone coding and decoding scheme has been compared following technical characterstic, advantage with at present general coding and decoding scheme:

(1) under the situation of the surround sound signal that transmits or store languages quantity mutually of the same race and identical tonequality, the sweetone coding and decoding scheme is lower than the transmission code rate of general coding and decoding scheme, can save the memory space on the video disc disc, save the frequency bandwidth that Digital Television transmits.

(2) under the same transmission code rate, when transmitting or storing the surround sound signal of identical tonequality, the sweetone coding and decoding scheme can transmit or store the surround sound signal of more kinds of languages than general coding and decoding scheme.

(3) under the same transmission code rate, and under the situation of the surround sound signal of transmission or storage languages quantity mutually of the same race, the sweetone coding and decoding scheme transmits or the signal of storage can be than the sample frequency or the quantizing bit number of the employing higher standard of general coding and decoding scheme, promptly better tonequality.

(4) the sweetone coding and decoding scheme also can reduce by 6.1 sound channels of multilingual (multichannel) or the code check of 7.1 sound channel surround sound signals, and efficient is higher except the code check of the 5.1 sound channel surround sound signals that can reduce multilingual (multichannel).

(5) adaptation is wide.The sweetone coding and decoding scheme is supported multiple compressed formats such as MPEG-2, AC-3, DTS, EAC, AAC.

(6) can provide the trick-play functions of sound: as have only background sound not have voice dialogue (being applicable to Karaoke), there is the voice dialogue not have background sound (being applicable to foreign language studying), the voice dialogue (is applicable to the pattern at midnight than the relative volume of background sound is bigger, the beholder can be known hear dialogue, but background sound is not too disturbed others again).

(7) in the sweetone coding and decoding scheme, voice dialogue and background sound are independent coding transmission, also help the post-production and the editor of program.

(4) embodiment:

(1) sweetone version 1 coding and decoding scheme is to the processing mode of audio signal:

Because in sweetone version 1 coding and decoding scheme: during terminal (Digital Television, video disc player) playback, the voice dialogue only is positioned sound channel of mid-C, so need not handle voice dialogue audio at front end.The signal that collects just can be encoded in real time, transmission, playback.The 5.1 multilingual sound channel surround sound sound accompaniments that can be applicable to the on-the-spot broadcasting program are broadcasted.Certainly, this programme also is applicable to the occasion that non real-time transmits.

Front end: multiple languages (multichannel) the monophony voice that collects compresses with identical section sample time white signal simultaneously, is compressed into the monaural compressed data packets of multichannel.The 5.1 sound channel background sound surround sound signals that collect also compress with identical section sample time simultaneously, are compressed into the compressed data packets of one tunnel 5.1 sound channel.(compressed format can be compressed format arbitrarily such as MPEG-2, AC-3, DTS, EAC, AAC).For the needs that transmit, store, again with identical sample time section the monaural voice dialogue of multichannel compressed data packets and the background sound compressed data packets packaging multiplexing of one tunnel 5.1 sound channel become one section sample time section packets of audio data.In sweetone version 1 coding and decoding scheme, the coded format of every section audio packet as shown in Figure 3.(among the figure be example to transmit 10 kinds of languages (10 tunnel)).Repeat said process and just produce the continuous packets of audio data of a lot of sections, and audio data stream is made up of a lot of sections continuous packets of audio data.

Because in sweetone version 1 coding and decoding scheme, the coded format of every section audio packet is different with the form of general encoding scheme, so the voicefrequency circuit of general MPEG-1/2/4 decoder chips can't be handled sweetone version 1 packets of audio data.Therefore, can develop sweetoneversion 1 audio decoding circuit.(as Fig. 6)

Terminal: (Digital Television, in the video disc player) sweetone version 1 audio decoding circuit is the every section audio packet elder generation demultiplexing in sweetone version 1 audio data stream that receives, (user is by CPU control sweetone version 1 audio decoding circuit) selected the monaural voice dialogue of a kind of languages (a tunnel) compressed data packets and decompressed, background sound compressed data packets with 5.1 sound channels decompresses simultaneously, then with the monaural voice of a kind of languages (a tunnel) behind the decompress(ion) to the center channels C audio mixing in white signal and the background sound signal, repeat the 5.1 sound channel surround sound signals that said process just restores a kind of languages continuously.

Should note compatibility during design sweetone version 1 audio decoding circuit: handle the back audio mixing in each sound channel though 1 circuit can not carry out the parameter of the monaural voice dialogue data based " descriptor " in the packet of sweetone version 2 coded formats audio with sweetone version 2 coded formats and general coded format signal, but can restore the 5.1 sound channel surround sound signals that the voice dialogue is positioned at center channels with monaural voice to the center channels C audio mixing in white signal and the background sound signal.2 5.1 sound channel surround sound signals for general coded format, circuit is done the just straight-through output of decompression processing to it.

(2) sweetone version 2 coding and decoding schemes are to the processing mode of audio signal:

Because in sweetone version 2 coding and decoding schemes: be performance leading role's (voice dialogue) mobile sense and the actual effect of leading role's (voice dialogue) in environment, during terminal (Digital Television, video disc player) playback, the voice dialogue all might distribute in five sound channels.So need the voice dialogue signal of recorded in mono is made parameter---the descriptor of reflection voice dialogue audio at front end.Therefore, sweetone version 2 coding and decoding schemes are applicable to the occasion that non real-time transmits: the 5.1 multilingual sound channel surround sound sound accompaniments that are used in the movie or television play of sufficient time post-production.

Because non-linear audio workstation is generally all used in audio processing now, post-production.General audio process software is exactly directly voice to be carried out audio to white signal to handle the back audio mixing and go in several or five sound channels in the background sound signal to go.Sweetone version 2 coding and decoding schemes do not carry out directly to voice dialogue primary signal that audio is handled and audio mixing, but add parameter---the descriptor (as Fig. 5) of the effect in reflection voice dialogue several in the background sound signal or five sound channels.Make software so can develop sweetone version 2.0 audios, be used to make descriptor.

Front end: white signal is made one road descriptor signal according to the situation of the story of a play or opera and the situation of background sound for earlier the monophony voice that records.During making, every segment descriptor should with the voice dialogue signal Synchronization of corresponding each of section sample time.

Multiple languages (multichannel) the monophony voice that will record again compresses with identical section sample time simultaneously white signal, is compressed into the monaural compressed data packets of multichannel.The 5.1 sound channel background sound surround sound signals that record also compress with identical section sample time simultaneously, are compressed into the compressed data packets of one tunnel 5.1 sound channel.(compressed format can be compressed format arbitrarily such as MPEG-2, AC-3, DTS, EAC, AAC).For the needs that transmit, store, with identical sample time section the monaural voice dialogue of multichannel compressed data packets and time with it background sound compressed data packets packaging multiplexing of going up corresponding descriptor and one tunnel 5.1 sound channel become one section sample time section packets of audio data.In sweetone version 2 coding and decoding schemes, the coded format of every section audio packet as shown in Figure 4.(among the figure be example to transmit 10 kinds of languages (10 tunnel)).Repeat said process and just produce the continuous packets of audio data of a lot of sections, and audio data stream is made up of a lot of sections continuous packets of audio data.

Because in the sweetone version2 coding and decoding scheme, the coded format of every section audio packet is different with the coded format of general encoding scheme and sweetone version1 coding and decoding scheme, so can develop sweetone version 2 audio decoding circuits.(as Fig. 7)

Terminal: (in Digital Television, the video disc player) sweetone version2 audio decoding circuit is with the every section audio packet elder generation demultiplexing in the sweetone version2 audio data stream that receives, (user is by CPU control sweetone version.2 audio decoding circuit) selected the monaural voice dialogue of a kind of languages (a tunnel) compressed data packets and decompressed, and the background sound compressed data packets with 5.1 sound channels decompresses simultaneously.Then the monaural voice of a kind of languages (a tunnel) behind the decompress(ion) is carried out the DSP digital audio to white signal according to the parameter of descriptor reflection and handle the back audio mixing in the several of background sound signal or five sound channels, repeat the 5.1 sound channel surround sound signals that voice dialogue that said process just restores a kind of languages also has the sense of moving and actual effect in environment is arranged continuously.

Should note the compatibility with sweetone version 1 coded format and general coded format signal during design sweetone version 2 audio decoding circuits: the packet of 1 sweetone version, 1 coded format does not have descriptor, so only with monaural voice to the center channels C audio mixing in white signal and the background sound signal, restore the 5.1 sound channel surround sound signals that the voice dialogue is positioned at center channels.2 5.1 sound channel surround sound signals for general coded format, circuit is done the just straight-through output of decompression processing to it.

(5) description of drawings:

(Fig. 1) general coded format: when DVD need store or Digital Television need transmit single 5.1 sound channel surround sound signals of planting languages (a tunnel) time, every section audio packet just has only the 5.1 sound channel surround sound compressed data packets of one group of unit sample period to constitute.

(Fig. 2) general coded format: when DVD need store or Digital Television need transmit 5.1 sound channel surround sound signals of multiple languages (as 10 kinds) time, every section audio packet just has the 5.1 sound channel surround sound compressed data packets packaging multiplexings of many group (10 groups) unit sample periods to form.

(Fig. 3) sweetone version 1 coded format: when DVD need store or Digital Television need transmit 5.1 sound channel surround sound signals of multiple languages (as 10 kinds) time, every section audio packet is exactly to be formed by the monaural voice dialogue compressed data packets of the identical sampling period of multiple languages (10 tunnel) and the background sound compressed data packets packaging multiplexing of one tunnel 5.1 sound channel.

(Fig. 4) sweetone version2 coded format: when DVD need store or Digital Television need transmit 5.1 sound channel surround sound signals of multiple languages (as 10 kinds) time, every section audio packet is exactly that monaural voice dialogue compressed data packets and a tunnel by the identical sampling period of multiple languages (10 tunnel) is used for describing interior voice of this sampling period white signal is formed at the descriptor of the parameter of L, C, R, SL, each sound channel of SR and the background sound compressed data packets packaging multiplexing of one tunnel 5.1 sound channel.

(Fig. 5) descriptor is used for describing voice dialogue data: the different parameters of the different parameters of each sound channel volume, frequency response in each sound channel, in each sound channel different parameters, the different parameters of in each sound channel, delaying time, the different parameters of reverberation in each sound channel of phase place

(Fig. 6) sweetone version 1 audio decoding circuit block diagram

(Fig. 7) sweetone version 2 audio decoding circuit block diagrams

Claims

The code encoding/decoding mode (sequencing) of present multilingual (multichannel) 5.1 sound channel surround sound signals is: multiple languages (multichannel) the voice dialogue elder generation and the identical background sound audio mixing that collect, form 5.1 sound channel surround sound signals of multiple languages (multichannel), then with 5.1 sound channel surround sound Signal Compression of various languages (multichannel), again with 5.1 sound channel surround sound packed data packaging multiplexings of various languages (multichannel) for transmitting or storing.At terminal (Digital Television, video disc player): the user selects a kind of languages (a tunnel) 5.1 sound channel surround sound packed datas and decompresses 5.1 sound channel surround sound signals of a kind of languages (a tunnel) of can resetting out by CPU control audio decoding circuit.

Independent claims of the present invention: the technical characterictic that preamble and characteristic are write exactly lumps together, and limits the claimed scope of the present invention's (sweetone coding and decoding scheme).

Preamble

The subject name of claimed invention: a kind of low code rate coding and decoding scheme for multi-language circle stereo.Scheme is called for short: sweet sound (sweetone)

The total essential features of sweetone coding and decoding scheme and present coding and decoding scheme is: multiple languages (multichannel) voice dialogue and identical background sound are to want audio mixing, the 5.1 sound channel surround sound signals (voice that promptly comprises is to bletilla background sound signal) of various languages (multichannel) also compress, the 5.1 sound channel surround sound packed datas (voice that promptly comprises is to bletilla background sound packed data) of various languages (multichannel) also are that packaging multiplexing is for transmitting or storing, in terminal (Digital Television, video disc player): 5.1 sound channel surround sound signals of a kind of languages (a tunnel) of resetting out, the user also is packed data and the decompression that need select a kind of languages (a tunnel) by CPU control audio decoding circuit.

Characteristic

The distinctive technical characterictic of sweetone coding and decoding scheme is (being different from present code encoding/decoding mode part):

Sweetone version 1 code encoding/decoding mode (sequencing) is: the voice dialogue of every kind of languages and background sound be audio mixing in advance not, with separately independently the voice of multiple languages the compression respectively separately of white signal and independent one tunnel 5.1 sound channel background sound signals is transmitted or storage behind the 5.1 sound channel background sound Signal Compression packet packaging multiplexings with the voice dialogue Signal Compression packet of multiple languages (multichannel) and independent a tunnel then.When terminal (Digital Television, video disc player) decoding playback, again will through decompress separately independently monophony voice white signal (user selects a kind of in the multiple languages by CPU control audio decoding circuit) and background sound signal reorganization audio mixing are restored the 5.1 sound channel surround sound signals of any one languages (the voice dialogue is positioned at the place ahead central authorities).

Sweetone version 2 coding and decoding schemes (sequencing) are: voice dialogue not precompose audio is handled the background sound signal audio mixing in advance of also getting along well.In the voice dialogue voice data that transmits, add the low rate code that is referred to as " descriptor ".Will be separately the voice of independently multiple languages (multichannel) to white signal and independent one tunnel 5.1 sound channel background sound signals compression respectively separately, then with the voice dialogue Signal Compression packet of multiple languages (multichannel) with the time transmits or stores after going up corresponding descriptor and independent one tunnel 5.1 sound channel background sound compressed data packets packaging multiplexings with it.When client, terminal (Digital Television, video disc player) decoding playback, ((user selects a kind of in the multiple languages by CPU control audio decoding circuit) is through amplitude, phase place, frequency response, time-delay, reverberation to white signal to utilize monaural voice that the parameter of " descriptor " reflection will be through decompressing ... after multiple audio processing, in audio mixing certain sound channel or certain several sound channel in 5 sound channels of background sound signal.Restore 5.1 sound channel surround sound signals of a kind of languages (the voice dialogue also has the sense of moving and actual effect in environment is arranged).

Annotate: " descriptor " here is used for describing voice dialogue data: the different parameters of the different parameters of each sound channel volume, frequency response in each sound channel, in each sound channel different parameters, the different parameters of in each sound channel, delaying time, the different parameters of reverberation in each sound channel of phase place