CN103474076A

CN103474076A - Method and device for transmitting aligned multichannel audio frequency

Info

Publication number: CN103474076A
Application number: CN2013103564124A
Authority: CN
Inventors: A.R.琼斯
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2008-10-06
Filing date: 2008-10-06
Publication date: 2013-12-25
Anticipated expiration: 2028-10-06
Also published as: CN103474076B

Abstract

The invention relates to a method for encoding audio frequency and combing the encoded audio frequency into digital transport streams. The method comprises inputting and receiving audio signals at the same position in multiple periods on an encoder, appointing same timestamps to all the audio signals at the same position in the multiple periods at each unit interval, and combining the audio signals with the same timestamps into the digital transport streams. The invention further provides a method for decoding encoded data, an encoding device and a decoding device.

Description

Method and apparatus for delivery of the multi-channel audio alignd

Technical field

The present invention relates generally to audio coding, and be specifically related to the method and apparatus for delivery of hyperchannel (multi-channel) audio frequency of alignment.

Background technology

Be provided for the mode of a plurality of Voice & Video components of transmission in single transport stream such as the modern audiovisual coding such as MPEG-1 and MPEG-2.With the audio component separated, with selected video component, can align separately.Such as synchronous multi-channel audios such as surround sounds, only according to single premixed surround sound audio component, provide, for example, single Dolby 5.1 audio components.Yet, the current mode that is not provided for transmitting with synchronous versions individuation multi-channel sound frequency component.

Particularly, MPEG-1 and MPEG-2 audio frequency standard (being respectively ISO/IEC 11172-3 and ISO/IEC 13818-3) are described the mode of digital audio signal coding and package.These modes comprise the scheme of specifying the various forms of multi-channel sounds for supporting to use single mpeg 2 transport stream component.MPEG-1 audio system before these regulation back compatibles.In the prior art, only have by several voice-grade channels being pooled in this type of single transmission component, just may guarantee the desired synchronous of passage.These schemes all require:

[a] use surround sound compression method (for example, Dolby 5.1) or

[b] used proprietary compress technique, or

[c] used not compressed audio.

Use the surround sound compression method by utilizing the redundancy existed between several passages, and the human auditory system's who utilizes some spatial character make sound to become can not to detect and therefore can shelter in processing feature, reduced the desired bit rate of hyperchannel.These complicated schemes provide processes the appropriate ways that wherein expection only has single code level of a Code And Decode operation, but may need recompile signal several times for transmitting in network, for feasible and the operation reason (for example, source is fed to central editing facility from remote location), they are not desirable selections.This audio quality that has been the cascade problem worse that produces due to the repeatedly encoding operation that carries out successively.Especially in the situation that capacity limited be like this, cause bit rate significantly to reduce, stay surplus (headroom) seldom process concatenated coding and transmit in this type of deterioration.

Use proprietary compress technique General Requirements to use other external dedicated equipment, cause larger expense and operation complexity.The method also may suffer to worsen more than the equal in quality of the cascade generation of a coding/decoding level.

For example, yet for example,, if audio frequency sends (, not compressing the linear PCM sample) with uncompressed form, the data transfer rate required is high data transfer rate (, every binary channels is to about 3Mbit/s).

Although when final audio-visual media is provided to the consumer, the above is not generally problem, but it has proposed the problem of audio-visual media making industry really, because the sector utilizes ubiquitous Modern High-Speed data network making between facility instant " original " audio-visual media in sending compressed format (more and more, for making the source material of TV, film and other media), perhaps in fact from making facility, send TV or audio network point of departure, for example ground forwarder, satellite uplink or wired head end.

For example, position film crew generally is fed to teletorium by audiovisual materials, in order to edit and be distributed to subordinate TV station, and finally is broadcast to the beholder.Above-mentioned audiovisual coding standard does not allow to send synchronous multi-channel audio without premixed ground, therefore, has increased the complicacy of its field apparatus, or has stoped them that multi-channel audio is provided.

Exist one need to be to transmit the multi-channel audio had the requirement of channel-to-channel alignment accurately especially, making at multichannel time unifying is that important situation subaudio frequency signal can be encoded to the surround sound audio frequency subsequently, this coding is used above-mentioned mpeg standard, because most of making apparatus has been set to use together with these standards.

Correspondingly, the present invention has proposed method and apparatus, and the method and apparatus of these proposals, when keeping the alignment of interchannel correct time and sound quality, provides for delivery of the cost-effective of multi-channel audio and machine-processed easily.

Summary of the invention

Embodiments of the invention provide a kind of and are included in the method in digital transport stream by audio coding and by described coded audio, it is included in the scrambler input and receives the upper sound signal in same position of a plurality of time, time per unit is assigned to identical timestamp all signals of the upper sound signal in same position of described a plurality of time, and the sound signal that will be added with identical time stamp is attached in digital transport stream.

Optionally, the step received also comprises the frame with the voice data that forms pre-sizing to the upper sampled audio signal in same position of time, and the described frame of alignment voice data to be to keep sound signal in time in same position, and wherein assign the step of identical time stamp to carry out on the aligned frame of voice data.

Optionally, the method arranges the aligned frame of audio compressed data by identical audio coder configuration before also being included in and assigning timestamp, and the voice data that will compress and be added with identical time stamp is assigned to a plurality of single channels of transport stream.

Optionally, a plurality of single channel comprises two single (dual mono) audio components of one or more routines.

Optionally, pre-sizing is the size of addressed location in mpeg standard (Access Unit), and video transmission stream is MPEG-1 or mpeg 2 transport stream.

Optionally, timestamp is the presentative time stamp.

Optionally, in the method for any one of front claim, audio frequency is attached to step in digital video frequency flow and comprises compression and voice data that be added with identical time stamp are multiplexed in transport stream.

Embodiments of the invention also provide a kind of method that will comprise the digital transport stream decoding of the audio frequency of encoding according to any above-mentioned coding method, it comprises a plurality of sound signals that are added with identical time stamp of reception, mean upper each voice-grade channel in same position of a plurality of time, stab to determine shared timestamp detection time, and, according to the timestamp detected, using on a plurality of time, in each voice-grade channel of same position, as a plurality of passages, export.

Optionally, a plurality of sound signals that are added with identical time stamp have been sampled and have alignd to form the aligned frame of voice data, and wherein identical time stamp has been applied to the aligned frame of voice data.

Optionally, the aligned frame of voice data is compressed before the appointment of timestamp, and the method also comprises the described frame of voice data is decompressed to produce each sound signal for output.

Optionally, the step of exporting upper each voice-grade channel in same position of a plurality of time comprises that service time, the only timestamp of a sound signal of the upper sound signal in same position presented audio frequency.

Optionally, digital transport stream is digital video transport stream, and the aligned frame of voice data comprises the PES grouping.

Embodiments of the invention also provide a kind of encoding device that is suitable for carrying out any above-mentioned coding method.

Embodiments of the invention also provide a kind of decoding device that is suitable for carrying out any above-mentioned coding/decoding method.

Embodiments of the invention also provide a kind of digital transmission system, and this system comprises at least one described encoding device, at least one described decoding device and the communication link between it.

Embodiments of the invention also provide a kind of computer-readable media that carries instruction, and described instruction impels computer logic to carry out any described coding, decoding or two kinds of methods when carrying out.

Embodiments of the invention also provide a kind of for coded audio and from a plurality of time the voice-grade channel in same position produce the encoding device of transport stream, this encoding device comprises: for carry out at least one scrambler of coded audio according to predetermined compression; Packet function of every scrambler, the predetermined portions for the audio frequency package by coding to audio frequency; Collect function, be suitable for providing identical time stamp to packet function in order to be included in a plurality of predetermined portions of voice data, make the audio frequency indicative audio passage of coding in time in same position; And the multiplexer of the right output for multiplexing at least one scrambler and packet function together.

The accompanying drawing explanation

To with reference to accompanying drawing, the method and apparatus for delivery of the multi-channel audio of alignment be described only by example now, wherein:

Fig. 1 illustrates the schematic block diagram according to the part of the analog or digital list encoding device of prior art;

Fig. 2 illustrates the schematic block diagram according to the part of the analog or digital list decoding device of prior art;

Fig. 3 illustrates the schematic block diagram of the part of or two single encoding devices stereo according to the analog or digital of prior art;

Fig. 4 illustrates the schematic block diagram of the part of or two single decoding devices stereo according to the analog or digital of prior art;

Fig. 5 illustrates according to an embodiment of the invention the process flow diagram for delivery of the coded portion of the method for the multi-channel audio of alignment;

Fig. 6 illustrates according to an embodiment of the invention the process flow diagram for delivery of the decoded portion of the method for the multi-channel audio of alignment;

Fig. 7 illustrates the schematic block diagram of the part of multichannel analog according to an embodiment of the invention or numerical coding equipment;

Fig. 8 illustrates the schematic block diagram of the part of multichannel analog according to an embodiment of the invention or numeral decoding device.

Embodiment

Describe one embodiment of the invention now with reference to accompanying drawing, in accompanying drawing, for same or similar part or step, provide same or similar reference numerals.

Following content will be based on Moving Picture Experts Group-2.Yet, will understand, the basis invention is equally applicable to support other compressed audio standard of two single codings, as Advanced Audio Coding (AAC) or Dolby numeral (Dolby Digital).

The mode of MPEG-1 and MPEG-2 audio frequency specification description coding and package digital audio and video signals.The sound signal of processing is delivered to mpeg system layer (ISO/IEC 13818-1) so that further package, in transport stream (TS), then transmits by communication networks such as telecommunications or broadcast system.These MPEG package rule definitions provide the grammer of structure to bit stream.Particularly, bit stream comprises timestamp, and timestamp is used for controlling the sequential of the output audio of decoding and recovering by demoder.These timestamps are used for Voice & Video component accurate timing order.

When the audio or video of the demoder timestamp (DTS) when the coded data that the timestamp that the mpeg standard definition is two types-definition receives will present to demoder and definition decoding will be by system output so that the presentative time stamp (PTS) of hearing respectively or seeing.What the most often use is the timestamp of a rear type.

By these timestamps of managing as described in more detail below, can suitably present the several independent sound signal for the hyperchannel set of coding or decoding according to the audiovisual transfer system of an implementation column of the present invention, that realizes thus requiring between the hyperchannel set is synchronous simultaneously.

Fig. 1 illustrates the schematic block diagram according to the part of the analog or digital list encoding device of prior art, and it illustrates by for example such as the system flow of the voice data of the cataloged procedure of MPEG-2.Decode procedure is its inverse process, and shown in Figure 2.

All examples in figure illustrate dual analog 110 and numeral 105 inputs, analog input before input coding device 130 by modulus (A/D) converter 120 to carry out digitizing.The direct input coding device 130 of DAB 105.Each passage is meaned by label a-d.Yet, will understand, the invention is not restricted to the passage of any magnitude setting, and fully scalable, and the audio frequency input can be only analog format, numeric only form or as directed pair of form.

In the situation that input is in analog form, before simulated sound input coding device 130, for example with the form of linear impulsive code modulation (PCM), it is carried out to digital sample, wherein it is converted into the form that bit reduces.

Scrambler 130 outputs to packet function 140 by the digital bit stream of a plurality of codings (bit stream of each independent voice-grade channel), packet function 140 by the audio frequency package in audio sample.The definitions section of audio sample collects with associated in the territory of coding according to the bit block that is called addressed location.Each addressed location is the part of sealing bag of audio frequency, for example, and the frame of 1152 audio sample.

Subsequently, by multiplexer 150 by the channel multiplexing of each package together to form transport stream 160.

Decoding device is shown in Figure 2, and substantially is inverse process.Transport stream 160 is carried out demultiplexing by demodulation multiplexer 250, demodulation multiplexer 250 provides each voice-grade channel of package in order to unpack by separating packet function 240, decode in decoder stage 235 afterwards and export as Direct Digital stream 105, or being output into analog form 110 through digital to analog converter 220.

Fig. 3 and 4 illustrates the Code And Decode equipment for two single or synchronous stereo case.A plurality of stereo or two lists are to adding system to, but these will not be to locking together, because the MPEG standard is not done clear (being different from the surround sound option that runs into problem described in the background technology part) to it, and therefore they remain with the separate entity of separating timestamp, and each output at demoder rebuilds separately.

Such as a plurality of independent audio passages such as different language sound rails, can exist in order to comprise any given transport stream, each is encoded separately.

The number of channels of the bit-rate allocation for each passage of selecting according to the Systems Operator and quality criterion, requirement exists a plurality of different associated between the homologue of input audio group and its coding.Normal mode of operation is these voice-grade channel absolute codings, and does not exist specific (special) requirements that they are locked together.

Some in these passages can be associated with the vision signal of following (that is, in the situation that audio frequency is video or television sound), and system will be used Audio and Video is flowed to common timestamp, make these signals and its corresponding video proper alignment.Audio frequency be aligned in be not in the case very accurately-it only need to guarantee to meet the synchronous requirement of lip.Other alignment of this grade is not as required so accurate of hyperchannel surround sound.

Therefore, usually each independently monophonic audio signal, two monophony or stereo ((referring to Fig. 3) had to independent identity in multiplexing output stream, Basic Flow), therefore and each has its oneself the timestamp independently generated by encoding device during the package stage, and uses separately at demoder.

In brief overview, to the proposal solution of the shortcoming of above-mentioned prior art, be by being utilized as the sequential control that these situations provide and they are expanded to the sequential control of hyperchannel situation, adapting to the common MPEG-2 transformat for standard mono or binary channels stereo channel.Therefore, demoder can present a plurality of voice-grade channels of Accurate align according to an embodiment of the invention, and this thereby solved stationary problem, and avoided the cascade of coded system and the deterioration of following.

This solution and existing MPEG-2 grammer are fully compatible, and therefore common compatible demoder can present multi-channel audio in conventional time relationship, and the method can realize its repetition in cascade system, and do not worry deterioration, even without the alignment accuracy of the demoder same degree with according to one embodiment of the invention.

In more detail, in the multi-channel synchronous method of proposing, several input audio signals that requirement is processed in independent and the method for synchronization are processed by identical sequential control, make and distribute identical time stamp in transfer syntax, so that demoder will also keep alignment.

Fig. 5 illustrates the part of coding method according to an embodiment of the invention 500.

In step 510, the independent audio passage of synchronously and do not convert to by single transport stream transmission the predetermined quantity (N) of single component is transfused to encoding device.The encoding device time per unit forms the audio sample of K alignment, from each input voice-grade channel, gets a sample, and wherein, sample is corresponding to the identical moment.

The encoding device time per unit forms N/2 the frame (step 520) of K alignment audio sample, wherein, each frame is corresponding to identical zero-time, but for each voice-grade channel, be ready to use the compression method of selecting in step 530 to be compressed to form addressed location, be generally every pair of voice-grade channel and use two single audio frequency compressions.

Subsequently, in step 540, for the condensed frame (being addressed location) of audio sample is assigned identical timestamp, the general form with header fields.

In step 550, the condensed frame that is added with timestamp of audio sample packed (that is, package) becomes the PES grouping, the respective standard in its use that comprises Moving Picture Experts Group-2 for example two single right.Remaining cataloged procedure is with identical under normal circumstances, that is, the audio frequency of package is transmitted together package and is multiplexed in output transport stream 160 with any relevant video (if applicable) and other passage.

Fig. 6 illustrates contrary according to an embodiment of the invention decode procedure.

Particularly, coding/decoding method comprises that reception N/2 is to single audio frequency passage 610, stab 620 detection time, determine which is to sharing timestamp 630, by those to de-compressing into N addressed location 640 of the single audio frequency sample relevant with identical presentative time, and the audio frequency that output decompresses subsequently accurately presents N sample 650 with according to single common timestamp simultaneously.

To understand, alignment, compression and timestamp supply can complete by the single nextport hardware component NextPort of encoding device, and inverse process can be completed by the single nextport hardware component NextPort of decoding device.

Shown in Figure 7 for the encoding device of carrying out above-mentioned coding method according to an embodiment of the invention, wherein, can see, there is the extra level of processing (, hyperchannel framing level 770), provide it to align several sound signals and arrangement and be provided for using between independent but synchronous voice-grade channel in package level 140 common timestamp.

Described method and apparatus preferably operates by with two single channels, carrying independent but synchronous voice-grade channel.Therefore, its homographic solution decoding apparatus 800 of the encoding device 700(of Fig. 7 and Fig. 8) be depicted as every pair of voice-grade channel and there is independent encoder/decoder and package/unpack.

Fig. 7 illustrates the example with four independent voice-grade channels that will be synchronous together, with two (analog/digital) input capability.A/D 120 (a-d) is in order to be provided to the front digitizing of framing level 770 for the analog channel process.The numeral input directly is fed in framing level 770.

The piece of framing level 770 audio sample in same position from all voice-grade channel creation-times, and mark they so as with together with all identical time stamp that go up At All Other Times the audio sample in same position, process.This general form that adopts timestamp synchronizing signal 780, it is delivered to the more package level 140 of below of processing pipeline (pipeline).

Simultaneously, (co-timed) frame of audio sample right common timing as the two single sampling as formed in framing level 770, be provided to standard code level 730, standard code level 730 is provided to package level 140 by the audio sample of coding again, in this level, the timestamp synchronizing signal 780 provided according to framing level 770 is by they packages.

The presentative time stamp (PTS) that one preferred embodiment will be used the sample block of addressed location size and be associated, addressed location belongs to the hyperchannel pair that uses the compression of individual digit signal processor, generation, with the set of the PES grouping of identical pts value, comprises the compressed audio relevant with the accurate common former sample regularly of voice data.

In the situation that odd number input channel and two single channel are arranged as transmission mechanism, may simply be one of two single channels and fill quiet.

Subsequently, together with the output of each two strand (scrambler with packet function to) is multiplexing in normal way by multiplexer 150, so that output transport stream 160 to be provided.

Decoding device 800 according to an embodiment of the invention shown in Fig. 8.

Decode operation decompresses the discrete addressed location of the audio frequency relevant with a plurality of pairs of single audio frequency components, keeps its presentative time stamp 835.Subsequently, according to the common timestamp of sharing between them, at same time, by frame, present the frame that level 870 presents the sample of decoding.Therefore, a plurality of samples relevant with accurate common sample time regularly are to present together, thus realized by whole coding/decoding processing chain, across a plurality of passages to keeping the target of accurate channel-to-channel audio frequency alignment.

Therefore, use following characteristics for the complete scheme of several passages of isochronous audio at encoding device:

Be formed the frame of alignment of audio sample across the upper sample in same position of input time of a plurality of voice-grade channels with the addressed location size of coupling compression;

The audio frame of alignment is compressed by identical audio coder configuration, preferably distributes the audio component of two monophony passages (as a pair of) to each compression.Yet, also can use stereo channel or each single channel, and or then use two single right;

The addressed location of compression is preferably assigned identical presentative time stamp value or with the demoder timestamp (DTS) of predetermined time delay;

The audio component of compression transmits as the audio component of the single compression of a plurality of conventional binary channels in mpeg 2 transport stream.

At decoding device (that is, receiving position):

The audio component of a plurality of compressions is decoded, and result is for any set point in respective streams, and a plurality of set of the decompressed frame of audio sample (that is, the passage of decoding) have identical timestamp across passage;

By using the presentative time stamp of one-component only, present the audio frame for the decompression of a plurality of passages to output, make the output audio sample in time in same position (or after DTS period predetermined time).

Said method and equipment provide by its can by communication system transmit audio frequency several passages, make their mutually keep being synchronized to all the time mode of sample accuracy.Realize that this synchronous is limited to the stereo surround sound coding that causes deterioration to concatenated coding multistage the time with previous mode.Described method and apparatus has been avoided the deterioration of prior art systems, and without more complicated and proprietary surround sound solution sometimes.

Therefore, for " original " multi-channel audio (embodiments of the invention provide, not yet be mixed in the surround sound form) and the mode that sends across identical transport stream of the video relevant with it, reduce the deterioration of the sound quality caused due to the relevant cascade of known audio frequency transmission method before other or other problem thus.This has also been avoided using before transmission and has diminished the surround sound processing or use the unpressed linear PCM of very high bandwidth.

The present invention is particularly suitable for utilizing multi-channel audio and the broadcast quality video that it does not converted to single component (for example, 5.1 surround sounds) transmits.Yet, will understand, embodiments of the invention can be equally applicable to only audio transport stream packet, as for delivery of multichannel radio sound or suchlike those streams.

Sending compressed audio in order to process in the system of surround sound in another location, the present invention is useful especially.This is because when in mixing, using the source of this type of compression, the audio sample of compression do not line up the illusion (artefact) that can cause compression, this can cause again finally around undesirable audio frequency infringement in audio mix.

The end that typical realization will be included in communication link according to the encoding device of one embodiment of the invention and at the other end decoding device according to one embodiment of the invention.If requirement, this type systematic is to repeating across a plurality of communication links.

Said method can be carried out by the hardware of any suitable adaptation or design.In the instruction set that the part of method also can be stored in computer-readable media, implement, instruction set, when being written into computing machine, digital signal processor (DSP) or similar device, impels this computing machine to carry out said method.

Similarly, described method can be embodied as integrated circuit special programming or hardware design, and this circuit operation is to carry out described method on the voice data in being loaded into described integrated circuit.Integrated circuit can form the part such as PC and suchlike general-purpose calculating appts, or it can form the part of the device more specialized such as game console, mobile phone, portable computer device or hardware audio/video encoder/decoder etc.

Exemplary hardware embodiment is programmed for the embodiment that carries out described method and/or the field programmable gate array (FPGA) of described equipment is provided, and FPGA is arranged on the daughter board of the video server that frame that data center holds installs so that for example at IPTV television system and/or teletorium or support position video uplink bus (uplink van) use of scene information group.

Another exemplary hardware embodiment of the present invention is the embodiment that comprises forwarder and the right Voice & Video transmitter of receiver, wherein, forwarder comprises encoding device, and receiver comprises decoding device, and wherein each encoding device is embodied as special IC (ASIC).

It will be apparent to one skilled in the art that the accurate order of the step of carrying out in methods described herein and content can according to as the requirement of the specific collection of the speed of coding and suchlike execution parameter change.In addition, will understand, the different embodiment of disclosed equipment can be according to the present invention the requirement of the specific implementation of integral body, optionally with various combination, realize some feature of the present invention.Correspondingly, the claim numbering must not be considered as to the strict restriction to the ability of moving characteristic between claim, and therefore can freely utilize the part of dependent claims.

Claims

1. a coded audio the audio frequency of described coding is included in to the method in digital transport stream comprises:

Receive the sound signal in same position on a plurality of time in the scrambler input;

The upper sound signal in same position of described a plurality of time of sampling is with a plurality of aligned frame of the voice data that forms pre-sizing, and wherein the aligned frame of voice data is corresponding to identical period time;

Described a plurality of aligned frame of audio compressed data are to create condensed frame;

Time per unit is assigned to described condensed frame by identical time stamp; And

Described condensed frame is attached in a plurality of Basic Flows of described digital transport stream.

2. the method for claim 1, wherein said compression also comprises:

Arranged described a plurality of aligned frame of audio compressed data with identical audio coder configuration before assigning described identical time stamp; And

Described a plurality of aligned frame are assigned to a plurality of single channels of described digital transport stream.

3. method as claimed in claim 2, wherein said a plurality of single channels comprise the two single audio frequency components of one or more routines.

4. the method for claim 1, wherein said pre-sizing is the size of addressed location in mpeg standard, and described video transmission stream is MPEG-1 or mpeg 2 transport stream.

5. the method for claim 1, wherein said timestamp is the presentative time stamp.

6. the method for claim 1, wherein the described step of combination also comprises:

The voice data that is added with identical time stamp is multiplexed in described digital transport stream.

7. the method for a decoded digital transport stream comprises:

Reception comprises the digital transport stream of the audio frequency of coding;

Obtain condensed frame from a plurality of Basic Flows of described digital transport stream;

Described condensed frame is decompressed to create to a plurality of aligned frame of the voice data of the pre-sizing that means upper each voice-grade channel in same position of a plurality of time, wherein the aligned frame of voice data is corresponding to identical period time;

The timestamp of each frame of described a plurality of frame sound intermediate frequency data of detection voice data, to determine the frame that is added with identical time stamp of voice data; And

Mean the timestamp of the frame of described a plurality of frame sound intermediate frequency data of the voice data of an independent voice-grade channel in upper each voice-grade channel in same position of described a plurality of time and present the frame that is added with identical time stamp of voice data at same time by use.

8. method as claimed in claim 7, the audio frequency of wherein said coding has been sampled and has alignd to form described a plurality of aligned frame of voice data, and wherein said identical time stamp has been applied to described a plurality of aligned frame of voice data.

9. method as claimed in claim 7, wherein said video transmission stream is digital video transport stream, and described a plurality of aligned frame of voice data comprise the PES grouping.

10. one kind for coded audio and described audio frequency is included in to the scrambler of digital transport stream, and described scrambler comprises:

Processor;

Non-transient state computer-readable storage medium, further comprise computer-readable instruction, and described instruction is configured to when being carried out by described processor:

Receive the sound signal in same position on a plurality of time in input;

11. scrambler as claimed in claim 10, wherein said computer-readable instruction is configured to compression, when being carried out by described processor, also is configured to:

Described a plurality of aligned frame of voice data are assigned to a plurality of single channels of described digital transport stream.

12. scrambler as claimed in claim 11, wherein said a plurality of single channels comprise the two single audio frequency components of one or more routines.

13. scrambler as claimed in claim 10, wherein said pre-sizing is the size of addressed location in mpeg standard, and described video transmission stream is MPEG-1 or mpeg 2 transport stream.

14. scrambler as claimed in claim 10, wherein said timestamp is the presentative time stamp.

15. scrambler as claimed in claim 10, during wherein computer-readable instruction also is configured to, by following steps, described audio frequency is attached to digital video frequency flow when being carried out by described processor:

Described a plurality of aligned frame of voice data are multiplexed in described digital transport stream.

16. the demoder for the decoded digital transport stream comprises:

Processor;

The timestamp of each frame in described a plurality of aligned frame of detection voice data, to determine the frame that is added with identical time stamp of voice data; And

17. demoder as claimed in claim 16, wherein said video transmission stream is digital video transport stream, and described a plurality of aligned frame of voice data comprise the PES grouping.

18. a digital transmission system comprises:

For coded audio and described audio frequency is included in to the scrambler of digital transport stream, described scrambler comprises:

First processor;

The first non-transient state computer-readable storage medium, further comprise computer-readable instruction, and described instruction is configured to when being carried out by described first processor:

Receive the sound signal in same position on a plurality of time in input;

Described condensed frame is attached in a plurality of Basic Flows of described digital transport stream; And

For the demoder of the described digital transport stream of decoding, described demoder comprises:

The second processor;

The second non-transient state computer-readable storage medium, further comprise computer-readable instruction, and described instruction is configured to when being carried out by described the second processor:

Reception comprises the described digital transport stream of the audio frequency of coding;

The timestamp of each frame of described a plurality of aligned frame sound intermediate frequency data of detection voice data, to determine the frame that is added with identical time stamp of voice data, and

19. the method for claim 1, wherein said a plurality of time, the upper sound signal in same position also comprised original multi-channel audio.

20. the method for claim 1, wherein said a plurality of time, the upper sound signal in same position was suitable for being processed into surround sound.

21. method as claimed in claim 20, the wherein said surround sound that is processed into is performed in another position.

22. the method for claim 1, wherein said a plurality of time, the upper sound signal in same position was the component of hyperchannel surround sound.

23. the method for claim 1, wherein said a plurality of time upper in the sound signal of same position carry separately but synchronous voice-grade channel.