CN1172536C

CN1172536C - Method of embedding compressed digital audio signals in video signal using guard bands

Info

Publication number: CN1172536C
Application number: CNB998037516A
Authority: CN
Inventors: �е��; 克雷格·坎贝尔·托德
Original assignee: Dolby Laboratories Licensing Corp
Current assignee: Dolby Laboratories Licensing Corp
Priority date: 1998-03-13
Filing date: 1999-03-11
Publication date: 2004-10-20
Anticipated expiration: 2019-03-11
Also published as: BR9909247A; EP1062816B1; EP1062816A1; KR100675562B1; AU760400B2; WO1999046938A1; CN1292979A; KR20010040826A; ATE247363T1; CA2323564A1; AR021444A2; BR9909247B1; AR014716A1; JP2002507101A; HK1036721A1; DK1062816T3; US6085163A; TW473702B; DE69910360D1; CA2323564C

Abstract

An audio signal processor forms gaps or guard bands in sequences of blocks conveying encoded audio information and time aligns the guard bands with video information. The guard bands are formed to allow for variations in processing or circuit delays so that the routing or switching of different streams of video information with embedded audio information does not result in a loss of any encoded audio blocks.

Description

The coded audio piece that aim at service time in video/audio is used is so that the method and apparatus that audio frequency switches

Technical field:

The Audio Signal Processing of relate generally to of the present invention in video/audio is used.More particularly, the present invention relates to be used for the block coding method that brings each sequence of each piece of separation to encode to by all gaps or all protections, it makes the unlikely destruction video of normal variation in signal processing postpones and the aligned relationship of audio-frequency information.

Background technology:

Set up several international standards at present, they have made regulation to the various aspects that digitized audio message are embedded into each frames of video information.For example, SMPTE 259M standard by animation and Television Engineer association (SMPTE) issue has just been stipulated a kind of serial digital interface (SDI), therein, the digitized audio message that reaches 4 passages can be embedded among component and the compound serial digital video signal.SMPTE 272M standard is embedded into digitized audio message for how in the attached data space of each frames of video information is gone, and has provided a kind of sufficient definition.

The serial transmission of digitized audio message itself has become the theme of every international standard.For example, by the AES3 standard (ANSIS4.40) of audio engineer association (AES) issue, just stipulated the serial transmission of the two-channel digital audio-frequency information represented with linear impulsive coded modulation (PCM) form.According to this standard, be interleaved and be transmitted in paired mode at all samples of the PCM of two passages.

In similar all record and broadcasted application, a kind of common activity is exactly that the video/audio information that has embedded is flow to edlin or shearing, and each the segment information stream that will shear is stitched together to form one section new independent information flow.Similarly movable by the multistage information flow being merged or, producing segment information stream by between the multistage information flow, switching.Usually video information is main reference synchronization, and therefore, editor or shearing point are usually in alignment with a frame of video.

Such as the such standard definition of AES11 be used for the example that digital audio-frequency apparatus carried out synchronous recommendation in studio operation.AES11 is intended to control owing to shake or the caused timing uncertainty of processing delay, and is provided for frame of video information each frame in alignment with two samples of AES3 digitized audio message stream.Various device and the method for following this standard can guarantee that in one period given time interval, synchronous all signals all have identical frame number, and contain all samples with a kind of common timing relationship.Unfortunately, also there be not a kind of standard or the example that can stipulate the aligned relationship between video information and long period audio-frequency information at interval now.Consequently, from the equipment of different manufacturers, even at timing relationship and at equipment different aspect the processing delay, all can on the mutual aligning of audio ﹠ video information, introduce a large amount of uncertainties from same manufacturer.

For example using in the AES3 standard in the application of the linear expression of defined audio-frequency information, the uncertainty on aiming at is unessential.Because each in-edit is restricted, they only appear between each pair sample frame of audio-frequency information, so can not cause losing of audio-frequency information in any uncertainty aspect the video/audio aligning.It only has influence on sound and the image relative timing relation in face of being presented in a people time, and this is imperceptible.

Yet increasing application item uses the coding techniques that reduces bit rate that plurality purpose voice-grade channel is embedded in the video/audio data stream and goes.Usually these technology are applied to the above sample block of 128 audio samples, so that produce each block of information of having encoded.Typically these sample block are being represented and are being crossed over 3 to 12ms audio-frequency information.By these cataloged procedures produced each the information encoded piece representing minimum information unit, by it, a reasonably accurate duplicate of one section original audio information is recovered.The dividing frequencyband coding technology will be by representing to reduce bit rate based on the frequency subband that psychoacoustic coding is applied to one group of audio signal.Can produce frequency subband and represent by adopting a plurality of band pass filters or one or more conversion.For the purpose of discussing, from the angle that adopts a filter array these dividing frequencyband coding technology are described in this article.

Since be in one the border of encoding block will cause the part of this piece cut from remaining signal with an interior in-edit, so in these block encodings were used, the above-mentioned uncertainty on aligned relationship was important.In the signal that has recovered, one section dropout that typically is the above width of 3ms will be declared a partial loss of encoding block.For the mankind's auditory system, a kind of like this lose seemingly noticeable.

Just can avoid this problem by using a kind of post-processing approach, in said method, by the audio signal of having encoded is adopted a kind of decode procedure, in case of necessity the PCM that has recovered is represented to edit, and by the pcm audio information of having edited being adopted a kind of cataloged procedure produce a kind of new expression of having encoded.Owing to the fringe cost that in decoding/recompile process, is caused and in the decline aspect the tonequality, so this solution does not have attraction.In addition, owing to after reading hereinafter, can understand better, because decoding/recompile process can be introduced additional delay in audio information stream, so post-processing approach does not have attraction.

Summary of the invention:

A target of the present invention is exactly the method and apparatus that is provided for handling the video/audio information stream that has embedded, and it allows to carry out similar editor and switches such activity, avoids the appearance of the problems referred to above simultaneously.

According to an aspect of the present invention, a kind of method that is used for processing audio information is provided here, it is characterized in that, may further comprise the steps: receive a kind of input audio signal that is loaded with described audio-frequency information, receiver, video frame reference point, their expressions are at the time reference of the frame of video of a sequence, by apply a block encoding process to described input audio signal, from described audio-frequency information, produce codes audio information piece with the form that reduces bit rate, and the described codes audio information of time compression piece, be combined as one group of audio stream of having encoded with each piece with described elapsed time compression, it comprises a plurality of sequences of each piece of described elapsed time compression, by means of one in time in alignment with the gap of a frame of video reference point separately, make that a starting block in a sequence separately follows an end of a period piece in a leading sequence isolated.

According to a further aspect in the invention, a kind of device that is used for processing audio information is provided here, it is characterized in that, comprise: be used to receive a device that is loaded with the input audio signal of described audio-frequency information, be used to receive the device of each frame of video reference point, described reference point is represented the time reference at a sequence video frame, from described audio-frequency information, produce each audio information blocks of having encoded and the device of the described audio information blocks of having encoded of time compression by apply a block encoding process to described input audio signal with the form that reduces bit rate, with the device that described time compression piece is combined as the audio stream of having encoded, described audio stream comprises the encoding block through time compression of a plurality of sequences, utilize one in time in alignment with the gap of a frame of video reference point separately, make starting block in a sequence separately with formerly an end of a period piece in the sequence is isolated at one.

Accompanying drawing by reference discussion hereinafter can be understood feature of the present invention and preferred embodiment thereof better, and reference number identical in the accompanying drawing is corresponding to identical parts.The discussion hereinafter and the content of accompanying drawing only as an example and this be not appreciated that scope of the present invention applied various restrictions.

Description of drawings:

Fig. 1 is a functional block diagram that is used to write down and be used for transmitting by route an embodiment of a system that organizes video/audio datas stream more.

Fig. 2 A is with the diagrammatic representation that has imaginary each audio signal of different aligned relationship between each reference point of frame of video to 2C.

Fig. 3 is a functional block diagram that is used for an embodiment of a kind of device that each vision signal that embeds audio-frequency information is handled.

Fig. 4 is a functional block diagram according to an embodiment of a coding audio signal processor of different aspects of the present invention.

Fig. 5 A handles and follows the diagrammatic representation that has imaginary each audio signal of different aligned relationship between each reference point of frame of video to 5C according to the present invention.

Fig. 6 is by a kind of diagrammatic representation of the overlapping audio information blocks of window function weighting.

Embodiment:

A system scans

Fig. 1 represents to be used to write down and be used for transmitting by route an embodiment of a system of many group video/audio data streams, and an example of a system of various aspects of the present invention can be advantageously used in expression.For brevity, this part figure and every other each figure do not represent to be used to transmit in order to each signal path of the master clock signal of this equipment synchronously.In the discussion here, for example suppose that 21,22,23 and 24 all signals that produced all accord with SMPTE 259M and SMPTE 272M standard along the path for those, yet, in implementing process of the present invention, and do not require certain criteria or signal format.For example, in an alternative embodiment of native system, each each independent signal that transmits separately video information and audio-frequency information is along the path 21 to 24 and router three 1 (comprising the circuit that is used for individually by route transmission video and audio-frequency information) and being produced.In such an embodiment, embed between device 12 and the router three 1 at SDI, insert a SDI and removed to embed device.The alternative embodiment of illustrated here this is in order to show in implementing process of the present invention, not require the specific signal form.

Video tape recorder (VTR) 16 be 1 receiver, video information and 2 receive audio-frequency informations from the path from the path, and this video/audio information is recorded on the tape.Thereafter, the video/audio information of VTR16 playback record on tape, and 21 produce one group of playback signal that is loaded with the video information that embeds audio-frequency information along the path.In a comparable manner, the VTR17 record is 3 and 4 video and the audio-frequency informations that receive from the path respectively, and thereafter, 22 produce one group of playback signal that is loaded with the video information that embeds audio-frequency information along the path.

VTR16, VTR17 and VTR18 comprise such as a serial digital interface (SDI) and embed the such circuit of device, in order to audio-frequency information is embedded into when the playback among the video information.

SDI embeds device 11 5 and 6 receiver, video and the audio-frequency informations from the path respectively, and 14 produces one group of signal that is loaded with the video information that embeds audio-frequency information along the path.VTR18, it comprises that one is removed to embed the such circuit of device such as SDI, extracts audio-frequency information from the video/audio data signal, and the video that will separate and audio information recording are to tape.Thereafter, VTR16 recovers video and audio-frequency information from tape, and uses such as a such circuit of SDI embedding device, and 23 produce one group of playback signal that is loaded with the video information that embeds audio-frequency information along the path.Yet, if replace VTR18, owing to this video/audio data stream can be recorded and playback itself, so in this register, just do not need to embed device and remove to embed device with a digital data recorder.

SDI embeds device 12 7 and 8 receiver, video and the audio-frequency informations from the path respectively, and 24 produces one group of signal that is loaded with the video information that embeds audio-frequency information along the path.

SDI router three 1 21,22,23 and 24 receives each video/audio signal from the path, and 34 these signals is transmitted or switch to playback/tape deck 41 by route along the path.The number of the signal that is received by SDI router three 1 is unimportant.

Use of playback/tape deck 41 expressions is by any apparatus of the signal in path 34.For example, it may be a tape deck as VTR, or the playback reproducer as the television set.And playback/tape deck 41 can be located in the place that embeds device 31 away from SDI, and in this case, communication or broadcast channel are being represented in path 34.

The skew of video/audio aligned relationship

At VTR16,17,18 and embed the mutual aligned relationship that circuit delay in the

device

11 and 12 can change video information and audio-frequency information at SDI.Consequently, for example, the aligned relationship of the video/audio information in playback signal 21 may be offset with respect to 1 and 2 videos that received and the aligned relationship between the audio-frequency information from the path respectively.Between equipment from different manufacturers, variable quantity on aligned relationship is different, in equipment from (part) not on the same stage of same manufacturer, also can change, even, for example, in a given equipment, (this variable quantity) also can change as a function of the init state of each buffer storage.

Referring to Fig. 2 A, signal 111 expressions are with the audio-frequency information that has the specific alignment relation between all reference points 101 of frame of video and 102.In these frame of video reference points each is illustrated in a specific reference point in the frame of video separately.For example, for TSC-system formula video information, a common reference point follows the video information of the 10th row in each frame to coincide.For the pal mode video information, a common reference point coincides with the row of the 1st in each frame.In carrying out process of the present invention, do not require the specific alignment relation.

At Fig. 2 B, signal 121 is being represented the identical information of information with 111 carryings of signal, but it postpones with respect to the latter to some extent.Consequently, skew has taken place in the aligned relationship between signal 121 and each reference point of frame of video with respect to the aligned relationship of signal 111.In Fig. 2 C, signal 131 is being represented the identical information of information with 111 carryings of signal, but it is leading to some extent with respect to the latter.Consequently, the aligned relationship between signal 131 and each reference point of frame of video is offset in the mode in contrast to the skew of the aligned relationship of signal 121.

Referring to Fig. 1, suppose that audio-frequency information shown in Fig. 2 A and aligned relationship are by path 1/2,3/4,5/6 and 7/8 transmits, and for example Fig. 2 A is to such among the different skew on the aligned relationship will similarly appear at all signals that produced along path 21 to 24 shown in Fig. 2 C.Suppose that further Fig. 2 A also appears among all signals that produced along path 21 to 23 respectively to the aligned relationship shown in Fig. 2 C.When SDI router three 1 switches in all signals that received from this 3 paths, one section little discontinuous will appear in embedded audio-frequency information in the signal that is passed through along path 34.If audio-frequency information is represented as a kind of linear forms, PCM for example, then owing to thisly discontinuously only continue several samples, thus this discontinuous may not can by the perception of a human hearer institute.Yet, distinguish that discontinuity between two kinds of signals with different audio contents will be a difficulty especially.

The effect of coding

As mentioned above, people to be embedded into the interest of going in video/audio data stream for the voice-grade channel with greater number growing.When the information capacity of the voice-grade channel of these greater number surpasses the capacity in the space that audio-frequency information can use, just need to use the bandwidth or the Bit-Rate Reduction of certain form.An example of this compression just is based on the audio coding of psychoacoustic principle.

These coding techniquess are applied to each audio samples piece usually, to produce each block of information of having encoded.These sample block are typically being represented and are being crossed over a period of time audio-frequency information at interval of 3 to 12ms.Representing minimum information unit by each coding block that these cataloged procedures produced, by it, a reasonably accurate duplicate of one section original audio information is recovered.

In Fig. 2 A, a sequence of each block of information of having encoded is represented as a pulse train.The information of these piece institute carryings is a kind of coded representations of the audio-frequency information in the signal 111.The shape of pulse and size are unessential.This pulse train is only look to providing a series of, and the information encoded corresponding to each audio samples piece in these piece carryings, they adjacent to each other or, preferably, overlap each other.In the example shown in Fig. 2 A, cross between the audio-frequency information in time interval of adjacent each reference point of frame of video by 6 the information encoded piece represent.In patent documentation WO-A99/21187, the various considerations that are used to improve the audio coding quality in video/audio is used are disclosed.

When block encoding technique was applied in the system of Fig. 1,21 to the 24 all signals that received contained the audio-frequency information of encoding with the form of piece to SDI router three 1 from the path.As mentioned above, between each block of information of having encoded and each frame of video reference point, may appear at the skew that is changing on the aligned relationship.This is illustrated as, and for example, the different aligned relationship between each reference point 101 of frame of video and 112,122 and 132 each piece is respectively as Fig. 2 A, shown in 2B and the 2C.As mentioned above, suppose in all signals that produced along path 21 to 23, occur respectively as Fig. 2 A to the aligned relationship shown in Fig. 2 C, when SDI router three 1 in moment that frame of video reference point 101 occurs, from the signal that is received via path 22 shown in Fig. 2 B, switch to the signal that is received via path 23 shown in Fig. 2 C, then on switching point, will have a large amount of audio-frequency informations from the signal that transmits by route along path 23, not to be resumed.Because on the one hand, whole all is required to recover audio-frequency information, still, on the other hand, this piece is lost in the part of switching point back, so in piece 123, the audio-frequency information of institute's carrying can not be resumed before switching point.Similarly, because part before switching point is lost in the piece 133, so in piece 133, can not be resumed at the audio-frequency information of switching point back institute carrying.

For the type of system shown in Figure 1, this problem is different.For example, on an independent VTR, carry out Vidio tap edit or audio frequency when dubbing, also such problem can occur.

As will illustrating fully hereinafter; the present invention overcomes this problem by all boundary belts of formation or all gaps in the audio stream of having encoded; therefore, under the prerequisite of dropped audio information not, can allow the appreciable variation on the video/audio aligned relationship.

The code signal processor

Fig. 3 represents a video/audio signal processor, and it can be loaded among the system for example shown in Figure 1 in several ways.In an illustrated embodiment, from input signal path 61-1,61-2 and 61-3 receive the many groups signal that is loaded with the video information that embeds audio-frequency information.Show 3 input signal paths in the drawings; Yet all embodiment of the present invention can have the signal path of the input signal that is essentially arbitrary number.Signal distributor 62 is being represented the signal allocation process of wide region, comprises switching, merges editor, splicing and storage/retrieval.For simplicity, the diagram of this paper and discussion putative signal distributor 62 receive many group video/audio signals, and handle in some way and/or distribute these signals, so that 63 produce one group of independent signal that is loaded with the video information that embeds audio-frequency information along the path.Go formatter 64 from path 63 receiver, videos/audio-frequency information, therefrom extract embedded audio-frequency information, and make its 65 transmission along the path.Video information can 69 transmission along the path.Audio signal processor 66 is 65 reception audio-frequency informations from the path, and this audio-frequency information is applied a block encoding process, so that each block of information that 67 generations have been encoded along the path.Formatter 68 each block of information that 67 receptions have been encoded from the path; and 70 produce one group of output signal along the path; the latter comprises a plurality of sequences of each block of information of having encoded, and has gap or boundary belt between an end of a period piece of a starting block of a sequence and a leading sequence.Use one group of reference signal, master clock signal for example makes gap or boundary belt aim at video information from the time.

As mentioned above, these figure do not illustrate each signal path that this equipment is carried out each synchronous master clock signal in order to carrying.In a preferred embodiment, audio signal processor 66 formation are in alignment with each audio samples piece of master clock signal.Such aligned relationship is shown in Fig. 2 A, and in the drawings, each border between each adjacent sample block coincides with each

reference point

101 and 102 of frame of video; But, also can use other various aligned relationship.

Referring to Fig. 5 A, the coded message of representation signal section 111-2 in the sequence carrying of piece 112-2, and it is that a kind of imaginary time compression of the part of signal 111 between each

reference point

101 and 102 of frame of video is represented.Similarly, the coded message of representation signal section 111-1 in the sequence carrying of piece 112-1, and the coded message of representation signal section 111-3 in the sequence carrying of piece 112-3.Audio signal processor 66 and formatter 68 produces be loaded with audio-frequency information one section each sequence of each piece of information encoded; therein; for example, between the starting block of the end of a period piece of sequence 112-1 and sequence 112-2, a boundary belt or gap have been formed.

Also show at Fig. 2 A to the skew on aligned relationship shown in Fig. 2 C in Fig. 5 C at Fig. 5 A.In these figure, at sequence 122-1,122-2,122-3,132-1,132-2, and the carrying of coded message among the 132-3 and is represented each signal section 121-1,121-2,121-3,131-1, the coded message of 131-2 and 131-3 respectively.As seeing, because between each

reference point

101 and 102 of frame of video, each possible switching point appears within the boundary belt, so the skew on aligned relationship can not cause losing of audio-frequency information from Fig. 5 B and 5C.

Signal processor shown in Figure 3 for example can be included among the SDI router, contains the AES3 that embedded or each vision signal of pcm audio information so that handle.An embodiment who has omitted signal distributor 62 can be included among a VTR or the SDI embedding device.Another has omitted, and the embodiment that removes formatter 64 also can be included into a VTR or a SDI embeds among each input circuit of device.

Fig. 4 represents an embodiment of a coding audio signal processor, and it is suitable for bringing embodiment shown in Figure 3 into, and also has independently effectiveness, and this will be illustrated below.According to this embodiment, audio signal processor 66 comprises a plurality of filter arrays 71,72 and 73.In response to the signal that is received from path 65-1, filter array 71 produces a plurality of frequency sub-band signals along path 75-1 to 75-3.In response to the signal that is received from path 65-2, filter array 72 produces a plurality of frequency sub-band signals along path 76-1 to 76-3.In response to the signal that is received from path 65-3, filter array 73 produces a plurality of frequency sub-band signals along path 77-1 to 77-3.Filter array 71,72 and 73 can be installed in several ways, comprises an array of each band pass filter, a cascade collection of each band segmentation filter, and one or more time domains are to the conversion of frequency domain.3 filter arrays only are shown, and each filter array only illustrates 3 groups of subband signals, yet, an embodiment can comprise more filter array, wherein each can produce the subband signal more than 24, and each subband signal is being represented each frequency subband of the bandwidth with the critical bandwidth that is equal to or less than the human auditory system again.79 pairs of each subband signals of encoder apply a block encoding process, and along the path 67 sequences that produce each piece, the latter is representing via path 65-1, a kind of coding form of the audio-frequency information that 65-2 and 65-3 received.

In implementing process of the present invention, it is unimportant to cut apart channel coding.Also can use other coded system, for example the PCM of block-by-block compression-expansion or delta modulation.

In the embodiment of a practicality; a coding audio signal processor receives the audio-frequency information of 8 passages with the form of linear PCM; perhaps; change another kind of mode; receive 4 road AES3 data flow; and use 8 filter arrays and an encoder to implement the block encoding process one time; has each block of information of having encoded of each boundary belt so that produce; transmit required space of above-mentioned each piece or bandwidth be not more than under the linear PCM mode for the audio-frequency information that transmits two passages or; change another kind of mode; transmit one group of AES3 data flow, required space or bandwidth.

Overlapping each piece and window function

Show as in order to the pulse train of representing each block of information in each figure: each adjacent piece is adjacent to each other but do not overlap each other.Though in implementing process of the present invention, do not require the arrangement of each specific piece, each preferred embodiment is handled each piece that overlaps each other.In general, each overlapping audio information blocks is made each the sample sum that overlaps each other in each adjacent piece be substantially equal to a constant by a window function weighting or modulation.

Fig. 6 represents a sequence of each piece.Starting block 141 in sequence overlaps with adjacent piece 142.Each piece of among this sequence all uses an envelope to represent that this envelope has the shape of a window function, and this window function is used to the corresponding audio information in the time domain is weighted processing.End of a period piece 146 in this sequence is with a leading piece and a follow-up piece overlaid that does not illustrate in the drawings, the overlapping amount and the selection of window function have significant impact to coding efficiency, but in implementing process of the present invention, do not require specific window function or lap.In each preferred embodiment, lap equals half of block length, and derives this window function from Caesar-Bessel function.

As mentioned above, audio signal processor 86 generations are in alignment with the audio-frequency information of each reference point of frame of video.In each embodiment that produces each sequence forms by each audio information blocks, can aim at like this, make a frame of video reference point follow any point among any of this sequence to coincide in fact.In example shown in Figure 6, the starting point of starting block 141 coincides with frame of video reference point 100.

In some applications, accurate coincide point can be different along with different frame of video.For example, in every application that digitized audio message and TSC-system formula video information is combined, because audio sample rate is not the integral multiple of the frame frequency of video, so each continuous frame of video can have the audio samples number that is changing.

Among the patent documentation WO-A99/21187 of institute's reference in the above, the various considerations about block length, window function and video/audio aligned relationship have been discussed.

Claims

1. a method that is used for processing audio information is characterized in that, may further comprise the steps:

Receive a kind of input audio signal that is loaded with described audio-frequency information,

Receiver, video frame reference point, their represent the time reference at the frame of video of a sequence,

By applying a block encoding process to described input audio signal, from described audio-frequency information, produce codes audio information piece with the form that reduces bit rate, and time compression described codes audio information piece and

Each piece of described elapsed time compression is combined as one group of audio stream of having encoded, it comprises a plurality of sequences of each piece of described elapsed time compression, by means of one in time in alignment with the gap of a frame of video reference point separately, make that a starting block in a sequence separately follows an end of a period piece in a leading sequence isolated.

2. method according to claim 1 is characterized in that, described block encoding process is that described coded audio information is become a kind of form that less redundancy and/or less perception do not have correlation that contains.

3. according to the described method of claim 2, it is characterized in that described block encoding process comprises:

Apply an array of band-pass filters to described input audio signal, or apply one or more conversion, with the expression thing of a plurality of frequency subbands of producing described input audio signal and

According to psychoacoustic principle,, produce described each audio information blocks of having encoded by adaptively each Bit Allocation in Discrete being represented to described each frequency subband.

4. method according to claim 1 is characterized in that, and is further comprising the steps of: each overlapping block to described audio-frequency information applies described block encoding process.

5. a device that is used for processing audio information is characterized in that, comprising:

Be used to receive a device that is loaded with the input audio signal of described audio-frequency information,

Be used to receive the device of each frame of video reference point, described reference point is represented the time reference at a sequence video frame,

By apply to described input audio signal a block encoding process with the form that reduces bit rate from described audio-frequency information, produce each audio information blocks of having encoded and the described audio information blocks of having encoded of time compression device and

The audio information blocks of having encoded of described elapsed time compression is combined into the device of the audio stream of having encoded, described audio stream comprises the encoding block through time compression of a plurality of sequences, utilize one in time in alignment with the gap of a frame of video reference point separately, make starting block in a sequence separately with formerly an end of a period piece in the sequence is isolated at one.

6. device according to claim 5 is characterized in that, described block encoding process is that described coded audio information is become a kind of form that less redundancy and/or less perception do not have correlation that contains.

7. device according to claim 6, wherein said block encoding process comprises:

Apply an array of band-pass filters or apply one or more conversion to described input audio signal, represent with a plurality of frequency subbands that produce described input audio signal, and

8. device according to claim 5 is characterized in that, the device that is used to produce the described piece of codes audio information applies described block encoding process to each overlapping block of described audio-frequency information.