CN1172536C - Method of embedding compressed digital audio signals in video signal using guard bands - Google Patents

Method of embedding compressed digital audio signals in video signal using guard bands Download PDF

Info

Publication number
CN1172536C
CN1172536C CNB998037516A CN99803751A CN1172536C CN 1172536 C CN1172536 C CN 1172536C CN B998037516 A CNB998037516 A CN B998037516A CN 99803751 A CN99803751 A CN 99803751A CN 1172536 C CN1172536 C CN 1172536C
Authority
CN
China
Prior art keywords
audio
information
video
piece
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CNB998037516A
Other languages
Chinese (zh)
Other versions
CN1292979A (en
Inventor
�����
克雷格·坎贝尔·托德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolby Laboratories Licensing Corp
Original Assignee
Dolby Laboratories Licensing Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby Laboratories Licensing Corp filed Critical Dolby Laboratories Licensing Corp
Publication of CN1292979A publication Critical patent/CN1292979A/en
Application granted granted Critical
Publication of CN1172536C publication Critical patent/CN1172536C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/23602Multiplexing isochronously with the video sync, e.g. according to bit-parallel or bit-serial interface formats, as SDI
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • H04N21/2335Processing of audio elementary streams involving reformatting operations of audio signals, e.g. by converting from one coding standard to another
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/23614Multiplexing of additional data and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/2368Multiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/242Synchronization processes, e.g. processing of PCR [Program Clock References]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4341Demultiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4342Demultiplexing isochronously with video sync, e.g. according to bit-parallel or bit-serial interface formats, as SDI
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4348Demultiplexing of additional data and video streams

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Television Signal Processing For Recording (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Time-Division Multiplex Systems (AREA)
  • Television Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

An audio signal processor forms gaps or guard bands in sequences of blocks conveying encoded audio information and time aligns the guard bands with video information. The guard bands are formed to allow for variations in processing or circuit delays so that the routing or switching of different streams of video information with embedded audio information does not result in a loss of any encoded audio blocks.

Description

The coded audio piece that aim at service time in video/audio is used is so that the method and apparatus that audio frequency switches
Technical field:
The Audio Signal Processing of relate generally to of the present invention in video/audio is used.More particularly, the present invention relates to be used for the block coding method that brings each sequence of each piece of separation to encode to by all gaps or all protections, it makes the unlikely destruction video of normal variation in signal processing postpones and the aligned relationship of audio-frequency information.
Background technology:
Set up several international standards at present, they have made regulation to the various aspects that digitized audio message are embedded into each frames of video information.For example, SMPTE 259M standard by animation and Television Engineer association (SMPTE) issue has just been stipulated a kind of serial digital interface (SDI), therein, the digitized audio message that reaches 4 passages can be embedded among component and the compound serial digital video signal.SMPTE 272M standard is embedded into digitized audio message for how in the attached data space of each frames of video information is gone, and has provided a kind of sufficient definition.
The serial transmission of digitized audio message itself has become the theme of every international standard.For example, by the AES3 standard (ANSIS4.40) of audio engineer association (AES) issue, just stipulated the serial transmission of the two-channel digital audio-frequency information represented with linear impulsive coded modulation (PCM) form.According to this standard, be interleaved and be transmitted in paired mode at all samples of the PCM of two passages.
In similar all record and broadcasted application, a kind of common activity is exactly that the video/audio information that has embedded is flow to edlin or shearing, and each the segment information stream that will shear is stitched together to form one section new independent information flow.Similarly movable by the multistage information flow being merged or, producing segment information stream by between the multistage information flow, switching.Usually video information is main reference synchronization, and therefore, editor or shearing point are usually in alignment with a frame of video.
Such as the such standard definition of AES11 be used for the example that digital audio-frequency apparatus carried out synchronous recommendation in studio operation.AES11 is intended to control owing to shake or the caused timing uncertainty of processing delay, and is provided for frame of video information each frame in alignment with two samples of AES3 digitized audio message stream.Various device and the method for following this standard can guarantee that in one period given time interval, synchronous all signals all have identical frame number, and contain all samples with a kind of common timing relationship.Unfortunately, also there be not a kind of standard or the example that can stipulate the aligned relationship between video information and long period audio-frequency information at interval now.Consequently, from the equipment of different manufacturers, even at timing relationship and at equipment different aspect the processing delay, all can on the mutual aligning of audio ﹠ video information, introduce a large amount of uncertainties from same manufacturer.
For example using in the AES3 standard in the application of the linear expression of defined audio-frequency information, the uncertainty on aiming at is unessential.Because each in-edit is restricted, they only appear between each pair sample frame of audio-frequency information, so can not cause losing of audio-frequency information in any uncertainty aspect the video/audio aligning.It only has influence on sound and the image relative timing relation in face of being presented in a people time, and this is imperceptible.
Yet increasing application item uses the coding techniques that reduces bit rate that plurality purpose voice-grade channel is embedded in the video/audio data stream and goes.Usually these technology are applied to the above sample block of 128 audio samples, so that produce each block of information of having encoded.Typically these sample block are being represented and are being crossed over 3 to 12ms audio-frequency information.By these cataloged procedures produced each the information encoded piece representing minimum information unit, by it, a reasonably accurate duplicate of one section original audio information is recovered.The dividing frequencyband coding technology will be by representing to reduce bit rate based on the frequency subband that psychoacoustic coding is applied to one group of audio signal.Can produce frequency subband and represent by adopting a plurality of band pass filters or one or more conversion.For the purpose of discussing, from the angle that adopts a filter array these dividing frequencyband coding technology are described in this article.
Since be in one the border of encoding block will cause the part of this piece cut from remaining signal with an interior in-edit, so in these block encodings were used, the above-mentioned uncertainty on aligned relationship was important.In the signal that has recovered, one section dropout that typically is the above width of 3ms will be declared a partial loss of encoding block.For the mankind's auditory system, a kind of like this lose seemingly noticeable.
Just can avoid this problem by using a kind of post-processing approach, in said method, by the audio signal of having encoded is adopted a kind of decode procedure, in case of necessity the PCM that has recovered is represented to edit, and by the pcm audio information of having edited being adopted a kind of cataloged procedure produce a kind of new expression of having encoded.Owing to the fringe cost that in decoding/recompile process, is caused and in the decline aspect the tonequality, so this solution does not have attraction.In addition, owing to after reading hereinafter, can understand better, because decoding/recompile process can be introduced additional delay in audio information stream, so post-processing approach does not have attraction.
Summary of the invention:
A target of the present invention is exactly the method and apparatus that is provided for handling the video/audio information stream that has embedded, and it allows to carry out similar editor and switches such activity, avoids the appearance of the problems referred to above simultaneously.
According to an aspect of the present invention, a kind of method that is used for processing audio information is provided here, it is characterized in that, may further comprise the steps: receive a kind of input audio signal that is loaded with described audio-frequency information, receiver, video frame reference point, their expressions are at the time reference of the frame of video of a sequence, by apply a block encoding process to described input audio signal, from described audio-frequency information, produce codes audio information piece with the form that reduces bit rate, and the described codes audio information of time compression piece, be combined as one group of audio stream of having encoded with each piece with described elapsed time compression, it comprises a plurality of sequences of each piece of described elapsed time compression, by means of one in time in alignment with the gap of a frame of video reference point separately, make that a starting block in a sequence separately follows an end of a period piece in a leading sequence isolated.
According to a further aspect in the invention, a kind of device that is used for processing audio information is provided here, it is characterized in that, comprise: be used to receive a device that is loaded with the input audio signal of described audio-frequency information, be used to receive the device of each frame of video reference point, described reference point is represented the time reference at a sequence video frame, from described audio-frequency information, produce each audio information blocks of having encoded and the device of the described audio information blocks of having encoded of time compression by apply a block encoding process to described input audio signal with the form that reduces bit rate, with the device that described time compression piece is combined as the audio stream of having encoded, described audio stream comprises the encoding block through time compression of a plurality of sequences, utilize one in time in alignment with the gap of a frame of video reference point separately, make starting block in a sequence separately with formerly an end of a period piece in the sequence is isolated at one.
Accompanying drawing by reference discussion hereinafter can be understood feature of the present invention and preferred embodiment thereof better, and reference number identical in the accompanying drawing is corresponding to identical parts.The discussion hereinafter and the content of accompanying drawing only as an example and this be not appreciated that scope of the present invention applied various restrictions.
Description of drawings:
Fig. 1 is a functional block diagram that is used to write down and be used for transmitting by route an embodiment of a system that organizes video/audio datas stream more.
Fig. 2 A is with the diagrammatic representation that has imaginary each audio signal of different aligned relationship between each reference point of frame of video to 2C.
Fig. 3 is a functional block diagram that is used for an embodiment of a kind of device that each vision signal that embeds audio-frequency information is handled.
Fig. 4 is a functional block diagram according to an embodiment of a coding audio signal processor of different aspects of the present invention.
Fig. 5 A handles and follows the diagrammatic representation that has imaginary each audio signal of different aligned relationship between each reference point of frame of video to 5C according to the present invention.
Fig. 6 is by a kind of diagrammatic representation of the overlapping audio information blocks of window function weighting.
Embodiment:
A system scans
Fig. 1 represents to be used to write down and be used for transmitting by route an embodiment of a system of many group video/audio data streams, and an example of a system of various aspects of the present invention can be advantageously used in expression.For brevity, this part figure and every other each figure do not represent to be used to transmit in order to each signal path of the master clock signal of this equipment synchronously.In the discussion here, for example suppose that 21,22,23 and 24 all signals that produced all accord with SMPTE 259M and SMPTE 272M standard along the path for those, yet, in implementing process of the present invention, and do not require certain criteria or signal format.For example, in an alternative embodiment of native system, each each independent signal that transmits separately video information and audio-frequency information is along the path 21 to 24 and router three 1 (comprising the circuit that is used for individually by route transmission video and audio-frequency information) and being produced.In such an embodiment, embed between device 12 and the router three 1 at SDI, insert a SDI and removed to embed device.The alternative embodiment of illustrated here this is in order to show in implementing process of the present invention, not require the specific signal form.
Video tape recorder (VTR) 16 be 1 receiver, video information and 2 receive audio-frequency informations from the path from the path, and this video/audio information is recorded on the tape.Thereafter, the video/audio information of VTR16 playback record on tape, and 21 produce one group of playback signal that is loaded with the video information that embeds audio-frequency information along the path.In a comparable manner, the VTR17 record is 3 and 4 video and the audio-frequency informations that receive from the path respectively, and thereafter, 22 produce one group of playback signal that is loaded with the video information that embeds audio-frequency information along the path.
VTR16, VTR17 and VTR18 comprise such as a serial digital interface (SDI) and embed the such circuit of device, in order to audio-frequency information is embedded into when the playback among the video information.
SDI embeds device 11 5 and 6 receiver, video and the audio-frequency informations from the path respectively, and 14 produces one group of signal that is loaded with the video information that embeds audio-frequency information along the path.VTR18, it comprises that one is removed to embed the such circuit of device such as SDI, extracts audio-frequency information from the video/audio data signal, and the video that will separate and audio information recording are to tape.Thereafter, VTR16 recovers video and audio-frequency information from tape, and uses such as a such circuit of SDI embedding device, and 23 produce one group of playback signal that is loaded with the video information that embeds audio-frequency information along the path.Yet, if replace VTR18, owing to this video/audio data stream can be recorded and playback itself, so in this register, just do not need to embed device and remove to embed device with a digital data recorder.
SDI embeds device 12 7 and 8 receiver, video and the audio-frequency informations from the path respectively, and 24 produces one group of signal that is loaded with the video information that embeds audio-frequency information along the path.
SDI router three 1 21,22,23 and 24 receives each video/audio signal from the path, and 34 these signals is transmitted or switch to playback/tape deck 41 by route along the path.The number of the signal that is received by SDI router three 1 is unimportant.
Use of playback/tape deck 41 expressions is by any apparatus of the signal in path 34.For example, it may be a tape deck as VTR, or the playback reproducer as the television set.And playback/tape deck 41 can be located in the place that embeds device 31 away from SDI, and in this case, communication or broadcast channel are being represented in path 34.
The skew of video/audio aligned relationship
At VTR16,17,18 and embed the mutual aligned relationship that circuit delay in the device 11 and 12 can change video information and audio-frequency information at SDI.Consequently, for example, the aligned relationship of the video/audio information in playback signal 21 may be offset with respect to 1 and 2 videos that received and the aligned relationship between the audio-frequency information from the path respectively.Between equipment from different manufacturers, variable quantity on aligned relationship is different, in equipment from (part) not on the same stage of same manufacturer, also can change, even, for example, in a given equipment, (this variable quantity) also can change as a function of the init state of each buffer storage.
Referring to Fig. 2 A, signal 111 expressions are with the audio-frequency information that has the specific alignment relation between all reference points 101 of frame of video and 102.In these frame of video reference points each is illustrated in a specific reference point in the frame of video separately.For example, for TSC-system formula video information, a common reference point follows the video information of the 10th row in each frame to coincide.For the pal mode video information, a common reference point coincides with the row of the 1st in each frame.In carrying out process of the present invention, do not require the specific alignment relation.
At Fig. 2 B, signal 121 is being represented the identical information of information with 111 carryings of signal, but it postpones with respect to the latter to some extent.Consequently, skew has taken place in the aligned relationship between signal 121 and each reference point of frame of video with respect to the aligned relationship of signal 111.In Fig. 2 C, signal 131 is being represented the identical information of information with 111 carryings of signal, but it is leading to some extent with respect to the latter.Consequently, the aligned relationship between signal 131 and each reference point of frame of video is offset in the mode in contrast to the skew of the aligned relationship of signal 121.
Referring to Fig. 1, suppose that audio-frequency information shown in Fig. 2 A and aligned relationship are by path 1/2,3/4,5/6 and 7/8 transmits, and for example Fig. 2 A is to such among the different skew on the aligned relationship will similarly appear at all signals that produced along path 21 to 24 shown in Fig. 2 C.Suppose that further Fig. 2 A also appears among all signals that produced along path 21 to 23 respectively to the aligned relationship shown in Fig. 2 C.When SDI router three 1 switches in all signals that received from this 3 paths, one section little discontinuous will appear in embedded audio-frequency information in the signal that is passed through along path 34.If audio-frequency information is represented as a kind of linear forms, PCM for example, then owing to thisly discontinuously only continue several samples, thus this discontinuous may not can by the perception of a human hearer institute.Yet, distinguish that discontinuity between two kinds of signals with different audio contents will be a difficulty especially.
The effect of coding
As mentioned above, people to be embedded into the interest of going in video/audio data stream for the voice-grade channel with greater number growing.When the information capacity of the voice-grade channel of these greater number surpasses the capacity in the space that audio-frequency information can use, just need to use the bandwidth or the Bit-Rate Reduction of certain form.An example of this compression just is based on the audio coding of psychoacoustic principle.
These coding techniquess are applied to each audio samples piece usually, to produce each block of information of having encoded.These sample block are typically being represented and are being crossed over a period of time audio-frequency information at interval of 3 to 12ms.Representing minimum information unit by each coding block that these cataloged procedures produced, by it, a reasonably accurate duplicate of one section original audio information is recovered.
In Fig. 2 A, a sequence of each block of information of having encoded is represented as a pulse train.The information of these piece institute carryings is a kind of coded representations of the audio-frequency information in the signal 111.The shape of pulse and size are unessential.This pulse train is only look to providing a series of, and the information encoded corresponding to each audio samples piece in these piece carryings, they adjacent to each other or, preferably, overlap each other.In the example shown in Fig. 2 A, cross between the audio-frequency information in time interval of adjacent each reference point of frame of video by 6 the information encoded piece represent.In patent documentation WO-A99/21187, the various considerations that are used to improve the audio coding quality in video/audio is used are disclosed.
When block encoding technique was applied in the system of Fig. 1,21 to the 24 all signals that received contained the audio-frequency information of encoding with the form of piece to SDI router three 1 from the path.As mentioned above, between each block of information of having encoded and each frame of video reference point, may appear at the skew that is changing on the aligned relationship.This is illustrated as, and for example, the different aligned relationship between each reference point 101 of frame of video and 112,122 and 132 each piece is respectively as Fig. 2 A, shown in 2B and the 2C.As mentioned above, suppose in all signals that produced along path 21 to 23, occur respectively as Fig. 2 A to the aligned relationship shown in Fig. 2 C, when SDI router three 1 in moment that frame of video reference point 101 occurs, from the signal that is received via path 22 shown in Fig. 2 B, switch to the signal that is received via path 23 shown in Fig. 2 C, then on switching point, will have a large amount of audio-frequency informations from the signal that transmits by route along path 23, not to be resumed.Because on the one hand, whole all is required to recover audio-frequency information, still, on the other hand, this piece is lost in the part of switching point back, so in piece 123, the audio-frequency information of institute's carrying can not be resumed before switching point.Similarly, because part before switching point is lost in the piece 133, so in piece 133, can not be resumed at the audio-frequency information of switching point back institute carrying.
For the type of system shown in Figure 1, this problem is different.For example, on an independent VTR, carry out Vidio tap edit or audio frequency when dubbing, also such problem can occur.
As will illustrating fully hereinafter; the present invention overcomes this problem by all boundary belts of formation or all gaps in the audio stream of having encoded; therefore, under the prerequisite of dropped audio information not, can allow the appreciable variation on the video/audio aligned relationship.
The code signal processor
Fig. 3 represents a video/audio signal processor, and it can be loaded among the system for example shown in Figure 1 in several ways.In an illustrated embodiment, from input signal path 61-1,61-2 and 61-3 receive the many groups signal that is loaded with the video information that embeds audio-frequency information.Show 3 input signal paths in the drawings; Yet all embodiment of the present invention can have the signal path of the input signal that is essentially arbitrary number.Signal distributor 62 is being represented the signal allocation process of wide region, comprises switching, merges editor, splicing and storage/retrieval.For simplicity, the diagram of this paper and discussion putative signal distributor 62 receive many group video/audio signals, and handle in some way and/or distribute these signals, so that 63 produce one group of independent signal that is loaded with the video information that embeds audio-frequency information along the path.Go formatter 64 from path 63 receiver, videos/audio-frequency information, therefrom extract embedded audio-frequency information, and make its 65 transmission along the path.Video information can 69 transmission along the path.Audio signal processor 66 is 65 reception audio-frequency informations from the path, and this audio-frequency information is applied a block encoding process, so that each block of information that 67 generations have been encoded along the path.Formatter 68 each block of information that 67 receptions have been encoded from the path; and 70 produce one group of output signal along the path; the latter comprises a plurality of sequences of each block of information of having encoded, and has gap or boundary belt between an end of a period piece of a starting block of a sequence and a leading sequence.Use one group of reference signal, master clock signal for example makes gap or boundary belt aim at video information from the time.
As mentioned above, these figure do not illustrate each signal path that this equipment is carried out each synchronous master clock signal in order to carrying.In a preferred embodiment, audio signal processor 66 formation are in alignment with each audio samples piece of master clock signal.Such aligned relationship is shown in Fig. 2 A, and in the drawings, each border between each adjacent sample block coincides with each reference point 101 and 102 of frame of video; But, also can use other various aligned relationship.
Referring to Fig. 5 A, the coded message of representation signal section 111-2 in the sequence carrying of piece 112-2, and it is that a kind of imaginary time compression of the part of signal 111 between each reference point 101 and 102 of frame of video is represented.Similarly, the coded message of representation signal section 111-1 in the sequence carrying of piece 112-1, and the coded message of representation signal section 111-3 in the sequence carrying of piece 112-3.Audio signal processor 66 and formatter 68 produces be loaded with audio-frequency information one section each sequence of each piece of information encoded; therein; for example, between the starting block of the end of a period piece of sequence 112-1 and sequence 112-2, a boundary belt or gap have been formed.
Also show at Fig. 2 A to the skew on aligned relationship shown in Fig. 2 C in Fig. 5 C at Fig. 5 A.In these figure, at sequence 122-1,122-2,122-3,132-1,132-2, and the carrying of coded message among the 132-3 and is represented each signal section 121-1,121-2,121-3,131-1, the coded message of 131-2 and 131-3 respectively.As seeing, because between each reference point 101 and 102 of frame of video, each possible switching point appears within the boundary belt, so the skew on aligned relationship can not cause losing of audio-frequency information from Fig. 5 B and 5C.
Signal processor shown in Figure 3 for example can be included among the SDI router, contains the AES3 that embedded or each vision signal of pcm audio information so that handle.An embodiment who has omitted signal distributor 62 can be included among a VTR or the SDI embedding device.Another has omitted, and the embodiment that removes formatter 64 also can be included into a VTR or a SDI embeds among each input circuit of device.
Fig. 4 represents an embodiment of a coding audio signal processor, and it is suitable for bringing embodiment shown in Figure 3 into, and also has independently effectiveness, and this will be illustrated below.According to this embodiment, audio signal processor 66 comprises a plurality of filter arrays 71,72 and 73.In response to the signal that is received from path 65-1, filter array 71 produces a plurality of frequency sub-band signals along path 75-1 to 75-3.In response to the signal that is received from path 65-2, filter array 72 produces a plurality of frequency sub-band signals along path 76-1 to 76-3.In response to the signal that is received from path 65-3, filter array 73 produces a plurality of frequency sub-band signals along path 77-1 to 77-3.Filter array 71,72 and 73 can be installed in several ways, comprises an array of each band pass filter, a cascade collection of each band segmentation filter, and one or more time domains are to the conversion of frequency domain.3 filter arrays only are shown, and each filter array only illustrates 3 groups of subband signals, yet, an embodiment can comprise more filter array, wherein each can produce the subband signal more than 24, and each subband signal is being represented each frequency subband of the bandwidth with the critical bandwidth that is equal to or less than the human auditory system again.79 pairs of each subband signals of encoder apply a block encoding process, and along the path 67 sequences that produce each piece, the latter is representing via path 65-1, a kind of coding form of the audio-frequency information that 65-2 and 65-3 received.
In implementing process of the present invention, it is unimportant to cut apart channel coding.Also can use other coded system, for example the PCM of block-by-block compression-expansion or delta modulation.
In the embodiment of a practicality; a coding audio signal processor receives the audio-frequency information of 8 passages with the form of linear PCM; perhaps; change another kind of mode; receive 4 road AES3 data flow; and use 8 filter arrays and an encoder to implement the block encoding process one time; has each block of information of having encoded of each boundary belt so that produce; transmit required space of above-mentioned each piece or bandwidth be not more than under the linear PCM mode for the audio-frequency information that transmits two passages or; change another kind of mode; transmit one group of AES3 data flow, required space or bandwidth.
Overlapping each piece and window function
Show as in order to the pulse train of representing each block of information in each figure: each adjacent piece is adjacent to each other but do not overlap each other.Though in implementing process of the present invention, do not require the arrangement of each specific piece, each preferred embodiment is handled each piece that overlaps each other.In general, each overlapping audio information blocks is made each the sample sum that overlaps each other in each adjacent piece be substantially equal to a constant by a window function weighting or modulation.
Fig. 6 represents a sequence of each piece.Starting block 141 in sequence overlaps with adjacent piece 142.Each piece of among this sequence all uses an envelope to represent that this envelope has the shape of a window function, and this window function is used to the corresponding audio information in the time domain is weighted processing.End of a period piece 146 in this sequence is with a leading piece and a follow-up piece overlaid that does not illustrate in the drawings, the overlapping amount and the selection of window function have significant impact to coding efficiency, but in implementing process of the present invention, do not require specific window function or lap.In each preferred embodiment, lap equals half of block length, and derives this window function from Caesar-Bessel function.
As mentioned above, audio signal processor 86 generations are in alignment with the audio-frequency information of each reference point of frame of video.In each embodiment that produces each sequence forms by each audio information blocks, can aim at like this, make a frame of video reference point follow any point among any of this sequence to coincide in fact.In example shown in Figure 6, the starting point of starting block 141 coincides with frame of video reference point 100.
In some applications, accurate coincide point can be different along with different frame of video.For example, in every application that digitized audio message and TSC-system formula video information is combined, because audio sample rate is not the integral multiple of the frame frequency of video, so each continuous frame of video can have the audio samples number that is changing.
Among the patent documentation WO-A99/21187 of institute's reference in the above, the various considerations about block length, window function and video/audio aligned relationship have been discussed.

Claims (8)

1. a method that is used for processing audio information is characterized in that, may further comprise the steps:
Receive a kind of input audio signal that is loaded with described audio-frequency information,
Receiver, video frame reference point, their represent the time reference at the frame of video of a sequence,
By applying a block encoding process to described input audio signal, from described audio-frequency information, produce codes audio information piece with the form that reduces bit rate, and time compression described codes audio information piece and
Each piece of described elapsed time compression is combined as one group of audio stream of having encoded, it comprises a plurality of sequences of each piece of described elapsed time compression, by means of one in time in alignment with the gap of a frame of video reference point separately, make that a starting block in a sequence separately follows an end of a period piece in a leading sequence isolated.
2. method according to claim 1 is characterized in that, described block encoding process is that described coded audio information is become a kind of form that less redundancy and/or less perception do not have correlation that contains.
3. according to the described method of claim 2, it is characterized in that described block encoding process comprises:
Apply an array of band-pass filters to described input audio signal, or apply one or more conversion, with the expression thing of a plurality of frequency subbands of producing described input audio signal and
According to psychoacoustic principle,, produce described each audio information blocks of having encoded by adaptively each Bit Allocation in Discrete being represented to described each frequency subband.
4. method according to claim 1 is characterized in that, and is further comprising the steps of: each overlapping block to described audio-frequency information applies described block encoding process.
5. a device that is used for processing audio information is characterized in that, comprising:
Be used to receive a device that is loaded with the input audio signal of described audio-frequency information,
Be used to receive the device of each frame of video reference point, described reference point is represented the time reference at a sequence video frame,
By apply to described input audio signal a block encoding process with the form that reduces bit rate from described audio-frequency information, produce each audio information blocks of having encoded and the described audio information blocks of having encoded of time compression device and
The audio information blocks of having encoded of described elapsed time compression is combined into the device of the audio stream of having encoded, described audio stream comprises the encoding block through time compression of a plurality of sequences, utilize one in time in alignment with the gap of a frame of video reference point separately, make starting block in a sequence separately with formerly an end of a period piece in the sequence is isolated at one.
6. device according to claim 5 is characterized in that, described block encoding process is that described coded audio information is become a kind of form that less redundancy and/or less perception do not have correlation that contains.
7. device according to claim 6, wherein said block encoding process comprises:
Apply an array of band-pass filters or apply one or more conversion to described input audio signal, represent with a plurality of frequency subbands that produce described input audio signal, and
According to psychoacoustic principle,, produce described each audio information blocks of having encoded by adaptively each Bit Allocation in Discrete being represented to described each frequency subband.
8. device according to claim 5 is characterized in that, the device that is used to produce the described piece of codes audio information applies described block encoding process to each overlapping block of described audio-frequency information.
CNB998037516A 1998-03-13 1999-03-11 Method of embedding compressed digital audio signals in video signal using guard bands Expired - Lifetime CN1172536C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/042,367 1998-03-13
US09/042,367 US6085163A (en) 1998-03-13 1998-03-13 Using time-aligned blocks of encoded audio in video/audio applications to facilitate audio switching

Publications (2)

Publication Number Publication Date
CN1292979A CN1292979A (en) 2001-04-25
CN1172536C true CN1172536C (en) 2004-10-20

Family

ID=21921528

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB998037516A Expired - Lifetime CN1172536C (en) 1998-03-13 1999-03-11 Method of embedding compressed digital audio signals in video signal using guard bands

Country Status (17)

Country Link
US (1) US6085163A (en)
EP (1) EP1062816B1 (en)
JP (1) JP4402834B2 (en)
KR (1) KR100675562B1 (en)
CN (1) CN1172536C (en)
AR (2) AR014716A1 (en)
AT (1) ATE247363T1 (en)
AU (1) AU760400B2 (en)
BR (1) BR9909247B1 (en)
CA (1) CA2323564C (en)
DE (1) DE69910360T2 (en)
DK (1) DK1062816T3 (en)
ES (1) ES2203101T3 (en)
HK (1) HK1036721A1 (en)
MY (1) MY125807A (en)
TW (1) TW473702B (en)
WO (1) WO1999046938A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6188987B1 (en) * 1998-11-17 2001-02-13 Dolby Laboratories Licensing Corporation Providing auxiliary information with frame-based encoded audio information
US6690428B1 (en) * 1999-09-13 2004-02-10 Nvision, Inc. Method and apparatus for embedding digital audio data in a serial digital video data stream
US8503650B2 (en) * 2001-02-27 2013-08-06 Verizon Data Services Llc Methods and systems for configuring and providing conference calls
US7277427B1 (en) * 2003-02-10 2007-10-02 Nvision, Inc. Spatially distributed routing switch
WO2006042207A1 (en) 2004-10-07 2006-04-20 Thomson Licensing Audio/video router
RU2444071C2 (en) 2006-12-12 2012-02-27 Фраунхофер-Гезелльшафт цур Фёрдерунг дер ангевандтен Encoder, decoder and methods for encoding and decoding data segments representing time-domain data stream
JP4886041B2 (en) 2006-12-20 2012-02-29 ジーブイビービー ホールディングス エス.エイ.アール.エル. Embedded audio routing selector
AU2008291065A1 (en) * 2007-12-19 2009-07-09 Interactivetv Pty Limited Device and method for synchronisation of digital video and audio streams to media presentation devices
EP2242048B1 (en) * 2008-01-09 2017-06-14 LG Electronics Inc. Method and apparatus for identifying frame type
TWI643187B (en) 2009-05-27 2018-12-01 瑞典商杜比國際公司 Systems and methods for generating a high frequency component of a signal from a low frequency component of the signal, a set-top box, a computer program product and storage medium thereof
US11657788B2 (en) 2009-05-27 2023-05-23 Dolby International Ab Efficient combined harmonic transposition

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4532556A (en) * 1983-05-20 1985-07-30 Dolby Laboratories Licensing Corporation Time-base correction of audio signals in video tape recorders
US5632005A (en) * 1991-01-08 1997-05-20 Ray Milton Dolby Encoder/decoder for multidimensional sound fields
JP3329076B2 (en) * 1994-06-27 2002-09-30 ソニー株式会社 Digital signal transmission method, digital signal transmission device, digital signal reception method, and digital signal reception device
EP0734021A3 (en) * 1995-03-23 1999-05-26 SICAN, GESELLSCHAFT FÜR SILIZIUM-ANWENDUNGEN UND CAD/CAT NIEDERSACHSEN mbH Method and apparatus for decoding of digital audio data coded in layer 1 or 2 of MPEG format
US5819215A (en) * 1995-10-13 1998-10-06 Dobson; Kurt Method and apparatus for wavelet based data compression having adaptive bit rate control for compression of digital audio or other sensory data
US5946352A (en) * 1997-05-02 1999-08-31 Texas Instruments Incorporated Method and apparatus for downmixing decoded data streams in the frequency domain prior to conversion to the time domain
US5860060A (en) * 1997-05-02 1999-01-12 Texas Instruments Incorporated Method for left/right channel self-alignment

Also Published As

Publication number Publication date
BR9909247A (en) 2000-11-28
EP1062816B1 (en) 2003-08-13
EP1062816A1 (en) 2000-12-27
KR100675562B1 (en) 2007-01-29
AU760400B2 (en) 2003-05-15
WO1999046938A1 (en) 1999-09-16
CN1292979A (en) 2001-04-25
KR20010040826A (en) 2001-05-15
ATE247363T1 (en) 2003-08-15
CA2323564A1 (en) 1999-09-16
AR021444A2 (en) 2002-07-17
BR9909247B1 (en) 2014-08-26
AR014716A1 (en) 2001-03-28
JP2002507101A (en) 2002-03-05
HK1036721A1 (en) 2002-01-11
DK1062816T3 (en) 2003-11-03
US6085163A (en) 2000-07-04
TW473702B (en) 2002-01-21
DE69910360D1 (en) 2003-09-18
CA2323564C (en) 2008-05-13
DE69910360T2 (en) 2004-06-24
AU3183099A (en) 1999-09-27
MY125807A (en) 2006-08-30
JP4402834B2 (en) 2010-01-20
ES2203101T3 (en) 2004-04-01

Similar Documents

Publication Publication Date Title
EP1142346B1 (en) Encoding auxiliary information with frame-based encoded audio information
EP1472889B1 (en) Audio coding
EP0090582B1 (en) Recording/reproducing apparatus
US8275625B2 (en) Adaptive variable bit rate audio encoding
CN1172536C (en) Method of embedding compressed digital audio signals in video signal using guard bands
CN1829333B (en) Method for generating information signal to be recorded
CN1179870A (en) Method and device for encoding, transferring and decoding non-PCM bitstream between digital versatile disc device and multi-channel reproduction apparatus
MX9801215A (en) Method and device for encoding seamless-connection system of bit stream.
US6480234B1 (en) Method and apparatus for synchronously encoding audio signals with corresponding video frames
EP0696114B1 (en) Multiplexing in a data compression and expansion system
RU2258266C2 (en) Data carrier carrying stereophonic signal and data signal, and device and method for recording and reproducing stereophonic signal and data signal on/from carrier
KR100306930B1 (en) Digital data transmitter and method for transmitting the same
MXPA00008964A (en) Method of embedding compressed digital audio signals in a video signal using guard bands
JPH08307822A (en) Recording and reproducing device for digital signal
CN1158177A (en) Various recording/reproduction modes in recording/reproducing a digital information signal and at least one digital auxiliary signal
JPH11341433A (en) Digital data transmitter
RU98102214A (en) METHOD OF SYNCHRONOUS TELECONFERENCE AND SYSTEM FOR ITS IMPLEMENTATION

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1036721

Country of ref document: HK

CX01 Expiry of patent term

Granted publication date: 20041020

CX01 Expiry of patent term