GB2522260A - Method and apparatus for determining synchronisation of audio signals - Google Patents
Method and apparatus for determining synchronisation of audio signals Download PDFInfo
- Publication number
- GB2522260A GB2522260A GB1400944.3A GB201400944A GB2522260A GB 2522260 A GB2522260 A GB 2522260A GB 201400944 A GB201400944 A GB 201400944A GB 2522260 A GB2522260 A GB 2522260A
- Authority
- GB
- United Kingdom
- Prior art keywords
- audio
- samples
- audio signal
- timing information
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/04—Synchronising
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
- H04N21/43076—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of the same content streams on multiple devices, e.g. when family members are watching the same movie on different devices
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/2368—Multiplexing of audio and video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/238—Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
- H04N21/2381—Adapting the multiplex stream to a specific network, e.g. an Internet Protocol [IP] network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/242—Synchronization processes, e.g. processing of PCR [Program Clock References]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/434—Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
- H04N21/4341—Demultiplexing of audio and video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/63—Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
- H04N21/643—Communication protocols
- H04N21/6437—Real-time Transport Protocol [RTP]
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The invention relates to providing additional synchronisation signalling within a synchronous digital audio signal. In one version samples of the audio signal are modified at a transmitter side by including timing information in place of audio information. At the receiver side this timing information is extracted and compared to a second source of timing information to determine synchronisation. This version may be referred to as a "test mode" as the quality of the audio signal is at least impaired. In a second version samples of the audio signal are modified at the transmitter side by altering the audio information so that a repeated code is provided that is spread over a plurality of samples. At the receiver side this repeated code is extracted and compared to a local version of the repeated code to determine synchronisation. This mode may be referred to as a "live mode" as the quality of the audio signal is minimally impacted.
Description
Intellectual Property Office Application No. GB1400944.3 RTM Date:14 July 2014 The following terms are registered trade marks and should be read as such wherever they occur in this document: Stagebox Intellectual Property Office is an operating name of the Patent Office www.ipo.govuk Method and Apparatus for Determining Synchronisation of Audio Signals
BACKGROUND OF THE INVENTION
S
This invention relates to timing of digital audio and/or video signals and to methods and apparatus for determining synchronisation of such signals.
Traditional production systems rely on SD! (Serial Digital Interface) routing that is point to point synchronous distribution. This can be demonstrated, in the most simple production system by connecting a camera directly to a monitor. The professional standard between these two devices is SDI.
W02013/117889 of British BroadcasUng Corporation describes a system by which signals may be converted between protocols such as SDI and infrastructure standards of P (Internet Protocol), more specifically RTP (Real Time Protocol). The device used for converbng between such protocols is referred to as a "Stagebox' and is a marked departure from the broadcast standards of SDI. There are a series of different lP encoders and decoders available on Inc market These often use propretary network orotocols to ensure correct sending and receiving. The Stagebox builds on the concept of sending and receiving video and audio across broadcast centres, and looks at the tools required by camera operators and in studios. Based lower down the food chain, the Stagebox aims to commoditise IT equipment and standards in the professional broadcast arena.
This is achieved by analysing standard methods of work for all the main genres (News, Sport, Long-form Entertainment, Live Studio Entertainment, and Single Camera Shoots) and looking at the tools' required across these genres.
Once the tools have been defined, the Stagebox has been designed, to allow easy access to these tools' over IT nfrastrucLre In addition to the technical challenges described, a primary aim of the Stagebox, is to produce an open-standard device, where possible using the Industry IT standards. This will allow further integration in the future to what-ever the industry may develop
SUMMARY OF THE INVENTION
We have appreciated problems when dehvering digital audio-video signals such as SDI across a network, particularly ir situations such as described above in which a conversion is made from one protocol to another for transmission across the network. In general, whenever an audio-video signal is delivered across a network, there is a risk that the synchronisation of the audio and/or video component may be compromised.
We have further appreciated the need for methods by which an end to end transmission across a network may be tested, whilst using existing audio-video equipment within the transmission chain, We have also appreciated the need for providing for the testing of synchronisation whilst delivering an audio-video signal without disruption to the presentation of the audiovideo content.
The improvements of the present invention are del ned in the independent claims below, to which reference may now be made. Advantageous features are set forth in the dependent claims In broad terms, the invention provides a method, an encoder! decoder and transmitter or receiver provided with functionality to alter an audio and/ or video component of a signal so as to allow for testing of synchronisation. The invention also provides a device that may be provided as an addition to a camera or to studio equipment.
The invention resides in two aspects. In a first aspect) samples of an audio component of a signal are altered such that the contents represent time information such as a timecode, sample number and channel instead of representing the original audio data. This first aspect may be referred to as a "test mode" as the quality of the audio signal is at least impaired. In a second aspect, a repeated code is provided within lower significant bits of samples of an audio component such that a receiver may align a local version of that code with the received code to determine relative synchronisation. This mode may he referred to as a live mode" as the quality of the audio signal is minimally impacted.
The invention may also be delivered by way of a method of operating any of the functionahty described above, and as a system incorporating multiple cameras, studio eqLdprnent and apparatus as described above.
BRIEF DESCRIPTION OF THE DRAWINGS
An embodiment of the invention will be described in more detail by way of example with reference to the accompanying drawings, in which: Fig. I is a schematic diagram of a system embodying the invention; Fig. 2 shows the structure of SDI signalling; Fig, 3 shows the sampling of an audio waveform; Fig. 4 shows the insertion of audio samples into ancillary data space; Fig. 5 shows the layout of audio samples within audio packets; Fig, 6 shows an example audio sample modified according to a "test mod&'; Fig. 7 shows an example audio sample modified according to a live mode"; Fig. 8 is a schematic diagram of an encoder! decoder embodying the invention.
DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION
An embodiment of the invention comprises a device that is connectable to a camera to provide modification to audiovidco signals such as conversion from signalling required by the camera to P data streams and from IP data streams to sig"alling for the camera The same device may also be used at studio equipment for converting P streams received from cameras for use by the studio equipment. As such a single type of device may be deployed at existing items of television production equipment such that transmission between devices may use I P. An advantage of an embodiment of the invention is that it allows camera equipmert of the type used n a studio environment or remotely but in ccnjuricton with a production facility to take advantage of transmission of data to and from the device over packet based networks Such a system may include multiple cameras, studio equipment and potentially one or more central servers for control, each optionally having a device embodying the invention.
The embodiment may additionally provide functionality for example coders converting will be automatically set depending upon connectivity factors such as how many cameras are detected in the system, what editing system is used and so on The server within a system can send instructions back to each device to change various settings using return packets. The cameras may be anywhere in the world and the instructions may include corrective information or other control data such as a °tafly hght".
The device may be implemented as an integral part to future cameras and studio equipment. The main embodiment that wiU be described, though, is a separate device that may be used as an add-on to existing equipment such as cameras, mixing desks and other studio equipment. We will refer to such a device herein as a "Stagebox' in keeping with our own earlier published patent apphcation W02013/1 17889.
Figure 1 shows a schematic end to end example embodying the invention. An AV device 10 such as a camera provides audio-video signals for deflvery to a destination devce 16 such as a studio monitor. The camera typicaUy provides data in SDI format. In between, a network 2 provides the delivery mechanism but, as discussod above, may degrade the synchronisation of the signals. A converter 12 on the transmitter side and converter 14 on the receiver side (which may be identical converters) are provided to modify the signals and, optionaily, provide conversion as will be described. The converters are implemented as "Stageboxes" as noted above It is noted for the avodance of doubt that the preferred arrangement involves the conversion to IP for delivery across a network, but the techniques described herein may apply equally to delivery by other network protocos and audio-video standards including arrangements in which the source standard, such as SDI, is retained in the end to end delivery.
Audio-Video Standards Methods and devices embodying the invention preferably operate using signals according to existing standards, such as SDI, and this standard will briefly be described by way of background for ease of understanding. For the avoidance of doubt, other standards are possible as are other forniats within such standards. Other examples of line numbers, frame rates, sampling rates and so on may be used and the following is just one example..
Figure 2 shows the features of an SD! signal relevant to this disclosure.
HD-SDI is defined in SMPTE ST 292-2008, and contains three man elements, video, audio and ancillary data. In this example, a video frame comprises 1125 lines of pixels of which 1080 are active video lines 20 hay ng start of active video (SAy) blocks 26 and end of active video blocks 28 which separate the active portions of lines from a horizontal blanking interval 22 (the SAV and EAV are also present in the VANC discussed below). The active portions comprise 1920 pixels..
At the end of the active video Unes for a frame there is a vertical blanking interval 24. The horizontal blanking interval 22 carries horizontal ancillary data (HANC) and the vertical blanking interval 24 carries vertical ancillary data (VANC). The horizontal ancillary data (HANC) contains packets 30 which are between the EAV and SAV and are structured as discussed below.
The following standards describe further aspects of SDI including SMPTE ST 12-2008 for timecode, SMPTE ST 272-2004 relating to placing of audio data into the video ancillary space, SMFTE ST 274-2008 relating to how the video waveforms are made up, SMPTE 291 relating to hew ancillary data packets are formed (used to carry the audio and timecode data) and SPYIPTE ST 299-2004 is how to carry 24-bit digital audio The Stagebox fully supports the standards w th regards to its different frame rates, ard resolutions for video I he Stagebox also handles its main elements.
As described in SMPTE ST 272-2004, audio data packets carry all the information in the AES bit stream. The audio data packet 30 is located in the ancillary data space 22 of the digital video on most of the television lines in a field. An audio control packet is transmitted once per field in an interlaced system and once per frame in a progressive system. The audio control packet is optional for the default case of 48-kHz synchronous audio (20 or 24 bits), and is required for all other modes of operation. Auxiliary data are carried in an extended data packet corresponoing to and imnediately following the associated audio data packet.
As described in SMPTE ST 274-2008 the 1920x1080 image structure defined in this standard is mapped onto an interface that contains 1125 total lines. A frame comprises the indicated total number of lines; each line at the interface is of equal duration determined by the interface sampling frequency and the luminance samples per total me (S/TL) Raster pixel representation at the interface is presented from left to right, and in the raster shall be presented from top to bottom Lines are numbered in time sequenGe ac&ordng to the raster structure. Each line is represented by a number of samples, equally spaced. A progressive system shall convey 1080 active picture lines per frame in order from top to bottom.
As described in SMPTE ST 291-2006 Ancillary data packets and space formatting described by this Standard reside in an Ancillary space defined by the interconnecting interface document. In general sense, Ancillary space in a serial interface is a space not used by the main data stream and is used as a transport for data associated with the main data stream. The type of payload data carried in the ancillary space is then defined in separate application documents. SAV (Start of Active Video) 26 and EAV (End of Active Video) 28 markers that mark an active digital video/data space exist on all serial digital interfaces SDl, HD-SDI and SDTI) regardless of number of TV lines of the used television system.
During a horizontal interval of every television line, the ancillary space that is located between EAV and SAV markers is called horizontal ancillary space (HANC space) Dunng a vertical blanking nterval of each frame, the ancillary space is called vertical ancillary space (VANC space).
Lastly, SMPTE ST 299-2004 defines the mapping of 24-bit AES digital audio data and associated control information into the ancillary data space of a serial digital video conforming to SMPTE ST 292-2008. Audio data derived from two channel pairs are configured in an audio data packet. Two types of ancillary data packets carrying AES audio information are defined and formatted per SMPTE ST 292-2008. Each audio data packet carries all of the information in the AES bit stream as defined by AES3. The audio data packet shall be located in the horizontal ancillary data space of the CbICr data stream. An audio control packet shall be transmitted once per field in an interlaced system and once per frame in a progressive system in the horizontal ancillary data space of the second line after the switching point of the Y data stream. Data ID are defined for four separate packets of each packet type. This aVows for up to eight channel pairs. In this standard, the audio groups are numbered 1 through 4 and the channeis are numbered 1 through 16.
An SDI formatter places the audio data packet in the horizontal ancillary space foflowing the video line curing which the audio sample occurred Following a switching point, the audio data packet is delayed one additional line to prevent data corruption Flag bit mpf defines the audio data packet positon ri the multiplexed output stream relative to the associated video data When bit mpf = 0, it nd cates the audio data packet is located immediately after the video line during which the audio sample occurred When bit mpf = 1, it indicates the audio data packet is located in the second line following the video line during which the audio sample occurred.
The audio component of an SDl signal will now be described in more detail. An audio analogue source (e.g. microphone) is typically sampled at a rate of 48kHz (approximately every 20 microseconds) with a bit depth of 24 bits for eaci sample Figure 2 shows two such samphng points A and B There are 25 frames of video per second each having 1125 (approximately 35 microseconds per line). There will therefore be one or two samples of audio per line of video.
Figure 3 shows the samphng of a waveform and Figure 4 shows how those samples are then inserted in packets within the honzontal blanking interval at the end of a respective line, Within such an SDI signal, synchronisation of the audio and video is maintaineo by virtue of the fact that the data is serial and synchronous. Decoding the audio samples as they are received within the data stream ensures that they are presented to an analogue to digital converter (ADC) within a tolerance of one line of video, namely 35 microseconds. Any variation by this amount is inaudible in the resulting audio signal.
Figure 5 shows the structure of the audio packets within the SDI standards and will be described briefly for ease of reference, though this aspect is known fully to the skilled person. An audio packet is shown in the upper part of figure 5 and comprises a payload of audio samples within words of audio data UDW2 -UDW1 7 as well as header and footer words indicating packet structure
S
The header words include ADF words comprise a start code indicating a data packet. A DID word indicates that the packet comprises audio data. A DBN word provides a sequential count. A DC word provides the packet length. CLK words provide clock or timing indication as to where in relation to the video line the sample was taken The footer words include error correctior words UDW1C UDW23 and a checksum word CS.
The specific content of the packet is shown in the table of the bottom heft of figure 5 For ease of future reference, it is noted that an audio sample of 24 bits on channel 1 is specified as having a east significant bit (LSB) at bit position b4 of word UDW2. The most significant bit MSB is specified at bit position b3 of word UDW5, Other sample bits in the 24 bit field audi -aud23 are at positions as shown in the table.
Timing We have appreciated the need to consider timing information when delivering digital audio, and in particular audio-video, across a network.
Whenever the audio component is separate from another component, such as the video component there is a risk of synchronisation loss particularly when converting between synchrorioLls devices such as cameras and an asynchronous network such as an P network. In one example, a camera may be attached to the so called Stagebox" for conversion of its output to an IF stream, and a remote control remote from the camera tray be attached to a second such Stagebox for converung between IP and control signals Each of the camera and the remote control need to be unaware of the intermediary IP network and to send and receive appropriate timing signals in the manner of a synchronous network, although the intermediary is an asychronous open standard IF network.
More generally, each device attached to an IP network requires lunctionality to provide timing. For this purpose a timing arrangement is provided.
We have appreciated that there are problems regarding timing information when data is exchanged n an asychronous network Studio equipment receiving AV feeds from multiple cameras needs a mechanism to switch between those cameras. However, data transmitted over an IF network from cameras is not guaranteed to arrive in any particular order or in a known time nterval In the absence of proper timing information, the studio equipment accordingly cannot reflably process packet streams or switch between different packets streams. A device embodying the invention incorporates a new arrangement for providing timing.
As previously described, the Stagebox' device can operate as an WI to P and P to SDI bridge on a local network, and may be used as part of the wider P Studio environment. This disclosure describes concepts addressing the problems of t ming synchronisation in an P network envircnment In this arrangement, AV material is captured, transiated into an on-the-wire format, and then transmitted to receiving device, which then translates it back to the original format In a traditional synchronous environment, the media data arive witn the same timing relationship as they are sent, so the signals themselves effectively carry their own timing. When using an asynchronous communication medium, especially a shared meciun sucn as ethernet th $ IS not possible, and so the original material must be reconstructed at the far end using a local source of timing such as a local oscillator or a genlock signai distributed via a traditional cable set up In addition the original source for eaci piece of content needs to be timed based on some sort of source, such as a local oscillator or a genlock signaL. In a traditional studio this is solved by creating a gerilock signal at a singie location and sending it to all the sources of content via a traditional cable system.
In the IP world we need a different mechanism for providing a common sense of synchronisation.
Since the ethernet medium does not provide a guaranteed fixed latency for particular connections a system making use of it must be able to cope with packets of data arriving at irregular intervals. In extreme cases packets may even arrive in an incorrect order due to hav'ng been reordered dunng transt or passed through different routes. Accordingly, any point-to-point IP Audio-visual (AV) link the receiving end must employ a buffer of data which is written to as data arrive and read from at a fixed frequency for content output. The transmitter will transmit daa at a fixed frequency, and except in cases of extreme network congestion Lhe frequency at which the data arrives will, when averaged out over time, be equal to the frequency at which the transmitter sends it. If the frequency at which the receiver processes the data is not the same as the frequency at which it arrives then the receive buffer wifi either start to fill faster than it is emptied or empty faster than it is filled. If, over time, the rate of reception averages out to be the same as the rate of orocessrig at the rece ye end then this will be a temporary effect, if the two frequencies are notably different, however, then the buffer will everituafly either empty entirely or overflow, causing disruptions in the stream of media.
First Improvement The first improvement now be described will be referred to as a test mode", as the signalling is particularly beneficial for testing synchronisation.
However, the improvement is not limited to testing purposes and can be used with a live audio-video signal. The concept behind the first improvement is that one or more of the samples of audio data within audio packets are modified to include tming information in place of audio information Such modifcatori may be applied to every sample, selected samples or potentially to random samples.
The preferred approach is to provide the timing information in place of all of the sample bits for all audio packets such that the packets no longer convey audio information, but instead convey the t'ming information II is for this reason that this improvement is referred to as a test mode.
Figure 6 shows an example SDI sfgnal modified according to the first improvement of an embodiment of the invention. As described above, an 501 signal comprises an audio component and the video component, with the audio component being arranged as samples Nith one or more sarrules provided for each line of video. A typical implementation will have audio samples approximately every 20 microseconds which happens to provide 1920 samples per frame, each sample having 24 bits and each being provided in the horizontal ancillary space following each video line. Such an arrangement intends the receiver to decode the audio samples such that synchronisation is maintained with each line of video. However, synchronisation can be lost even when transmitted over a synchronous network but can particularly be lost when the signal is converted to Internet Protocol and transmitted over an asynchronous network. For this reason, the first improvement modifies the audio component by providing a timestamp in place of audio data bits in at least some and preferably all audio samples. As a large amount of audio data bits are destroyed as a result, the signal can no longer pravda an audio output However, as the structure of the signal is retained, it can SUM be processed by any standard studio equipment using the SO! Protocol in the transmission chain.
The structure of the inserted timing information could take a variety of forms. Common to these forms is that the timing information can identify timing relative to another timing source, in particular a timecode of an associated video stream. For example, a timecode may be in the form HftMM:SS:FF: where the value is described in terms of hours, minutes, seconds and frames with the frame number being in the range 0-24. In order to correctly identify the sample position at which an audio sample was taken, we chose to provide a sample number, a channel number and frame number and count of seconds.
Referring again to Figure 6, the timing information comprises a most sign ficant digit (MSD) 42 (value 0, 1 or 2) taking 2 bits and least significant digit (LSD) 40 (value 0 to 9) taking 4 bits of a frame number count having a range 0 - 23 We thus identify which frame within a second We now need to know which second of time is identified and for that purpose the least significant digit ([SD) 44 of a second counter (value 0 to 9) is provided @ bits). This identifies which second within a range of plus or minus 5 seconds, being a 10 second range. Next we identify a channel number 46 of 16 channels (2 bits). Lastly, we identify which sample of the 1920 audio samples per frame (range 1 -1920) by two separate values, a tens value 50 having values 1 -10 and a remainder value 48 having values 1 -192 which combined give a range 1 -1920. With this information, the encoder identifies the second (within a 10 second range) the frame, the channel and the sample number to which a particular audio packet relates. The accuracy as therefore to within the sample period of the audio sampling, since the sample counter starts at the top left of a frame. As can be seen, this information totals 24 bits and therefore entirely replaces an audio sample.
A receiver can decode the timing information and use this to determine whether the audio signal originally provided by the signal has slipped i relation tc arother source of timing, in particular in relation to a video timecocie Whilst some original samples of audio data could be retained in the signal, as this technique would negatively impact the quahty of the signal, it is likely only to be used in a "test mode and in this situation you may as weD use the entirety of the available audio bits and replace each sample with a corresponding time indication described.
The above timing information may be considered a tag, label or code that indicates time relative to a video signal. Other options include setting a counter to zero for the first sample occurring every few seconds, say every 20 seconds.
Assuminq that the audio does not drift by more than plus or minus 10 seconds, then the sample is uniquely identified. Again, this provides accuracy to the nearest aduo sample, but not the precise position of that sample, and so thus could be considered a acoarser test mode.
The labelling can be further improved to provide precise sample position relative to video pixels. In the example above, there are 2640 x 1125 positions (including HANG) = 2,970,000 positions which may be represented by a 22 bit code. This could be supplied within the 24 bits audio sample field thereby uniquely identifying the pixel position at which the sample was taken. This could be considered a precise" test mode. However, whilst this identifies the position within a frame, we would no longer know which frame is presented. Accordingly) this type of code could be sent periodically amongst the coarse test type codes above, such as every few codes of even alternative between coarse type and precise type codes so as to continually provide synchronisation to both sample and pixel local accuracy.
In addition to the modification to the audio samples, this improvement also proposes providing an opt'onal add,tional tumestamp by way of a visible label within the video frame itself, This is preferably related to the timecode of the video data. We therefore have the original video timecode, a time stamp provided in the audio data according to this modification as well as a visible timecode provided in the video data frame. These may be used in conjunction at the receiver end to ensure synchronisation of the audio and video with one another as well as synchronisation of the audio channel with other audio channels provided within the same data stream, Second Improvement Figure 7 shows modification of audio samples according to a second improvement embodying the invention. This improvement provides a permanent way of verifying synchronisation of an audio signal with the accompanying video signal or indeed the audio signal with other audio channels in the same signal, or more generaly synchronisation of the audio signal with some other signal As this improvement may be used with a live audio signal without notable impairment of quality, we will refer to this improvement as a live mode.
The second improvement appreciates that the existing structure of audio samples should be retained for compatibility with existing equipment. In addition, the improvement appreciates that any modification having a detrimental effect to the audible signal at a decoder should be avoided.
As previously described! an SDI signal comprises an audio component with one sample for each of 1920 video lines. The second improvenent provides a modification to the audio samples to alter only a small number of bits for the audio samples. For example, the least significant bit could be altered tor every sample at the start of a video frame or for every N video frames A least significant bit could be altered for every sample In general, a bit or bits of lowe' significance may be periodically altered so as to orovide a digital code spread over multiple samples of the audio. As bits of lower significance are used, the effect of such changes would be inaudible at a decoder. The advantage provided is that a repeated code buried in this manner within the audio samples arid spread over many samples can be analysed by comparing to a local version of that code at the receiver to thereby align the audio samples with another signal, such as with the accompanying video signal, and other channel of the audio signal or the like.
An example will now be described for ease of understanding. If we assume that an audio signal coula drift in relation to a iiceo signal oy up 10 plus or minus 5 seconds (a 10 second spread, being a pessimistic maximum), then if the code is provided by one bit withir the audio signal per frame of video then at a rate of 25 frames per second a repeated code of 250 bits spread over the 250 frames at 1 bit per frame could be provided. Each bit could be the least significant bit of, for example, the first sample of audio for each video frame.
Figure 7 shows that within a 24 bit field one bit is the LSB 50 of the audio sample (refer again to Figure 5 to determine the bit position of the LSB as b4 of UDW2, this is not necessarily the bit shown in Figure 7) and that this position is easily determined by the receiver, Over the same period the number of audio bits transmitted comprises 1920 sampes per frame multiplied by the 250 frames at a rate of 24 bits per sample which is 11 25 milhori bits At first sight, this could create a large overhead for discovering the repeated code within the audio signal. However, this i not the case.
In the example described, the least significant bit of the samples of audio for each video frame could be extracted. To discover the relative position of the extracted bit sequence against a local version of that bit sequence simply requires a bitwise compare of the bits (in this case 250 bits) at each of the possible 250 relative positions for all 1920 samples. Put another way, a code of 250 bits is to be discovered within 1920 x 250 bit sequences at each of 250 possible positions. As it is unlikely that the bit sequence would have been damaged in any way, this can be a simple logical compare that can be provided very quickly. Once the code is found, the relative timing offset of the local code and the received code is known As the alignment s to precise bit positions, the relative offset of the audio and video can be determined to sample position accuracy.
In a sense, bits of lower significance within the audio stream are modulated according to a code that is spread across multiple samples of the audio in such a manner that a local version of that code that can be aligned with the received code so as to determine relative timing. This can be achieved by using that bits of lower significance within the audio so that the audio is not materially altered and no audible difference can be determined at a receiver. The code can be chosen to balance speed of acquisition of the coder and receiver against other constraints. For example, a different code may be selected for each channel and it may be desired that the codes are sufficiently distinct as between channel so that the receiver could accurately determine if an audio signal somehow became associated with the wrong video signal. The codes may be ARBS codes, gold codes or other sequences selected to have desired properties for acquisition at the receiver. b
The first improvement and second improvement described may be used in combination together. For example, the first improvement may be used to establish relative timing of signals and then the second improvement used to continuously track the relative timing. Other variations are possible such as periodically using the first improvement techniqe to establish pixel position accuracy of relative timing and tracking using the second improvement.
Figure 8 shows an example hardware implementation. A transmitter side converter 12 (which may be part of other equipment) comprises an AV seoarator 60 that provides an audio component to a code inserter 62 and a video component. An AV combiner 64 recombines the audio and video components.
These components could then be transmitted over a serial network. However, a de-serialiser is provided to provide an IP signal instead.
At the receiver side, a serialiser takes the IA packets and reconverts to a synchronous serial signal. An AV separator 70 then provides separate audio and video components. The audio is provided to a synchroniser that extracts the code of the first or second improvement and determines synchronisation. An AV combiner 74 can then re-insert the audio component into the AV signal with correct synchronisation established.
Claims (14)
- CLAIMS1. A method for providing additional synchronisation signalUng within a synchronous dtgital audio signal, comprising S at a transmitter side -modifying samples of the audio signal to include timing information in place of audio information; at a receiver side -extracting the timing information; -comparing the extracted timing information to a second source of timing information to determine synchronisation.
- 2. A method according to claim 1, wherein the timing information identifies a relative position of the samples of the audio signal.
- 3, A method according to claim 2, wherein the timing information identifies a relative position in relation to an accompanying video component.
- 4. A method according to any preceding claim, wherein the timing information comprises a code identifying a sample sequence.
- 5. A method according to any preceding claim, wherein the timing information comprises a code identifying a frame and other timing of an accompanying video signal.
- 6. A method according to any preceding claim, wherein the second source of timing information is a video timestarnp provided in an accompanying video component.
- 7 A method according any preceding claim, wherein the samples of audio signal are from one channel of audio and the second source of timing information is a different channel of audio.
- 8. A method according to any preceding claim, wherein the digital audio signal comprises a signal of the type having audio samples serially located within a video stream.
- 9 A method according to claim 8, wherein the digital audio signal comprises an SDI signal.
- 10. A system for providing additional synchronisation signatling within a synchronous digital audio sIgnal, comprising: at a transmitter side -means for modifying samples of the audic signal to include timing information in place of audio information; at a receiver side -means for extracting the timing information; means for comparing the extracted timing information to a second source of timing information to determine synchronisation.
- 11. A system according to claim 10, wherein the timing information identifies a relative position of the samples of the audio signal.
- 12. A system according to claim 11, wherein the timing information identifies a relative position in relation to an accompanying video component.
- 13. A system according to any of claims 10 to 12, wherein the timing nformation compnses a code identifying a sample sequence
- 14. A system according to any of claims 10 to 13, wherein the timing information comprises a code identifying a frame and other timing of an accompanying video signal.A system accord ng to any of claims 1010 14, wherein the second source of timing information is a video timestamp provided in an accompanying video component.16 A system according to any of claims 10 to 15, wherein the samples of audio signal are from one channel of audio and the second source of timing information is a different channel of audio.17. A system according to any of claims 10 to 16, wherein the digital audio signal comprises a signal of the type having audio samples serially located within a video stream.18 A system according to any of claims 17, wherein te digita: audio signal comprises an SDI signal.19. A method for providing additional synchronisation signalling within a synchronous digital audio signal, comprising at a transmitter side, modifying samples of the audio signal to include timing information in place of audio information.20. A system for providing additional synchronisation signalling within a synchronous digital audio signal, comprising at a transmitter side, means for modifying samples of the audio signal to include timing information in place of audio information.21. A method for determining synchronisation within a synchronous digital audio s gnai, in whch samples of the audio signal have been mothfied to include timing information in place of audio information, comprising; at a receiver side -extracting the timing information; -comparing the extracted timing information to a second source of timing information to determine synchronisation.22. A device for determining synchronisation within a synchronous digital audio signal, in which samples of the audio signal have been modified to include timing information in place of audio information: comprising, at a receiver side: -means for extracting the timing information; -means for comparing the extracted timing Information to a second source of timing information to determine synchronisation 23. A method for providing additional synchronisation signafling within a synchronous digital audio signal, comprising: at a transmitter side modifying samples of the audio signal to alter audio information such that a repeated code is provided spread over a plurality of samples; at a receiver side -extracting the repeated code; -comparing the extracted repeated code to a local version of the repeated code to determine synchronisation. in24. A method according to claim 23, wherein data bits that are altered are those of lower significance.A method according to claim 23, wherein the data bits that are altered are the least significant bits of samples.26. A method according to claim 23, wherein one data bit per sample is altered.27. A method according to any of claims 23 to 26, wherein the signal comprises an audio video signal and the repeated code comprises one or more bits for a sample of audio in a video frame.28. A method according to claim 27, wherein the repeated code comprises one bit per frame.29. A method according to any of claims 23 to 28, wherein the digital audio $ gnal comprises a signal of the type having audio samples serially located within a video stream and wherein tre repeated coce comprises an audio least significant bit periodically located in the stream 30. A method according to claim 29, wherein the digital audio signal comprises an SDI signal.31. A method according to any of claims 23 to 30, wherein the comparing comprises an acquisition process using a cycflc shift of the local version ol the code.37. A system for providing additional synchronisation signaDing within a synchronous digital audio signal, comprising: at a transmitter side -means for modifying samples of the audio signal to alter audio information such that a repeated code is provided spread over a plurality of samples; at a receiver side means for extracting the repeated code; means for comparing the extracted repeated code to a local version of tne repeated code to aetermine synchronisation 33. A system according to claim 32, wherein data bits that are altered are those of lower significance.34. A system according to claim 32, wherein the data bits that are altered are the least significant bits of samples.35. A system according to claim 32, wherein one data bit per sample is aftered.36 A system accordng to any of claims 32 to 35 wherein the signal comprises an audio video signal and the repeated code comprises one or more bits for a sample of audio in a video frame.37 A system according to claim 36, wherein the repeated code comprises one bit per frame.38. A system according to any of claims 32 to 37, wherein the digital audio signal comprises a signal of the type having audio samples serially located within a video stream and wherein the repeated code comprises an audio least significant bit periodically located in the stream.39 A system according to claim 39, wherein tne digita audio signal comprises an SDI signal.40 A system according to any of claims 32 to 39, wherein the comparing comprises an acquisition process using a cyclic shift of the local version of the code.41. A method for providing additional synchronisation signalling within a synchronous digital audio signal, comprising at a transmitter side: -modifying sampFes of the audio signal to alter audio information such that a repeated code is provided spread over a plurality of samples.42 A system for providing additional synchronisation signalhng w'thin a synchronous digital audio signal, comprising at a transmitter side: -means for modifying samples of the audio signal to alter audio information such that a repeated code is provided spread over a plurality of samples; 43 A method for determinng synchronisation within a synchronous digital audio signal, in which samples of the audio signal are rnodifled to ater audio information such that a repeated code is provided spread over a plurality of samples, comprising at a receiver side -extracting the repeated code; -* comparing the extracted repeated code to a local version of the repeated code to determine synchronisation.44. A device for determining synchronisation within a synchronous digitai audio signal, in which samples of the audio signal are modified to alter audio information such that a repeated code is provided spread over a plurality of samples, comprising at a receiver side: -means for extracting the repeated code; -means for comparing the extracted repeated code to a local version of the repeated code to determine synchronisation.45. A method or system according to any preceding claim, further comprising at the transmitter side converting between synchronous audio-video and asynchronous packaged data streams.46 A method or system accorthng to claim 45, wbereui the audio and video are converted to or from RTP.47. A method or system according to any preceding claim, wherein the apparatus comprises a device connectable to a video camera having connections to the interlaces, typically in the form of a separate box with attachment to the camera.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1400944.3A GB2522260A (en) | 2014-01-20 | 2014-01-20 | Method and apparatus for determining synchronisation of audio signals |
PCT/GB2015/050119 WO2015107372A1 (en) | 2014-01-20 | 2015-01-20 | Method and apparatus for determining synchronisation of audio signals |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1400944.3A GB2522260A (en) | 2014-01-20 | 2014-01-20 | Method and apparatus for determining synchronisation of audio signals |
Publications (2)
Publication Number | Publication Date |
---|---|
GB201400944D0 GB201400944D0 (en) | 2014-03-05 |
GB2522260A true GB2522260A (en) | 2015-07-22 |
Family
ID=50239201
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB1400944.3A Withdrawn GB2522260A (en) | 2014-01-20 | 2014-01-20 | Method and apparatus for determining synchronisation of audio signals |
Country Status (2)
Country | Link |
---|---|
GB (1) | GB2522260A (en) |
WO (1) | WO2015107372A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107484010B (en) * | 2017-10-09 | 2020-03-17 | 武汉斗鱼网络科技有限公司 | Video resource decoding method and device |
CN109600665B (en) * | 2018-08-01 | 2020-06-19 | 北京微播视界科技有限公司 | Method and apparatus for processing data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050219366A1 (en) * | 2004-03-31 | 2005-10-06 | Hollowbush Richard R | Digital audio-video differential delay and channel analyzer |
US20060013565A1 (en) * | 2004-06-22 | 2006-01-19 | Baumgartner Hans A | Method and apparatus for measuring and/or correcting audio/visual synchronization |
US20070126929A1 (en) * | 2003-07-01 | 2007-06-07 | Lg Electronics Inc. | Method and apparatus for testing lip-sync of digital television receiver |
US20070245222A1 (en) * | 2006-03-31 | 2007-10-18 | David Wang | Lip synchronization system and method |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7461002B2 (en) * | 2001-04-13 | 2008-12-02 | Dolby Laboratories Licensing Corporation | Method for time aligning audio signals using characterizations based on auditory events |
US7142250B1 (en) * | 2003-04-05 | 2006-11-28 | Apple Computer, Inc. | Method and apparatus for synchronizing audio and video streams |
US7359006B1 (en) * | 2003-05-20 | 2008-04-15 | Micronas Usa, Inc. | Audio module supporting audio signature |
JP2006528859A (en) * | 2003-07-25 | 2006-12-21 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Fingerprint generation and detection method and apparatus for synchronizing audio and video |
US7907212B2 (en) * | 2006-03-20 | 2011-03-15 | Vixs Systems, Inc. | Multiple path audio video synchronization |
GB2499261B (en) * | 2012-02-10 | 2016-05-04 | British Broadcasting Corp | Method and apparatus for converting audio, video and control signals |
-
2014
- 2014-01-20 GB GB1400944.3A patent/GB2522260A/en not_active Withdrawn
-
2015
- 2015-01-20 WO PCT/GB2015/050119 patent/WO2015107372A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070126929A1 (en) * | 2003-07-01 | 2007-06-07 | Lg Electronics Inc. | Method and apparatus for testing lip-sync of digital television receiver |
US20050219366A1 (en) * | 2004-03-31 | 2005-10-06 | Hollowbush Richard R | Digital audio-video differential delay and channel analyzer |
US20060013565A1 (en) * | 2004-06-22 | 2006-01-19 | Baumgartner Hans A | Method and apparatus for measuring and/or correcting audio/visual synchronization |
US20070245222A1 (en) * | 2006-03-31 | 2007-10-18 | David Wang | Lip synchronization system and method |
Also Published As
Publication number | Publication date |
---|---|
WO2015107372A1 (en) | 2015-07-23 |
GB201400944D0 (en) | 2014-03-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11095934B2 (en) | Receiving device and receiving method | |
US8422564B2 (en) | Method and apparatus for transmitting/receiving enhanced media data in digital multimedia broadcasting system | |
US8009742B2 (en) | Method and system for retransmitting internet protocol packet for terrestrial digital multimedia broadcasting service | |
EP2739043A1 (en) | Transmitting apparatus and method and receiving apparatus and method for providing a 3d service through a connection with a reference image transmitted in real time and additional image and content transmitted separately | |
US20050169269A1 (en) | Data transmission device and data transmission method | |
KR100706619B1 (en) | Apparatus for Communication and Broadcasting Using Multiplexing at MPEG-2 Transmission Convergence Layer | |
KR20120084252A (en) | Receiver for receiving a plurality of transport stream, transmitter for transmitting each of transport stream, and reproducing method thereof | |
EP1762078B1 (en) | Method for transmitting packets in a transmission system | |
JP2015149680A (en) | Transmitting device and receiving device | |
EP2343845A2 (en) | Precise compensation of video propagation duration | |
JP2012513139A (en) | Method for synchronizing transport streams in a multiplexer with an external coprocessor | |
KR20100061221A (en) | Apparatus and method for inserting or extracting a timestamp information | |
US20100172374A1 (en) | System and method for transport of a constant bit rate stream | |
GB2522260A (en) | Method and apparatus for determining synchronisation of audio signals | |
KR101131836B1 (en) | ASI Switcher for digital advertisement inserter | |
KR20170086028A (en) | Transmission device, transmission method, reception device and reception method | |
EP3280147A1 (en) | Method and apparatus for transmitting and receiving broadcast signal | |
JP3893643B2 (en) | Signal multiplexing method and transmission signal generating apparatus using the same | |
KR101879194B1 (en) | Method and Apparatus for Recovering Packet Loss | |
US20160366417A1 (en) | Method for synchronizing adaptive bitrate streams across multiple encoders with the source originating from the same baseband video | |
Edwards et al. | Elementary flows for live ip production | |
CN112272316B (en) | Multi-transmission code stream synchronous UDP distribution method and system based on video display timestamp | |
KR101011350B1 (en) | Method for creating a system clock in a receiver device and corresponding receiver device | |
EP3567863A1 (en) | Transmission arrangement for wirelessly transmitting an mpeg2-ts-compatible data stream | |
JP5307187B2 (en) | Video transmission device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WAP | Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1) |