WO2018015752A1 - Sample synchronisation - Google Patents

Sample synchronisation Download PDF

Info

Publication number
WO2018015752A1
WO2018015752A1 PCT/GB2017/052129 GB2017052129W WO2018015752A1 WO 2018015752 A1 WO2018015752 A1 WO 2018015752A1 GB 2017052129 W GB2017052129 W GB 2017052129W WO 2018015752 A1 WO2018015752 A1 WO 2018015752A1
Authority
WO
WIPO (PCT)
Prior art keywords
hash
block
audio stream
value
pcm audio
Prior art date
Application number
PCT/GB2017/052129
Other languages
French (fr)
Inventor
Malcolm Law
Original Assignee
Malcolm Law
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Malcolm Law filed Critical Malcolm Law
Publication of WO2018015752A1 publication Critical patent/WO2018015752A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L7/00Arrangements for synchronising receiver with transmitter
    • H04L7/04Speed or phase control by synchronisation signals
    • H04L7/048Speed or phase control by synchronisation signals using the properties of error detecting or error correcting codes, e.g. parity as synchronisation signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0017Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L7/00Arrangements for synchronising receiver with transmitter
    • H04L7/0054Detection of the synchronisation error by features other than the received signal transition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/436Interfacing a local distribution network, e.g. communicating with another STB or one or more peripheral devices inside the home
    • H04N21/43615Interfacing a Home Network, e.g. for connecting the client to a plurality of peripherals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams

Definitions

  • the invention relates to an efficient method of identifying a specific point in an audio stream, particularly for applying commands thereto.
  • Common data links for example USB and Bluetooth, are sometimes used as a means of transporting audio from one device to another.
  • An example of such a system would be a personal computer acting as a media server connected to a digital to analogue converter (DAC), which in turn may be connected to a number of loudspeakers.
  • DAC digital to analogue converter
  • Due to the variety of software and hardware components such a system may take, latency along the audio link, whether by USB or otherwise, is often both unknown and variable compared to control messages which may be communicated by different paths, or different protocols on the same wire.
  • Audio processing such as volume control, may be performed at various points along the system.
  • Figure 1 illustrates this situation, where both audio 4 and control 7 data are sent from a transmitter 1 to a receiver 2, which performs some action 8 on the audio governed by the control data.
  • the relative delay between the audio 4 and control 7 paths is unknown and may be variable.
  • a method of specifying a point in a PCM audio stream losslessly transmitted over a communication channel comprising the steps of:
  • a specific point in the PCM audio stream can be identified and its location transmitted with sufficient accuracy and in a computationally efficient manner.
  • the hash contained in the message allows the point to be subsequently determined, such that a receiver may apply a related command to a precise position in the audio specified by the transmitter.
  • the method comprises the further step of including the command in the message, and the point would be the position where the command is to take effect. In this way both the identifying information about the point and the related command are conveyed in a single message.
  • the method comprises the further step of maintaining a sample counter, and including the value of the sample counter at the point in the message.
  • the transmitter and receiver establish a synchronised sample counter, allowing subsequent messages to specify precise positions in the audio where commands are to take effect by referencing the value of the sample counter at that position.
  • the message may be transmitted over an asynchronous communication channel to the PCM audio stream.
  • the hash has the property that for any hash value H, block of audio sample values X, and value 2 b corresponding to a bitplane in the PCM audio, there exists a block of audio sample values X' such that the hash of X' computes as H and each audio sample value in X' is equal to the corresponding value in X plus one of 0, +2 b and -2 b . In this way, a hash collision which might lead to the receiver mistakenly identifying the point is unlikely even for repetitive audio.
  • the hash has the property that is a function of In this way, the
  • the computational cost for the receiver scanning the audio to find a block with the correct hash is kept low and independent of the block size n.
  • hash has the property that is a bit rotation of Linear hash functions of this form lead to extremely simple and computationally efficient update formulae for the receiver.
  • the hash comprises a cyclic redundancy check (CRC) and the order of the polynomial defining the CRC divides the size of the data block in bits.
  • the method may comprise the further step of ensuring that the chosen point does not correspond to a block of constant samples. Constant audio can easily be identified, which can decline to signal positions where the block is mostly zero or clipped and so does not have adequate randomness. In this way, the transmitter can wait until a block occurs which is neither silent nor clipped.
  • the method may comprise the further step of checking if the block of n samples of the PCM audio stream matches another other block of n samples. In the event of a match, the method may further comprise choosing a different point. In this way the transmitter can avoid ambiguity which might give rise to the receiver misidentifying the point. In some embodiments, the method may further comprise altering the block of n samples of the PCM audio stream before transmitting the message. In this way the transmitter can minimise the chance of the receiver mistakenly identifying the point even on exactly repeating audio, typically by making a small pseudorandom alteration. Preferably the transmitter also includes in the message information directing how the receiver is to alter the block of n samples of the PCM audio stream. In this way the transmitter can specify the inverse alteration so the receiver can restore the exact original audio before modification by the transmitter.
  • a method of identifying a point in a PCM audio stream losslessly received over a communication channel comprising the steps of:
  • a specific chosen point can be identified in a received PCM audio stream with sufficient accuracy.
  • the received message containing the hash allows the point to be so determined, such that a related command may then be applied at a precise position in the audio requested by the transmitter.
  • the method comprises the further steps of:
  • the message is received over an asynchronous communication channel to the PCM audio stream.
  • the hash may have the property that for any hash value H, block of audio sample values X, and value 2 b corresponding to a bitplane in the PCM audio, there exists a block of audio sample values X' such that the hash of X' computes as H and each audio sample value in X' is equal to the corresponding value in X plus one of 0, +2 b and -2 b .
  • the hash may have the property that
  • the hash may also have the stronger property that is a function of Additionally, the hash
  • the hash may comprise a cyclic redundancy check 'CRC and the order of the polynomial defining the CRC divides the size of the data block in bits.
  • the method of the second aspect may comprise the further step of computing the hash of a successive block of the PCM audio stream using the hash value computed for the preceding overlapping block of the PCM audio stream. In this way, the audio stream can be scanned for a block which hashes to the received value in a computationally efficient way, especially if the hash function has some of the properties disclosed above.
  • the method may comprise the further steps of retrieving information from the message directing how the block of the PCM audio stream is to be altered, and altering the block of the PCM audio stream whose hash matches the hash value. In this way, a modification made by the transmitter as described above, can be undone to give end-to-end lossless operation.
  • a transmitter adapted to specify a point in a PCM audio stream losslessly transmitted over a communication channel by performing the method of the first aspect.
  • a receiver adapted to identify a point in a PCM audio stream losslessly received over a communication channel by performing the method of the second aspect.
  • a codec comprising a transmitter according to the third aspect in combination with a receiver according to the fourth aspect.
  • a computer program product comprising instructions that when executed by a processor causes said processor to perform the method of the first or second aspects.
  • the present invention provides techniques and devices for identifying and communicating the location of a specific point in a PCM audio stream between a transmitter and a receiver with sufficient accuracy and in a computationally efficient manner. This allows a command to be related to the point so that it may subsequently be applied at the point in PCM audio stream. Further variations and embellishments will become apparent to the skilled person in light of this disclosure.
  • Figure 1 shows a transmitter and receiver where a PCM audio stream is communicated over a communications link but commands are communicated over an asynchronous communications link with unknown and possibly variable delay relative to the audio path;
  • FIG. 2 shows a transmitter and receiver according to the invention where a PCM audio stream is communicated over a communications link, as is a message containing a hash of a block of the audio stream.
  • a particular point in the audio can (with high probability) be uniquely defined by a sufficiently large block of audio samples - for example the n preceding samples, where n ⁇ 8 to ensure that even quiet audio contains sufficient randomness to make repetition of identical blocks unlikely.
  • Figure 2 shows a transmitter 1 and receiver 2 where a PCM audio stream 3 is communicated over a communications link 4. It is desired to identify a specific point 5 in the audio stream 3 that the transmitter 1 wishes to communicate to the receiver 2 by means of a message 6 sent across an asynchronous communications link 7.
  • the specific point 5 might be the point where a configuration change is specified to occur or, preferably, the message might specify the value of a sample counter at that point. Having established a common sample counter synchronised to the audio stream on both sides of the link, any subsequent messages requesting configuration changes need only specify at what value of the sample counter they should take effect.
  • this is done by computing a hash 10 of a block of audio with a known relationship to the specific point 5, and including the hash value in the message 6.
  • a hash function maps a block of data (containing audio sample values) onto a smaller domain - perhaps a 32 bit integer. No cryptographic properties are required of the hash function, but a good choice of hash function for this purpose will satisfy certain properties, as discussed below.
  • the specific point is found by looking for the block of audio that hashes to the same value 21. This is efficiently done by repeatedly updating 20 a hash value 22 using a new value of the audio and a delayed value 23 of the audio.
  • the hash function's first desirable property is updateability.
  • the receiver In order to find the block of audio which hashes to the value received in the message, the receiver needs to examine all plausible blocks of audio and compute their hash. It is onerous for the receiver to compute each hash from scratch, and far more computationally efficient if it can compute each hash by updating the preceding hash to take into account the change as the block moves by one sample, bringing in a new sample and dropping an old sample.
  • a simple hash function that illustrates this property is to represent each PCM audio sample value at time t in a 32 bit word as x t and then to sum them over the block. If we denote the hash function at time t as H t , then:
  • the hash of each block can be updated from the hash of the last one by adding the new value and subtracting the old one that no longer contributes to the result.
  • the updateability property can be expressed as requiring that there exists a function f() such that:
  • the receiver can compute the next hash value from just 3 values: the previous hash value, the new audio value that comes into the block and the old audio value that drops out of the block.
  • the receiver uses the updatability property to efficiently identify the block of audio with the correct hash value.
  • the hash is updated 20 using just 3 values: the incoming audio sample, a delayed 23 audio sample and the previous hash value.
  • the updated hash value can then be compared to the hash value in the received message to find a match and thus identify the specific point 5 in the audio received by the receiver 2.
  • summing over the block is not a good choice of hash function because if the audio is quiet, exercising few values, then there will not be much variability in the sum over a block either. Consequently, hash collisions (where two blocks of audio hash to the same value despite not being identical) are going to be quite common.
  • ROTATE is a w-bit bit rotation (either left or right) and w divides n. This can be seen to be randomising, by noting that in the first w terms of the summation there is one that maps the bit 2 b to every bit in the hash word. If we now consider the binary representation of (H - h(X)) modulo 2 W , for every set bit we can add 2 b to that audio value whilst leaving the others unchanged. This generates the required hash value H, proving that the hash is randomising.
  • Another example is to replace the summation by an XOR. This is also randomising (using a similar proof) and updateable with
  • FIG. 2 This method of receiver operation is illustrated in Figure 2, where the hash value is shown to be updated by a function taking the previous value of the hash function, the input audio and a delayed version of the input audio corresponding to the data dropping out of the hash block. Each updated value is then compared with the hash value received in a message to identify the point in time where a match occurs.
  • Some processors include built in instructions for computing cyclic redundancy checks (CRCs) and we can attempt to build an updateable hash function out of a CRC.
  • CRCs cyclic redundancy checks
  • a d bit CRC is defined by a generator polynomial
  • Such a polynomial P(y) is said to have an order m defined as the smallest positive integer m such that x m ⁇
  • the hash is updated by computing some function of 5 values: the previous hash value, the newest 2 sample values coming into the hash block and the oldest two sample values falling out of the hash block.
  • the updateability property has been weakened to accommodate the hash block length in bits not being an integer multiple of the size of each sample value.
  • hashing a block of audio fails to uniquely identify a position in the stream. This is because the audio is exactly repetitive and so nearby blocks of audio have the same hash value because they are identical. This issue can be addressed by various techniques.
  • the transmitter to make a small perturbation to the audio over which the hash is computed.
  • a good choice would be to XOR the least significant bits of some, or all, of the block of n samples with a pseudorandom sequence. This reduces the probability of a hash collision to an insignificant level at the expense of introducing a tiny amount of audio noise for a very short duration.
  • the transmitted message could also contain a field directing the receiver to alter the audio in such a way as to invert the transmitter's alterations, thus restoring overall lossless operation.
  • a suitable field might contain a seed for the pseudorandom bit generator, so that the receiver can regenerate the pseudorandom sequence and XOR the Isbs back to their original values.
  • Audio samples may also be grouped together, so that x t is a value representing a group of samples, and so the sample rate of x t is a submultiple of that of the underlying audio. This is natural if the receiver has some other means of establishing group boundaries. Further variations and embellishments to the invention will become apparent to the skilled person in light of this disclosure and the appended claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A method of specifying a point in a PCM audio stream transmitted over a communication channel comprises transmitting a message containing a hash of a block of samples of the audio stream related to the point. A corresponding value of a sample counter can also be included in the message allowing synchronisation of the counter. Conversely a method of identifying a point in a PCM audio stream received over a communication channel comprises successively updating a hash of overlapping blocks in the audio stream until the updated hash matches one received in a message. A transmitter and a receiver for performing the methods are also proposed.

Description

SAMPLE SYNCHRONISATION
Field of the Invention
The invention relates to an efficient method of identifying a specific point in an audio stream, particularly for applying commands thereto.
Background to the Invention
Common data links, for example USB and Bluetooth, are sometimes used as a means of transporting audio from one device to another. An example of such a system would be a personal computer acting as a media server connected to a digital to analogue converter (DAC), which in turn may be connected to a number of loudspeakers. Due to the variety of software and hardware components such a system may take, latency along the audio link, whether by USB or otherwise, is often both unknown and variable compared to control messages which may be communicated by different paths, or different protocols on the same wire.
Audio processing, such as volume control, may be performed at various points along the system. Figure 1 illustrates this situation, where both audio 4 and control 7 data are sent from a transmitter 1 to a receiver 2, which performs some action 8 on the audio governed by the control data. However, the relative delay between the audio 4 and control 7 paths is unknown and may be variable.
Sometimes altering the processing configuration will be in response to a user request, and the precise point in relation to the audio where the change is applied does not really matter.
However, there are also situations where an early device in the chain wishes a change to be precisely timed, and a later device in the chain implements that change. An example would be if a DAC was to change volume level in response to ReplayGain information. The change should be implemented precisely on the track boundary, as it can involve a substantial change in volume, and if the timing of the change is approximate then it could result in the end of the first track or the beginning of the second being erroneously loud or quiet. However, the location of the track boundary is not known in the continuous audio stream received by the DAC.
There is therefore a need for a practical method to effect sample accurate configuration changes across such a link, for example volume changes over a USB connection.
Summary of the Invention
According to a first aspect of the present invention there is provided a method of specifying a point in a PCM audio stream losslessly transmitted over a communication channel, the method comprising the steps of:
choosing the point in the PCM audio stream; and,
transmitting a message containing a hash of a block of n≥ 8 samples of the PCM audio stream having a predetermined relationship to the point.
In this way, a specific point in the PCM audio stream can be identified and its location transmitted with sufficient accuracy and in a computationally efficient manner. The hash contained in the message allows the point to be subsequently determined, such that a receiver may apply a related command to a precise position in the audio specified by the transmitter.
In some embodiments, the method comprises the further step of including the command in the message, and the point would be the position where the command is to take effect. In this way both the identifying information about the point and the related command are conveyed in a single message.
In some embodiments the method comprises the further step of maintaining a sample counter, and including the value of the sample counter at the point in the message. In this way, the transmitter and receiver establish a synchronised sample counter, allowing subsequent messages to specify precise positions in the audio where commands are to take effect by referencing the value of the sample counter at that position.
In some embodiments the message may be transmitted over an asynchronous communication channel to the PCM audio stream. In some preferred embodiments the hash has the property that for any hash value H, block of audio sample values X, and value 2b corresponding to a bitplane in the PCM audio, there exists a block of audio sample values X' such that the hash of X' computes as H and each audio sample value in X' is equal to the corresponding value in X plus one of 0, +2b and -2b. In this way, a hash collision which might lead to the receiver mistakenly identifying the point is unlikely even for repetitive audio. In some embodiments the hash has the property that is a function of In this way, the
Figure imgf000005_0002
Figure imgf000005_0001
computational cost for the receiver scanning the audio to find a block with the correct hash is kept low and independent of the block size n. Preferably, the hash
Figure imgf000005_0004
nas tne stronger property that h
Figure imgf000005_0003
is a function of In some of these embodiments, the
Figure imgf000005_0005
hash
Figure imgf000005_0006
has the property that
Figure imgf000005_0008
is a bit rotation of
Figure imgf000005_0007
Linear hash functions of this form lead to extremely simple and computationally efficient update formulae for the receiver.
In some embodiments the hash comprises a cyclic redundancy check (CRC) and the order of the polynomial defining the CRC divides the size of the data block in bits. The method may comprise the further step of ensuring that the chosen point does not correspond to a block of constant samples. Constant audio can easily be identified, which can decline to signal positions where the block is mostly zero or clipped and so does not have adequate randomness. In this way, the transmitter can wait until a block occurs which is neither silent nor clipped.
The method may comprise the further step of checking if the block of n samples of the PCM audio stream matches another other block of n samples. In the event of a match, the method may further comprise choosing a different point. In this way the transmitter can avoid ambiguity which might give rise to the receiver misidentifying the point. In some embodiments, the method may further comprise altering the block of n samples of the PCM audio stream before transmitting the message. In this way the transmitter can minimise the chance of the receiver mistakenly identifying the point even on exactly repeating audio, typically by making a small pseudorandom alteration. Preferably the transmitter also includes in the message information directing how the receiver is to alter the block of n samples of the PCM audio stream. In this way the transmitter can specify the inverse alteration so the receiver can restore the exact original audio before modification by the transmitter.
According to a second aspect of the present invention there is provided a method of identifying a point in a PCM audio stream losslessly received over a communication channel, the method comprising the steps of:
receiving a message containing a hash value; and,
computing a hash of successive overlapping blocks of the PCM audio stream until a block is found whose hash matches the received hash value.
In this way, a specific chosen point can be identified in a received PCM audio stream with sufficient accuracy. The received message containing the hash allows the point to be so determined, such that a related command may then be applied at a precise position in the audio requested by the transmitter.
In some embodiments the method of the second aspect comprises the further steps of:
decoding a command from the received message; and,
implementing the command at a point in the PCM audio stream having a predetermined relationship to the block whose hash matches the received hash value. In some embodiments the method comprises the further steps of:
decoding a value for a sample counter from the received message; and, setting a sample counter to the decoded value at a point in the PCM audio stream having a predetermined relationship to the block whose hash matches the received hash value. In some embodiments the message is received over an asynchronous communication channel to the PCM audio stream.
The hash may have the property that for any hash value H, block of audio sample values X, and value 2b corresponding to a bitplane in the PCM audio, there exists a block of audio sample values X' such that the hash of X' computes as H and each audio sample value in X' is equal to the corresponding value in X plus one of 0, +2b and -2b. Alternatively, the hash
Figure imgf000007_0001
may have the property that
Figure imgf000007_0002
is a function of
Figure imgf000007_0003
In this case the hash
Figure imgf000007_0004
may also have the stronger property that
Figure imgf000007_0006
is a function of
Figure imgf000007_0005
Additionally, the hash
Figure imgf000007_0007
may have the property that is a Dit rotation of
Figure imgf000007_0009
Figure imgf000007_0008
In some embodiments, the hash may comprise a cyclic redundancy check 'CRC and the order of the polynomial defining the CRC divides the size of the data block in bits. The method of the second aspect may comprise the further step of computing the hash of a successive block of the PCM audio stream using the hash value computed for the preceding overlapping block of the PCM audio stream. In this way, the audio stream can be scanned for a block which hashes to the received value in a computationally efficient way, especially if the hash function has some of the properties disclosed above.
In some embodiments, the method may comprise the further steps of retrieving information from the message directing how the block of the PCM audio stream is to be altered, and altering the block of the PCM audio stream whose hash matches the hash value. In this way, a modification made by the transmitter as described above, can be undone to give end-to-end lossless operation.
According to a third aspect of the present invention there is provided a transmitter adapted to specify a point in a PCM audio stream losslessly transmitted over a communication channel by performing the method of the first aspect. According to a fourth aspect of the present invention there is provided a receiver adapted to identify a point in a PCM audio stream losslessly received over a communication channel by performing the method of the second aspect.
According to a fifth aspect of the present invention there is provided a codec comprising a transmitter according to the third aspect in combination with a receiver according to the fourth aspect. According to a sixth aspect of the present invention there is provided a computer program product comprising instructions that when executed by a processor causes said processor to perform the method of the first or second aspects.
As will be appreciated by those skilled in the art, the present invention provides techniques and devices for identifying and communicating the location of a specific point in a PCM audio stream between a transmitter and a receiver with sufficient accuracy and in a computationally efficient manner. This allows a command to be related to the point so that it may subsequently be applied at the point in PCM audio stream. Further variations and embellishments will become apparent to the skilled person in light of this disclosure.
Brief Description of the Drawings
Examples of the present invention will be described in detail with reference to the accompanying drawings, in which:
Figure 1 shows a transmitter and receiver where a PCM audio stream is communicated over a communications link but commands are communicated over an asynchronous communications link with unknown and possibly variable delay relative to the audio path; and
Figure 2 shows a transmitter and receiver according to the invention where a PCM audio stream is communicated over a communications link, as is a message containing a hash of a block of the audio stream. Detailed Description
Except in certain circumstances (particular test signals, artificial complete silence and clipping), audio signals carry some degree of noise. Even though audio waveforms are often repetitive, it is thus unlikely that two nearby blocks of audio exactly match another and increasingly unlikely as the block size increases.
Thus, a particular point in the audio can (with high probability) be uniquely defined by a sufficiently large block of audio samples - for example the n preceding samples, where n≥ 8 to ensure that even quiet audio contains sufficient randomness to make repetition of identical blocks unlikely. However, it is excessively verbose to include a verbatim copy of a block of audio in a message to specify exactly where volume is to change.
Figure 2 shows a transmitter 1 and receiver 2 where a PCM audio stream 3 is communicated over a communications link 4. It is desired to identify a specific point 5 in the audio stream 3 that the transmitter 1 wishes to communicate to the receiver 2 by means of a message 6 sent across an asynchronous communications link 7. The specific point 5 might be the point where a configuration change is specified to occur or, preferably, the message might specify the value of a sample counter at that point. Having established a common sample counter synchronised to the audio stream on both sides of the link, any subsequent messages requesting configuration changes need only specify at what value of the sample counter they should take effect.
According to the invention, this is done by computing a hash 10 of a block of audio with a known relationship to the specific point 5, and including the hash value in the message 6. A hash function maps a block of data (containing audio sample values) onto a smaller domain - perhaps a 32 bit integer. No cryptographic properties are required of the hash function, but a good choice of hash function for this purpose will satisfy certain properties, as discussed below. In the receiver, the specific point is found by looking for the block of audio that hashes to the same value 21. This is efficiently done by repeatedly updating 20 a hash value 22 using a new value of the audio and a delayed value 23 of the audio. The hash function's first desirable property is updateability. In order to find the block of audio which hashes to the value received in the message, the receiver needs to examine all plausible blocks of audio and compute their hash. It is onerous for the receiver to compute each hash from scratch, and far more computationally efficient if it can compute each hash by updating the preceding hash to take into account the change as the block moves by one sample, bringing in a new sample and dropping an old sample.
A simple hash function that illustrates this property is to represent each PCM audio sample value at time t in a 32 bit word as xt and then to sum them over the block. If we denote the hash function at time t as Ht, then:
Figure imgf000010_0001
So the hash of each block can be updated from the hash of the last one by adding the new value and subtracting the old one that no longer contributes to the result.
More generally, the updateability property can be expressed as requiring that there exists a function f() such that:
Figure imgf000010_0002
If the hash function admits of such a formulation then, no matter how big the block is, the receiver can compute the next hash value from just 3 values: the previous hash value, the new audio value that comes into the block and the old audio value that drops out of the block.
Referring back to Figure 2, we see how the receiver uses the updatability property to efficiently identify the block of audio with the correct hash value. As every audio sample comes in, it maintains the hash value 22 corresponding to the block ending at that point. No matter what the length of the block is, the hash is updated 20 using just 3 values: the incoming audio sample, a delayed 23 audio sample and the previous hash value. The updated hash value can then be compared to the hash value in the received message to find a match and thus identify the specific point 5 in the audio received by the receiver 2. However, although it is updateable, summing over the block is not a good choice of hash function because if the audio is quiet, exercising few values, then there will not be much variability in the sum over a block either. Consequently, hash collisions (where two blocks of audio hash to the same value despite not being identical) are going to be quite common.
We also need the hash function to do a good job of mixing up the bits from the audio samples in the block so that two similar blocks that differ in low level noise have minimal chance of a hash collision, ideally not much more than 2"w, where w is the word width of the hash value in bits.
We shall call this criterion "randomising", and define a hash function to be randomising if given any block X of audio values, bitplane 2b in the PCM audio sample values and hash value H then there exists a block X' of PCM audio values which hashes to H where every audio value in X' is equal to the corresponding value in X plus 0 or ±2b.
The rationale behind this definition is that with a randomising hash, we take the bitplane 2b to match the least significant active bit in the audio, which will change with the smallest noise. The randomising property tells us that small amounts of noise in this Isb will exercise all possible hash values.
Although this is not sufficient to prove that the probability of a hash collision is 2"w, it should suffice to give reasonable confidence that it does not greatly exceed this ideal value.
A simple example of a randomising hash is:
Figure imgf000011_0001
where ROTATE is a w-bit bit rotation (either left or right) and w divides n. This can be seen to be randomising, by noting that in the first w terms of the summation there is one that maps the bit 2b to every bit in the hash word. If we now consider the binary representation of (H - h(X)) modulo 2W, for every set bit we can add 2b to that audio value whilst leaving the others unchanged. This generates the required hash value H, proving that the hash is randomising.
This hash is also updateable, since:
Figure imgf000012_0001
Another example is to replace the summation by an XOR. This is also randomising (using a similar proof) and updateable with
Figure imgf000012_0002
This method of receiver operation is illustrated in Figure 2, where the hash value is shown to be updated by a function taking the previous value of the hash function, the input audio and a delayed version of the input audio corresponding to the data dropping out of the hash block. Each updated value is then compared with the hash value received in a message to identify the point in time where a match occurs.
Whilst the above hash function satisfied our test for randomisability, one might be concerned that the hash update function is overly simple. Perhaps a more complex update function would give better scrambling of the audio bits, and so make better use of all of bits in the hash word for avoiding collisions.
Some processors include built in instructions for computing cyclic redundancy checks (CRCs) and we can attempt to build an updateable hash function out of a CRC.
A d bit CRC is defined by a generator polynomial
Figure imgf000012_0004
Figure imgf000012_0003
of degree d over the binary field GF(2) (we use y as the indeterminant to avoid confusion with our audio sample values xt). Such a polynomial P(y) is said to have an order m defined as the smallest positive integer m such that xm
1 modulo P(y).
Supposing each audio sample to be represented as a 32 bit word, we can define the hash function in a similar way to a CRC of the data block, as follows:
Figure imgf000013_0001
where all arithmetic in the above expression is in the ring of polynomials over
GF(2).
Now if we express Ht in terms of we get:
Figure imgf000013_0003
Figure imgf000013_0002
If the order m of P(y) divides 32n, then y32n≡ 1 and this simplifies to:
Figure imgf000013_0004
which would be extremely simple to implement if the processor provides a built in operation for cycling a CRC by 32 bits (the addition and subtraction in the ring are both implemented by computer XOR instructions).
There are degree 32 polynomials whose order is divisible by 32. An example is:
Figure imgf000013_0005
which has degree 1056=32*33.
However, it is very highly composite with:
Figure imgf000013_0006
which suggests that we may not be getting as much shuffling benefit as we hoped for in moving to using a CRC, and so we may prefer a polynomial whose order does not divide 32.
An example of such a polynomial is:
Figure imgf000013_0007
which has degree 1 127 = 35*32 + 7.
Using this polynomial, we would want the data over which the hash is calculated to contain 1 127 bits (or a multiple of 1 127 bits) to make the hash easy to update, but this is not a whole number of 32 bit samples. This has the effect of complicating the update formula which needs to add in the new 32 bit word xt, but take away 32 bits of data, 7 bits of which comes from xt_36 and 25 from xt_35. Alternatively, it could take away 32 bits of data from xt_36 and add in 7 bits from xt combined with 25 bits from xt_v
Neither of these modes of operation satisfies the desirable updateability test, but they do satisfy a slightly weakened updatability test:
Figure imgf000014_0001
Here the hash is updated by computing some function of 5 values: the previous hash value, the newest 2 sample values coming into the hash block and the oldest two sample values falling out of the hash block. The updateability property has been weakened to accommodate the hash block length in bits not being an integer multiple of the size of each sample value. We have discussed how a block of audio may be uniquely identified on both sides of a communication link by means of an updateable hash. The purpose of doing so is so that messages between the two ends of the communication link can identify exactly where in the audio some event relating to the audio (such as a volume change) is to occur.
This can be done by sending a message conveying that an event is to occur when the audio hashes to a particular value, but more generally useful is to split the process into two stages. The first is to establish a common sample counter using a message conveying that when the audio hashes to a particular value, then the sample counter should have a specified value. The second is to say that an event is to occur at a particular value of the sample counter.
On various test signals (e.g. an undithered sine wave whose sampling rate divides the audio sampling rate) or completely silent sections where the audio is precisely zero, or where heavy clipping occurs and the audio spends a significant amount of time at the maximum (or minimum) representable values, then hashing a block of audio fails to uniquely identify a position in the stream. This is because the audio is exactly repetitive and so nearby blocks of audio have the same hash value because they are identical. This issue can be addressed by various techniques.
Firstly, in many situations it may not matter. When the audio is completely silent, it may be unimportant exactly when a change occurs. And operation on an artificial test signal is an artificial situation which may not be important in real world operation.
However, constant audio can easily be identified in the transmitter, which can decline to signal positions where the block is mostly zero or clipped and so does not have adequate randomness. Instead the transmitter can wait till a block occurs which is neither silent nor clipped.
Exactly repetitive signals are harder to spot in the transmitter, but the transmitter could keep a record of recent hash values to ensure that a hash it proposes to transmit does not collide with another nearby section of prior audio. After transmitting a hash value it could continue to update the hash and look for a subsequent collision. If it finds one, then it could take remedial action - in the case where a sample clock is synchronised, suitable remedial action would be to send another synchronisation message to correct any mistaken synchronisation the receiver established.
Another possibility is for the transmitter to make a small perturbation to the audio over which the hash is computed. A good choice would be to XOR the least significant bits of some, or all, of the block of n samples with a pseudorandom sequence. This reduces the probability of a hash collision to an insignificant level at the expense of introducing a tiny amount of audio noise for a very short duration.
On the face of it, such a strategy would not appear suitable to situations where lossless operation is required, but the transmitted message could also contain a field directing the receiver to alter the audio in such a way as to invert the transmitter's alterations, thus restoring overall lossless operation. In the above example, a suitable field might contain a seed for the pseudorandom bit generator, so that the receiver can regenerate the pseudorandom sequence and XOR the Isbs back to their original values. It will be appreciated that the use of 32 bits for the size of the sample value and the size of the hash value is purely illustrative and other choices of sizes may be chosen. It will also be appreciated that there are many ways a 32 bit value representing a sample can be constructed from a multichannel sample of perhaps 24 bits per sample. One channel could be selected and zero/sign extended to 32 bits or multiple channels could be added or XORed, perhaps with bit shifts/rotations. Different choices will have different properties but the choice does not affect the nature of the invention. Audio samples may also be grouped together, so that xt is a value representing a group of samples, and so the sample rate of xt is a submultiple of that of the underlying audio. This is natural if the receiver has some other means of establishing group boundaries. Further variations and embellishments to the invention will become apparent to the skilled person in light of this disclosure and the appended claims.

Claims

Claims
1 . A method of specifying a point in a PCM audio stream losslessly transmitted over a communication channel, the method comprising the steps of: choosing the point in the PCM audio stream; and,
transmitting a message containing a hash of a block of n≥ 8 samples of the PCM audio stream having a predetermined relationship to the point.
2. A method according to claim 1 , the method comprising the further step of: including a command in the message to take effect at the point.
3. A method according to claim 1 or claim 2, the method comprising the further step of
maintaining a sample counter; and,
including a value of the sample counter at the point in the message.
4. A method according to any of claims 1 to 3, wherein the message is transmitted over an asynchronous communication channel to the PCM audio stream.
5. A method according to any of claims 1 to 4, wherein the hash has the following property:
for any hash value H, block of audio sample values X, and value 2b corresponding to a bitplane in the PCM audio, there exists a block of audio sample values X' such that the hash of X' computes as H and each audio sample value in X' is equal to the corresponding value in X plus one of 0, +2b and -2b.
6. A method according to any of claims 1 to 5, wherein the hash
Figure imgf000017_0005
has the following property:
Figure imgf000017_0004
7. A method according to claim 6, wherein the hash has the
Figure imgf000017_0003
following property:
is a function of
Figure imgf000017_0001
Figure imgf000017_0002
8. A method according to claim 6 or claim 7, wherein the hash has the following property:
Figure imgf000018_0001
is a bit rotation of
Figure imgf000018_0002
Figure imgf000018_0003
9. A method according to claim 6, wherein the hash comprises a cyclic redundancy check 'CRC and the order of the polynomial defining the CRC divides the size of the data block in bits.
10. A method according to any of claims 1 to 9, the method comprising the further step of ensuring that the chosen point does not correspond to a block of constant samples.
1 1 . A method according to any of claims 1 to 10, the method comprising the further step of:
checking if the block of n samples of the PCM audio stream matches another other block of n samples.
12. A method according to claim 1 1 , the method comprising the further step of: in the event of a match, choosing a different point.
13. A method according to any of claims 1 to 12, the method comprising the further step of:
altering the block of the n samples of the PCM audio stream before transmitting the message.
14. A method according to claim 13, the method comprising the further step of: including in the message information directing how the receiver is to alter the block of n samples of the PCM audio stream.
15. A method of identifying a point in a PCM audio stream losslessly received over a communication channel, the method comprising the steps of:
receiving a message containing a hash value; and,
computing a hash of successive overlapping blocks of the PCM audio stream until a block is found whose hash matches the received hash value.
16. A method according to claim 15, the method comprising the further steps of:
decoding a command from the received message; and,
implementing the command at a point in the PCM audio stream having a predetermined relationship to the block whose hash matches the received hash value.
17. A method according to claim 15 or claim 16, the method comprising the further steps of:
decoding a value for a sample counter from the received message; and, setting a sample counter to the decoded value at a point in the PCM audio stream having a predetermined relationship to the block whose hash matches the received hash value.
18. A method according to any of claims 15 to 17, wherein the message is received over an asynchronous communication channel to the PCM audio stream.
19. A method according to any of claims 15 to 18, wherein the hash has the following property:
for any hash value H, block of audio sample values X, and value 2b corresponding to a bitplane in the PCM audio, there exists a block of audio sample values X' such that the hash of X' computes as H and each audio sample value in X' is equal to the corresponding value in X plus one of 0, +2b and -2b.
20. A method according to any of claims 15 to 18, wherein the hash
Figure imgf000019_0006
has the following property:
is a function of
Figure imgf000019_0004
Figure imgf000019_0005
21. A method according to claim 20, wherein the hash nas tne
Figure imgf000019_0003
following property:
is a function of
Figure imgf000019_0001
Figure imgf000019_0002
22. A method according to claim 20 or claim 21 , wherein the hash
Figure imgf000020_0001
has the following property:
is a bit rotation of
Figure imgf000020_0002
Figure imgf000020_0003
23. A method according to claim 20, wherein the hash comprises a cyclic redundancy check 'CRC and the order of the polynomial defining the CRC divides the size of the data block in bits.
24. A method according to any one of claims 15 to 23, the method comprising the further step of:
computing the hash of a successive block of the PCM audio stream using the hash value computed for the preceding overlapping block of the PCM audio stream.
25. A method according to any one of claims 15 to 24, the method comprising the further steps of:
retrieving information from the message directing how the block of the PCM audio stream is to be altered; and,
altering the block of the PCM audio stream whose hash matches the hash value.
26. A transmitter adapted to specify a point in a PCM audio stream losslessly transmitted over a communication channel by performing the method of any one of claims 1 to 14.
27. A receiver adapted to identify a point in a PCM audio stream losslessly received over a communication channel by performing the method of any of claims 15 to 25.
28. A codec comprising a transmitter according to claim 26 in combination with a receiver according to claim 27.
29. A computer program product comprising instructions that when executed by a processor causes said processor to perform the method of any one of claims 1 to 25.
PCT/GB2017/052129 2016-07-20 2017-07-19 Sample synchronisation WO2018015752A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1612560.1A GB2552349B (en) 2016-07-20 2016-07-20 Sample synchronisation
GB1612560.1 2016-07-20

Publications (1)

Publication Number Publication Date
WO2018015752A1 true WO2018015752A1 (en) 2018-01-25

Family

ID=56890443

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2017/052129 WO2018015752A1 (en) 2016-07-20 2017-07-19 Sample synchronisation

Country Status (2)

Country Link
GB (1) GB2552349B (en)
WO (1) WO2018015752A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111261194A (en) * 2020-04-29 2020-06-09 浙江百应科技有限公司 Volume analysis method based on PCM technology

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050034005A1 (en) * 2000-06-15 2005-02-10 Doug Lauder Method and system for synchronizing data
US20100034393A1 (en) * 2008-08-06 2010-02-11 Samsung Electronics Co., Ltd. Ad-hoc adaptive wireless mobile sound system
US8341662B1 (en) * 1999-09-30 2012-12-25 International Business Machine Corporation User-controlled selective overlay in a streaming media
US20140029701A1 (en) * 2012-07-29 2014-01-30 Adam E. Newham Frame sync across multiple channels
US20160005411A1 (en) * 2013-02-13 2016-01-07 Meridian Audio Limited Versatile music distribution

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3338797B2 (en) * 1999-06-04 2002-10-28 日本電気株式会社 Apparatus and method for coping with wireless reception data deviation
US20050226601A1 (en) * 2004-04-08 2005-10-13 Alon Cohen Device, system and method for synchronizing an effect to a media presentation
US7827467B2 (en) * 2006-01-04 2010-11-02 Nokia Corporation Method for checking of video encoder and decoder state integrity
US7661121B2 (en) * 2006-06-22 2010-02-09 Tivo, Inc. In-band data recognition and synchronization system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8341662B1 (en) * 1999-09-30 2012-12-25 International Business Machine Corporation User-controlled selective overlay in a streaming media
US20050034005A1 (en) * 2000-06-15 2005-02-10 Doug Lauder Method and system for synchronizing data
US20100034393A1 (en) * 2008-08-06 2010-02-11 Samsung Electronics Co., Ltd. Ad-hoc adaptive wireless mobile sound system
US20140029701A1 (en) * 2012-07-29 2014-01-30 Adam E. Newham Frame sync across multiple channels
US20160005411A1 (en) * 2013-02-13 2016-01-07 Meridian Audio Limited Versatile music distribution

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111261194A (en) * 2020-04-29 2020-06-09 浙江百应科技有限公司 Volume analysis method based on PCM technology

Also Published As

Publication number Publication date
GB2552349B (en) 2019-05-22
GB201612560D0 (en) 2016-08-31
GB2552349A (en) 2018-01-24

Similar Documents

Publication Publication Date Title
US11558188B2 (en) Methods for secure data storage
US9727574B2 (en) System and method for applying an efficient data compression scheme to URL parameters
ES2399220T3 (en) Lost packet recovery procedure for packet transmission protocols
EP1587007A2 (en) Efficient algorithm and protocol for remote differential compression
KR20060059853A (en) Quantum key distribution method and communication device
US8671332B2 (en) Systems and methods for a rateless round robin protocol for adaptive error control
KR20170031618A (en) Network named fragments in a content centric network
CN110944012B (en) Anti-protocol analysis data secure transmission method, system and information data processing terminal
CN106656424B (en) Data transmission verification method
JP2001521196A (en) Accelerated signature verification using elliptic curves
US20050256974A1 (en) Efficient algorithm and protocol for remote differential compression on a remote device
RU2147793C1 (en) Method for decryption of repeated data packet in confidential communication system
WO2018015752A1 (en) Sample synchronisation
WO2001039434A2 (en) Packet order determining method and apparatus
US7093085B2 (en) Device and method for minimizing puncturing-caused output delay
EP1271828A1 (en) Apparatus and method for generating a checkbits for error detection using a pseudo-random sequence
US20240106669A1 (en) Methods and systems for streaming block templates with cross-references
US10608822B2 (en) Efficient calculation of message authentication codes for related data
Yazdani et al. Age of information analysis for instantly decompressible IoT protocols
US7180851B1 (en) Method for quick identification of special data packets
MX2008014753A (en) Generation of valid program clock reference time stamps for duplicate transport stream packets.
KR20060011677A (en) Apparatus and method for transmitting/receiving a header information in a wireless communication system with multi-channel structure
US20240031128A1 (en) Methods and systems for synchronizing a streamed template to a solved block
US20230421402A1 (en) Methods and systems for compressing transaction identifiers
RU2239289C2 (en) Method for transmitting digital information in feedback systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17745471

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17745471

Country of ref document: EP

Kind code of ref document: A1