SE1930073A1

SE1930073A1 - Method and system to minimize the latency in audio signal processing

Info

Publication number: SE1930073A1
Application number: SE1930073A
Authority: SE
Inventors: Angelis Agostino De; Carlo Fischione; Nitin Kulkarni; Sharan Yagneswar; Stefano Zambon
Original assignee: Modern Ancient Instr Networked Ab
Priority date: 2019-02-28
Filing date: 2019-02-28
Publication date: 2020-08-29

Abstract

A method to minimize the latency of audio signal processing by using only a single serial communication line and a system to perform such method are here disclosed. A unique combination of operation enacted by MCUs and by CPUs are disclosed as a more efficient way to process an audio digital signal. The solution is created on a single serial communication line between the MCU and the CPU, and the CPU runs on an independent internal clock synchronized by means of a Delay Locked Loop (DLL) to the audio clock using timing drift information sent by the MCU on the same serial communication line used for digital signal transfer. A dedicated algorithm, running on the MCU with deterministic timing, contributes to the latency reduction to 2*N M sampling periods, with M being significantly smaller than M. The computational load on the CPU is also lowered as a consequence.

Description

METHOD AND SYSTEM TO MINIMIZE THE LATENCY IN AUDIOSIGNAL PROCESSING Field of lnvention The invention refers to the field of (real time) audio signal processingsystems, and specifically those systems performed by a generalpurpose Central Processing Unit (CPU) connected through a digitalserial connection to a Microcontroller Unit (MCU), such MCU beingconnected itself to one or more analog-to-digital (ADC) or digital-to- analog (DAC) converters.

Background ln the audio signal processing systems field, a common problem facedby the operator is the latency, this latency being the short period ofdelay (usually measured in milliseconds) between the time when ananalog audio signal enters a system and the time when it emergesfrom the system. Potential contributors elements causing the latencyin an audio system include, but are not limited to, analog-to-digitalconversion, buffering, digital signal processing, transmission time,digital-to-analog conversion and the speed of sound in thetransmission medium. Solving the problem of reducing latency is veryimportant in a variety of fields where audio signal are processed, forinstance in sound reinforcement systems, foldback systems, in-earmonitors, live radio, digital musical instruments, or audio broadcasted through a network, telecommunication applications, computer system with high interaction audio with humans. lt is with reference to this problem that the present invention providesa new and innovative solution which enable to achieve minimized latencies, within systems using general purpose CPU.

Prior art The prior art background is found in a variety of audio signal processingsystems or devices (US 9473869 B2; US 9552828 B2; US 8817996 B2).ln particular, document US2002/OO34273A1 describes a method foradaptive synchronization of devices connected via an Universal SerialBus (USB). The method described in US2002/OO34273A1 targets real-time applications where it is required that the data received by thereceiving device is synchronous with the data sent by the transmittingdevice, in order to avoid data corruption. The method described inUS2002/OO34273A1 is based on a dynamic correction of the USB clockfrequency in relation to the clock of the transmitting device, suchcorrection being performed at software level on the basis of the analysis of the transmitted data.

Document US2010/OO77247A1 presents an alternative method tosynchronize the data transmitted between devices connected via USB,such devices being for instance a computer and an audio interface. Themethod described in US2010/OO77247A1 specifically targets audiocontent transmissions, and relies on a circuitry that generates a clock for both the transmitting device and the USB connection.

The solutions of US2002/OO34273A1 and US2010/OO77247A1 bothhave as limitation the fact that both rely on the USB hardwareconnection and on its corresponding protocol to transmit data between MCU and CPU. Differently, the solution disclosed in the present invention allows to operate in absence of a USB protocol, or of additional circuitry (like in the case of documentUS2010/OO77247A1), because if can leverage any kind of serialprotocol. Additionally, the proposed solution has the technicaladvantage of achieving a higher synchronization accuracy and with asignificantly lower development efforts compared to software written for the USB protocol.

Another prior art teaching (Document US2002/OO34273A1) showslimitation in the fact that big corrections of the clock mismatch areaccomplished at large and variable time intervals; differently, thepresent invention is based on periodic and continuous small corrections of the clock mismatch.

Another advantage of the proposed solution is the possibility to avoidusing a USB controller encompassed into the CPU or, alternatively, anexternal chip used for the same purpose. Therefore, hardwaremanufacturers have wider production choices in terms of hardware components and cheaper CPU (where a USB controller is not included).

Document US20160034417A1 discloses the prior art which is theclosest to the present invention. Some background information is herenecessary: General purpose CPUs are able to run complex algorithmssuch as the Digital Signal Processing, using both fixed and floating pointnumerical computation, but often such general CPUs lack the requiredinterfaces for connecting multichannel audio converters. Conversely,MCUs are very versatile and with deterministic computation timingbut such MCUs have limited computational powers compared to CPU; MCUs are generally used to perform a variety of computationally 80 90 100 simpler task in a deterministic way, such as handling General PurposeInput Output and other I/O hardware connections. ln most digitalaudio systems where the signal processing algorithms run on generalpurpose CPUs, such a combination of both MCU and CPU is generallyemployed. Even in situations where there is apparently no separatephysical MCU on the board, its role is substituted by logical blocks inside a System on a Chip (SoC) or in the analog/digital converters.

According to US20160034417A1 the communication between theMCU and the CPU is performed in a synchronous way; however, suchstructuring represents a main disadvantage as it caused transmissiondelay and waste of CPU performance to handle such communication.Prior art limitations can be better understood by looking at Figure 1and Figure 2, where the behaviour of prior art systems is illustrated according to the following logical steps: a. Both an MCU (101) and a CPU (102) are configured to work withbuffers of size N and a fixed sampling rate FS. Typical values are44100 Hz, 48000 Hz for FS and 16 <= N <= 2048. b. The MCU (101) is directly connected to the audio converters(103) (104) and a precise master clock generator. At regular timeintervals equal to N / FS seconds, the MCU (101) raises ahardware lnterrupt Request Line (IRQ) (201) to signal the CPU(102) that it needs to fill a new set of buffers (202). c. The operating system (OS) running on the CPU processes theincoming IRQ (201), usually giving control to an audio driver(203) that performs the full-duplex transfers of ingoing and outgoing buffers that were computed at previous iterations. 105 110 115 120 125 130 These transfers are typically performed without CPUintervention using Direct Memory Access (DMA).d. After initiating the transfer, the audio driver (203) signals adifferent task to start the work on the computation of anothercopy of buffers (204). This double-buffering schemecomputation (204) is necessary because, with DMA systems, itis a typical requirement to not have the CPU change the contentof data being transferred. Such computation (204) must beterminated before the next interrupt is raised after N / FS seconds in order to guarantee the integrity of the audio signal.

Overall, prior art teachings show several limitations in the attempt tosolve the issue of latency: the best solution provided by the prior artreaches the latency level in the measure of (3*N) /F_<, seconds, at thelowest, with N being the buffer size, plus the overhead given by theserial communication interface and the analog/digital conversion. Thistimeframe due to the described double-buffering scheme (204) and,as a consequence, a total of 3-buffers delay results as the summary of input buffering, processing, and output buffering. lt is with reference to the issue described above and the limitations ofthe prior art that the present invention teaches an innovative methodand a system to reduce the total latency to a measure of (2*N + |\/|)/Fs, plus analog/digital conversion, with M being a small constant in the range of 2-4 samples.

Another advantage of the present invention consists in the fact that adedicated Hardware lnterrupt Request Line between the MCU and the CPU for time synchronization is unnecessary, as the system relies on 135 140 145 150 155 software-generated interrupts. Further advantages are: i) simplerHardware design for audio systems, ii) a wider range of applicationsuch as when it is not possible to have dedicated Hardware lnterruptRequest Line; iii) an overall saving of CPU time due to the fact thatsoftware-generated interrupts are generally significantly faster toprocess on modern CPUs rather than HW |RQs, which are usually multiplexed (requiring CPU to perform several operations).A preferred embodiment of the invention is described herein.

Summary of the invention A method to minimize the latency of audio signal processing by usingonly a single serial communication line and a system to perform suchmethod are here disclosed. A unique combination of operationenacted by MCUs and by CPUs are disclosed as a more efficient way toprocess an audio digital signal. The solution is created on a single serialcommunication line between the MCU and the CPU, and the CPU runson an independent internal clock synchronized by means of a DelayLocked Loop (DLL) to the audio clock using timing drift informationsent by the MCU on the same serial communication line used fordigital signal transfer. A dedicated algorithm, running on the MCU withdeterministic timing, contributes to the latency reduction to 2*N + |\/Isampling periods, with M being significantly smaller than M. The computational load on the CPU is also lowered as a consequence.

Description of the figuresNotwithstanding any other forms which may fall within the scope of the present invention, a preferred embodiment of the invention will 160 165 170 175 180 now be described, by way of example only, with reference to the accompanying figures: Figure 1 illustrates the limitations of the prior art - Figure 2 illustrates the limitations of the prior art - Figure 3 illustrates an overview of the found system in its logic interaction between the MCU and the CPU- Figure 4 illustrates the logic components of the buffer system- Figure 5 illustrates the steps executed by the found method - Figure 6 illustrates an example of Delayed Locked Loop Figure 3 shows the overall functioning of the disclosed method wherea CPU unit (102) and a MCU (101) unit are used in conjunction. lnparticular, CPU (102) runs digital signal processing (DSP) algorithms(320) on the input signal (310), generating an electric audio signal(330). Contemporary, multiple converters (340) are connected to the MCUs (102), being the MCUs very versatile regarding digital interfaces.

A first advantage of the system here disclosed consists of using anasynchronous communication scheme as a way to make a moreefficient use of respective operational advantages of the MCUs and ofthe CPUs running on different clocks, therefore compensating for transmission delay and timing inaccuracies.

A second advantage of the system here disclosed consists in the factthat only a single serial communication line (350) is necessary betweenthe MCU (101) and the CPU (102) and that such CPU (102) runs on an independent internal clock (360), such internal clock being synchronized by means of a Delay Locked Loop (DLL) (370) to the audioclock using timing drift information (380) sent by the MCU (101) on the 185 190 195 200 205 same serial communication line (350) used for digital signal transfer.Therefore, by using the system here disclosed, a separate interruptrequest line is not needed and the hardware and softwareimplementation is made simpler and can be adapted to cases where such interrupt line cannot be implemented.

Another advantage of the disclosed invention consists in the fact thatthe application of a separated algorithm (320) running on the MCU(101) with deterministic timing behaviour, allows to reduce the latencyto 2*N + M sampling periods, with M being significantly smaller thanM. As a consequence, the computational load on the CPU results lowered and the latency issue minimized.

Reference now is made to Figure 4, where the logic operations of the buffer system are described: a) The CPU (102) operates on a double set of buffers having size Nsamples for both input and output, exactly as the synchronoususe of the prior art described in Figure 1. b) Differently from the prior art, the MCU (101) implements itsbuffers using a circular First ln, First Out (FIFO) queue (401) ofsize N. c) With respect to the input signal path (110), at each samplingperiod (i.e. every 1 / FS seconds) the MCU (101) puts one audiosample from the ADC (103) into the Tx buffer and, viceversa,sends one audio sample from the Rx buffer to the DAC (104). d) Differently from the prior art approach described in Figure 1, theMCU (101) does not wait for the Tx buffer to be full to initiate a transfer. Data to/from the CPU (102) is transferred from the 210 215 220 225 230 same FIFO (401) according to the procedure described in pointc) above, but with a slightly different read/write pointer into the queue being M samples ahead of the ADC/DAC pointers.

The value of the constant M takes into account the difference in speedbetween the converters and the seria| interface between the MCU andthe CPU. lf we refer to the bandwidth of such seria| interface as vsER, inbit per second, a condition for M in the case of Kaudio channels being encoded with D bits (typically 16-32) is:M>=N *ﬂ-(D *K*F5)/V5ER) ln systems with typical values and where, as it is usually true, thebandwidth VSER can be slowed down to be as close as possible to theaudio data rate, the minimum value of M as prescribed by the previous equation is in the range of 2 to 4 audio samples.

Reference now is made to Figure 5, where the sequence of operations between the MCU (101) and the CPU (102) are described: a) MCU (101) starts in idle mode (500) waiting for CPU (102) toinitiate the first transmission, discarding all the samples coming from the ADC and sending null values to the DAC; b) The CPU (102) does an internal setup (600), configuring theinternal clock (360) to fire a timer at periodic intervals set with aperiod of N / Fs, generating Software lnterrupts to be handled periodically; c) When the timer fires for the first time, CPU initiates a duplextransfer (700) to send/receive the content of its serial controllerbuffers to the MCU; d) The MCU aligns (800) its internal buffers (810) so that the first 235 value received by the CPU is set M samples ahead of the value sentto the DAC in its internal FIFO; e) The MCU compute (900) a timing error information (910) to besent to the CPU during the next transfer (920); f) ln the following interrupts, the CPU: 240 i. initiate the transfer (1000); ii. adjust (1100) its internal timer by using the informationreceived from the MCU regarding the current timing errorand by running a program that implements a Delay LockedLoop (DLL) control system (120); 245 iii. perform DSP algorithms (1200) on its internal buffers, whichcorrespond to the transfer buffers of the previous interrupt; iv. when transfer is completed, it switches (1300) the roles of transfer and internal processing buffers.

Reference now is made to Figure 6, where an example of Delay Locked 250 Loop is provided, with k being the index for discrete sampling time: a) The CPU (102) computes its internal time tcpu(k) (2300)correcting the output of an internal oscillator, the last one beingimplemented with a unit digital delay (2100) and a multiplicativecoefficient a (2000), with the filtered error e_(k)(2900). The 255 contribute r;(k) (2200) takes into account an external noise that 260 265 270 275 280 combines errors due to drift with comparison to the referencemaster clock or CPU scheduling jitter; b) The MCU computes the error e(k) (2600) between a referencemaster clock signal tMC(k) (2500) and the CPU time tcpu(k) (2300).The measure of e(k) can be influenced by a quantization errorb(k) (2400), which can be assumed to have zero average; c) The MCU reports the computed error to the CPU, with aconstant integer delay in samples 2"” (2700) which depends onthe serial communication interface and the double-bufferingscheme employed for processing and is known in advance; d) An error filter H (2800) computes e_(k)(2900) in a way that theoverall system is numerically stable and that the effective errore(k) (2600) converges quickly to an amplitude smaller than the sampling period T= 1 / Fs There are several possible implementations for computing such a filterH so that the requirements of point d) above are satisfied. A commonimplementation employs a digital lowpass filter to attenuate therapid-changing components of the error due to jitter and other noises,so that the computed error takes into account only the slow-varyingdrift between the two clocks. ln typical scenarios with the accuracy ofCPU internal clocks and reference master clocks, the significantfrequency range for the clock drift is well below D/ FS. Therefore, usingD or a higher value for the time-constant of H when this one isimplemented as a digital lowpass filter achieves the desired result interms of stability and accuracy. ln a more general context, the problemcan be formalized with a stochastic difference equation and a general condition on H can be derived to ensure the result.

Claims

.~'\!. N .t~\.s^=='\<-< :Snus

1. ) A method of processing audio signal comprising the steps of: a. transforming an input electric signal into a sequence of bytes at a regular time interval through one Analog-to-Digital converter (ADC) (103); transforming a sequence of bytes into an output electricsignal through one Digital-to-Analog converter (DAC) (104);transferring encoded audio samples from DAC (104) andADC (103) to a general purpose CPU (102) and from ageneral purpose CPU (102) to DAC (104) and ADC (103),with the aid of a separate Microcontroller Unit (MCU) (101) at intervals of at least one sampling period; . Synchronizing the above mentioned step 1.c) with a clock synchronizer algorithm (320), said algorithm implementedby propagating a clock error from the MCU (101) to theCPU (102) using the same serial communication line (350)used for transferring audio samples and employing a signalprocessing algorithm to ensure that the error stays below the audio sampling period.

2. ) An audio signal processing system as described in claim 1 wherein: a) said CPU (102) operates on a double set of buffers having size N samples for both input and output. b) Said MCU (101) implements its buffers using a bi- directional First ln, First Out (FIFO) queue (401) of size N,implemented by any of the following alternative means: a hardware full-duplex serial interface, a couple of hardware

3. ) half-duplex interfaces in each direction, a full-duplexcommunication interface in the case of software usingshared memory between the CPU and the MCU toimplement; c) Said MCU (101) puts one audio sample from the ADC (103)into the Tx buffer and, viceversa, sends one audio samplefrom the Rx buffer to the DAC (104) at each samplingperiod (i.e. every 1 / FS seconds) of the input signal path(110). d) Said serial communication interfaces are replaced with ashared Random Access Memory (RAM) between the MCUand the CPU. e) Said MCU (101) does not wait for the Tx buffer to be full toinitiate a transfer. An audio signal processing system as described in claim 1 and/or claim 2 wherein more than one of the components ADC, DAC and MCU are employed, as long as they are all driven by a single shared master clock.

4. ) An audio signal processing system as described in claim 1 and/or claim 2 wherein the components are integrated in at least one System on Chip (SoC). A method of processing audio signal comprising the steps of: a) MCU (101) starting in idle mode (500) waiting for CPU (102)to initiate the first transmission, discarding all the samples coming from the ADC and sending null values to the DAC; b) The CPU (102) performing an internal setup (600), configuring the internal clock (360) to fire a timer at d) f) periodic intervals set with a period of N / Fs, generatingSoftware lnterrupts to be handled periodically; CPU initiating a duplex transfer (700) to send/receive thecontent of its serial controller buffers to the MCU, at themoment when the timer fires for the first time; MCU aligning (800) its internal buffers (810) in a way thatthe first value received by the CPU is set M samples aheadof the value sent to the DAC in its internal FIFO; MCU computing (900) a timing error information (910) tobe sent to the CPU during the next transfer (920); CPU performing the following circuit: i. initiating the transfer (1000); i. adjusting (1100) its internal timer by using the information received from the MCU regarding thecurrent timing error and by running a program thatimplements a Delay Locked Loop (DLL) controlsystem (120); iii. performing DSP algorithms (1200) on its internalbuffers, which correspond to the transfer buffers ofthe previous interrupt; iv. switching (1300) the roles of transfer and internalprocessing buffers at the moment when transfer is completed.