WO2025059892A1 - Stereo expansion using nullformers and panning - Google Patents
Stereo expansion using nullformers and panning Download PDFInfo
- Publication number
- WO2025059892A1 WO2025059892A1 PCT/CN2023/119911 CN2023119911W WO2025059892A1 WO 2025059892 A1 WO2025059892 A1 WO 2025059892A1 CN 2023119911 W CN2023119911 W CN 2023119911W WO 2025059892 A1 WO2025059892 A1 WO 2025059892A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio signals
- nullformer
- channel
- audio
- processors
- Prior art date
Links
- 238000004091 panning Methods 0.000 title claims abstract description 57
- 230000005236 sound signal Effects 0.000 claims abstract description 214
- 238000000034 method Methods 0.000 claims abstract description 80
- 238000004422 calculation algorithm Methods 0.000 claims description 52
- 230000015654 memory Effects 0.000 claims description 36
- 230000008569 process Effects 0.000 claims description 24
- 238000012545 processing Methods 0.000 abstract description 68
- 230000006870 function Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 11
- 238000000926 separation method Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 6
- 238000001914 filtration Methods 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000001537 neural effect Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000009877 rendering Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 239000010752 BS 2869 Class D Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 239000000758 substrate Substances 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001143 conditioned effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- H04R2430/20—Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
- H04R2430/25—Array processing for suppression of unwanted side-lobes in directivity characteristics, e.g. a blocking matrix
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2499/00—Aspects covered by H04R or H04S not otherwise provided for in their subgroups
- H04R2499/10—General applications
- H04R2499/11—Transducers incorporated or for use in hand-held devices, e.g. mobile phones, PDA's, camera's
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
- H04S1/007—Two-channel systems in which the audio signals are in digital form
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/07—Synergistic effects of band splitting and sub-band processing
Definitions
- Audio playback devices are devices that can reproduce one or more audio signals, whether digital or analog signals.
- An audio playback device may include circuitry that allows a user to interact with the audio signals, such as by amplifying the signals to drive a transducer that generates the sound waves that can be perceived by a listener’s ears. Audio playback can be incorporated into a wide variety of devices.
- audio playback devices may comprise stand-alone audio devices, mobile telephones, cellular or satellite radio telephones, personal digital assistants (PDAs) , panels or tablets, gaming devices, or computing devices002E.
- PDAs personal digital assistants
- a user may desire to modify the spatial information within an audio signal.
- a stereo recording the user may desire to expand the acoustic soundstage of a stereo recording by moving the acoustic sources further left or right in the soundstage. That is, sources in the stereo recording that are acoustically located slightly or far to the left and the right of the center may be moved to the very left and the very right of the stereo soundstage, respectively.
- the movement of the spatial sources may arrange any source primarily in the left channel entirely to the left channel for reproduction at the left ear, and likewise for the right ear of the listener.
- Audio sources may be moved within an acoustic scene by separately processing the channels of the audio source through nullformers and similarity-based processing (e.g., using correlation-based algorithms) to form modified channels containing representations of the same sounds but with the sounds moved within the soundstage.
- similarity-based processing include but are not limited to (1) least square fit with allowable time adjustment: time-domain or frequency-domain; (2) correlation and co-variance approaches; and (3) higher order based approaches: cross-cumulant, empirical Kullback-Leibler Divergence, or Itakura-Saito distance.
- the processing may be applied to stereo sounds to increase a source-separation factor and to increase the quality of the perception of the sources located in the center, the left, and the right by introduced additional space between sources at the left and the right. That is a user may more easily distinguish between audio sources on the left and right of the soundstage. For instance, expanding the soundstage can increase the speech intelligibility in an environment with multiple, simultaneously active acoustic sources.
- the nullformers may reduce the contribution of a first source to the second channel and reduce contribution of a second source to the first channel.
- the nullformer may obtain this reduction in contribution of an acoustic source to a channel by applying a null to an audio signal in the direction of the acoustic source.
- the null may be applied by subtracting one channel from the other channel.
- the steering direction of the nullformer is the direction of the null.
- a beamformer constructively overlaps soundwaves coming from the steering direction, whereas a nullformer destructively overlaps these soundwaves due to the subtraction of the captured, delayed, and weighted soundwaves. Consequently, processing involving a nullformer differs from processing involving a beamformer, in which the beamformer involves summing or a weighted summing of the channels.
- a method for signal processing includes receiving a first plurality of audio signals comprising at least a first channel and a second channel; applying at least one nullformer to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the at least one nullformer reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; and applying similarity-based algorithms (e.g., correlation-based panning) to the second plurality of audio signals to determine a third plurality of audio signals.
- similarity-based algorithms e.g., correlation-based panning
- an apparatus includes one or more processors and a memory coupled to the one or more processors.
- the one or more processors is configured to perform operations including receive a first plurality of audio signals comprising at least a first channel and a second channel; apply at least one nullformer to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the at least one nullformer reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; and apply similarity-based algorithms (e.g., correlation-based panning) to the second plurality of audio signals to determine a third plurality of audio signals.
- similarity-based algorithms e.g., correlation-based panning
- an apparatus includes means for receiving a first plurality of audio signals comprising at least a first channel and a second channel; means for applying a directional null to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the null applying means reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; and means for applying similarity-based algorithms (e.g., correlation-based panning) to the second plurality of audio signals to determine a third plurality of audio signals.
- similarity-based algorithms e.g., correlation-based panning
- a non-transitory computer-readable medium stores instructions that, when executed by one or more processors, cause the processor to perform operations.
- the operations include receiving a first plurality of audio signals comprising at least a first channel and a second channel; applying at least one nullformer to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the at least one nullformer reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; and similarity-based algorithms (e.g., correlation-based panning) to the second plurality of audio signals to determine a third plurality of audio signals.
- similarity-based algorithms e.g., correlation-based panning
- Audio signal processing may be applied audio data captured by one or more microphones of the signal processing device.
- Audio signal processing devices devices that can playback, record, and/or process one or more audio recordings can be incorporated into a wide variety of devices.
- audio signal processing devices may comprise stand-alone audio devices, such as entertainment devices and personal media players, wireless communication device handsets such as mobile telephones, cellular or satellite radio telephones, personal digital assistants (PDAs) , tablets, gaming devices, computing devices such as webcams, video surveillance cameras, or other devices with audio recording or audio capabilities.
- PDAs personal digital assistants
- the audio signal processing techniques described herein may involve devices having microphones and processing circuitry (e.g., application specific integrated circuits (ASICs) , digital signal processors (DSP) , graphics processing unit (GPU) , or central processing units (CPU) ) .
- processing circuitry e.g., application specific integrated circuits (ASICs) , digital signal processors (DSP) , graphics processing unit (GPU) , or central processing units (CPU) .
- ASICs application specific integrated circuits
- DSP digital signal processors
- GPU graphics processing unit
- CPU central processing units
- a device may include a digital signal processor or a processor (e.g., an application processor) including specific functionality for audio processing.
- the methods and techniques described herein may be entirely performed by the digital signal processor or the processor, or various operations may be split between the digital signal processor and the processor, and in some aspects split across additional processors.
- the methods and techniques disclosed herein may be adapted using input from a neural signal processor (NSP) in which one or more parameters of the signal processing are controlled based on output from a machine learning (ML) model executed by the NSP.
- NSP neural signal processor
- ML machine learning
- a device configured for audio signal processing and/or audio capture.
- the apparatus includes means for recording audio.
- Example means may include a dynamic microphone, a condenser microphone, a ribbon microphone, a carbon microphone, or a crystal microphone.
- the microphone may be construed as a microelectromechanical system (MEMS) . These components may be controlled to capture first and/or second sound recordings, which may correspond to left and right channels of a recording.
- MEMS microelectromechanical system
- the microphones may include analog and/or digital microphones.
- Analog microphones provide a sensor signal, which in some embodiments is conditioned or filtered.
- Analog microphones in a digital system include an external analog-to-digital converter (ADC) to interface with digital circuitry.
- Digital microphones include the ADC and other digital elements to convert the sensor signal into a digital data stream, such as a pulse-density modulated (PDM) stream or a pulse-code modulated (PCM) stream.
- PDM pulse-density modulated
- PCM pulse-code modulated
- the method may be embedded in a computer-readable medium as computer program code comprising instructions that cause a processor to perform the steps of the method.
- the processor may be part of a mobile device including a first network adaptor configured to transmit data, such as images or videos (with associated or embedded sounds) in a recording or as streaming data, over a first network connection of a plurality of network connections; and a processor coupled to the first network adaptor and the memory.
- the processor may cause the transmission of audio signals described herein over a wireless communications network such as a 5G NR communication network.
- aspects and implementations are described in this application by illustration to some examples, those skilled in the art will understand that additional implementations and use cases may come about in many different arrangements and scenarios. Innovations described herein may be implemented across many differing platform types, devices, systems, shapes, sizes, and packaging arrangements. For example, aspects and/or uses may come about via integrated chip implementations and other non-module-component based devices (e.g., end-user devices, vehicles, communication devices, computing devices, industrial equipment, retail/purchasing devices, medical devices, artificial intelligence (AI) -enabled devices, etc. ) . While some examples may or may not be specifically directed to use cases or applications, a wide assortment of applicability of described innovations may occur.
- non-module-component based devices e.g., end-user devices, vehicles, communication devices, computing devices, industrial equipment, retail/purchasing devices, medical devices, artificial intelligence (AI) -enabled devices, etc.
- AI artificial intelligence
- Implementations may range in spectrum from chip-level or modular components to non-modular, non-chip-level implementations and further to aggregate, distributed, or original equipment manufacturer (OEM) devices or systems incorporating one or more aspects of the described innovations.
- OEM original equipment manufacturer
- devices incorporating described aspects and features may also necessarily include additional components and features for implementation and practice of claimed and described aspects. It is intended that innovations described herein may be practiced in a wide variety of devices, chip-level components, systems, distributed arrangements, end-user devices, etc. of varying sizes, shapes, and constitution.
- Figure 1 shows a block diagram of a system-on-chip (SoC) configured for performing signal processing according to one or more aspects of this disclosure.
- SoC system-on-chip
- Figure 2 is a block diagram illustrating an example data flow path for audio signal processing in a multimedia device according to one or more aspects of the disclosure.
- Figure 3 shows a flow chart of an example method for processing audio data to perform acoustic source separation and soundstage expansion according to one or more aspects of this disclosure.
- Figure 4 is a block diagram illustrating stereo processing of audio signals according to one or more aspects of the disclosure.
- Figure 5 is a block diagram illustrating stereo processing of audio signals using mu-law panning according to one or more aspects of the disclosure.
- the present disclosure provides systems, apparatus, methods, and computer-readable media that support signal processing, including techniques for expanding the acoustic scene of a stereo recording.
- aspects of this disclosure may expand the acoustic scene of a stereo recording by moving acoustic sources that are slightly or far to the left and the right of the center to the very left and the very right of the stereo soundstage, respectively, corresponding to the left ear and the right ear of the listener.
- the techniques may be applied to expand the acoustic scene in other dimensions, such as top-bottom or front-back.
- the present disclosure provides techniques for increasing the source-separation factor and increasing the quality of the perception of the sources located in the center, the left, and the right.
- the expansion introduces more virtual space between the sources at the left and the right, especially when the sources are close to the center.
- the processing described in some aspects of the disclosure provide a low-power solution that provides excellent performance in spatial expansion.
- An example device for recording sounds and/or processing sound signals using one or more microphones may include a configuration of one, two, three, four, or more microphones at different locations on the device.
- the example device may include one or more digital signal processors (DSPs) , AI engines, or other suitable circuitry for processing signals captured by the microphones.
- DSPs digital signal processors
- the one or more digital signal processors (DSPs) may output signals representing sounds through a bus for storage in a memory, for reproduction by an audio system, and/or for further processing by other components (such as an applications processor) .
- the processing circuitry may perform further processing, such as for encoding, storage, transmission, or other manipulation of the audio signals.
- the example device may include audio circuitry including an audio amplifier (e.g., a class-D amplifier) for driving a transducer to reproduce the sounds represented by the audio signals.
- a speaker may be integrated with the device and coupled to the audio amplifier to be driven by the audio amplifier for reproducing the sounds.
- a connection may be provided by a jack or other connector on the device to couple an external transducer (e.g., an external speaker or headphones) to the audio amplifier to be driven by the audio circuitry to reproducing the sounds.
- the jack may instead output a digital signal for conversion and amplification by an external device, such as when the jack is configured to be coupled to a digital device through a Universal Serial Bus (USB) Type-C (USB-C) connection and some or all of the audio circuitry is bypassed.
- USB Universal Serial Bus
- USB-C Universal Serial Bus Type-C
- FIG. 1 shows a block diagram of a system-on-chip (SoC) configured for performing signal processing according to one or more aspects of this disclosure.
- the SoC 100 may include several components coupled together through a bus 102, which may be a network-on-a-chip (NoC) or a plurality of NOCs interconnecting various components.
- NoC network-on-a-chip
- Figure 1 illustrates several components coupled to the bus 102, the several components may be coupled to different busses with additional busses connecting the different busses to provide a path for communication between the components.
- the DSP 112 may process audio signals received from microphones 130A, 130B, and 130C of microphone array 130.
- the DSP 112 may include hardware customized for performing a limited set of operations on specific kinds of data.
- a DSP may include transistors coupled together to perform operations on streaming data and use memory architectures and/or access techniques to fetch multiple data or instructions concurrently. Such configurations may allow the DSP 112 to operate on real-time data, such as video data, audio data, or modem data, in a power-efficient manner.
- the SoC 100 also includes a central processing unit (CPU) 104 and a memory 106 storing instructions 108 (e.g., a memory storing processor-readable code or a non-transitory computer-readable medium storing instructions) that may be executed by a processor of the SoC 100.
- the CPU 104 may be a single central processing unit (CPU) or a CPU cluster comprising two or more cores such as core 104A.
- the CPU 104 may include hardware capable of performing generic operations on many kinds of data, such as hardware capable of executing instructions from the Advanced RISC Machines instruction set, such as ARMv8, ARMv9.
- a CPU 104 may include transistors coupled together to perform operations for supporting executing an operating system and user applications (e.g., a camera application, a multimedia application, a gaming application, a productivity application, a messaging application, a videocall application, an audio recording application, a video recording application) .
- the CPU 104 may execute instructions 108 retrieved from the memory 106.
- the CPU 104 executing an operating system may coordinate execution of instructions by various components within the SoC 100. For example, the CPU 104 may retrieve instructions 108 from memory 106 and execute the instructions on the DSP 112.
- the SoC 100 may further include a neural signal processor (NSP) 124 for executing machine learning (ML) models relating to multimedia applications.
- the NSP 124 may include hardware configured to perform and accelerate convolution operations involved in executing machine learning algorithms. For example, the NSP 124 may improve performance when executing predictive models such as artificial neural networks (ANNs) (including multilayer feedforward neural networks (MLFFNN) , the recurrent neural networks (RNN) , and/or the radial basis functions (RBF) ) .
- ANNs artificial neural networks
- MLFFNN multilayer feedforward neural networks
- RNN recurrent neural networks
- RBF radial basis functions
- the ANN executed by the NSP 124 may access predefined training weights stored in the memory 106 for performing operations on user data.
- the SoC 100 may be coupled to a display 114 for interacting with a user.
- the SoC 100 may also include a graphics processing unit (GPU) 126 for rendering images on the display 114.
- the CPU 104 may perform rendering to the display 114 without a GPU 126.
- the GPU 126 may be configured to execute instructions for performing operations unrelated to rendering images, such as for processing large volumes of datasets in parallel.
- Processing algorithms, techniques, and methods that are described herein may be executed by one or more processors of the SoC 100, which may include execution by all steps on one of the processors (e.g., DSP 112, CPU 104, NSP 124, GPU 126) or may include execution of steps across a combination of one or more of the processors (e.g., DSP 112, CPU 104, NSP 124, GPU 126) .
- the DSP 112 or the CPU 104 executes instructions to perform various operations described herein, including spatial expanding of audio signals to move acoustic sources within a soundstage.
- execution of the instructions by the CPU 104 as part of a multimedia application may instruct the DSP 112 to begin or end capturing audio from one or more microphones 130A-C.
- the operations of the CPU 104 may be based on user input.
- a voice recorder application executing on CPU 104 may receive a user command to begin a voice recording upon which audio comprising one or more channels is captured and processed for playback and/or storage. Audio processing to determine “output” or “corrected” signals, such as according to techniques described herein, may be applied to one or more segments of audio in the recording sequence.
- Input/output components may be coupled to the SoC 100 through an input/output (I/O) hub 116.
- An example of a hub 116 is an interconnect to a peripheral component interconnect express (PCIe) bus.
- Example components coupled to hub 116 may be components used for interacting with a user, such as a touch screen interface and/or physical buttons.
- Some components coupled to hub 116 may also include network interfaces for communicating with other devices, including a wide area network (WAN) adaptor (e.g., WAN adaptor 152) , a local area network (LAN) adaptor (e.g., LAN adaptor 153) , and/or a personal area network (PAN) adaptor (e.g., PAN adaptor 154) .
- WAN wide area network
- LAN local area network
- PAN personal area network
- a WAN adaptor 152 may be a 4G LTE or a 5G NR wireless network adaptor.
- a LAN adaptor 153 may be an IEEE 802.11 WiFi wireless network adapter.
- a PAN adaptor 154 may be a Bluetooth wireless network adaptor.
- Each of the WAN adaptor 152, LAN adaptor 153, and/or PAN adaptor 154 may be coupled to an antenna that may be shared by each of the adaptors 152, 153, and 154, or coupled to multiple antennas configured for primary and diversity reception and/or configured for receiving specific frequency bands.
- the WAN adaptor 152, LAN adaptor 153, and/or PAN adaptor 154 may share circuitry, such as portions of a radio frequency front end (RFFE) .
- RFFE radio frequency front end
- Audio circuitry 156 may be integrated in SoC 100 as dedicated circuitry for coupling the SoC 100 to a speaker 120 external to the SoC 100, which may be a transducer such as a speaker (either internal to or external to a device incorporating the SoC 100) or headphones.
- the audio circuitry 156 may include coder/decoder (CODEC) functionality for processing digital audio signals.
- the audio circuitry 156 may further include one or more amplifiers (e.g., a class-D amplifier) for driving a transducer coupled to the SoC 100 for outputting sounds generated during execution of applications by the SoC 100.
- Functionality related to audio signals described herein may be performed by a combination of the audio circuitry 156 and/or other processors of the SoC (e.g., CPU 104, DSP 112, GPU 126, NSP 124) .
- the SoC 100 may couple to external devices outside the package of the SoC 100.
- the SoC 100 may be coupled to a power supply 118, such as a battery or an adaptor to couple the SoC 100 to an energy source.
- the signal processing described herein may be adapted to and achieve power efficiency to support operation of the SoC 100 from a limited-capacity power supply 118 such as a battery.
- operations may be performed on a portion of the SoC 100 configured for performing the operation at a lowest power consumption.
- operations themselves are performed in a manner that reduces a number of computations to perform the operation, such that the algorithm is optimized for extending the operational time of a device while powered by a limited-capacity power supply 118.
- the operations described herein may be configured based on a type of power supply 118 providing energy to the SoC 100. For example, a first set of operations may be executed to perform a function when the power supply 118 is a wall adaptor. As another example, a second set of operations may be executed to perform a function when the power supply 118 is a battery.
- the SoC 100 may also include or be coupled to additional features or components that are not shown in Figure 1. Although components are shown integrated as a single SoC 100, which may include all components built on a single semiconductor die with a common semiconductor substrate, other arrangements of the illustrated blocks different number of dies, substrates, and/or packages may be arranged to accomplish the same functionality described in this disclosure.
- the memory 106 may include a non-transient or non-transitory computer readable medium storing computer-executable instructions as instructions 108 to perform all or a portion of one or more operations described in this disclosure.
- the instructions 108 may include a multimedia application (or other suitable application such as a messaging application) to be executed by the SoC 100 that records, processes, or outputs audio signals.
- the instructions 108 may also include other applications or programs executed by the SoC 100, such as an operating system and applications other than for multimedia processing.
- the memory 106 may also store audio data.
- the SoC 100 may be coupled to an external memory and configured to access the memory for writing output audio files for later playback or long-term storage.
- the SoC 100 may be coupled to a flash storage device comprising NAND memory for storing video files (e.g., MP4-container formatted files) including audio tracks and/or storing audio recordings (e.g., MPEG-1 Layer 3 files, also referred to as MP3 files) .
- video files e.g., MP4-container formatted files
- audio recordings e.g., MPEG-1 Layer 3 files, also referred to as MP3 files
- Portions of the video or audio files may be transferred to memory 106 for processing by the SoC 100, with the resulting signals after processing encoded as video or audio files in the memory 106 for transfer to the long-term storage.
- SoC 100 is referred to in the examples herein for performing aspects of the present disclosure, some device components may not be shown in Figure 1 to prevent obscuring aspects of the present disclosure. Additionally, other components, numbers of components, or combinations of components may be included in a suitable device for performing aspects of the present disclosure. As such, the present disclosure is not limited to a specific device or configuration of components, including the SoC 100.
- the SoC of Figure 1 may be operated to obtain improved audio recordings and/or improved user experience through higher quality audio playback by applying spatial expanding to increase differences between left and right channels in a stereo audio signal.
- One example method of performing multimedia operations is shown in Figure 2 and described below.
- FIG. 2 is a block diagram illustrating an example data flow path for audio signal processing in a multimedia device according to one or more aspects of the disclosure.
- SoC 100 of device 200 may execute multimedia control 210, such as part of an operating system or driver, to control the capture of sounds from microphones or other audio sources and/or to control the configuration of audio processing circuitry 156.
- the audio configuration applied by multimedia control 210 to either output devices (e.g., speakers) or input devices (e.g., microphones) may include parameters that specify, for example, a bit depth, a sampling rate, a data rate, a magnitude, or other parameters.
- Multimedia control 210 may be managed by or provide services to a multimedia application 204.
- the multimedia application 204 may also execute on the SoC 100.
- the multimedia application 204 provides settings accessible to a user such that a user can specify individual playback settings or select a profile with corresponding playback settings.
- the multimedia application 204 may be, for example, a video recording application, a screen sharing application, a virtual conferencing application, an audio playback application, a messaging application, a video communications application, or other application that processes audio data.
- the multimedia application 204 may include spatial expander 206 to improve the quality of audio presented to the user during execution of multimedia application 204.
- the spatial expander 206 may perform one or more or a combination of the techniques described herein.
- the device 200 of Figure 2 may be configured to perform the operations described with reference to Figure 3 to determine an audio signal.
- Figure 3 shows a flow chart 300 of an example method for processing audio data to perform acoustic source separation and soundstage expansion according to one or more aspects of this disclosure.
- the operations of Figure 3 may result in an audio signal with improved representation of sounds, which results in an improved user experience.
- Each of the operations described with reference to Figure 3 may be performed by one or a combination of the processors of the SoC 100.
- first and second audio signals are received, in which the first and second audio signals may correspond to left and right channels of an audio representation.
- the audio data may be received, for example, from microphones.
- the audio data may alternatively be received from a wireless microphone, in which the audio data is received through one or more of the WAN adaptor 152, the LAN adaptor 153, and/or the PAN adaptor 154.
- the audio data may alternatively be received from a memory location or a network storage location, such as when the audio signal was previously captured and is now retrieved from memory 106 and/or from a remote location through one or more of the WAN adaptor 152, the LAN adaptor 153, and/or the PAN adaptor 154.
- the capture or retrieval of audio signals may be initiated by multimedia application 204 executing on the SoC 100.
- Audio data comprising the audio signals, may be retrieved at block 302 and further processed by the SoC 100 according to the operations described in one or more of the following blocks.
- spatio-temporal filtering is applied to the first plurality of audio signals received at block 302 to determine a second plurality of audio signals (e.g., filtered audio signals) .
- the filtering may apply one or more nullformers to the first plurality of audio signals or signals derived from the first plurality of audio signals.
- the outputs y i of filtering an input audio signal x with a plurality i of nullformers h i may be expressed as: where k and n denote the frequency index and the time frame, respectively.
- f s is a sampling frequency
- c is the speed of sound
- p m is a distance between an m-th microphone and an acoustic source
- p c is a distance between a center of a coordinate system and a center of a microphone array recording the first plurality of audio signals
- p si is the distance between the center of the coordinate system and source i
- N k is a number of frequencies
- k is an index of one of the N k number of frequencies
- N m is the number of microphones
- ( ⁇ ) T is the transpose of a vector or matrix.
- the filtered output may be mixed with the microphone signals to obtain a signal expressed as:
- a panning algorithm (such as correlation-based panning, including mu-law panning, or another algorithm for identifying similarities and differences between signals) is applied to the spatio-temporal filtered audio signals computed at block 304.
- the operation of block 306 may be applied to the resulting combination signal y 1 .
- the result of the correlation-based panning of block 306 is a third plurality of audio signals with an expanded soundstage.
- the nullformers and panning may be configured through appropriate parameters to move acoustic sources within a certain threshold range of the center of a soundstage to a further distance from the center of the soundstage.
- the signal processing may be configured to move signals within a range of 5 degrees to 15 degrees left of the soundstage (centered at zero degrees) to a location a predefined amount (e.g., 25 degrees) further left of the soundstage.
- the predefined amount may be a constant value or may be a variable amount proportional to the distance the acoustic source is located from the center.
- the signal processing may also reduce or eliminate acoustic sources present within a threshold range around the center, such as to reduce or eliminate center channel acoustic sources that are within +5 to -5 degrees around the center of the soundstage.
- One application of spatio-temporal filtering of block 306 includes computing statistics for the estimated cross-covariance function R x1x2 of the two channels, the estimated variance ⁇ x1 of the first channel, and the estimated variance ⁇ x2 of the second channel, such as from the following equations:
- Panning masks may be defined using the panning function for each channel as:
- output audio signals with the expanded soundstage may be represented as:
- DSP digital signal processor
- the operations described with reference to blocks 302, 304, and 306 of Figure 3 may be performed on a digital signal processor (DSP) , such as DSP 112 of the SoC 100 illustrated in Figure 1.
- the operations may alternatively be performed by one or more of the processors of Figure 1, including one or more of the CPU 104, the DSP 112, the GPU 126, or the NSP 124.
- the CPU 104 may record audio signals from the microphone array 130 to memory 106 as part of the operations of block 302.
- the DSP 112 may then perform the operations of blocks 304 and 306 on the audio signals stored in memory 106, after which output signals determined by the DSP 112 may be stored in memory 106, output to audio circuitry 156 for reproduction, and/or transmitted to another device through one or more of the WAN 152, LAN 153, and/or PAN 154.
- the processor performing the operations of blocks 302, 304, and/or 306 may be dedicated logic circuitry for performing certain operations.
- FIG. 4 is a block diagram illustrating stereo processing of audio signals according to one or more aspects of the disclosure.
- the device 200 includes two microphones 130A, 130B, which may be configured as described with reference to the examples of Figure 1 or Figure 2.
- the microphones 130A, 130B may record sounds from soundstage 432, which includes acoustic source 432C at a center of the soundstage 432, acoustic source 432L slightly to the left of the soundstage 432, and acoustic source 432R slightly to the right of the soundstage 432.
- the two microphones 130A, 130B output a first plurality of audio signals 402.
- the number of signals in the first plurality of audio signals 402 may match the number of microphones 130A, 130B.
- the example of Figure 4 illustrates a two-channel stereo configuration with left and right channels.
- the example of Figure 4 may be extended to include additional microphones and additional signals, such as a center channel, a high channel, a low channel, a high-right channel, a high-left channel, and/or additional channels.
- the first plurality of audio signals 402 may be applied to a spatio-temporal filter 410, which may include, among other circuitry, nullformers for adding directionality to the microphones 130A, 130B.
- the directionality may be configured to reduce the strength of left-channel acoustic sources in the right channel and reducing the strength of right-channel acoustic sources in the left channel.
- the spatio-temporal filter 410 may be configured to reduce the contribution of acoustic source 432L to the right-channel of the first plurality of audio signals 402 and to reduce the contribution of acoustic source 432R to the left-channel of the first plurality of audio signals 402.
- Such a configuration for the spatio-temporal filter 410 may include one or more nullformers.
- nullformers 412A, 412B may be configured to process the audio signals 402 based on one or more nullformer parameters to generate a null audio signals 404 having a null in a direction corresponding to the one or more nullformer parameters.
- the null applied by the nullformer may be configured to reduce sensitivity of a left-channel microphone to audio sources at the right of the soundstage 432 and to reduce sensitivity of a right-channel microphone to audio sources at the left of the soundstage 432.
- nullformer may be apply a null in at least one direction of each of the microphones 130A-B.
- the nullformer may apply a null in the left direction of a first channel of a plurality of audio signals and a null in the right direction of a second channel of a plurality of audio signals.
- the null may be accomplished by subtracting one of the channels from the other of the channels.
- the nullformer may subtract the second channel from the first channel to determine a first output and may subtract the first channel from the second channel to determine a second output.
- two or more nullformers may be coupled in parallel to receive and process the first plurality of audio signals 402.
- the parallel nullformers 412A, 412B may process different frequencies using different or same parameters.
- a cut-off frequency may be used to divide the first plurality of audio signals 402 into a low-frequency (LF) portion and a high-frequency (HF) portion.
- An example cut-off frequency is 2 kHz.
- One implementation of such a cut-off filter is a combination of low-pass filter 408A filtering input to the LF NF 412A and high-pass filter 408B filtering input to the HF NF 412B.
- the output of the nullformers 412A, 412B may be further processed in the spatio-temporal filter 410.
- the spatio-temporal filter 410 may combine the second plurality of audio signals 404 with the first plurality of audio signals 402 before being output from the spatio-temporal filter 410 (e.g., before applying correlation-based panning 420 to the second plurality of audio signals 404) .
- the combining of the two pluralities of audio signals may be performed according to weights, and a different weight assigned to the mixing of the low-frequency audio signals than the high-frequency audio signals.
- the low-frequency audio signal of the second plurality of audio signals 404 may be mixed at a ratio of ⁇ to 1- ⁇ with the corresponding signal of the first plurality of audio signals 402.
- the high-frequency audio signal of the second plurality of audio signals 404 may be mixed at a ratio of ⁇ to 1- ⁇ with the corresponding signal of the first plurality of audio signals 402.
- adaptive filters 422A, 422B may receive the output of the spatio-temporal filter 410 in parallel with the correlation-based panning algorithm 420.
- the adaptive filters 422A and 422B may apply configuration parameters to the panning algorithm 420 that are determined to reduce or minimize correlation between both nullformer signals to enhance the stereo separation achieved by the correlation-based panning algorithm 420.
- the adaptive filters 422A, 422B operate on different frequency bands corresponding to the low-frequency and high-frequency null audio signals output from nullformers 412A, 412B, respectively.
- a single adaptive filter may be used to configure the panning algorithm 420 based on, for example, one or the other of the low or high frequency signals or a combination of the low and high frequency signals.
- Control over the spatio-temporal filter 410 may be provided through exposing parameters of the operation for configuration by other components.
- the ⁇ and ⁇ for configuring the mixing of the first plurality of audio signals 402 and the second plurality of audio signals 404 may be adjustable.
- a gain factor at an output of one or more of the nullformers may be adjustable.
- a gain factor ⁇ may be applied to the output of the high-frequency nullformer 412B.
- Each of the gain factors ⁇ , ⁇ , and ⁇ may be controllable through user input and/or predefined settings in the multimedia application 204 and/or multimedia control 210.
- an audio/visual (AV) spatial locator (SL) 414 may be configured to receive information about the soundstage 432 and output control parameters to configure the nullformers 412A, 412B and/or other components of the system.
- the AVSL 414 may control the gain factors ⁇ , ⁇ , and ⁇ .
- the AVSL 414 may receive the first plurality of audio signals 402 and use information in the audio signals to determine positions of audio sources and configure the nullformers 412A, 412B to assign an audio source to one of the channels by applying a null in the direction of the audio source for one of the signals.
- the AVSL 414 may additionally or alternatively receive other information for determining a location of the audio sources and assigning audio sources to the channels.
- the AVSL 414 may receive visual information recorded by a camera of the device 200 and that visual information used to adjust parameters of the nullformers 412A, 412B to direct a null of one or more of the microphones 130A, 130B towards an audio source.
- the AVSL 414 may apply location thresholds to the locations of the sources in the soundstage 432. For example, a 15%threshold may be set around a center of the soundstage 432 in which sources within 15%of the center are moved further into the left or right channels to expand the soundstage.
- the output of the spatio-temporal filter 410 may be input to the correlation-based panning algorithm 420.
- the correlation-based panning block 420 applies an algorithm to the output of the spatio-temporal filter 410 to determine a third plurality of audio signals 406 that have improved stereo separation in soundstage 434 as compared to the soundstage 432. For example, the acoustic source 434L, which corresponds to the acoustic source 432L, is moved farther to the left of the soundstage 434. Likewise, the acoustic source 434R, which corresponds to the acoustic source 432R, is moved farther to the right of the soundstage 434.
- FIG. 5 is a block diagram illustrating stereo processing of audio signals using mu-law panning according to one or more aspects of the disclosure.
- the audio signals 402 comprising sounds from acoustic sources 432L, 432C, and 432R are both separately input to a low-pass filter 508A and a high-pass filter 508B.
- Each of the filters 508A, 508B reduces the amplitude of sounds in a certain frequency range such that filtered audio signals are input to the spatio-temporal filter.
- the low-pass filter 508A reduces or eliminates sounds above a first cut-off frequency and the high-pass filter 508B reduces or eliminates sounds below a second cut-off frequency.
- the first and second cut-off frequencies may be the same frequency.
- the spatio-temporal filter 510 applies nullforming to add directionality to the signals similar to that of the spatio-temporal filter 410.
- the filter 510 receives the low-pass filtered audio signals and the high-pass filtered audio signals, each of the sets of signals comprising two channels (for stereo audio) or more channels.
- the filter 510 includes nullformers 512A, 512B similar to nullformers 412A, 412B of Figure 4.
- the outputs of nullformers 512A, 512B include a second plurality of audio signals 504.
- the audio signals 504 may be recombined with the filtered audio signals input to the spatio-temporal filter 510 based on mixing parameters (e.g., ⁇ , ⁇ ) .
- the output of filter 510 may be input to correlation-based mu-law panning algorithm 520.
- the mu-law panning algorithm is a companding algorithm that reduces the dynamic range of an audio signal through non-linear quantization.
- a companding algorithm such as the mu-law panning algorithm 520
- mu-law panning is described for the panning algorithm 520
- other algorithms beyond or in addition to mu-law panning may be applied to the output of filter 510 to achieve a similar effect.
- other companding algorithms such as an A-law panning algorithm may be used in place of mu-law panning algorithm 520.
- an output signal with an expanded stereo soundstage 534 is obtained with audio sources 534L and 534R moved further apart in the soundstage 534.
- the resulting audio signal may have audio source 534R entirely located in one channel and audio source 534L entirely located in another channel.
- a center channel 534C is shown in the soundstage 534, the channel 534C may be removed in some embodiments.
- the source separation may be applied to an environment comprising multiple individuals in separate telecommunications sessions.
- an automobile e.g., car or vehicle
- the multimedia system of the vehicle with an enclosed environment including multiple microphones in different locations that receive input from users differently.
- the multimedia system of the vehicle may also include multiple speakers for reproducing audio from the telecommunications sessions (e.g., telephonic discussions or virtual meetings) .
- the spatial expander of Figure 4 or Figure 5 may move (e.g., pan) the signals emitted from the zone of the spatially-invariant target source (e.g., a passenger) are captured with a microphone array (of two or more microphones) to one audio channel and the signals from all other zones omitted in the other channel or no channel at all through the application of nullformers to reduce contribution of the different zones.
- a microphone array of two or more microphones
- a user interface to the vehicle multimedia system may be provided (e.g., on the car’s instrument or front panel or dashboard) to select a zone during a telecommunications session for target-speech enhancement and noise and interference attenuation (e.g., to customize cross-talk attenuation between zones) .
- each passenger may connect to the vehicle’s digital interface with a mobile device (e.g., a smart phone or computer) through, e.g., a personal area network (PAN) connection, select the passenger’s zone, and use the in-car system to enhance their speech during a telecommunications session.
- a mobile device e.g., a smart phone or computer
- PAN personal area network
- supporting signal processing may include additional aspects, such as any single aspect or any combination of aspects described below or in connection with one or more other processes or devices described elsewhere herein.
- supporting signal processing may include a method comprising receiving a first plurality of audio signals comprising at least a first channel and a second channel; applying at least one nullformer to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the at least one nullformer reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; and applying a panning algorithm to the second plurality of audio signals to determine a third plurality of audio signals.
- the method may be performed by an apparatus and the method may be embodied in a non-transitory computer-readable medium as instructions that case performance of the operations of the method.
- the method may include, or the apparatus may perform or operate according to, one or more aspects as described below.
- the apparatus includes a wireless device, such as a UE.
- the apparatus may include one or more processors, and a memory coupled to the one or more processors.
- the one or more processors may be configured to perform operations described herein with respect to the apparatus.
- the apparatus may include a non-transitory computer-readable medium having program code recorded thereon and the program code may be executable by the apparatus for causing the one or more processors to perform operations described herein with reference to the apparatus.
- the apparatus may include one or more means configured to perform operations described herein.
- applying the at least one nullformer comprises: applying at least one first nullformer to each of the first plurality of audio signals; and applying at least one second nullformer to each of the first plurality of audio signals, wherein the at least one first nullformer processes a first frequency range and the at least one second nullformer processes a second frequency range different from the first frequency range.
- the first frequency range corresponds to first frequencies below a cutoff frequency and the second frequency range corresponds to second frequencies above the cutoff frequency.
- the method also includes, or the apparatus may perform, determining a location of a source of a sound in the first plurality of audio signals; and controlling a null direction of the at least one nullformer based on the location of the source.
- the location of the source may be determined adaptively or may be set as a predefined, static location.
- the method may include, or the apparatus may perform, combining the second plurality of audio signals with the first plurality of audio signals before applying the panning algorithm to the second plurality of audio signals.
- the method may include, or the apparatus may perform, adjusting a mixing ratio of the second plurality of audio signals and the first plurality of audio signals based on a configuration parameter when combining the second plurality of audio signals with the first plurality of audio signals.
- applying the panning algorithm comprises applying a mu-law panning algorithm to the second plurality of audio signals.
- applying the panning algorithm to the second plurality of audio signals determines the third plurality of audio signals in which any sound acoustically located on a first side of a sound is stage removed from a second side of the soundstage.
- applying the panning algorithm to the second plurality of audio signals determines the third plurality of audio signals in which acoustic sources within a threshold region of a center of the soundstage are removed.
- the method may include, or the apparatus may perform, outputting the third plurality of audio signals to a transducer.
- a multimedia device in combination with one or more of the first aspect through the eleventh aspect, includes a array of microphones comprising at least a first microphone and a second microphone, with a memory and one or more processors, in which the one or more processors is coupled to the memory and the first microphone and the second microphone and the one or more processors performs operations by executing a processor-readable code to perform receiving a first plurality of audio signals comprising at least a first channel corresponding to the first microphone and a second channel corresponding to the second microphone; applying at least one nullformer to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the at least one nullformer reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; and applying the panning algorithm to the second plurality of audio signals to determine a third plurality of audio signals.
- the multimedia device of the twelfth aspect may perform any one or more of the operations described with
- a method may include obtaining an audio scene centered at zero degrees that includes an audio left signal and an audio right signal, processing the left audio signal to separate the left audio signal further away from the audio center by a first threshold to determine a moved left audio signal, and processing the right audio signal to separate the right audio signal further away from the audio center by a second threshold, to determine a moved right audio signal.
- a null former is used to generate the moved left audio signal based on a null beam formation towards the left audio signal and application of panning to the left audio signal.
- a null former is used to generate the moved right audio signal based on a null beam formation towards the right audio signal and application of panning to the right audio signal.
- the obtaining is part of a recorder operation.
- the obtaining is part of a playback operation.
- a single block may be described as performing a function or functions.
- the function or functions performed by that block may be performed in a single component or across multiple components, and/or may be performed using hardware, software, or a combination of hardware and software.
- various illustrative components, blocks, modules, circuits, and steps are described below generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
- the example devices may include components other than those shown, including well-known components such as a processor, memory, and the like.
- a device may be any electronic device with one or more parts that may implement at least some portions of the disclosure. While the description and examples herein use the term “device” to describe various aspects of the disclosure, the term “device” is not limited to a specific configuration, type, or number of objects.
- an apparatus may include a device or a portion of the device for performing the described operations.
- processing circuitry e.g., application specific integrated circuits (ASICs) , digital signal processors (DSP) , graphics processing unit (GPU) , central processing unit (CPU) , neural signal processor (NSP) ) configured to perform the recited function through hardware, software, or a combination of hardware configured by software.
- ASICs application specific integrated circuits
- DSP digital signal processors
- GPU graphics processing unit
- CPU central processing unit
- NSP neural signal processor
- processors include processors, electronics devices, hardware devices, electronics components, logical circuits, memories, software codes, firmware codes, among other examples, or any combination thereof.
- Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, application, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, and/or functions, among other examples, whether referred to as software, firmware, middleware, microcode, hardware description language or otherwise.
- features discussed herein may be implemented via specialized processor circuitry, via executable instructions, or combinations thereof.
- one or more blocks (or operations) described with reference to Figure 3 may be combined with one or more blocks (or operations) described with reference to another of the figures.
- one or more blocks (or operations) of Figure 3 may be combined with one or more blocks (or operations) of Figure 1 or Figure 2.
- one or more blocks associated with Figure 3 may be combined with one or more blocks (or operations) associated with Figure 4 or Figure 5.
- one or more blocks associated with Figure 4 or Figure 5 may be combined with one or more blocks (or operations) associated with Figure 1 or Figure 2.
- aspects of embodiments of Figure 4 may be combined with Figure 5 or vice versa.
- the operations described may be implemented in hardware, digital electronic circuitry, computer software, firmware, including the structures disclosed in this specification and their structural equivalents thereof, or in any combination thereof. Implementations of the subject matter described in this specification also may be implemented as one or more computer programs, which is one or more modules of computer program instructions, encoded on a computer storage media for execution by, or to control the operation of, data processing apparatus.
- Computer-readable media includes both computer storage media and communication media including any medium that may be enabled to transfer a computer program from one place to another.
- a storage media may be any available media that may be accessed by a computer.
- Such computer-readable media may include random-access memory (RAM) , read-only memory (ROM) , electrically erasable programmable read-only memory (EEPROM) , CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to store desired program code in the form of instructions or data structures and that may be accessed by a computer.
- RAM random-access memory
- ROM read-only memory
- EEPROM electrically erasable programmable read-only memory
- CD-ROM or other optical disk storage such as any connection may be properly termed a computer-readable medium.
- Disk and disc includes compact disc (CD) , laser disc, optical disc, digital versatile disc (DVD) , floppy disk, and Blu-ray disc wherein disks usually reproduce data magnetically and discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
- drawings may schematically depict one or more example processes in the form of a flow diagram. However, other operations that are not depicted may be incorporated in the example processes that are schematically illustrated. For example, one or more additional operations may be performed before, after, simultaneously, or between any of the illustrated operations. In certain circumstances, multitasking and parallel processing may be advantageous.
- the term “or, ” when used in a list of two or more items means that any one of the listed items may be employed by itself, or any combination of two or more of the listed items may be employed. For example, if a composition is described as containing components A, B, or C, the composition may contain A alone; B alone; C alone; A and B in combination; A and C in combination; B and C in combination; or A, B, and C in combination.
- substantially is defined as largely, but not necessarily wholly, what is specified (and includes what is specified; for example, substantially 90 degrees includes 90 degrees and substantially parallel includes parallel) , as understood by a person of ordinary skill in the art. In any disclosed implementations, the term “substantially” may be substituted with “within [apercentage] of” what is specified, where the percentage includes . 1, 1, 5, or 10 percent.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
This disclosure provides systems, methods, and devices for audio signal processing that support expanding a stereo sound stage of an audio recording. In a first aspect, a method of signal processing includes receiving a first plurality of audio signals comprising at least a first channel and a second channel; applying at least one nullformer to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the at least one nullformer reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; and applying correlation-based panning to the second plurality of audio signals to determine a third plurality of audio signals. Other aspects and features are also claimed and described.
Description
Aspects of the present disclosure relate generally to audio signal processing, and more particularly, to expanding the stereo sound field captured with microphones. Some features may enable and provide improved audio signal processing, including improved audio quality and speech intelligibility.
INTRODUCTION
Stereophonic sound, commonly called stereo, reproduces sounds using two audio channels corresponding to a user’s left ear and a user’s right ear. A user listening to sound generated by the left speaker and the right speaker can perceive a location of certain sounds based on how much of the sound is reproduced by the left speaker versus the right speaker. For example, a sound originating entirely from the left speaker and not the right speaker will sound to the user as a sound on their left side. The left and right channels of stereo sounds can be recorded using two or more microphones placed a distance from each other such that each microphone records soundwaves propagating through the environment at different times. The movement of a sound from one channel to another channel provides clues that a listener can perceive as spatial information. A user listening to stereo sounds from two signals will hear similar spatial information of the stereo sounds as recorded by the two microphones when reproduced from speakers with a similar arrangement as the microphones.
Audio playback devices are devices that can reproduce one or more audio signals, whether digital or analog signals. An audio playback device may include circuitry that allows a user to interact with the audio signals, such as by amplifying the signals to drive a transducer that generates the sound waves that can be perceived by a listener’s ears. Audio playback can be incorporated into a wide variety of devices. By way of example, audio playback devices may comprise stand-alone audio devices, mobile telephones, cellular or satellite radio telephones, personal digital assistants (PDAs) , panels or tablets, gaming devices, or computing devices002E.
BRIEF SUMMARY OF SOME EXAMPLES
The following summarizes some aspects of the present disclosure to provide a basic understanding of the discussed technology. This summary is not an extensive overview
of all contemplated features of the disclosure and is intended neither to identify key or critical elements of all aspects of the disclosure nor to delineate the scope of any or all aspects of the disclosure. Its sole purpose is to present some concepts of one or more aspects of the disclosure in summary form as a prelude to the more detailed description that is presented later.
In some aspects, a user may desire to modify the spatial information within an audio signal. For example, in a stereo recording the user may desire to expand the acoustic soundstage of a stereo recording by moving the acoustic sources further left or right in the soundstage. That is, sources in the stereo recording that are acoustically located slightly or far to the left and the right of the center may be moved to the very left and the very right of the stereo soundstage, respectively. In some embodiments, the movement of the spatial sources may arrange any source primarily in the left channel entirely to the left channel for reproduction at the left ear, and likewise for the right ear of the listener.
Audio sources may be moved within an acoustic scene by separately processing the channels of the audio source through nullformers and similarity-based processing (e.g., using correlation-based algorithms) to form modified channels containing representations of the same sounds but with the sounds moved within the soundstage. It is noted that different examples of similarity-based processing include but are not limited to (1) least square fit with allowable time adjustment: time-domain or frequency-domain; (2) correlation and co-variance approaches; and (3) higher order based approaches: cross-cumulant, empirical Kullback-Leibler Divergence, or Itakura-Saito distance. In some embodiments, the processing may be applied to stereo sounds to increase a source-separation factor and to increase the quality of the perception of the sources located in the center, the left, and the right by introduced additional space between sources at the left and the right. That is a user may more easily distinguish between audio sources on the left and right of the soundstage. For instance, expanding the soundstage can increase the speech intelligibility in an environment with multiple, simultaneously active acoustic sources.
In a stereo configuration with two channels, the nullformers may reduce the contribution of a first source to the second channel and reduce contribution of a second source to the first channel. The nullformer may obtain this reduction in contribution of an acoustic source to a channel by applying a null to an audio signal in the direction of the acoustic source. The null may be applied by subtracting one channel from the other channel. The steering direction of the nullformer is the direction of the null. A beamformer
constructively overlaps soundwaves coming from the steering direction, whereas a nullformer destructively overlaps these soundwaves due to the subtraction of the captured, delayed, and weighted soundwaves. Consequently, processing involving a nullformer differs from processing involving a beamformer, in which the beamformer involves summing or a weighted summing of the channels.
In one aspect of the disclosure, a method for signal processing includes receiving a first plurality of audio signals comprising at least a first channel and a second channel; applying at least one nullformer to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the at least one nullformer reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; and applying similarity-based algorithms (e.g., correlation-based panning) to the second plurality of audio signals to determine a third plurality of audio signals.
In an additional aspect of the disclosure, an apparatus includes one or more processors and a memory coupled to the one or more processors. The one or more processors is configured to perform operations including receive a first plurality of audio signals comprising at least a first channel and a second channel; apply at least one nullformer to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the at least one nullformer reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; and apply similarity-based algorithms (e.g., correlation-based panning) to the second plurality of audio signals to determine a third plurality of audio signals.
In an additional aspect of the disclosure, an apparatus includes means for receiving a first plurality of audio signals comprising at least a first channel and a second channel; means for applying a directional null to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the null applying means reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; and means for applying similarity-based algorithms (e.g., correlation-based panning) to the second plurality of audio signals to determine a third plurality of audio signals.
In an additional aspect of the disclosure, a non-transitory computer-readable medium stores instructions that, when executed by one or more processors, cause the processor to perform operations. The operations include receiving a first plurality of audio signals comprising at least a first channel and a second channel; applying at least one nullformer
to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the at least one nullformer reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; and similarity-based algorithms (e.g., correlation-based panning) to the second plurality of audio signals to determine a third plurality of audio signals.
Methods of audio signal processing described herein may be performed by a signal processing device. The audio signal processing may be applied audio data captured by one or more microphones of the signal processing device. Audio signal processing devices, devices that can playback, record, and/or process one or more audio recordings can be incorporated into a wide variety of devices. By way of example, audio signal processing devices may comprise stand-alone audio devices, such as entertainment devices and personal media players, wireless communication device handsets such as mobile telephones, cellular or satellite radio telephones, personal digital assistants (PDAs) , tablets, gaming devices, computing devices such as webcams, video surveillance cameras, or other devices with audio recording or audio capabilities.
The audio signal processing techniques described herein may involve devices having microphones and processing circuitry (e.g., application specific integrated circuits (ASICs) , digital signal processors (DSP) , graphics processing unit (GPU) , or central processing units (CPU) ) .
In some aspects, a device may include a digital signal processor or a processor (e.g., an application processor) including specific functionality for audio processing. The methods and techniques described herein may be entirely performed by the digital signal processor or the processor, or various operations may be split between the digital signal processor and the processor, and in some aspects split across additional processors. In some embodiments, the methods and techniques disclosed herein may be adapted using input from a neural signal processor (NSP) in which one or more parameters of the signal processing are controlled based on output from a machine learning (ML) model executed by the NSP.
In an additional aspect of the disclosure, a device configured for audio signal processing and/or audio capture is disclosed. The apparatus includes means for recording audio. Example means may include a dynamic microphone, a condenser microphone, a ribbon microphone, a carbon microphone, or a crystal microphone. The microphone may be construed as a microelectromechanical system (MEMS) . These components may be
controlled to capture first and/or second sound recordings, which may correspond to left and right channels of a recording.
For any of these types of microphones, the microphones may include analog and/or digital microphones. Analog microphones provide a sensor signal, which in some embodiments is conditioned or filtered. Analog microphones in a digital system include an external analog-to-digital converter (ADC) to interface with digital circuitry. Digital microphones include the ADC and other digital elements to convert the sensor signal into a digital data stream, such as a pulse-density modulated (PDM) stream or a pulse-code modulated (PCM) stream.
Other aspects, features, and implementations will become apparent to those of ordinary skill in the art, upon reviewing the following description of specific, exemplary aspects in conjunction with the accompanying figures. While features may be discussed relative to certain aspects and figures below, various aspects may include one or more of the advantageous features discussed herein. In other words, while one or more aspects may be discussed as having certain advantageous features, one or more of such features may also be used in accordance with the various aspects. In similar fashion, while exemplary aspects may be discussed below as device, system, or method aspects, the exemplary aspects may be implemented in various devices, systems, and methods.
The method may be embedded in a computer-readable medium as computer program code comprising instructions that cause a processor to perform the steps of the method. In some embodiments, the processor may be part of a mobile device including a first network adaptor configured to transmit data, such as images or videos (with associated or embedded sounds) in a recording or as streaming data, over a first network connection of a plurality of network connections; and a processor coupled to the first network adaptor and the memory. The processor may cause the transmission of audio signals described herein over a wireless communications network such as a 5G NR communication network.
The foregoing has outlined, rather broadly, the features and technical advantages of examples according to the disclosure in order that the detailed description that follows may be better understood. Additional features and advantages will be described hereinafter. The conception and specific examples disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present disclosure. Such equivalent constructions do not depart from the scope of the appended claims. Characteristics of the concepts disclosed herein, both their organization and method of operation, together with associated advantages will be better understood
from the following description when considered in connection with the accompanying figures. Each of the figures is provided for the purposes of illustration and description, and not as a definition of the limits of the claims.
While aspects and implementations are described in this application by illustration to some examples, those skilled in the art will understand that additional implementations and use cases may come about in many different arrangements and scenarios. Innovations described herein may be implemented across many differing platform types, devices, systems, shapes, sizes, and packaging arrangements. For example, aspects and/or uses may come about via integrated chip implementations and other non-module-component based devices (e.g., end-user devices, vehicles, communication devices, computing devices, industrial equipment, retail/purchasing devices, medical devices, artificial intelligence (AI) -enabled devices, etc. ) . While some examples may or may not be specifically directed to use cases or applications, a wide assortment of applicability of described innovations may occur. Implementations may range in spectrum from chip-level or modular components to non-modular, non-chip-level implementations and further to aggregate, distributed, or original equipment manufacturer (OEM) devices or systems incorporating one or more aspects of the described innovations. In some practical settings, devices incorporating described aspects and features may also necessarily include additional components and features for implementation and practice of claimed and described aspects. It is intended that innovations described herein may be practiced in a wide variety of devices, chip-level components, systems, distributed arrangements, end-user devices, etc. of varying sizes, shapes, and constitution.
A further understanding of the nature and advantages of the present disclosure may be realized by reference to the following drawings. In the appended figures, similar components or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label by a dash and a second label that distinguishes among the similar components. If just the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.
Figure 1 shows a block diagram of a system-on-chip (SoC) configured for performing signal processing according to one or more aspects of this disclosure.
Figure 2 is a block diagram illustrating an example data flow path for audio signal processing in a multimedia device according to one or more aspects of the disclosure.
Figure 3 shows a flow chart of an example method for processing audio data to perform acoustic source separation and soundstage expansion according to one or more aspects of this disclosure.
Figure 4 is a block diagram illustrating stereo processing of audio signals according to one or more aspects of the disclosure.
Figure 5 is a block diagram illustrating stereo processing of audio signals using mu-law panning according to one or more aspects of the disclosure.
Like reference numbers and designations in the various drawings indicate like elements.
The present disclosure provides systems, apparatus, methods, and computer-readable media that support signal processing, including techniques for expanding the acoustic scene of a stereo recording. For example, aspects of this disclosure may expand the acoustic scene of a stereo recording by moving acoustic sources that are slightly or far to the left and the right of the center to the very left and the very right of the stereo soundstage, respectively, corresponding to the left ear and the right ear of the listener. Although embodiments are described with respect to stereo sound sources, the techniques may be applied to expand the acoustic scene in other dimensions, such as top-bottom or front-back.
Particular implementations of the subject matter described in this disclosure may be implemented to realize one or more of the following potential advantages or benefits. In some aspects, the present disclosure provides techniques for increasing the source-separation factor and increasing the quality of the perception of the sources located in the center, the left, and the right. The expansion introduces more virtual space between the sources at the left and the right, especially when the sources are close to the center. The processing described in some aspects of the disclosure provide a low-power solution that provides excellent performance in spatial expansion.
The detailed description set forth below, in connection with the appended drawings to which the text references, is intended as a description of various embodiments and is not intended to limit the scope of the disclosure. Rather, the detailed description includes specific details for the purpose of providing a thorough understanding of the subject matter of this disclosure. It will be apparent to those skilled in the art that these specific
details are not required in every case and that, in some instances, well-known structures and components are shown in block diagram form for clarity of presentation.
In the description of embodiments herein, numerous specific details are set forth, such as examples of specific components, circuits, and processes to provide a thorough understanding of the present disclosure. The term “coupled” as used herein means connected directly to or connected through one or more intervening components or circuits. Also, in the following description and for purposes of explanation, specific nomenclature is set forth to provide a thorough understanding of the present disclosure. However, it will be apparent to one skilled in the art that these specific details may not be required to practice the teachings disclosed herein. In other instances, well known circuits and devices are shown in block diagram form to avoid obscuring teachings of the present disclosure.
Some portions of the detailed descriptions which follow are presented in terms of procedures, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. In the present disclosure, a procedure, logic block, process, or the like, is conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, although not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system.
An example device for recording sounds and/or processing sound signals using one or more microphones, such as a MEMS microphone, may include a configuration of one, two, three, four, or more microphones at different locations on the device. The example device may include one or more digital signal processors (DSPs) , AI engines, or other suitable circuitry for processing signals captured by the microphones. The one or more digital signal processors (DSPs) may output signals representing sounds through a bus for storage in a memory, for reproduction by an audio system, and/or for further processing by other components (such as an applications processor) . The processing circuitry may perform further processing, such as for encoding, storage, transmission, or other manipulation of the audio signals. In some embodiments, the example device may include audio circuitry including an audio amplifier (e.g., a class-D amplifier) for driving a transducer to reproduce the sounds represented by the audio signals. A speaker may be integrated with the device and coupled to the audio amplifier to be driven by the audio amplifier for reproducing the sounds. A connection may be provided by a jack or other
connector on the device to couple an external transducer (e.g., an external speaker or headphones) to the audio amplifier to be driven by the audio circuitry to reproducing the sounds. In some embodiments, the jack may instead output a digital signal for conversion and amplification by an external device, such as when the jack is configured to be coupled to a digital device through a Universal Serial Bus (USB) Type-C (USB-C) connection and some or all of the audio circuitry is bypassed.
Figure 1 shows a block diagram of a system-on-chip (SoC) configured for performing signal processing according to one or more aspects of this disclosure. The SoC 100 may include several components coupled together through a bus 102, which may be a network-on-a-chip (NoC) or a plurality of NOCs interconnecting various components. For example, although Figure 1 illustrates several components coupled to the bus 102, the several components may be coupled to different busses with additional busses connecting the different busses to provide a path for communication between the components.
One example component in the SoC 100 is a digital signal processor 112 for signal processing. The DSP 112 may process audio signals received from microphones 130A, 130B, and 130C of microphone array 130. The DSP 112 may include hardware customized for performing a limited set of operations on specific kinds of data. For example, a DSP may include transistors coupled together to perform operations on streaming data and use memory architectures and/or access techniques to fetch multiple data or instructions concurrently. Such configurations may allow the DSP 112 to operate on real-time data, such as video data, audio data, or modem data, in a power-efficient manner.
The SoC 100 also includes a central processing unit (CPU) 104 and a memory 106 storing instructions 108 (e.g., a memory storing processor-readable code or a non-transitory computer-readable medium storing instructions) that may be executed by a processor of the SoC 100. The CPU 104 may be a single central processing unit (CPU) or a CPU cluster comprising two or more cores such as core 104A. The CPU 104 may include hardware capable of performing generic operations on many kinds of data, such as hardware capable of executing instructions from the Advanced RISC Machinesinstruction set, such as ARMv8, ARMv9. For example, a CPU 104 may include transistors coupled together to perform operations for supporting executing an operating system and user applications (e.g., a camera application, a multimedia application, a gaming application, a productivity application, a messaging application, a videocall application, an audio recording application, a video recording application) . The CPU 104
may execute instructions 108 retrieved from the memory 106. In some embodiments, the CPU 104 executing an operating system may coordinate execution of instructions by various components within the SoC 100. For example, the CPU 104 may retrieve instructions 108 from memory 106 and execute the instructions on the DSP 112.
The SoC 100 may further include a neural signal processor (NSP) 124 for executing machine learning (ML) models relating to multimedia applications. The NSP 124 may include hardware configured to perform and accelerate convolution operations involved in executing machine learning algorithms. For example, the NSP 124 may improve performance when executing predictive models such as artificial neural networks (ANNs) (including multilayer feedforward neural networks (MLFFNN) , the recurrent neural networks (RNN) , and/or the radial basis functions (RBF) ) . The ANN executed by the NSP 124 may access predefined training weights stored in the memory 106 for performing operations on user data.
The SoC 100 may be coupled to a display 114 for interacting with a user. The SoC 100 may also include a graphics processing unit (GPU) 126 for rendering images on the display 114. In some embodiments, the CPU 104 may perform rendering to the display 114 without a GPU 126. In some embodiments, the GPU 126 may be configured to execute instructions for performing operations unrelated to rendering images, such as for processing large volumes of datasets in parallel.
Processing algorithms, techniques, and methods that are described herein may be executed by one or more processors of the SoC 100, which may include execution by all steps on one of the processors (e.g., DSP 112, CPU 104, NSP 124, GPU 126) or may include execution of steps across a combination of one or more of the processors (e.g., DSP 112, CPU 104, NSP 124, GPU 126) . In some embodiments, at least one of the DSP 112 or the CPU 104 executes instructions to perform various operations described herein, including spatial expanding of audio signals to move acoustic sources within a soundstage. For example, execution of the instructions by the CPU 104 as part of a multimedia application (e.g., a voice recorder, a sound recording, or a video recorder) may instruct the DSP 112 to begin or end capturing audio from one or more microphones 130A-C. The operations of the CPU 104 may be based on user input. For example, a voice recorder application executing on CPU 104 may receive a user command to begin a voice recording upon which audio comprising one or more channels is captured and processed for playback and/or storage. Audio processing to determine “output” or “corrected”
signals, such as according to techniques described herein, may be applied to one or more segments of audio in the recording sequence.
Input/output components may be coupled to the SoC 100 through an input/output (I/O) hub 116. An example of a hub 116 is an interconnect to a peripheral component interconnect express (PCIe) bus. Example components coupled to hub 116 may be components used for interacting with a user, such as a touch screen interface and/or physical buttons. Some components coupled to hub 116 may also include network interfaces for communicating with other devices, including a wide area network (WAN) adaptor (e.g., WAN adaptor 152) , a local area network (LAN) adaptor (e.g., LAN adaptor 153) , and/or a personal area network (PAN) adaptor (e.g., PAN adaptor 154) . A WAN adaptor 152 may be a 4G LTE or a 5G NR wireless network adaptor. A LAN adaptor 153 may be an IEEE 802.11 WiFi wireless network adapter. A PAN adaptor 154 may be a Bluetooth wireless network adaptor. Each of the WAN adaptor 152, LAN adaptor 153, and/or PAN adaptor 154 may be coupled to an antenna that may be shared by each of the adaptors 152, 153, and 154, or coupled to multiple antennas configured for primary and diversity reception and/or configured for receiving specific frequency bands. In some embodiments, the WAN adaptor 152, LAN adaptor 153, and/or PAN adaptor 154 may share circuitry, such as portions of a radio frequency front end (RFFE) .
Audio circuitry 156 may be integrated in SoC 100 as dedicated circuitry for coupling the SoC 100 to a speaker 120 external to the SoC 100, which may be a transducer such as a speaker (either internal to or external to a device incorporating the SoC 100) or headphones. The audio circuitry 156 may include coder/decoder (CODEC) functionality for processing digital audio signals. The audio circuitry 156 may further include one or more amplifiers (e.g., a class-D amplifier) for driving a transducer coupled to the SoC 100 for outputting sounds generated during execution of applications by the SoC 100. Functionality related to audio signals described herein may be performed by a combination of the audio circuitry 156 and/or other processors of the SoC (e.g., CPU 104, DSP 112, GPU 126, NSP 124) .
The SoC 100 may couple to external devices outside the package of the SoC 100. For example, the SoC 100 may be coupled to a power supply 118, such as a battery or an adaptor to couple the SoC 100 to an energy source. The signal processing described herein may be adapted to and achieve power efficiency to support operation of the SoC 100 from a limited-capacity power supply 118 such as a battery. For example, operations may be performed on a portion of the SoC 100 configured for performing the operation at a lowest
power consumption. As another example, operations themselves are performed in a manner that reduces a number of computations to perform the operation, such that the algorithm is optimized for extending the operational time of a device while powered by a limited-capacity power supply 118. In some embodiments, the operations described herein may be configured based on a type of power supply 118 providing energy to the SoC 100. For example, a first set of operations may be executed to perform a function when the power supply 118 is a wall adaptor. As another example, a second set of operations may be executed to perform a function when the power supply 118 is a battery.
The SoC 100 may also include or be coupled to additional features or components that are not shown in Figure 1. Although components are shown integrated as a single SoC 100, which may include all components built on a single semiconductor die with a common semiconductor substrate, other arrangements of the illustrated blocks different number of dies, substrates, and/or packages may be arranged to accomplish the same functionality described in this disclosure.
The memory 106 may include a non-transient or non-transitory computer readable medium storing computer-executable instructions as instructions 108 to perform all or a portion of one or more operations described in this disclosure. The instructions 108 may include a multimedia application (or other suitable application such as a messaging application) to be executed by the SoC 100 that records, processes, or outputs audio signals. The instructions 108 may also include other applications or programs executed by the SoC 100, such as an operating system and applications other than for multimedia processing.
In addition to instructions 108, the memory 106 may also store audio data. The SoC 100 may be coupled to an external memory and configured to access the memory for writing output audio files for later playback or long-term storage. For example, the SoC 100 may be coupled to a flash storage device comprising NAND memory for storing video files (e.g., MP4-container formatted files) including audio tracks and/or storing audio recordings (e.g., MPEG-1 Layer 3 files, also referred to as MP3 files) . Portions of the video or audio files may be transferred to memory 106 for processing by the SoC 100, with the resulting signals after processing encoded as video or audio files in the memory 106 for transfer to the long-term storage.
While the SoC 100 is referred to in the examples herein for performing aspects of the present disclosure, some device components may not be shown in Figure 1 to prevent obscuring aspects of the present disclosure. Additionally, other components, numbers of
components, or combinations of components may be included in a suitable device for performing aspects of the present disclosure. As such, the present disclosure is not limited to a specific device or configuration of components, including the SoC 100.
The SoC of Figure 1 may be operated to obtain improved audio recordings and/or improved user experience through higher quality audio playback by applying spatial expanding to increase differences between left and right channels in a stereo audio signal. One example method of performing multimedia operations is shown in Figure 2 and described below.
Figure 2 is a block diagram illustrating an example data flow path for audio signal processing in a multimedia device according to one or more aspects of the disclosure. SoC 100 of device 200 may execute multimedia control 210, such as part of an operating system or driver, to control the capture of sounds from microphones or other audio sources and/or to control the configuration of audio processing circuitry 156. The audio configuration applied by multimedia control 210 to either output devices (e.g., speakers) or input devices (e.g., microphones) may include parameters that specify, for example, a bit depth, a sampling rate, a data rate, a magnitude, or other parameters.
Multimedia control 210 may be managed by or provide services to a multimedia application 204. The multimedia application 204 may also execute on the SoC 100. The multimedia application 204 provides settings accessible to a user such that a user can specify individual playback settings or select a profile with corresponding playback settings. The multimedia application 204 may be, for example, a video recording application, a screen sharing application, a virtual conferencing application, an audio playback application, a messaging application, a video communications application, or other application that processes audio data. The multimedia application 204 may include spatial expander 206 to improve the quality of audio presented to the user during execution of multimedia application 204. The spatial expander 206 may perform one or more or a combination of the techniques described herein.
The device 200 of Figure 2 may be configured to perform the operations described with reference to Figure 3 to determine an audio signal. Figure 3 shows a flow chart 300 of an example method for processing audio data to perform acoustic source separation and soundstage expansion according to one or more aspects of this disclosure. The operations of Figure 3 may result in an audio signal with improved representation of sounds, which results in an improved user experience. Each of the operations described with reference to Figure 3 may be performed by one or a combination of the processors of the SoC 100.
At block 302, first and second audio signals are received, in which the first and second audio signals may correspond to left and right channels of an audio representation. The audio data may be received, for example, from microphones. The audio data may alternatively be received from a wireless microphone, in which the audio data is received through one or more of the WAN adaptor 152, the LAN adaptor 153, and/or the PAN adaptor 154. The audio data may alternatively be received from a memory location or a network storage location, such as when the audio signal was previously captured and is now retrieved from memory 106 and/or from a remote location through one or more of the WAN adaptor 152, the LAN adaptor 153, and/or the PAN adaptor 154. In some embodiments, the capture or retrieval of audio signals may be initiated by multimedia application 204 executing on the SoC 100. Audio data, comprising the audio signals, may be retrieved at block 302 and further processed by the SoC 100 according to the operations described in one or more of the following blocks.
At block 304, spatio-temporal filtering is applied to the first plurality of audio signals received at block 302 to determine a second plurality of audio signals (e.g., filtered audio signals) . The filtering may apply one or more nullformers to the first plurality of audio signals or signals derived from the first plurality of audio signals. The outputs yi of filtering an input audio signal x with a plurality i of nullformers hi may be expressed as: where k and n denote the frequency index and the time frame, respectively.
Although the equation may be applied generally to any number of channels in an audio signal, the nullformer hi in a two-channel system (e.g., a stereo system with a left channel and a right channel) may be expressed as:
hi [k, n] = [hi, 1 [k, n] , -hi, 2 [k, n] ] T,
hi [k, n] = [hi, 1 [k, n] , -hi, 2 [k, n] ] T,
with m= {1, 2} and Nm=2, such that:
in which fs is a sampling frequency, c is the speed of sound, pm is a distance between an m-th microphone and an acoustic source, pc is a distance between a center of a coordinate system and a center of a microphone array recording the first plurality of audio signals, psi is the distance between the center of the coordinate system and source i, Nk is a number of frequencies, k is an index of one of the Nk number of frequencies, Nm is the number of
microphones, and (·) T is the transpose of a vector or matrix. In some embodiments, the filtered output may be mixed with the microphone signals to obtain a signal expressed as:
in which Ni=Nm.
At block 306, a panning algorithm (such as correlation-based panning, including mu-law panning, or another algorithm for identifying similarities and differences between signals) is applied to the spatio-temporal filtered audio signals computed at block 304. In embodiments in which the microphone signal is mixed with the filtered signals, the operation of block 306 may be applied to the resulting combination signal y1. The result of the correlation-based panning of block 306 is a third plurality of audio signals with an expanded soundstage. The nullformers and panning may be configured through appropriate parameters to move acoustic sources within a certain threshold range of the center of a soundstage to a further distance from the center of the soundstage. For example, the signal processing may be configured to move signals within a range of 5 degrees to 15 degrees left of the soundstage (centered at zero degrees) to a location a predefined amount (e.g., 25 degrees) further left of the soundstage. The predefined amount may be a constant value or may be a variable amount proportional to the distance the acoustic source is located from the center. In some embodiments, the signal processing may also reduce or eliminate acoustic sources present within a threshold range around the center, such as to reduce or eliminate center channel acoustic sources that are within +5 to -5 degrees around the center of the soundstage.
One application of spatio-temporal filtering of block 306 includes computing statistics for the estimated cross-covariance function Rx1x2 of the two channels, the estimated variance σx1 of the first channel, and the estimated variance σx2 of the second channel, such as from the following equations:
and
in which λ is a forgetting factor. These statistics may be used in a panning function defined as:
in which
and
and
∈ is a small number to avoid division by zero. Panning masks may be defined using the panning function for each channel as:
and
in which τi is a panning/masking threshold for the i-th nullformer output, such that output audio signals with the expanded soundstage may be represented as:
and
The operations described with reference to blocks 302, 304, and 306 of Figure 3 may be performed on a digital signal processor (DSP) , such as DSP 112 of the SoC 100 illustrated in Figure 1. However, the operations may alternatively be performed by one or more of the processors of Figure 1, including one or more of the CPU 104, the DSP 112, the GPU 126, or the NSP 124. For example, the CPU 104 may record audio signals from the microphone array 130 to memory 106 as part of the operations of block 302. The DSP 112 may then perform the operations of blocks 304 and 306 on the audio signals stored in memory 106, after which output signals determined by the DSP 112 may be stored in memory 106, output to audio circuitry 156 for reproduction, and/or transmitted to another device through one or more of the WAN 152, LAN 153, and/or PAN 154. In another example, the processor performing the operations of blocks 302, 304, and/or 306 may be dedicated logic circuitry for performing certain operations.
Aspects of the signal processing described in Figure 3 are applied in example devices, such as the example device of Figure 4. Figure 4 is a block diagram illustrating stereo processing of audio signals according to one or more aspects of the disclosure. The device 200 includes two microphones 130A, 130B, which may be configured as described with reference to the examples of Figure 1 or Figure 2. The microphones 130A, 130B may record sounds from soundstage 432, which includes acoustic source 432C at a center of the soundstage 432, acoustic source 432L slightly to the left of the soundstage 432, and acoustic source 432R slightly to the right of the soundstage 432. The two microphones
130A, 130B output a first plurality of audio signals 402. The number of signals in the first plurality of audio signals 402 may match the number of microphones 130A, 130B. The example of Figure 4 illustrates a two-channel stereo configuration with left and right channels. The example of Figure 4 may be extended to include additional microphones and additional signals, such as a center channel, a high channel, a low channel, a high-right channel, a high-left channel, and/or additional channels.
The first plurality of audio signals 402 may be applied to a spatio-temporal filter 410, which may include, among other circuitry, nullformers for adding directionality to the microphones 130A, 130B. The directionality may be configured to reduce the strength of left-channel acoustic sources in the right channel and reducing the strength of right-channel acoustic sources in the left channel. For example, the spatio-temporal filter 410 may be configured to reduce the contribution of acoustic source 432L to the right-channel of the first plurality of audio signals 402 and to reduce the contribution of acoustic source 432R to the left-channel of the first plurality of audio signals 402. Such a configuration for the spatio-temporal filter 410 may include one or more nullformers. For example, nullformers 412A, 412B may be configured to process the audio signals 402 based on one or more nullformer parameters to generate a null audio signals 404 having a null in a direction corresponding to the one or more nullformer parameters. The null applied by the nullformer may be configured to reduce sensitivity of a left-channel microphone to audio sources at the right of the soundstage 432 and to reduce sensitivity of a right-channel microphone to audio sources at the left of the soundstage 432.
Different numbers of nullformers may be applied to the audio signals 402. For example, one nullformer may be apply a null in at least one direction of each of the microphones 130A-B. The nullformer may apply a null in the left direction of a first channel of a plurality of audio signals and a null in the right direction of a second channel of a plurality of audio signals. The null may be accomplished by subtracting one of the channels from the other of the channels. For example, the nullformer may subtract the second channel from the first channel to determine a first output and may subtract the first channel from the second channel to determine a second output. In some embodiments, such as shown in Figure 4, two or more nullformers may be coupled in parallel to receive and process the first plurality of audio signals 402. The parallel nullformers 412A, 412B may process different frequencies using different or same parameters. For example, a cut-off frequency may be used to divide the first plurality of audio signals 402 into a low-frequency (LF) portion and a high-frequency (HF) portion. An example cut-off frequency is 2 kHz. One
implementation of such a cut-off filter is a combination of low-pass filter 408A filtering input to the LF NF 412A and high-pass filter 408B filtering input to the HF NF 412B.
The output of the nullformers 412A, 412B may be further processed in the spatio-temporal filter 410. For example, the spatio-temporal filter 410 may combine the second plurality of audio signals 404 with the first plurality of audio signals 402 before being output from the spatio-temporal filter 410 (e.g., before applying correlation-based panning 420 to the second plurality of audio signals 404) . The combining of the two pluralities of audio signals may be performed according to weights, and a different weight assigned to the mixing of the low-frequency audio signals than the high-frequency audio signals. For example, the low-frequency audio signal of the second plurality of audio signals 404 may be mixed at a ratio of α to 1-α with the corresponding signal of the first plurality of audio signals 402. Likewise, the high-frequency audio signal of the second plurality of audio signals 404 may be mixed at a ratio of β to 1-β with the corresponding signal of the first plurality of audio signals 402.
In some embodiments, adaptive filters 422A, 422B may receive the output of the spatio-temporal filter 410 in parallel with the correlation-based panning algorithm 420. The adaptive filters 422A and 422B may apply configuration parameters to the panning algorithm 420 that are determined to reduce or minimize correlation between both nullformer signals to enhance the stereo separation achieved by the correlation-based panning algorithm 420. As shown in the embodiment of Figure 4, the adaptive filters 422A, 422B operate on different frequency bands corresponding to the low-frequency and high-frequency null audio signals output from nullformers 412A, 412B, respectively. However, in some embodiments a single adaptive filter may be used to configure the panning algorithm 420 based on, for example, one or the other of the low or high frequency signals or a combination of the low and high frequency signals.
Control over the spatio-temporal filter 410 may be provided through exposing parameters of the operation for configuration by other components. For example, the α and β for configuring the mixing of the first plurality of audio signals 402 and the second plurality of audio signals 404 may be adjustable. Additionally, a gain factor at an output of one or more of the nullformers may be adjustable. For example, a gain factor γ may be applied to the output of the high-frequency nullformer 412B. Each of the gain factors α, β, and γmay be controllable through user input and/or predefined settings in the multimedia application 204 and/or multimedia control 210.
In some embodiments, an audio/visual (AV) spatial locator (SL) 414 may be configured to receive information about the soundstage 432 and output control parameters to configure the nullformers 412A, 412B and/or other components of the system. For example, the AVSL 414 may control the gain factors α, β, and γ. The AVSL 414 may receive the first plurality of audio signals 402 and use information in the audio signals to determine positions of audio sources and configure the nullformers 412A, 412B to assign an audio source to one of the channels by applying a null in the direction of the audio source for one of the signals. The AVSL 414 may additionally or alternatively receive other information for determining a location of the audio sources and assigning audio sources to the channels. For example, the AVSL 414 may receive visual information recorded by a camera of the device 200 and that visual information used to adjust parameters of the nullformers 412A, 412B to direct a null of one or more of the microphones 130A, 130B towards an audio source. In some embodiments, the AVSL 414 may apply location thresholds to the locations of the sources in the soundstage 432. For example, a 15%threshold may be set around a center of the soundstage 432 in which sources within 15%of the center are moved further into the left or right channels to expand the soundstage.
The output of the spatio-temporal filter 410 may be input to the correlation-based panning algorithm 420. The correlation-based panning block 420 applies an algorithm to the output of the spatio-temporal filter 410 to determine a third plurality of audio signals 406 that have improved stereo separation in soundstage 434 as compared to the soundstage 432. For example, the acoustic source 434L, which corresponds to the acoustic source 432L, is moved farther to the left of the soundstage 434. Likewise, the acoustic source 434R, which corresponds to the acoustic source 432R, is moved farther to the right of the soundstage 434.
Another embodiment illustrating the use of nullformers and a spatio-temporal filter for the separation of stereo sound is illustrated in Figure 5. Figure 5 is a block diagram illustrating stereo processing of audio signals using mu-law panning according to one or more aspects of the disclosure. The audio signals 402 comprising sounds from acoustic sources 432L, 432C, and 432R are both separately input to a low-pass filter 508A and a high-pass filter 508B. Each of the filters 508A, 508B reduces the amplitude of sounds in a certain frequency range such that filtered audio signals are input to the spatio-temporal filter. For example, the low-pass filter 508A reduces or eliminates sounds above a first cut-off frequency and the high-pass filter 508B reduces or eliminates sounds below a
second cut-off frequency. In some embodiments, the first and second cut-off frequencies may be the same frequency.
The spatio-temporal filter 510 applies nullforming to add directionality to the signals similar to that of the spatio-temporal filter 410. The filter 510 receives the low-pass filtered audio signals and the high-pass filtered audio signals, each of the sets of signals comprising two channels (for stereo audio) or more channels. The filter 510 includes nullformers 512A, 512B similar to nullformers 412A, 412B of Figure 4. The outputs of nullformers 512A, 512B include a second plurality of audio signals 504. The audio signals 504 may be recombined with the filtered audio signals input to the spatio-temporal filter 510 based on mixing parameters (e.g., α, β) . The output of filter 510 may be input to correlation-based mu-law panning algorithm 520.
The mu-law panning algorithm is a companding algorithm that reduces the dynamic range of an audio signal through non-linear quantization. When a companding algorithm, such as the mu-law panning algorithm 520, is applied to the output of filter 510, the apparent separation of audio sources in the soundstage is increased. Although mu-law panning is described for the panning algorithm 520, other algorithms beyond or in addition to mu-law panning may be applied to the output of filter 510 to achieve a similar effect. For example, other companding algorithms such as an A-law panning algorithm may be used in place of mu-law panning algorithm 520. After the panning algorithm 520 is applied to the output of filter 510, an output signal with an expanded stereo soundstage 534 is obtained with audio sources 534L and 534R moved further apart in the soundstage 534. The resulting audio signal may have audio source 534R entirely located in one channel and audio source 534L entirely located in another channel. Although a center channel 534C is shown in the soundstage 534, the channel 534C may be removed in some embodiments.
In one example application of techniques of this disclosure, the source separation may be applied to an environment comprising multiple individuals in separate telecommunications sessions. For example, an automobile (e.g., car or vehicle) may have multiple passengers engaged in different telephonic discussions or virtual meetings. The multimedia system of the vehicle with an enclosed environment including multiple microphones in different locations that receive input from users differently. The multimedia system of the vehicle may also include multiple speakers for reproducing audio from the telecommunications sessions (e.g., telephonic discussions or virtual meetings) . Applications of techniques for moving sources between channels may be
applied to the vehicular multimedia system to allow the sounds for each passenger’s telecommunications session to be isolated and reproduced through the speakers such that each passenger’s audio output through a speaker near that passenger is primarily comprised of the passenger’s telecommunications sessions.
For example, given a static distribution of microphone arrays inside the vehicle, the spatial expander of Figure 4 or Figure 5 may move (e.g., pan) the signals emitted from the zone of the spatially-invariant target source (e.g., a passenger) are captured with a microphone array (of two or more microphones) to one audio channel and the signals from all other zones omitted in the other channel or no channel at all through the application of nullformers to reduce contribution of the different zones. This allows for the determination of an audio stream for each zone in which the target signal of the corresponding zone is enhanced and the signals emitted from the sources outside of that zone are attenuated. A user interface to the vehicle multimedia system may be provided (e.g., on the car’s instrument or front panel or dashboard) to select a zone during a telecommunications session for target-speech enhancement and noise and interference attenuation (e.g., to customize cross-talk attenuation between zones) . Further, in some embodiments, each passenger may connect to the vehicle’s digital interface with a mobile device (e.g., a smart phone or computer) through, e.g., a personal area network (PAN) connection, select the passenger’s zone, and use the in-car system to enhance their speech during a telecommunications session.
In one or more aspects, techniques for supporting signal processing may include additional aspects, such as any single aspect or any combination of aspects described below or in connection with one or more other processes or devices described elsewhere herein. In a first aspect, supporting signal processing may include a method comprising receiving a first plurality of audio signals comprising at least a first channel and a second channel; applying at least one nullformer to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the at least one nullformer reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; and applying a panning algorithm to the second plurality of audio signals to determine a third plurality of audio signals. Included in the first aspect and other aspects described herein, the method may be performed by an apparatus and the method may be embodied in a non-transitory computer-readable medium as instructions that case performance of the operations of the method.
Additionally, the method may include, or the apparatus may perform or operate according to, one or more aspects as described below. In some implementations, the apparatus includes a wireless device, such as a UE. In some implementations, the apparatus may include one or more processors, and a memory coupled to the one or more processors. The one or more processors may be configured to perform operations described herein with respect to the apparatus. In some other implementations, the apparatus may include a non-transitory computer-readable medium having program code recorded thereon and the program code may be executable by the apparatus for causing the one or more processors to perform operations described herein with reference to the apparatus. In some implementations, the apparatus may include one or more means configured to perform operations described herein.
In a second aspect, in combination with the first aspect, applying the at least one nullformer comprises: applying at least one first nullformer to each of the first plurality of audio signals; and applying at least one second nullformer to each of the first plurality of audio signals, wherein the at least one first nullformer processes a first frequency range and the at least one second nullformer processes a second frequency range different from the first frequency range.
In a third aspect, in combination with one or more of the first aspect or the second aspect, the first frequency range corresponds to first frequencies below a cutoff frequency and the second frequency range corresponds to second frequencies above the cutoff frequency.
In a fourth aspect, in combination with one or more of the first aspect through the third aspect, the method also includes, or the apparatus may perform, determining a location of a source of a sound in the first plurality of audio signals; and controlling a null direction of the at least one nullformer based on the location of the source.
In a fifth aspect, in combination with one or more of the first aspect through the fourth aspect, the location of the source may be determined adaptively or may be set as a predefined, static location.
In a sixth aspect, in combination with one or more of the first aspect through the fifth aspect, the method may include, or the apparatus may perform, combining the second plurality of audio signals with the first plurality of audio signals before applying the panning algorithm to the second plurality of audio signals.
In a seventh aspect, in combination with one or more of the first aspect through the sixth aspect, the method may include, or the apparatus may perform, adjusting a mixing ratio of the second plurality of audio signals and the first plurality of audio signals based on a
configuration parameter when combining the second plurality of audio signals with the first plurality of audio signals.
In an eighth aspect, in combination with one or more of the first aspect through the seventh aspect, applying the panning algorithm comprises applying a mu-law panning algorithm to the second plurality of audio signals.
In a ninth aspect, in combination with one or more of the first aspect through the eighth aspect, applying the panning algorithm to the second plurality of audio signals determines the third plurality of audio signals in which any sound acoustically located on a first side of a sound is stage removed from a second side of the soundstage.
In a tenth aspect, in combination with one or more of the first aspect through the ninth aspect, applying the panning algorithm to the second plurality of audio signals determines the third plurality of audio signals in which acoustic sources within a threshold region of a center of the soundstage are removed.
In an eleventh aspect, in combination with one or more of the first aspect through the tenth aspect, the method may include, or the apparatus may perform, outputting the third plurality of audio signals to a transducer.
In a twelfth aspect, in combination with one or more of the first aspect through the eleventh aspect, a multimedia device includes a array of microphones comprising at least a first microphone and a second microphone, with a memory and one or more processors, in which the one or more processors is coupled to the memory and the first microphone and the second microphone and the one or more processors performs operations by executing a processor-readable code to perform receiving a first plurality of audio signals comprising at least a first channel corresponding to the first microphone and a second channel corresponding to the second microphone; applying at least one nullformer to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the at least one nullformer reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; and applying the panning algorithm to the second plurality of audio signals to determine a third plurality of audio signals. In additional aspects, the multimedia device of the twelfth aspect may perform any one or more of the operations described with regard to the first aspect through the eleventh aspect.
In a thirteenth aspect, a method may include obtaining an audio scene centered at zero degrees that includes an audio left signal and an audio right signal, processing the left audio signal to separate the left audio signal further away from the audio center by a first
threshold to determine a moved left audio signal, and processing the right audio signal to separate the right audio signal further away from the audio center by a second threshold, to determine a moved right audio signal.
In a fourteenth aspect, in combination with the thirteenth aspect, a null former is used to generate the moved left audio signal based on a null beam formation towards the left audio signal and application of panning to the left audio signal.
In a fifteenth aspect, in combination with one or more of the thirteenth through fourteenth aspects, a null former is used to generate the moved right audio signal based on a null beam formation towards the right audio signal and application of panning to the right audio signal.
In a sixteenth aspect, in combination with one or more of the thirteenth through fifteenth aspects, the obtaining is part of a recorder operation.
In a seventeenth aspect, in combination with one or more of the thirteenth through sixteenth aspects, the obtaining is part of a playback operation.
In the figures, a single block may be described as performing a function or functions. The function or functions performed by that block may be performed in a single component or across multiple components, and/or may be performed using hardware, software, or a combination of hardware and software. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps are described below generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure. Also, the example devices may include components other than those shown, including well-known components such as a processor, memory, and the like.
Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present application, discussions using terms such as “accessing, ” “receiving, ” “sending, ” “using, ” “selecting, ” “determining, ” “normalizing, ” “multiplying, ” “averaging, ” “monitoring, ” “comparing, ” “applying, ” “updating, ” “measuring, ” “deriving, ” “settling, ” “generating, ” or the like, refer to the actions and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer
system’s registers and memories into other data similarly represented as physical quantities within the computer system’s registers, memories, or other such information storage, transmission, or display devices. The use of different terms referring to actions or processes of a computer system does not necessarily indicate different operations. For example, “determining” data may refer to “generating” data. As another example, “determining” data may refer to “retrieving” data.
The terms “device” and “apparatus” are not limited to one or a specific number of physical objects (such as one smartphone, one camera controller, one processing system, and so on) . As used herein, a device may be any electronic device with one or more parts that may implement at least some portions of the disclosure. While the description and examples herein use the term “device” to describe various aspects of the disclosure, the term “device” is not limited to a specific configuration, type, or number of objects. As used herein, an apparatus may include a device or a portion of the device for performing the described operations.
Certain components in a device or apparatus described as “means for accessing, ” “means for receiving, ” “means for sending, ” “means for using, ” “means for selecting, ” “means for determining, ” “means for normalizing, ” “means for multiplying, ” or other similarly-named terms referring to one or more operations on data, such as audio data, may refer to processing circuitry (e.g., application specific integrated circuits (ASICs) , digital signal processors (DSP) , graphics processing unit (GPU) , central processing unit (CPU) , neural signal processor (NSP) ) configured to perform the recited function through hardware, software, or a combination of hardware configured by software.
Those of skill in the art would understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Components, the functional blocks, and the modules described herein with respect to the Figures referenced above include processors, electronics devices, hardware devices, electronics components, logical circuits, memories, software codes, firmware codes, among other examples, or any combination thereof. Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, application, software applications, software packages,
routines, subroutines, objects, executables, threads of execution, procedures, and/or functions, among other examples, whether referred to as software, firmware, middleware, microcode, hardware description language or otherwise. In addition, features discussed herein may be implemented via specialized processor circuitry, via executable instructions, or combinations thereof.
Those of skill in the art that one or more blocks (or operations) described with reference to Figure 3 may be combined with one or more blocks (or operations) described with reference to another of the figures. For example, one or more blocks (or operations) of Figure 3 may be combined with one or more blocks (or operations) of Figure 1 or Figure 2. As another example, one or more blocks associated with Figure 3 may be combined with one or more blocks (or operations) associated with Figure 4 or Figure 5. As a further example, one or more blocks associated with Figure 4 or Figure 5 may be combined with one or more blocks (or operations) associated with Figure 1 or Figure 2. Additionally, aspects of embodiments of Figure 4 may be combined with Figure 5 or vice versa.
Those of skill in the art would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure. Skilled artisans will also readily recognize that the order or combination of components, methods, or interactions that are described herein are merely examples and that the components, methods, or interactions of the various aspects of the present disclosure may be combined or performed in ways other than those illustrated and described herein.
The various illustrative logics, logical blocks, modules, circuits and algorithm processes described in connection with the implementations disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. The interchangeability of hardware and software has been described generally, in terms of functionality, and illustrated in the various illustrative components, blocks, modules, circuits, and processes described above. Whether such functionality is implemented in
hardware or software depends upon the particular application and design constraints imposed on the overall system.
In one or more aspects, the operations described may be implemented in hardware, digital electronic circuitry, computer software, firmware, including the structures disclosed in this specification and their structural equivalents thereof, or in any combination thereof. Implementations of the subject matter described in this specification also may be implemented as one or more computer programs, which is one or more modules of computer program instructions, encoded on a computer storage media for execution by, or to control the operation of, data processing apparatus.
The operations of a method or algorithm disclosed herein may be implemented in a processor-executable software module which may reside on a computer-readable medium and commercially made available as a computer program product as software. Computer-readable media includes both computer storage media and communication media including any medium that may be enabled to transfer a computer program from one place to another. A storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such computer-readable media may include random-access memory (RAM) , read-only memory (ROM) , electrically erasable programmable read-only memory (EEPROM) , CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to store desired program code in the form of instructions or data structures and that may be accessed by a computer. Also, any connection may be properly termed a computer-readable medium. Disk and disc, as used herein, includes compact disc (CD) , laser disc, optical disc, digital versatile disc (DVD) , floppy disk, and Blu-ray disc wherein disks usually reproduce data magnetically and discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Various modifications to the implementations described in this disclosure may be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to some other implementations without departing from the spirit or scope of this disclosure. Thus, the claims are not intended to be limited to the implementations shown herein but are to be accorded the widest scope consistent with this disclosure, the principles and the novel features disclosed herein.
Additionally, a person having ordinary skill in the art will readily appreciate, opposing terms such as “upper” and “lower, ” or “front” and back, ” or “top” and “bottom, ” or
“forward” and “backward, ” or “left” and “right” are sometimes used for ease of describing the figures, and indicate relative positions corresponding to the orientation of the figure on a properly oriented page, and may not reflect the proper orientation of any device as implemented.
Certain features that are described in this specification in the context of separate implementations also may be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation also may be implemented in multiple implementations separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination may in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown, or in sequential order, or that all illustrated operations be performed to achieve desirable results. Further, the drawings may schematically depict one or more example processes in the form of a flow diagram. However, other operations that are not depicted may be incorporated in the example processes that are schematically illustrated. For example, one or more additional operations may be performed before, after, simultaneously, or between any of the illustrated operations. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems may generally be integrated together in a single software product or packaged into multiple software products. Additionally, some other implementations are within the scope of the following claims. In some cases, the actions recited in the claims may be performed in a different order and still achieve desirable results.
As used herein, including in the claims, the term “or, ” when used in a list of two or more items, means that any one of the listed items may be employed by itself, or any combination of two or more of the listed items may be employed. For example, if a composition is described as containing components A, B, or C, the composition may contain A alone; B alone; C alone; A and B in combination; A and C in combination; B and C in combination; or A, B, and C in combination. Also, as used herein, including in
the claims, “or” as used in a list of items prefaced by “at least one of” indicates a disjunctive list such that, for example, a list of “at least one of A, B, or C” means A or B or C or AB or AC or BC or ABC (that is A and B and C) or any of these in any combination thereof.
The term “substantially” is defined as largely, but not necessarily wholly, what is specified (and includes what is specified; for example, substantially 90 degrees includes 90 degrees and substantially parallel includes parallel) , as understood by a person of ordinary skill in the art. In any disclosed implementations, the term “substantially” may be substituted with “within [apercentage] of” what is specified, where the percentage includes . 1, 1, 5, or 10 percent.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (30)
- A method, comprising:receiving a first plurality of audio signals including at least a first channel and a second channel;applying at least one nullformer to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the at least one nullformer reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; andapplying a panning algorithm to the second plurality of audio signals to determine a third plurality of audio signals.
- The method of claim 1, wherein applying the at least one nullformer comprises:applying at least one first nullformer to each of the first plurality of audio signals; andapplying at least one second nullformer to each of the first plurality of audio signals,wherein the at least one first nullformer processes a first frequency range and the at least one second nullformer processes a second frequency range different from the first frequency range.
- The method of claim 2, wherein the first frequency range corresponds to first frequencies below a cutoff frequency and the second frequency range corresponds to second frequencies above the cutoff frequency.
- The method of claim 2, further comprising:determining a location of a source of a sound in the first plurality of audio signals; andcontrolling a null direction of the at least one nullformer based on the location of the source.
- The method of claim 1, further comprising combining the second plurality of audio signals with the first plurality of audio signals before applying the panning algorithm to the second plurality of audio signals.
- The method of claim 5, further comprising adjusting a mixing ratio of the second plurality of audio signals and the first plurality of audio signals based on a configuration parameter when combining the second plurality of audio signals with the first plurality of audio signals.
- The method of claim 1, wherein applying the panning algorithm comprises applying a mu-law panning algorithm to the second plurality of audio signals.
- The method of claim 1, wherein applying the panning algorithm to the second plurality of audio signals determines the third plurality of audio signals in which any sound acoustically located on a first side of a soundstage is removed from a second side of the soundstage.
- The method of claim 8, wherein applying the panning algorithm to the second plurality of audio signals determines the third plurality of audio signals in which acoustic sources within a threshold region of a center of the soundstage are removed.
- The method of claim 1, further comprising outputting the third plurality of audio signals to a transducer.
- An apparatus, comprising:a memory configured to store a first plurality of audio signals; andone or more processors coupled to the memory, the one or more processors configured to:receive the first plurality of audio signals including at least a first channel and a second channel;apply at least one nullformer to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the at least one nullformer reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; andapply a panning algorithm to the second plurality of audio signals to determine a third plurality of audio signals.
- The apparatus of claim 11, wherein the one or more processors is configured to apply the at least one nullformer by being configured to:apply at least one first nullformer to each of the first plurality of audio signals; andapply at least one second nullformer to each of the first plurality of audio signals,wherein the at least one first nullformer processes a first frequency range and the at least one second nullformer processes a second frequency range different from the first frequency range.
- The apparatus of claim 12, wherein the first frequency range corresponds to first frequencies below a cutoff frequency and the second frequency range corresponds to second frequencies above the cutoff frequency.
- The apparatus of claim 12, wherein the one or more processors is further configured to:determine a location of a source of a sound in the first plurality of audio signals; andcontrol a null direction of the at least one nullformer based on the location of the source.
- The apparatus of claim 11, wherein the one or more processors is further configured to combine the second plurality of audio signals with the first plurality of audio signals before applying the panning algorithm to the second plurality of audio signals.
- The apparatus of claim 15, wherein the one or more processors is further configured to adjust a mixing ratio of the second plurality of audio signals and the first plurality of audio signals based on a configuration parameter when combining the second plurality of audio signals with the first plurality of audio signals.
- The apparatus of claim 11, wherein the one or more processors is configured to apply the panning algorithm by being configured to apply a mu-law panning algorithm to the second plurality of audio signals.
- The apparatus of claim 11, wherein the one or more processors is configured to determine the third plurality of audio signals in which any sound acoustically located on a first side of a soundstage is removed from a second side of the soundstage.
- The apparatus of claim 18, wherein the one or more processors is configured to determine the third plurality of audio signals in which acoustic sources within a threshold region of a center of the soundstage are removed.
- The apparatus of claim 11, wherein the one or more processors is further configured to output the third plurality of audio signals to a transducer.
- The apparatus of claim 11, further comprising a first microphone and a second microphone, each of the first microphone and the second microphone configured to store the first plurality of audio signals in the memory.
- The apparatus of claim 11, further comprising a camera, wherein the one or more processors is configured to:determine a location of a source of a sound in the first plurality of audio signals based on image data from the camera; andcontrol a null direction of the at least one nullformer based on the location of the source determined from the image data.
- An apparatus, comprising:a first microphone and a second microphone;a memory configured to store a first plurality of audio signals from the first microphone and the second microphone; andone or more processors coupled to the memory and to the first microphone and the second microphone, the one or more processors configured to:receive the first plurality of audio signals comprising at least a first channel corresponding to the first microphone and a second channel corresponding to the second microphone;apply at least one nullformer to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the at least one nullformer reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; andapply a panning algorithm to the second plurality of audio signals to determine a third plurality of audio signals.
- The apparatus of claim 23, wherein applying the at least one nullformer includes:applying at least one first nullformer to each of the first plurality of audio signals; andapplying at least one second nullformer to each of the first plurality of audio signals,wherein the at least one first nullformer processes a first frequency range and the at least one second nullformer processes a second frequency range different from the first frequency range.
- The apparatus of claim 24, wherein the one or more processors is further configured to:determining a location of a source of a sound in the first plurality of audio signals; andcontrolling a null direction of the at least one nullformer based on the location of the source.
- The apparatus of claim 23, wherein the one or more processors is further configured to combining the second plurality of audio signals with the first plurality of audio signals before applying the panning algorithm to the second plurality of audio signals.
- The apparatus of claim 23, wherein applying the panning algorithm includes applying a mu-law panning algorithm to the second plurality of audio signals.
- The apparatus of claim 23, further comprising a camera, wherein the one or more processors is configured to:determine a location of a source of a sound in the first plurality of audio signals based on image data from the camera; andcontrol a null direction of the at least one nullformer based on the location of the source determined from the image data.
- The apparatus of claim 23, further comprising:two or more speakers coupled to the one or more processors and configured to reproduce the third plurality of audio signals.
- A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:receiving a first plurality of audio signals comprising at least a first channel and a second channel;applying at least one nullformer to each of the first plurality of audio signals to determine a second plurality of audio signals, wherein the at least one nullformer reduces contribution of a first source to the second channel and reduces contribution of a second source to the first channel; andapplying a panning algorithm to the second plurality of audio signals to determine a third plurality of audio signals.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2023/119911 WO2025059892A1 (en) | 2023-09-20 | 2023-09-20 | Stereo expansion using nullformers and panning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2023/119911 WO2025059892A1 (en) | 2023-09-20 | 2023-09-20 | Stereo expansion using nullformers and panning |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2025059892A1 true WO2025059892A1 (en) | 2025-03-27 |
Family
ID=95073147
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/119911 WO2025059892A1 (en) | 2023-09-20 | 2023-09-20 | Stereo expansion using nullformers and panning |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2025059892A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070076902A1 (en) * | 2005-09-30 | 2007-04-05 | Aaron Master | Method and Apparatus for Removing or Isolating Voice or Instruments on Stereo Recordings |
US20120300941A1 (en) * | 2011-05-25 | 2012-11-29 | Samsung Electronics Co., Ltd. | Apparatus and method for removing vocal signal |
US20210006928A1 (en) * | 2019-07-05 | 2021-01-07 | Nokia Technologies Oy | Stereo audio |
-
2023
- 2023-09-20 WO PCT/CN2023/119911 patent/WO2025059892A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070076902A1 (en) * | 2005-09-30 | 2007-04-05 | Aaron Master | Method and Apparatus for Removing or Isolating Voice or Instruments on Stereo Recordings |
US20120300941A1 (en) * | 2011-05-25 | 2012-11-29 | Samsung Electronics Co., Ltd. | Apparatus and method for removing vocal signal |
US20210006928A1 (en) * | 2019-07-05 | 2021-01-07 | Nokia Technologies Oy | Stereo audio |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9361898B2 (en) | Three-dimensional sound compression and over-the-air-transmission during a call | |
US9922663B2 (en) | Voice signal processing method and apparatus | |
JP6121481B2 (en) | 3D sound acquisition and playback using multi-microphone | |
KR102160645B1 (en) | Apparatus and method for providing individual sound zones | |
US9516411B2 (en) | Signal-separation system using a directional microphone array and method for providing same | |
US20140294210A1 (en) | Systems, methods, and apparatus for directing sound in a vehicle | |
US20170193976A1 (en) | In-vehicle communication signal processing | |
US10755728B1 (en) | Multichannel noise cancellation using frequency domain spectrum masking | |
KR20190094356A (en) | Audio precompensation filter optimized for highlights and shadows | |
US10798511B1 (en) | Processing of audio signals for spatial audio | |
KR20110132245A (en) | Voice signal processing device and voice signal processing method | |
US20200037057A1 (en) | Systems and methods for processing an audio signal for replay on stereo and multi-channel audio devices | |
CN107431871A (en) | Filter the audio signal processor and method of audio signal | |
US20200312341A1 (en) | Noise suppression wearable device | |
EP3599775A1 (en) | Systems and methods for processing an audio signal for replay on stereo and multi-channel audio devices | |
WO2025059892A1 (en) | Stereo expansion using nullformers and panning | |
JP6355049B2 (en) | Acoustic signal processing method and acoustic signal processing apparatus | |
CN116709159B (en) | Audio processing method and terminal equipment | |
US20240331679A1 (en) | Machine learning-based feedback cancellation | |
KR20230123472A (en) | Spatial audio wind noise detection | |
WO2024206654A1 (en) | Machine learning-based feedback cancellation | |
JP7536078B2 (en) | Acoustic Echo Cancellation Unit | |
WO2023122547A1 (en) | A method of processing audio for playback of immersive audio | |
EP4449739A1 (en) | Multiband bass management in vehicle audio system | |
KR20180024612A (en) | A method and an apparatus for processing an audio signal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23952554 Country of ref document: EP Kind code of ref document: A1 |