WO2020120754A1 - Audio processing device, audio processing method and computer program thereof - Google Patents

Audio processing device, audio processing method and computer program thereof Download PDF

Info

Publication number
WO2020120754A1
WO2020120754A1 PCT/EP2019/085130 EP2019085130W WO2020120754A1 WO 2020120754 A1 WO2020120754 A1 WO 2020120754A1 EP 2019085130 W EP2019085130 W EP 2019085130W WO 2020120754 A1 WO2020120754 A1 WO 2020120754A1
Authority
WO
WIPO (PCT)
Prior art keywords
upmixing
remixing
electronic device
vehicle
adaptive
Prior art date
Application number
PCT/EP2019/085130
Other languages
French (fr)
Inventor
Stefan Uhlich
Michael Enenkl
Original Assignee
Sony Corporation
Sony Europe B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Corporation, Sony Europe B.V. filed Critical Sony Corporation
Publication of WO2020120754A1 publication Critical patent/WO2020120754A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • G10H1/366Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems with means for modifying or correcting the external signal, e.g. pitch correction, reverberation, changing a singer's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/46Volume control
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G5/00Tone control or bandwidth control in amplifiers
    • H03G5/005Tone control or bandwidth control in amplifiers of digital signals
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G5/00Tone control or bandwidth control in amplifiers
    • H03G5/16Automatic control
    • H03G5/165Equalizers; Volume or gain control in limited frequency bands
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/12Circuits for transducers, loudspeakers or microphones for distributing signals to two or more loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/046Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for differentiation between music and non-music signals, based on the identification of musical parameters, e.g. based on tempo detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/056Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/155Musical effects
    • G10H2210/265Acoustic effect simulation, i.e. volume, spatial, resonance or reverberation effects added to a musical sound, usually by appropriate filtering or delays
    • G10H2210/295Spatial effects, musical uses of multiple audio channels, e.g. stereo
    • G10H2210/305Source positioning in a soundscape, e.g. instrument positioning on a virtual soundstage, stereo panning or related delay or reverberation changes; Changing the stereo width of a musical source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/311Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2227/00Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
    • H04R2227/001Adaptation of signal processing in PA systems in dependence of presence of noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/01Aspects of volume control, not necessarily automatic, in sound systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2499/00Aspects covered by H04R or H04S not otherwise provided for in their subgroups
    • H04R2499/10General applications
    • H04R2499/13Acoustic transducers and sound field adaptation in vehicles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/05Generation or adaptation of centre channel in multi-channel audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field

Definitions

  • the present disclosure generally pertains to the field of audio processing, in particular to devices, methods and computer programs for audio source separation and adaptive upmixing/ remixing.
  • a vehicle audio system such as AM/FM and Digital Radio, CD and Navigation is equipment in stalled in a car or other vehicle. There is a lot of audio content available to provide in-car entertain ment and information for the vehicle occupants by such a vehicle audio system.
  • the disclosure provides an electronic device comprising circuitry config ured to perform audio source separation based on a received audio signal to obtain separated sources; analyze a current situation to determine one or more parameters for adaptive remixing or upmixing, and perform remixing or upmixing based on the separated sources and based on the pa rameters for adaptive remixing or upmixing to obtain a remixed or upmixed signal.
  • the disclosure provides a method comprising performing audio source separation based on a received audio signal to obtain separated sources; analyzing a current situation to determine one or more parameters for adaptive remixing or upmixing; and performing remixing or upmixing based on the separated sources and based on the parameters for adaptive re mixing or upmixing to obtain a remixed or upmixed signal.
  • the disclosure provides a computer program comprising instructions, the instructions when executed on a processor causing the processor to perform audio source sepa ration based on a received audio signal to obtain separated sources; analyze a current situation to de termine one or more parameters for adaptive remixing or upmixing, and perform remixing or upmixing based on the separated sources and based on the parameters for adaptive remixing or upmixing to obtain a remixed or upmixed signal.
  • Fig. 1 schematically shows a general approach of audio remixing/ upmixing by means of audio source separation
  • Fig. 2 schematically shows a process of remixing/ upmixing based on audio source separation per formed within vehicle
  • Fig. 3a shows a flow diagram visualizing a method for signal re-/ upmixing in order to decrease the volume vocals within a piece of music
  • Fig. 3b shows a flow diagram visualizing a method for signal re-/ upmixing according to a further embodiment
  • Fig. 4 schematically depicts an embodiment of a vehicle audio system, with an electronic system that improves current driving/ passenger situation;
  • Fig. 5 shows a flow diagram visualizing a method for signal remixing/ upmixing in which the re mixed/ upmixed signal is output only to front loudspeakers of a loudspeaker system
  • Fig. 6 schematically depicts a further embodiment of a vehicle audio system, with an electronic sys tem that improves current driving/ passenger situation by generating passenger-individual sound sources based on the presence of a passenger on the rear seat;
  • Fig. 7 shows a flow diagram visualizing a method for signal remixing/ upmixing according to a fur ther embodiment
  • Fig. 8 shows a flow diagram visualizing a method for signal remixing/ upmixing according to a fur ther embodiment
  • Fig. 9 shows a flow diagram visualizing a method for signal remixing/ upmixing according to a fur ther embodiment
  • Fig. 10 provides a schematic diagram of a system applying digitalized monopole synthesis algorithm
  • Fig. 11 shows a block diagram depicting an example of schematic configuration of a vehicle control system
  • Fig. 12 shows an example of installation positions of the imaging section and the outside- vehicle in formation detecting section
  • Fig. 13 shows a block diagram depicting an example of schematic configuration of a voice controlled system.
  • the embodiments disclose an electronic device, comprising a circuitry configured to perform audio source separation based on a received audio signal to obtain separated sources, configured to analyze a current situation to determine one or more parameters for adaptive remixing or upmixing, and configured to perform remixing or upmixing based on the separated sources and based on the pa rameters for adaptive remixing or upmixing to obtain a remixed or upmixed signal.
  • the electronic device may for example be an electronic control unit (ECU) within the vehicle.
  • ECUs are typically used in vehicles e.g. as a Door Control Unit (DCU), an Engine Control Unit (ECU), an Electric Power Steering Control Unit (PSCU), a Human-Machine Interface (HM3), Powertrain Con trol Module (PCM), a Seat Control Unit, a Speed Control Unit (SCU), a Telematic Control Unit (TCU), a Trans-mission Control Unit (TCU), a Brake Control Module (BCM; ABS or ESC), a Bat tery Management System (BMS), and/or a 3D audio rendering system.
  • the electronic device may be an ECU that is specifically used for the purpose of controlling a vehicle audio system.
  • an ECU that performs any of the functions described above may be used simultaneously for the purpose of controlling a vehicle audio system.
  • the electronic device may for example be a smart speaker capable of voice interaction, music playback, making to-do lists, setting alarms, streaming podcasts, playing audiobooks, providing weather, traffic, sports, and other real-time infor mation, such as news or the like.
  • the electronic device may also have the functions of a home auto mation system.
  • the circuitry of the electronic device may include a processor, may for example be CPU, a memory (RAM, ROM or the like), a memory and/ or storage, interfaces, etc.
  • Circuitry may comprise or may be connected with input means (mouse, keyboard, camera, etc.), output means (display (e.g. liquid crystal, (organic) light emitting diode, etc.)), loudspeakers, etc., a (wireless) interface, etc., as it is gen erally known for electronic devices (computers, smartphones, etc.).
  • circuitry may com prise or may be connected with sensors for sensing still images or video image data (image sensor, camera sensor, video sensor, etc.), for sensing environmental parameters (e.g. radar, humidity, light, temperature), etc.
  • Audio source separation may be any process which decomposes a source audio signal comprising multiple channels and audio from multiple audio sources (e.g. instruments, voice, etc.) into“separa tions”.
  • the separations produced by audio source separation from the source audio signal may for example comprise a vocals separation, and accompaniment separation, which may be for example a bass separation, a drums separations and another separation.
  • the audio source separation process may be implemented as described in more detail in published papers, namely Uhlich, Stefan, Franck Giron, and Yuki Mitsufuji.“Deep neural network based in strument extraction from music” ICASSP. 2015, and Uhlich, Stefan, et al.“Improving music source separation based on deep neural networks through data augmentation and network blending” Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.
  • ICASSP Acoustics, Speech and Signal Processing
  • Analyzing the situation may comprise collecting and evaluating any information obtained e.g. from sensors (cameras, microphones, pressure sensors or the like) or entities inside e.g. a vehicle or a household (multimedia system, cd player, radio receiver, or the like) over a connection to a commu nication bus (e.g. CAN bus, LIN bus) within the vehicle, over interprocess communication, or via other communication means.
  • Analyzing the situation may comprise any techniques of context awareness.
  • the electronic device may be connected to a telephoning system located in the car via the communication bus and configured to exchange information with the telephoning system over this communication bus.
  • the telephoning system may produce an event (in computing, an event is an action or occurrence recognized by software, often originating asynchro nously from the external environment, that may be handled by the software) that indicates that there is an incoming telephone call.
  • the situation analyzer may for example comprise a program (“lis tener”) that receives such events (“listens”), and the situation analyzer can react by setting the pa rameters for remixing or upmixing accordingly. Also, if the telephoning system and the situation analyzer are implemented on the same computing device, the communication could be handled via interprocess communication.
  • the received signal may for example be an audio stream obtained, over a communication bus in the vehicle, from a multimedia system within the vehicle, from a digital radio receiver, from an MPEG player, a CD player, or the like.
  • the received signal could also be a digitalized signal obtained from an analog radio receiver or the like.
  • the adaptive remixing or upmixing may be configured to perform remixing or upmixing of the sep arated sources, here vocals and accompaniment to produce an adapted remixed or upmixed signal.
  • the adaptive remixing or upmixing may automatically adjust the vocals volume, the vo cals location, the vocals spread or the like.
  • the adaptive remixing or upmixing may send the adapted remixed or upmixed signal to the loudspeaker system of the vehicle, which may render the adapted remixed or upmixed signal so that the passengers inside the vehicle may listen to the adapted re mixed or upmixed signal.
  • the current situation may be a current situation related to a vehicle and/ or the current situation may be a current situation related to a speech recognition system.
  • the vehicle may be a mobile machine that transports people or cargo.
  • the vehicle may be a wagon, a car, a truck, a bus, a railed vehicle (train, tram), a watercraft (ship, boat), an aircraft or a spacecraft.
  • the speech recognition system may be speech recognition system such as a smart speaker, a voice controlled system, or the like.
  • analyzing the current situation may comprise using an input from at least one sensor and/ or historic information.
  • the current situation may be analyzed using input from at least one sensor and/ or may be analyzed using historic information, in some embodiments, with the aim of detecting certain and/ or predefined events.
  • the parameters for adaptive remixing or upmixing may comprise a prede fined volume change parameter.
  • the parameters for adaptive remixing or upmixing may comprise a predefined volume decrease parameter related to the vocals. 11.
  • the predefined volume decrease parameter may be a volume decrease parameter of -12db.
  • the parameters for adaptive remixing or upmixing may be determined by an alyzing the current situation, which may comprise analyzing the current situation inside a vehicle, or analyzing a current driving/ passenger situation.
  • the remixed or upmixed signal may be output to a loudspeaker system.
  • analyzing the current situation may comprise determining if there is an in coming call.
  • the situation analyzer may start the situation analysis process at the moment the phone start ringing, or at the moment that the incoming call is answered, or at the moment that the driver or the co-passenger starts talking, or the like.
  • the parameters for adaptive remixing or upmixing may comprise an output channel selection parameter, for example, a first output channel selection parameter to“front” and a second output channel selection parameter to“rear”.
  • the circuitry may determine to which of front loudspeakers and/ or rear loudspeakers the remixed or upmixed signal is to be output (510) based on the output channel selec tion parameter.
  • the first output channel selection parameter to“front” may related to the front loudspeakers and the second output channel selection parameter to“rear” may related to the rear loudspeakers of the loudspeaker system.
  • analyzing the current situation may comprise determining if there is a pas sengers conversation.
  • analyzing the current situation may further comprise determining the audio quality of the received audio signal.
  • analyzing the current situation may comprise determining if a keyword for initiating speech commands detected.
  • the keyword for initiating speech commands may also be a voice control triggered by pushing a predetermined key.
  • the separated sources may comprise vocals and accompaniment and wherein the parameters for adaptive remixing or upmixing may comprise a predefined volume decrease pa rameter related to the vocals that depends on the detection of the keyword for initiating speech commands.
  • the parameters for adaptive remixing or upmixing may further comprise a predefined volume increase parameter related to the vocals that depends on the audio quality of the received audio signal.
  • the circuitry may further be configured to detect a surrounding noise floor within the obtained separated sources.
  • the circuitry may further be configured to perform dynamic equalization to the separated sources to obtain equalized separated sources, and to perform remixing/ upmixing based on the equalized separated sources.
  • the equalized separated sources may have a changed spectrum of the separated sources, or the like.
  • the embodiments also disclose a method comprising performing audio source separation based on a received audio signal to obtain separated sources, analyzing a current situation to determine one or more parameters for adaptive remixing or upmixing, and performing remixing or upmixing based on the separated sources and based on the parameters for adaptive remixing or upmixing to obtain a remixed or upmixed signal.
  • the embodiments also disclose a computer program comprising instructions, the instructions when executed on a processor causing the processor to perform audio source separation based on a re ceived audio signal to obtain separated sources, analyze a current situation to determine one or more parameters for adaptive remixing or upmixing, and perform remixing or upmixing based on the sep arated sources and based on the parameters for adaptive remixing or upmixing to obtain a remixed or upmixed signal.
  • Fig. 1 schematically shows a general approach of audio remixing/ upmixing by means of audio source separation.
  • source separation also called“demixing” which decomposes a source audio sig nal 1 comprising multiple channels la, lb and audio from multiple audio sources Source 1, Source 2, ...
  • Source K e.g. instruments, voice, etc.
  • K is an integer number and denotes the number of audio sources.
  • the source audio signal 1 is a stereo signal having two channels la, lb.
  • the source audio signal 1 could also be a monaural signal, or a 5.1 surround signal, or the like.
  • the audio source separation 201 process may for example be implemented as described in more detail in published papers, namely Uhlich, Stefan, Franck Giron, and Yuki Mitsufuji.“Deep neural network based in strument extraction from music” ICASSP. 2015, and Uhlich, Stefan, et al.“Improving music source separation based on deep neural networks through data augmentation and network blending” Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.
  • ICASSP Acoustics, Speech and Signal Processing
  • a residual signal 3 (r(n)) is generated in addition to the separated audio source signals 2a, ..., 2d.
  • the residual signal may for example represent a difference between the input audio con tent and the sum of all separated audio source signals.
  • the audio signal emitted by each audio source is represented in the input audio content 1 by its respective recorded sound waves.
  • a spatial information for the audio sources is typically included or represented by the input au dio content, e.g. by the proportion of the audio source signal included in the different audio chan nels.
  • the separation of the input audio content 1 into separated audio source signals 2a to 2d and a residual 3 is performed on the basis of blind source separation or other techniques which are able to separate audio sources.
  • a new loudspeaker signal 4 here a signal comprising five channels 4a to 4e, namely a 5.0 channel sys tem.
  • an output audio con tent is generated by mixing the separated audio source signals and the residual signal on the basis of spatial information.
  • the output audio content is exemplary illustrated and denoted with reference number 4 in Fig. 1. Remixing/ upmixing based on situation analysis
  • Fig. 2 schematically shows a process of remixing/ upmixing based on audio source separation per formed within a vehicle.
  • the process comprises an audio source separation 201, a situation analyzer 202, an adaptive remixing/ upmixing 203 and a dynamic equalizer 204.
  • An input signal containing multiple sources (see 1, 2, ... K in Fig. 1), e.g. a piece of music, is input to the source separation 201 and decomposed into separations (see separated sources 2a, ... 2d in Fig. 1) as it is described with regard to Fig. 1 above, here into vocals and accompaniment.
  • the separated sources, here vocals and accompaniment are transmitted to dynamic equalizer 204.
  • the dynamic equalizer 204 performs dy namic equalization on the vocals and accompaniment based on the surrounding noise detected in the vocals and accompaniment to produce equalized vocals and equalized accompaniment. For ex ample, the dynamic equalizer 204 changes the spectrum of the separated sources (see Fig. 9 and cor responding description). The equalized vocals and equalized accompaniment are transmitted to the adaptive remixing/ upmixing 203.
  • the situation analyzer 202 analyzes the current situation for example, inside the vehicle, e.g., the cur rent driving/ passenger situation inside the vehicle, and determines, based on this analysis, one or more parameters for adaptive remixing/ upmixing such as a volume change parameter (e.g. volume decrease parameter in Fig. 3a, 3b), a location change parameter (see Fig. 5), and/or spread of sources parameter.
  • a volume change parameter e.g. volume decrease parameter in Fig. 3a, 3b
  • a location change parameter see Fig. 5
  • spread of sources parameter e.g. volume change parameter in Fig. 3a, 3b
  • the current situation when the above discussed system is located inside a vehi cle, may for example be an incoming call (see 300 in Fig. 3a), a passenger conversation (see 700 in Fig. 7), and/ or an audio quality detection (see 800 in Fig. 8), or the like.
  • the current situa tion may be analyzed using input from at least one sensor and
  • the adaptive remixing/ upmixing 203 Based on the one or more parameters for adaptive remixing/ upmixing obtained from the situation analyzer 202, the adaptive remixing/ upmixing 203 performs remixing/ upmixing of the equalized vo cals and equalized accompaniment obtained from the dynamic equalizer 204, to produce a adapted remixed/upmixed signal (see channels 4a to 4e in Fig. 1).
  • the adaptive remix- ing/ upmixing 203 automatically adjusts the vocals volume, the vocals location, the vocals spread or the like.
  • the adaptive remixing/ upmixing 203 sends the adapted remixed/upmixed signal to the loudspeaker system of the vehicle, which renders the adapted remixed/ upmixed signal so that the passengers inside the vehicle can listen to the adapted remixed/upmixed signal.
  • the loudspeaker system presented here has two loudspeakers outputting remixed/upmixed sig nal, e.g. remixed/ upmixed signal, here a signal comprising two channels, a left channel and a right channel.
  • the loudspeaker signal of the left channel transmitted from the left loudspeakers of the loudspeaker system of the vehicle and the loudspeaker signal of the right channel transmitted from the right loudspeakers of the loudspeaker system of the vehicle may have more than two channels, for example, five channels (see Fig. 1).
  • the dynamic equalizer 204 performing dynamic equalization on the vocals and accompaniment may be an optional supplementary process. Without the presence of the dynamic equalizer 204, the adap tive remixing/ upmixing 203 receives as an input signal, the separated sources, here vocals and ac companiment, output from the audio source separation 201.
  • Fig. 3a shows a flow diagram visualizing a method for signal re-/ upmixing in order to decrease the volume vocals within an audio signal, e.g. a piece of music (see Fig. 4).
  • the audio source separation 201 receive an audio signal.
  • audio source separation is performed, based on the received audio signal, to obtain separated sources, namely vo cals and accompaniments (see Fig. 2).
  • the situation analyzer 202 recognizes the current driving / passenger situation to detect if there is an incoming call. If an incoming call is not detected at 302, the process continues at 304.
  • the situation analyzer 202 sets a vocals volume decrease parameter to 0 dB, and then the process continues to 305. If an incoming call is detected the method proceeds at 303.
  • the situation analyzer 202 sets a vocals volume decrease param eter to -12 dB, for example, and then the process continues to 305.
  • remixing/ upmixing is performed based on the separated sources being obtained at 301, and based on the vocals volume decrease parameter, being set at 303 or at 304, to obtain remixed/ upmixed signal.
  • the vocals vol ume decrease parameter may for example be a predefined volume change parameter related to the vocals, for example a decrease of 12dB or the like.
  • the adaptive remixing/upmixing 203 may automatically adjust the vocals volume, by decreasing the vocals volume, while keeping the volume of the accompaniment as before.
  • the passengers of the vehicle may listen to the ac companiment while at the same time clearly hearing the caller of the incoming call.
  • the re mixed/ upmixed signal is output to the loudspeaker system of the vehicle.
  • the remixed/ upmixed signal having a decrease of OdB may also be called originally received audio signal.
  • Fig. 3a shows a flow diagram visualizing a method for signal remixing/ upmbdng according to a fur ther embodiment.
  • the current situation is a current situation related to a speech recognition system, such as a smart speaker or the like.
  • the current situation may be a living room situation, vehicle related situation, or the like.
  • the audio source separation 201 receive an audio signal.
  • audio source separation is performed, based on the received audio signal, to obtain separated sources, namely vo cals and accompaniments (see Fig. 2).
  • the situation analyzer 202 recognizes the current situation to detect if a keyword for initiating speech commands is detected, for example the keyword received through microphones, e.g. the microphone array 1310 in Fig. 13. If a keyword for initiating speech commands is not detected at 312, the process continues at 314.
  • the situa tion analyzer 202 sets a vocals volume decrease parameter to 0 dB, and then the process continues to 315.
  • the method proceeds at 313.
  • the situation analyzer 202 sets a vocals volume decrease parameter to -12 dB, for example, and then the process continues to 315.
  • remixing/upmixing is performed based on the separated sources being obtained at 311, and based on the vocals volume decrease parameter, being set at 313 or at 314, to obtain remixed/upmixed signal.
  • the vocals volume decrease parameter may for exam ple be a predefined volume change parameter related to the vocals, for example a decrease of 12dB or the like.
  • the adaptive remixing/ upmixing 203 may automatically adjust the vocals volume, by decreasing the vocals volume, while keeping the volume of the accompaniment as be fore.
  • the listeners may listen to the accompaniment while at the same time interacting without any disturbance (clearly hearing and talking to the speech recognition system) with the voice con trolled system.
  • the remixed/upmixed signal is output to the loudspeaker system, for example the loudspeaker array 1311 in Fig. 13.
  • the keyword for initiating speech commands may also be a voice control triggered by pushing a key. Furthermore, if there is no keyword for initiating speech commands is detected, the audio remixing/ upmixing is made with a vocals volume decrease parame ter of OdB, that is both, the volume of the vocals separated source and the volume of the accompa niment separated source are not changed.
  • FIG. 4 schematically depicts an embodiment of a vehicle audio system, with an electronic system that improves current driving/ passenger situation.
  • a vehicle 401 is equipped with a vehicle audio system and four seats SI to S4.
  • the front seats SI and S2 are provided for a driver PI and a co-driver P2 and the rear seats S3 and S4 are provided for passengers P3 and P4, respectively.
  • the vehicle audio system comprises two microphones Ml and M2 that constitute a microphone ar ray, and loudspeakers SP1 to SP5 that constitute a loudspeaker array, e.g. the loudspeaker system of the vehicle.
  • the microphones Ml and M2 may obtain an audio signal from the driver and/ or the co passenger during, for example the incoming call situation.
  • the loudspeaker system outputs the loud speaker signal.
  • the loudspeaker signal of the left channel transmitted from the left loudspeakers e.g. loudspeakers SP1, SP3, of the loudspeaker system of the vehicle
  • the loud speaker signal of the right channel transmitted from the right loudspeakers e.g. loudspeakers SP2, SP4 of the loudspeaker system of the vehicle (see Fig. 2).
  • the vehicle audio system comprises a processor for implementing the functionality described in Fig. 2, in particular a situation analyzer 202, and an adaptive remixing/ upmixing 203, which act on a received audio signal, e.g. an input piece of music.
  • the adaptive remixing/ upmixing 203 (see Fig. 2) automatically decreases the vocals volume within the piece of music, based on the received audio signal (see 301 in Fig. 3a).
  • the situation analyzer 202 (see Fig. 2) analyzes the current situation inside the vehicle, e.g. detects an incoming call (see 302 in Fig. 3a).
  • the adaptive remixing/ upmixing 203 performs remixing/ upmixing based on the separated sources, here vocals and accompaniment, and based on the vocals volume decrease parameter, and outputs the remixed/upmixed signal (see Fig. 3a).
  • the passengers PI to P4 of the vehicle may hear clearly the voice of the caller trans mitted from the loudspeaker system of the vehicle without being disturbed by the vocals within the piece of music, while the volume of the accompaniment is kept as it was before the incoming call (see Fig. 3a).
  • Fig. 5 shows a flow diagram visualizing a method for signal remixing/ upmixing in which the re mixed/upmixed signal is output only to front loudspeakers of a loudspeaker system (see Fig. 6).
  • the audio source separation 201 receives an audio signal.
  • audio source separation is performed, based on the received audio signal, to obtain separated sources, namely vo cals and accompaniments (see Fig. 2).
  • the situation analyzer 202 analyzes the cur rent driving / passenger situation to detect if there is an incoming call. If an incoming call is not detected at 502, the process continues at 507.
  • the situation analyzer 202 sets a vocals volume decrease parameter to OdB.
  • output channel selection parameter is set to“front” and to “rear”, so that the remixed/upmixed signal, having a vocals volume decrease parameter to OdB, is output at the loudspeaker system of the vehicle (see Fig. 6), and the process continues to 509.
  • the situa tion analyzer 202 sets a first vocals volume decrease parameter to -12dB and the process continues at 505.
  • the situation analyzer 202 sets a second vocals volume de crease parameter to OdB, and the process continues at 506.
  • a first output channel selection parameter is set to“front”, and the process continues at 509.
  • a sec ond output channel selection parameter is set to“rear”, and the process continues at 509.
  • remixing/ upmixing is performed based on the separated sources being obtained at 501 and based on the parameters for adaptive remixing/ upmixing obtained at 503, 504, 505, and 506 (the vocals vol ume decrease parameters and the output channel selection parameters) to obtain a re- mixed/upmixed signal. That is, the adaptive remixing/ upmixing 203 automatically adjusts the vocals volume for the front loudspeakers, by decreasing the vocals volume (e.g.
  • the remixed/upmixed signal is output to the loudspeaker system, in particular to the front loudspeakers and/ or the rear loud speakers of the loudspeaker system of the vehicle. That is, the part of the remixed/upmixed signal having a decrease of -12dB in the vocals is output to the front loudspeakers (SP1, SP2 in Fig.
  • the passengers PI, P2 in the front seats, SI, S2 may listen to the ac companiment, and at the same time clearly hear the caller of the incoming call without being dis turbed by the vocals of the piece of music, while the passengers P3, P4 in the rear seats, S3, S4, may continue listening the original signal of the audio signal (see Fig. 4) without the vocals being reduced in volume.
  • FIG. 6 schematically depicts a further embodiment of a vehicle audio system, with an electronic sys tem that improves current driving/ passenger situation by generating passenger-individual sound sources based on the presence of a passenger on the rear seat.
  • a vehicle 601 is equipped with a vehi cle audio system and four seats SI to S4.
  • the front seats SI and S2 are provided for a driver PI or a co-driver P2 and the rear seats S3 and S4 are provided for passengers at the rear seats S3, S4.
  • the vehicle audio system comprises two microphones Ml and M2 that constitute a microphone ar ray, and loudspeakers SP1 to SP8 that constitute a loudspeaker array, e.g. the loudspeaker system of the vehicle.
  • the vehicle audio system comprises a sensor array that comprises two sensors SE1 and SE2 that are each arranged at the respective seats S3 and S4 of the vehicle 601.
  • the sensors SE1 and SE2 may be any kind of sensors such as a pressure sensor capable of obtaining a respective presence of passengers at the rear seats S3, S4.
  • the sensors SE1 and SE2 sense the presence of at least one passenger at the rear seats S3, S4 of the vehicle.
  • the loudspeaker system of the vehicle includes front loudspeakers SP1, SP2 and rear loudspeakers SP3 to SP8.
  • the driver at the front seat SI may listen the remixed/upmixed signal be ing output to the front loudspeakers SP1, SP2, of the loudspeaker system of the vehicle (see 510 in Fig. 5).
  • the driver PI may listen to the accompaniment, and at the same time clearly hearing the caller of the incoming call without being disturbed by the vocals of the piece of music, while the passenger P3 in the rear seat, S3, may continue listening the originally received audio signal (see 510 in Fig. 5).
  • the remixed/upmixed signal is output to the front loudspeakers SP1, SP2, of the loudspeaker system of the vehicle and the vehicle audio system generates an individual virtual sound source 601 on the left side of the driver PI, and an individual virtual sound source 602 on the right side of the driver PI, in a predefined location, for example, close to the ears of the driver.
  • the vehicle audio system based on the detected presence of a passenger, for example passenger P3 at the rear seat S3 of the vehicle, the vehicle audio system generates an individual virtual sound source 603 on the left side of the passenger P3, and an individual virtual sound source 604 on the right side of the passenger P3, in a predefined location, for example, close to the ears of each passenger.
  • the virtual sound source 601, 603 represents the loudspeaker signal of the left channel).
  • the virtual sound source 602, 604 represents the loudspeaker signal of the right.
  • sensor SE1 detects that passenger P3 is sitting on seat S3
  • a virtual sound source 602 is disposed near the ears of passenger P3, e.g. for ex periencing a stereo sound signal. Otherwise, the remixed/upmixed signal may be output to the loud speaker system of the vehicle (see 510 in Fig. 5).
  • Fig. 7 shows a flow diagram visualizing a method for signal remixing/ upmixing according to a fur ther embodiment.
  • the audio source separation 201 receives an audio signal.
  • audio source separation is performed, based on the received audio signal, to obtain separated sources, namely vo cals and accompaniments (see Fig. 2).
  • the situation analyzer 202 recognizes the current driving / passenger situation to detect if there is a passenger conversation. If a passenger conversation is not detected at 702, the process continues at 704.
  • the situation analyzer 202 sets a vocals volume decrease parameter to 0 dB, and then the process continues to 705. If a passen ger conversation is detected the method proceeds at 703.
  • the situation analyzer 202 sets a vocals volume decrease parameter to -12 dB, for example, and then the process continues to 705.
  • remixing/ upmixing is performed based on the separated sources being obtained at 701, and based on the vocals volume decrease parameter, being set at 703 or at 704, to obtain re
  • the vocals volume decrease parameter is a predefined volume change pa rameter related to the vocals, for example a decrease of 12dB, OdB or the like.
  • the adaptive remixing/ upmixing 203 may automatically adjust the vocals volume, by decreasing the vo cals volume (e.g. for -12dB), while keeping the volume of the accompaniment as before.
  • the adaptive remixing/ upmixing 203 may keep the volume of the vocals and the volume of the accompaniment as before.
  • the remixed/upmixed is output to the loud speaker system of the vehicle.
  • the remixed/upmixed signal having a decrease of OdB may also be called originally received audio signal.
  • Fig. 8 shows a flow diagram visualizing a method for signal remixing/ upmbdng according to a fur ther embodiment.
  • the audio source separation 201 receives an audio signal.
  • audio source separation is performed, based on the received audio signal, to obtain separated sources, namely vo cals and accompaniments (see Fig. 2).
  • the situation analyzer 202 recognizes the current driving / passenger situation to detect if there is a low quality audio. If a low quality audio is not detected at 802, the process continues at 804.
  • the situation analyzer 202 sets a vocals volume increase parameter to 0 dB, and then the process continues to 805. If a low quality audio is detected the method proceeds at 803.
  • the situation analyzer 202 sets a vocals volume in crease parameter to +3dB, for example, and then the process continues to 805.
  • remix- ing/ upmbdng is performed based on the separated sources being obtained at 801, and based on the vocals volume increase parameter, being set at 803 or at 804, to obtain remixed/upmixed signal.
  • the vocals volume increase parameter is a predefined volume change parameter related to the vocals, for example an increase of 3dB, OdB or the like.
  • the adaptive remixing/ upmixing 203 may automatically adjust the vocals volume, by increasing the vocals volume (e.g. for +3dB), while keep ing the volume of the accompaniment as before. In another example (e.g.
  • the adaptive re- mixing/ upmixing 203 may keep the volume of the vocals and the volume of the accompaniment as before.
  • the remixed/ upmixed is output to the loudspeaker system of the vehicle.
  • the re mixed/ upmixed signal having an increase of OdB may also be called originally received audio signal.
  • Fig. 9 shows a flow diagram visualizing a method for signal remixing/ upmixing according to a fur ther embodiment.
  • the electronic system receives an audio signal.
  • audio source separation is performed, based on the received audio signal, to obtain the separated sources, here vocals and accompaniment.
  • a surrounding noise floor is detected within the obtained separated sources.
  • dynamic equalization is performed to the separated sources, obtained at 903, based on the surrounding noise floor to obtain a changed spectrum of the separated sources.
  • the dynamic equalizer 204 (see Fig. 2) performs equalization to the separated sources, here vocals and accompaniment, which may include surrounding noise floor by changing the spectrum of the separated sources, namely the spectrum of the vocals and/ or the spectrum of the accompaniment.
  • the dynamic equalizer 204 may be a special adaptive equalizer that keeps the clarity of the separated sources, for example the clarity of the voice, under the masking effect of the surrounding noise floor.
  • remixing/ upmixing is performed based on the equalized separated sources obtained at 904, to obtain remixed/ upmixed signal.
  • the equalized separated sources for example the vocals, keep their clarity.
  • the remixed/upmixed signal is output to the loud speaker system of the vehicle.
  • the above method may be an optional supplementary method to the above described methods of Figs. 3a, 3b, 5, 7, 8.
  • Fig. 10 provides a schematic diagram of a system applying digitalized monopole synthesis algorithm. The theoretical background of this system is described in more detail in patent application US 2016/0037282 A1 which is herewith incorporated by reference.
  • a target sound field is modelled as at least one target monopole placed at a defined target position.
  • the target sound field is modelled as one single target monopole.
  • the target sound field is modelled as multiple target monopoles placed at respective de fined target positions.
  • the position of a target monopole may be moving.
  • a target monopole may adapt to the movement of a noise source to be attenuated.
  • the methods of synthesizing the sound of a tar get monopole based on a set of defined synthesis monopoles, as described below, may be applied for each target monopole independently, and the contributions of the synthesis monopoles obtained for each target monopole may be summed to reconstruct the target sound field.
  • the resulting signals S p (n) are power amplified and fed to loudspeaker S p .
  • the synthesis is thus performed in the form of delayed and amplified compo nents of the source signal X.
  • the modified amplification factor according to equation (118) of US 2016/0037282 A1 can be used.
  • a mapping factor as described with regard to Fig. 9 of US 2016/0037282 A1 can be used to modify the amplification.
  • the technology according to an embodiment of the present disclosure is applicable to various prod ucts.
  • the technology according to an embodiment of the present disclosure may be im plemented as a device included in a mobile body that is any of kinds of automobiles, electric vehicles, hybrid electric vehicles, motorcycles, bicycles, personal mobility vehicles, airplanes, drones, ships, robots, construction machinery, agricultural machinery (tractors), and the like.
  • Fig. 11 shows a block diagram depicting an example of schematic configuration of a vehicle control system 7000 as an example of a mobile body control system to which the technology according to an embodiment of the present disclosure can be applied.
  • the vehicle control system 7000 includes a plurality of electronic control units connected to each other via a communication network 7010.
  • the vehicle control system 7000 includes a driving system control unit 7100, a body system control unit 7200, a battery control unit 7300, an outside-vehicle infor mation detecting unit 7400, an in-vehicle information detecting unit 7500, and an integrated control unit 7600.
  • the communication network 7010 connecting the plurality of control units to each other may, for example, be a vehicle-mounted communication network compliant with an arbitrary stand ard such as controller area network (CAN), local interconnect network (LIN), local area network (LAN), FlexRay (registered trademark), or the like.
  • CAN controller area network
  • LIN local interconnect network
  • LAN local area network
  • FlexRay registered trademark
  • Each of the control units includes: a microcomputer that performs arithmetic processing according to various kinds of programs; a storage section that stores the programs executed by the microcom puter, parameters used for various kinds of operations, or the like; and a driving circuit that drives various kinds of control target devices.
  • Each of the control units further includes: a network inter face (I/F) for performing communication with other control units via the communication network 7010; and a communication I/F for performing communication with a device, a sensor, or the like within and without the vehicle by wire communication or radio communication.
  • I/F network inter face
  • the 11 includes a microcomputer 7610, a general-purpose communication I/F 7620, a dedicated communication I/F 7630, a positioning sec tion 7640, a beacon receiving section 7650, an in-vehicle device I/F 7660, a sound/image output section 7670, a vehicle-mounted network I/F 7680, and a storage section 7690.
  • the other control units similarly include a microcomputer, a communication I/F, a storage section, and the like.
  • the driving system control unit 7100 controls the operation of devices related to the driving system of the vehicle in accordance with various kinds of programs.
  • the driving system control unit 7100 may have a function as a control device of an antilock brake system (ABS), electronic stability con trol (ESC), or the like.
  • ABS antilock brake system
  • ESC electronic stability con trol
  • the driving system control unit 7100 is connected with a vehicle state detecting section 7110.
  • the driving system control unit 7100 performs arithmetic processing using a signal input from the vehi cle state detecting section 7110, and controls the internal combustion engine, the driving motor, an electric power steering device, the brake device, and the like.
  • the body system control unit 7200 controls the operation of various kinds of devices provided to the vehicle body in accordance with various kinds of programs.
  • the body system con trol unit 7200 functions as a control device for a keyless entry system, a smart key system, a power window device, or various kinds of lamps such as a headlamp, a backup lamp, a brake lamp, a turn signal, a fog lamp, or the like.
  • the battery control unit 7300 controls a secondary battery 7310, which is a power supply source for the driving motor, in accordance with various kinds of programs.
  • the outside-vehicle information detecting unit 7400 detects information about the outside of the vehicle including the vehicle control system 7000.
  • the outside-vehicle information de tecting unit 7400 is connected with at least one of an imaging section 7410 and an outside-vehicle information detecting section 7420.
  • the imaging section 7410 includes at least one of a time-of- flight (ToF) camera, a stereo camera, a monocular camera, an infrared camera, and other cameras.
  • ToF time-of- flight
  • the outside-vehicle information detecting section 7420 includes at least one of an en vironmental sensor for detecting current atmospheric conditions or weather conditions and a pe ripheral information detecting sensor for detecting another vehicle, an obstacle, a pedestrian, or the like on the periphery of the vehicle including the vehicle control system 7000.
  • the in-vehicle information detecting unit 7500 detects information about the inside of the vehicle.
  • the in-vehicle information detecting unit 7500 may collect any information related to a situation re lated to the vehicle.
  • the in-vehicle information detecting unit 7500 is, for example, connected with a driver and/or passengers state detecting section 7510 that detects the state of a driver and/or pas sengers.
  • the driver state detecting section 7510 may include a camera that images the driver, a bio sensor that detects biological information of the driver, a microphone that collects sound within the interior of the vehicle, or the like.
  • the biosensor is, for example, disposed in a seat surface, the steer ing wheel, or the like, and detects biological information of an occupant sitting in a seat or the driver holding the steering wheel (see Fig. 6).
  • the in-vehicle information detecting unit 7500 may calculate a degree of fatigue of the driver or a degree of concentration of the driver, or may determine whether the driver is dozing.
  • the in-vehicle information detecting unit 7500 may subject an audio signal obtained by the collection of the sound to processing such as noise canceling processing or the like (see Fig. 2).
  • the integrated control unit 7600 controls general operation within the vehicle control system 7000 in accordance with various kinds of programs.
  • the integrated control unit 7600 is connected with an input section 7800.
  • the input section 7800 is implemented by a device capable of input operation by an occupant, such, for example, as a touch panel, a button, a microphone, a switch, a lever, or the like (see Figs. 4 and 6).
  • the integrated control unit 7600 may be supplied with data obtained by voice recognition of voice input through the microphone.
  • the input section 7800 may, for example, be a remote control device using infrared rays or other radio waves, or an external connecting device such as a mobile telephone (see Figs.
  • the input section 7800 may be, for example, a camera. In that case, an occupant can input information by gesture. Alternatively, data may be in put which is obtained by detecting the movement of a wearable device that an occupant wears.
  • the input section 7800 may, for example, include an input control circuit or the like that generates an input signal on the basis of information input by an occupant or the like using the above-described input section 7800, and which outputs the generated input signal to the integrated control unit 7600.
  • An occupant or the like inputs various kinds of data or gives an instruction for processing operation to the vehicle control system 7000 by operating the input section 7800.
  • the storage section 7690 may include a read only memory (ROM) that stores various kinds of pro grams executed by the microcomputer and a random access memory (RAM) that stores various kinds of parameters, operation results, sensor values, or the like.
  • ROM read only memory
  • RAM random access memory
  • the storage section 7690 may be implemented by a magnetic storage device such as a hard disc drive (HDD) or the like, a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like.
  • the general-purpose communication I/F 7620 is a communication I/F used widely, which commu nication I/F mediates communication with various apparatuses present in an external environment 7750.
  • the general-purpose communication I/F 7620 may implement a cellular communication pro tocol such as global system for mobile communications (GSM (registered trademark)), worldwide interoperability for microwave access (WiMAX (registered trademark)), long term evolution (LTE (registered trademark)), LTE-advanced (LTE-A), or the like, or another wireless communication protocol such as wireless LAN (referred to also as wireless fidelity (Wi-Fi (registered trademark)), Bluetooth (registered trademark), or the like.
  • GSM global system for mobile communications
  • WiMAX worldwide interoperability for microwave access
  • LTE registered trademark
  • LTE-advanced LTE-advanced
  • WiFi wireless fidelity
  • Bluetooth registered trademark
  • the general-purpose communication I/F 7620 may, for example, connect to an apparatus (for example, an application server or a control server) present on an external network (for example, the Internet, a cloud network, or a company-specific network) via a base station or an access point.
  • the general-purpose communication I/F 7620 may connect to a terminal present in the vicinity of the vehicle (which terminal is, for example, a terminal of the driver, a pedestrian, or a store, or a machine type communication (MTC) terminal) using a peer to peer (P2P) technology, for example.
  • an apparatus for example, an application server or a control server
  • an external network for example, the Internet, a cloud network, or a company-specific network
  • MTC machine type communication
  • P2P peer to peer
  • the dedicated communication I/F 7630 is a communication I/F that supports a communication protocol developed for use in vehicles.
  • the dedicated communication I/F 7630 may implement a standard protocol such, for example, as wireless access in vehicle environment (WAVE), which is a combination of institute of electrical and electronic engineers (IEEE) 802.1 lp as a lower layer and IEEE 1609 as a higher layer, dedicated short range communications (DSRC), or a cellular communi cation protocol.
  • WAVE wireless access in vehicle environment
  • IEEE institute of electrical and electronic engineers
  • DSRC dedicated short range communications
  • the dedicated communication I/F 7630 typically carries out V2X communication as a concept including one or more of communication between a vehicle and a vehicle (V ehicle to Vehicle), communication between a road and a vehicle (Vehicle to Infrastructure), communication between a vehicle and a home (Vehicle to Home), and communication between a pedestrian and a vehicle (Vehicle to Pedestrian).
  • the positioning section 7640 performs positioning by receiving a global navigation satellite system (GNSS) signal from a GNSS satellite (for example, a GPS signal from a global posi tioning system (GPS) satellite), and generates positional information including the latitude, longi tude, and altitude of the vehicle.
  • GNSS global navigation satellite system
  • GPS global posi tioning system
  • the positioning section 7640 may identify a current position by exchanging signals with a wireless access point, or may obtain the positional information from a terminal such as a mobile telephone, a personal handyphone system (PHS), or a smart phone that has a positioning function.
  • the beacon receiving section 7650 receives a radio wave or an electromagnetic wave transmitted from a radio station installed on a road or the like, and thereby obtains information about the current position, congestion, a closed road, a necessary time, or the like.
  • the function of the beacon receiving section 7650 may be included in the dedicated communication I/F 7630 described above.
  • the in-vehicle device I/F 7660 is a communication interface that mediates connection between the microcomputer 7610 and various in-vehicle devices 7760 present within the vehicle.
  • the in-vehicle device I/F 7660 may establish wireless connection using a wireless communication protocol such as wireless LAN, Bluetooth (registered trademark), near field communication (NFC), or wireless uni versal serial bus (WUSB).
  • the in-vehicle device I/F 7660 may establish wired connec tion by universal serial bus (USB), high-definition multimedia interface (HDMI (registered trademark)), mobile high-definition link (MHL), or the like via a connection terminal (and a cable if necessary) not depicted in the figures.
  • the in-vehicle devices 7760 may, for example, include at least one of a mobile device and a wearable device possessed by an occupant and an information device carried into or attached to the vehicle.
  • the in-vehicle devices 7760 may also include a navigation device that searches for a path to an arbitrary destination.
  • the in-vehicle device I/F 7660 exchanges control signals or data signals with these in-vehicle devices 7760.
  • the vehicle-mounted network I/F 7680 is an interface that mediates communication between the microcomputer 7610 and the communication network 7010.
  • the vehicle-mounted network I/F 7680 transmits and receives signals or the like in conformity with a predetermined protocol sup ported by the communication network 7010.
  • the microcomputer 7610 of the integrated control unit 7600 controls the vehicle control system 7000 in accordance with various kinds of programs on the basis of information obtained via at least one of the general-purpose communication I/F 7620, the dedicated communication I/F 7630, the positioning section 7640, the beacon receiving section 7650, the in-vehicle device I/F 7660, and the vehicle-mounted network I/F 7680.
  • the microcomputer 7610 may implement the functionality de scribed in Fig. 1 and Fig. 2 and in particular the processes describes in Figs.3, 5, 7, 8, 9 and Fig. 10.
  • the microcomputer 7610 may calculate a control target value for the driving force gen erating device, the steering mechanism, or the braking device on the basis of the obtained infor mation about the inside and outside of the vehicle, and output a control command to the driving system control unit 7100.
  • the microcomputer 7610 may perform cooperative control intended to implement functions of an advanced driver assistance system (ADAS) which functions include collision avoidance or shock mitigation for the vehicle, following driving based on a follow ing distance, vehicle speed maintaining driving, a warning of collision of the vehicle, a warning of deviation of the vehicle from a lane, or the like.
  • ADAS advanced driver assistance system
  • microcomputer 7610 may perform cooperative control intended for automatic driving, which makes the vehicle to travel autonomously without depending on the operation of the driver, or the like, by controlling the driving force gener ating device, the steering mechanism, the braking device, or the like on the basis of the obtained in formation about the surroundings of the vehicle.
  • the microcomputer 7610 may generate three-dimensional distance information between the vehicle and an object such as a surrounding structure, a person, or the like, and generate local map infor mation including information about the surroundings of the current position of the vehicle, on the basis of information obtained via at least one of the general-purpose communication I/F 7620, the dedicated communication I/F 7630, the positioning section 7640, the beacon receiving section 7650, the in-vehicle device I/F 7660, and the vehicle-mounted network I/F 7680.
  • the micro computer 7610 may predict danger such as collision of the vehicle, approaching of a pedestrian or the like, an entry to a closed road, or the like on the basis of the obtained information, and generate a warning signal.
  • the warning signal may, for example, be a signal for producing a warning sound or lighting a warning lamp.
  • the sound/image output section 7670 transmits an output signal (e.g. remixed/upmixed signal) of at least one of a sound and an image to an output device capable of visually or auditorily notifying information to an occupant of the vehicle or the outside of the vehicle.
  • an audio speaker 7710, a display section 7720, and an instrument panel 7730 are illustrated as the out put device.
  • the display section 7720 may, for example, include at least one of an on-board display and a head-up display.
  • the display section 7720 may have an augmented reality (AR) display func tion.
  • AR augmented reality
  • the output device may be other than these devices, and may be another device such as head phones, a wearable device such as an eyeglass type display worn by an occupant or the like, a projector, a lamp, or the like.
  • the output device is a display device
  • the display device visually displays results obtained by various kinds of processing performed by the microcomputer 7610 or information received from another control unit in various forms such as text, an image, a table, a graph, or the like.
  • the output device is an audio output device (e.g. loudspeaker system, see Fig. 2).
  • each indi vidual control unit may include a plurality of control units.
  • the vehicle control system 7000 may include another control unit not depicted in the figures.
  • part or the whole of the functions performed by one of the control units in the above description may be assigned to an other control unit. That is, predetermined arithmetic processing may be performed by any of the control units as long as information is transmitted and received via the communication network 7010.
  • a sensor or a device connected to one of the control units may be connected to an other control unit, and a plurality of control units may mutually transmit and receive detection infor mation via the communication network 7010.
  • a computer program for realizing the functions of the electronic device according to the present embodiment described with reference to FIG. 2 can be implemented in one of the con trol units or the like.
  • a computer readable recording medium storing such a computer program can also be provided.
  • the recording medium is, for example, a magnetic disk, an optical disk, a magneto-optical disk, a flash memory, or the like.
  • the above-described computer program may be distributed via a network, for example, without the recording medium being used.
  • Fig. 12 shows an example of installation positions of the imaging section 7410 and the outside-vehi cle information detecting section 7420.
  • Imaging sections 7910, 7912, 7914, 7916, and 7918 are, for example, disposed at at least one of positions on a front nose, side-view mirrors, a rear bumper, and a back door of the vehicle 7900 and a position on an upper portion of a windshield within the inte rior of the vehicle.
  • the imaging section 7910 provided to the front nose and the imaging section 7918 provided to the upper portion of the windshield within the interior of the vehicle obtain mainly an image of the front of the vehicle 7900.
  • the imaging sections 7912 and 7914 provided to the sideview mirrors obtain mainly an image of the sides of the vehicle 7900.
  • the imaging section 7916 provided to the rear bumper or the back door obtains mainly an image of the rear of the vehi cle 7900.
  • the imaging section 7918 provided to the upper portion of the windshield within the inte rior of the vehicle is used mainly to detect a preceding vehicle, a pedestrian, an obstacle, a signal, a traffic sign, a lane, or the like.
  • Fig. 12 depicts an example of photographing ranges of the respective imaging sections 7910, 7912, 7914, and 7916.
  • An imaging range a represents the imaging range of the imaging section 7910 provided to the front nose.
  • Imaging ranges b and c respectively represent the imaging ranges of the imaging sections 7912 and 7914 provided to the side-view mirrors.
  • An imaging range d repre sents the imaging range of the imaging section 7916 provided to the rear bumper or the back door.
  • Outside-vehicle information detecting sections 7920, 7922, 7924, 7926, 7928, and 7930 provided to the front, rear, sides, and corners of the vehicle 7900 and the upper portion of the windshield within the interior of the vehicle may be, for example, an ultrasonic sensor or a radar device.
  • the outside- vehicle information detecting sections 7920, 7926, and 7930 provided to the front nose of the vehi cle 7900, the rear bumper, the back door of the vehicle 7900, and the upper portion of the wind shield within the interior of the vehicle may be a LIDAR device, for example.
  • These outside-vehicle information detecting sections 7920 to 7930 are used mainly to detect a preceding vehicle, a pedes trian, an obstacle, or the like.
  • Fig. 13 shows a block diagram depicting an example of schematic configuration of a voice controlled system, for example, a voice controlled system related to a vehicle or a household.
  • the electronic system 1300 comprises a CPU 1301 as processor.
  • the electronic device 1300 further comprises a microphone array 1310 and a loudspeaker array 1311 that are connected to the processor 1301.
  • Pro cessor 1301 may for example implement an audio source separation , an adaptive remixing/upmix- ing, situation analyzer, and/ or a dynamic equalizer that realize the processes described with regard to Fig. 2 in more detail.
  • the microphone array 1310 may be configured to receive speech (voice) com mands via automatic speech recognition (see Figs. 3a, 3b, 5, 7).
  • Loudspeaker array 1311 consists of one or more loudspeakers that are distributed over a predefined space and is configured to render 3D audio as described in the embodiments above.
  • the electronic device 1300 further comprises a user interface 1312 that is connected to the processor 1301.
  • This user interface 1312 acts as a man- machine interface and enables a dialogue between an administrator and the electronic system. For example, an administrator may make configurations to the system using this user interface 1312.
  • the electronic system 1300 further comprises an Ethernet interface 1321, a Bluetooth interface 1304, and a WLAN interface 1305. These units 1304, 1305 act as 1/ O interfaces for data communication with external devices.
  • additional loudspeakers, microphones, and video cameras with Ethernet, WLAN or Bluetooth connection may be coupled to the processor 1301 via these inter faces 1321, 1304, and 1305.
  • the electronic device 1300 further comprises a data storage 1302 and a data memory 1303 (here a RAM).
  • the data memory 1303 is arranged to temporarily store or cache data or computer instruc tions for processing by the processor 1301.
  • the data storage 1302 is arranged as a long term storage, e.g., for recording sensor data obtained from the microphone array 1310.
  • the data storage 1302 may also store audio data that represents audio messages, which the public announcement system may transport to people moving in the predefined space.
  • the voice controlled system of Fig. 13 may for example be used in a smart speaker, or the like.
  • the voice controlled system of Fig. 13 may be connected to a telephone system to receive incoming calls. That is, the voice controlled sys tem can implement the process of Figs. 3a, 3b, 5.
  • An electronic device comprising circuitry configured to:
  • remixing/ upmixing (305; 509; 705; 805) based on the separated sources and based on the parameters for adaptive remixing/ upmixing to obtain a remixed/ upmixed signal.
  • circuitry (7610, 7670) is configured to output (306; 706; 806; 906) the remixed or upmixed signal to a loudspeaker system (7710; SP1 to SP8).
  • circuitry (7610, 7670) is configured to determine to which of front loudspeakers and/ or rear loudspeakers the remixed or upmixed signal is to be out put (510) based on the output channel selection parameter.
  • circuitry (7610, 7670) is further configured to per form dynamic equalization (904) to the separated sources to obtain equalized separated sources, and to perform remixing or upmixing (905) based on the equalized separated sources.
  • a method comprising: performing audio source separation (301; 501; 701; 801) based on a received audio signal to obtain separated sources;
  • analyzing (302; 502; 702; 802) a current situation to determine one or more parameters for adaptive remixing or upmixing; and performing remixing or upmixing (305; 509; 705; 805) based on the separated sources and based on the parameters for adaptive remixing or upmixing to obtain a remixed or upmixed signal
  • a computer program comprising instructions, the instructions when executed on a processor causing the processor to: perform audio source separation (301; 501; 701; 801) based on a received audio signal to ob tain separated sources;

Abstract

An electronic device comprising circuitry (7610, 7670) configured to perform audio source separation (301; 501; 701; 801) based on a received audio signal to obtain separated sources, analyze (302; 502; 702; 802) a current situation to determine one or more parameters for adaptive remixing or upmixing, and perform remixing/upmixing (305; 509; 705; 805) based on the separated sources and based on the parameters for adaptive remixing/upmixing to obtain a remixed/upmixed signal.

Description

AUDIO PROCESSING DEVICE, AUDIO PROCESSING METHOD AND COMPUTER
PROGRAM THEREOF
TECHNICAL FIELD
The present disclosure generally pertains to the field of audio processing, in particular to devices, methods and computer programs for audio source separation and adaptive upmixing/ remixing.
TECHNICAL BACKGROUND
A vehicle audio system, such as AM/FM and Digital Radio, CD and Navigation is equipment in stalled in a car or other vehicle. There is a lot of audio content available to provide in-car entertain ment and information for the vehicle occupants by such a vehicle audio system.
In an automotive environment, different driving/ passengers situations may occur where the play back of audio content is disturbing, e.g. during a phone call or a conversation between passengers. However, there exist ways to reduce such a disturbance, for example by changing the volume of the audio content, and in particular by reducing the volume of music played back by the vehicle audio system.
SUMMARY
According to a first aspect, the disclosure provides an electronic device comprising circuitry config ured to perform audio source separation based on a received audio signal to obtain separated sources; analyze a current situation to determine one or more parameters for adaptive remixing or upmixing, and perform remixing or upmixing based on the separated sources and based on the pa rameters for adaptive remixing or upmixing to obtain a remixed or upmixed signal.
According to a further aspect, the disclosure provides a method comprising performing audio source separation based on a received audio signal to obtain separated sources; analyzing a current situation to determine one or more parameters for adaptive remixing or upmixing; and performing remixing or upmixing based on the separated sources and based on the parameters for adaptive re mixing or upmixing to obtain a remixed or upmixed signal.
According to a further aspect, the disclosure provides a computer program comprising instructions, the instructions when executed on a processor causing the processor to perform audio source sepa ration based on a received audio signal to obtain separated sources; analyze a current situation to de termine one or more parameters for adaptive remixing or upmixing, and perform remixing or upmixing based on the separated sources and based on the parameters for adaptive remixing or upmixing to obtain a remixed or upmixed signal. BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments are explained by way of example with respect to the accompanying drawings, in which:
Fig. 1 schematically shows a general approach of audio remixing/ upmixing by means of audio source separation;
Fig. 2 schematically shows a process of remixing/ upmixing based on audio source separation per formed within vehicle;
Fig. 3a shows a flow diagram visualizing a method for signal re-/ upmixing in order to decrease the volume vocals within a piece of music;
Fig. 3b shows a flow diagram visualizing a method for signal re-/ upmixing according to a further embodiment;
Fig. 4 schematically depicts an embodiment of a vehicle audio system, with an electronic system that improves current driving/ passenger situation;
Fig. 5 shows a flow diagram visualizing a method for signal remixing/ upmixing in which the re mixed/ upmixed signal is output only to front loudspeakers of a loudspeaker system;
Fig. 6 schematically depicts a further embodiment of a vehicle audio system, with an electronic sys tem that improves current driving/ passenger situation by generating passenger-individual sound sources based on the presence of a passenger on the rear seat;
Fig. 7 shows a flow diagram visualizing a method for signal remixing/ upmixing according to a fur ther embodiment;
Fig. 8 shows a flow diagram visualizing a method for signal remixing/ upmixing according to a fur ther embodiment;
Fig. 9 shows a flow diagram visualizing a method for signal remixing/ upmixing according to a fur ther embodiment;
Fig. 10 provides a schematic diagram of a system applying digitalized monopole synthesis algorithm;
Fig. 11 shows a block diagram depicting an example of schematic configuration of a vehicle control system;
Fig. 12 shows an example of installation positions of the imaging section and the outside- vehicle in formation detecting section; and
Fig. 13 shows a block diagram depicting an example of schematic configuration of a voice controlled system. DETAILED DESCRIPTION OF EMBODIMENTS
The embodiments disclose an electronic device, comprising a circuitry configured to perform audio source separation based on a received audio signal to obtain separated sources, configured to analyze a current situation to determine one or more parameters for adaptive remixing or upmixing, and configured to perform remixing or upmixing based on the separated sources and based on the pa rameters for adaptive remixing or upmixing to obtain a remixed or upmixed signal.
The electronic device may for example be an electronic control unit (ECU) within the vehicle. ECUs are typically used in vehicles e.g. as a Door Control Unit (DCU), an Engine Control Unit (ECU), an Electric Power Steering Control Unit (PSCU), a Human-Machine Interface (HM3), Powertrain Con trol Module (PCM), a Seat Control Unit, a Speed Control Unit (SCU), a Telematic Control Unit (TCU), a Trans-mission Control Unit (TCU), a Brake Control Module (BCM; ABS or ESC), a Bat tery Management System (BMS), and/or a 3D audio rendering system. The electronic device may be an ECU that is specifically used for the purpose of controlling a vehicle audio system. Alternatively, an ECU that performs any of the functions described above may be used simultaneously for the purpose of controlling a vehicle audio system. Moreover, the electronic device may for example be a smart speaker capable of voice interaction, music playback, making to-do lists, setting alarms, streaming podcasts, playing audiobooks, providing weather, traffic, sports, and other real-time infor mation, such as news or the like. The electronic device may also have the functions of a home auto mation system.
The circuitry of the electronic device may include a processor, may for example be CPU, a memory (RAM, ROM or the like), a memory and/ or storage, interfaces, etc. Circuitry may comprise or may be connected with input means (mouse, keyboard, camera, etc.), output means (display (e.g. liquid crystal, (organic) light emitting diode, etc.)), loudspeakers, etc., a (wireless) interface, etc., as it is gen erally known for electronic devices (computers, smartphones, etc.). Moreover, circuitry may com prise or may be connected with sensors for sensing still images or video image data (image sensor, camera sensor, video sensor, etc.), for sensing environmental parameters (e.g. radar, humidity, light, temperature), etc.
Audio source separation may be any process which decomposes a source audio signal comprising multiple channels and audio from multiple audio sources (e.g. instruments, voice, etc.) into“separa tions”. The separations produced by audio source separation from the source audio signal may for example comprise a vocals separation, and accompaniment separation, which may be for example a bass separation, a drums separations and another separation. In the vocals separation all sounds be longing to human voices might be included, for example human voice in a piece of music, human speech for example during the news broadcast/ advertisements of a radio station, in the bass separa tion all noises below a predefined threshold frequency might be included, in the drums separation all noises belonging to the drums in a song/ piece of music might be included and in the other separa tion all remaining sounds might be included.
The audio source separation process may be implemented as described in more detail in published papers, namely Uhlich, Stefan, Franck Giron, and Yuki Mitsufuji.“Deep neural network based in strument extraction from music” ICASSP. 2015, and Uhlich, Stefan, et al.“Improving music source separation based on deep neural networks through data augmentation and network blending” Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.
Analyzing the situation may comprise collecting and evaluating any information obtained e.g. from sensors (cameras, microphones, pressure sensors or the like) or entities inside e.g. a vehicle or a household (multimedia system, cd player, radio receiver, or the like) over a connection to a commu nication bus (e.g. CAN bus, LIN bus) within the vehicle, over interprocess communication, or via other communication means. Analyzing the situation may comprise any techniques of context awareness. For example, the electronic device may be connected to a telephoning system located in the car via the communication bus and configured to exchange information with the telephoning system over this communication bus. For example, the telephoning system may produce an event (in computing, an event is an action or occurrence recognized by software, often originating asynchro nously from the external environment, that may be handled by the software) that indicates that there is an incoming telephone call. The situation analyzer may for example comprise a program (“lis tener”) that receives such events (“listens”), and the situation analyzer can react by setting the pa rameters for remixing or upmixing accordingly. Also, if the telephoning system and the situation analyzer are implemented on the same computing device, the communication could be handled via interprocess communication.
The received signal may for example be an audio stream obtained, over a communication bus in the vehicle, from a multimedia system within the vehicle, from a digital radio receiver, from an MPEG player, a CD player, or the like. The received signal could also be a digitalized signal obtained from an analog radio receiver or the like.
The adaptive remixing or upmixing may be configured to perform remixing or upmixing of the sep arated sources, here vocals and accompaniment to produce an adapted remixed or upmixed signal. For example, the adaptive remixing or upmixing may automatically adjust the vocals volume, the vo cals location, the vocals spread or the like. The adaptive remixing or upmixing may send the adapted remixed or upmixed signal to the loudspeaker system of the vehicle, which may render the adapted remixed or upmixed signal so that the passengers inside the vehicle may listen to the adapted re mixed or upmixed signal.
In some embodiments, the current situation may be a current situation related to a vehicle and/ or the current situation may be a current situation related to a speech recognition system.
The vehicle may be a mobile machine that transports people or cargo. The vehicle may be a wagon, a car, a truck, a bus, a railed vehicle (train, tram), a watercraft (ship, boat), an aircraft or a spacecraft.
The speech recognition system may be speech recognition system such as a smart speaker, a voice controlled system, or the like.
In some embodiments, analyzing the current situation may comprise using an input from at least one sensor and/ or historic information. The current situation may be analyzed using input from at least one sensor and/ or may be analyzed using historic information, in some embodiments, with the aim of detecting certain and/ or predefined events.
In some embodiments, the parameters for adaptive remixing or upmixing may comprise a prede fined volume change parameter. The parameters for adaptive remixing or upmixing may comprise a predefined volume decrease parameter related to the vocals. 11. For example, the predefined volume decrease parameter may be a volume decrease parameter of -12db.
In some embodiments, the parameters for adaptive remixing or upmixing may be determined by an alyzing the current situation, which may comprise analyzing the current situation inside a vehicle, or analyzing a current driving/ passenger situation.
In some embodiments, the remixed or upmixed signal may be output to a loudspeaker system.
In some embodiments, analyzing the current situation may comprise determining if there is an in coming call. The situation analyzer may start the situation analysis process at the moment the phone start ringing, or at the moment that the incoming call is answered, or at the moment that the driver or the co-passenger starts talking, or the like.
In some embodiments, the parameters for adaptive remixing or upmixing may comprise an output channel selection parameter, for example, a first output channel selection parameter to“front” and a second output channel selection parameter to“rear”.
In some embodiments, the circuitry may determine to which of front loudspeakers and/ or rear loudspeakers the remixed or upmixed signal is to be output (510) based on the output channel selec tion parameter. The first output channel selection parameter to“front” may related to the front loudspeakers and the second output channel selection parameter to“rear” may related to the rear loudspeakers of the loudspeaker system. In some embodiments, analyzing the current situation may comprise determining if there is a pas sengers conversation.
In some embodiments, analyzing the current situation may further comprise determining the audio quality of the received audio signal.
In some embodiments, analyzing the current situation may comprise determining if a keyword for initiating speech commands detected. The keyword for initiating speech commands may also be a voice control triggered by pushing a predetermined key.
In some embodiments, the separated sources may comprise vocals and accompaniment and wherein the parameters for adaptive remixing or upmixing may comprise a predefined volume decrease pa rameter related to the vocals that depends on the detection of the keyword for initiating speech commands.
In some embodiments, the parameters for adaptive remixing or upmixing may further comprise a predefined volume increase parameter related to the vocals that depends on the audio quality of the received audio signal.
In some embodiments, the circuitry may further be configured to detect a surrounding noise floor within the obtained separated sources.
In some embodiments, the circuitry may further be configured to perform dynamic equalization to the separated sources to obtain equalized separated sources, and to perform remixing/ upmixing based on the equalized separated sources. For example, the equalized separated sources may have a changed spectrum of the separated sources, or the like.
The embodiments also disclose a method comprising performing audio source separation based on a received audio signal to obtain separated sources, analyzing a current situation to determine one or more parameters for adaptive remixing or upmixing, and performing remixing or upmixing based on the separated sources and based on the parameters for adaptive remixing or upmixing to obtain a remixed or upmixed signal.
The embodiments also disclose a computer program comprising instructions, the instructions when executed on a processor causing the processor to perform audio source separation based on a re ceived audio signal to obtain separated sources, analyze a current situation to determine one or more parameters for adaptive remixing or upmixing, and perform remixing or upmixing based on the sep arated sources and based on the parameters for adaptive remixing or upmixing to obtain a remixed or upmixed signal. Before a detailed description of the embodiments is given under reference of Figs. 1 to 13 some general explanations are made.
Audio remixing/ upmixing by means of audio source separation
Fig. 1 schematically shows a general approach of audio remixing/ upmixing by means of audio source separation.
First, source separation (also called“demixing”) is performed which decomposes a source audio sig nal 1 comprising multiple channels la, lb and audio from multiple audio sources Source 1, Source 2, ... Source K (e.g. instruments, voice, etc.) into“separations”, here into source estimates 2a-2d, wherein K is an integer number and denotes the number of audio sources. In the embodiment here, the source audio signal 1 is a stereo signal having two channels la, lb. In other examples, the source audio signal 1 could also be a monaural signal, or a 5.1 surround signal, or the like. The audio source separation 201 process may for example be implemented as described in more detail in published papers, namely Uhlich, Stefan, Franck Giron, and Yuki Mitsufuji.“Deep neural network based in strument extraction from music” ICASSP. 2015, and Uhlich, Stefan, et al.“Improving music source separation based on deep neural networks through data augmentation and network blending” Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.
As the separation of the audio source signal may be imperfect, for example, due to the mixing of the audio sources, a residual signal 3 (r(n)) is generated in addition to the separated audio source signals 2a, ..., 2d. The residual signal may for example represent a difference between the input audio con tent and the sum of all separated audio source signals. The audio signal emitted by each audio source is represented in the input audio content 1 by its respective recorded sound waves. For input audio content having more than one audio channel, such as stereo or surround sound input audio content, also a spatial information for the audio sources is typically included or represented by the input au dio content, e.g. by the proportion of the audio source signal included in the different audio chan nels. The separation of the input audio content 1 into separated audio source signals 2a to 2d and a residual 3 is performed on the basis of blind source separation or other techniques which are able to separate audio sources.
In a second step, the separations 2a to 2d and the possible residual 3 are remixed and rendered to a new loudspeaker signal 4, here a signal comprising five channels 4a to 4e, namely a 5.0 channel sys tem. On the basis of the separated audio source signals and the residual signal, an output audio con tent is generated by mixing the separated audio source signals and the residual signal on the basis of spatial information. The output audio content is exemplary illustrated and denoted with reference number 4 in Fig. 1. Remixing/ upmixing based on situation analysis
Fig. 2 schematically shows a process of remixing/ upmixing based on audio source separation per formed within a vehicle. The process comprises an audio source separation 201, a situation analyzer 202, an adaptive remixing/ upmixing 203 and a dynamic equalizer 204. An input signal containing multiple sources (see 1, 2, ... K in Fig. 1), e.g. a piece of music, is input to the source separation 201 and decomposed into separations (see separated sources 2a, ... 2d in Fig. 1) as it is described with regard to Fig. 1 above, here into vocals and accompaniment. The separated sources, here vocals and accompaniment, are transmitted to dynamic equalizer 204. The dynamic equalizer 204 performs dy namic equalization on the vocals and accompaniment based on the surrounding noise detected in the vocals and accompaniment to produce equalized vocals and equalized accompaniment. For ex ample, the dynamic equalizer 204 changes the spectrum of the separated sources (see Fig. 9 and cor responding description). The equalized vocals and equalized accompaniment are transmitted to the adaptive remixing/ upmixing 203.
The situation analyzer 202 analyzes the current situation for example, inside the vehicle, e.g., the cur rent driving/ passenger situation inside the vehicle, and determines, based on this analysis, one or more parameters for adaptive remixing/ upmixing such as a volume change parameter (e.g. volume decrease parameter in Fig. 3a, 3b), a location change parameter (see Fig. 5), and/or spread of sources parameter. The current situation, when the above discussed system is located inside a vehi cle, may for example be an incoming call (see 300 in Fig. 3a), a passenger conversation (see 700 in Fig. 7), and/ or an audio quality detection (see 800 in Fig. 8), or the like. Moreover, the current situa tion may be analyzed using input from at least one sensor and/ or may be analyzed using historic in formation, in some embodiments, with the aim of detecting certain and/ or predefined events, or the like.
Based on the one or more parameters for adaptive remixing/ upmixing obtained from the situation analyzer 202, the adaptive remixing/ upmixing 203 performs remixing/ upmixing of the equalized vo cals and equalized accompaniment obtained from the dynamic equalizer 204, to produce a adapted remixed/upmixed signal (see channels 4a to 4e in Fig. 1). For example, the adaptive remix- ing/ upmixing 203 automatically adjusts the vocals volume, the vocals location, the vocals spread or the like. The adaptive remixing/ upmixing 203 sends the adapted remixed/upmixed signal to the loudspeaker system of the vehicle, which renders the adapted remixed/ upmixed signal so that the passengers inside the vehicle can listen to the adapted remixed/upmixed signal. For simplicity rea sons the loudspeaker system presented here has two loudspeakers outputting remixed/upmixed sig nal, e.g. remixed/ upmixed signal, here a signal comprising two channels, a left channel and a right channel. The loudspeaker signal of the left channel transmitted from the left loudspeakers of the loudspeaker system of the vehicle and the loudspeaker signal of the right channel transmitted from the right loudspeakers of the loudspeaker system of the vehicle (see Figs. 4 and 6). The loudspeaker signal may have more than two channels, for example, five channels (see Fig. 1).
The dynamic equalizer 204 performing dynamic equalization on the vocals and accompaniment may be an optional supplementary process. Without the presence of the dynamic equalizer 204, the adap tive remixing/ upmixing 203 receives as an input signal, the separated sources, here vocals and ac companiment, output from the audio source separation 201.
Fig. 3a shows a flow diagram visualizing a method for signal re-/ upmixing in order to decrease the volume vocals within an audio signal, e.g. a piece of music (see Fig. 4).
At 300, the audio source separation 201 (see Fig. 2) receive an audio signal. At 301, audio source separation is performed, based on the received audio signal, to obtain separated sources, namely vo cals and accompaniments (see Fig. 2). At 302, the situation analyzer 202 (see Fig. 2) recognizes the current driving / passenger situation to detect if there is an incoming call. If an incoming call is not detected at 302, the process continues at 304. At 304, the situation analyzer 202 sets a vocals volume decrease parameter to 0 dB, and then the process continues to 305. If an incoming call is detected the method proceeds at 303. At 303, the situation analyzer 202 sets a vocals volume decrease param eter to -12 dB, for example, and then the process continues to 305. At 305, remixing/ upmixing is performed based on the separated sources being obtained at 301, and based on the vocals volume decrease parameter, being set at 303 or at 304, to obtain remixed/ upmixed signal. The vocals vol ume decrease parameter may for example be a predefined volume change parameter related to the vocals, for example a decrease of 12dB or the like. For example, the adaptive remixing/upmixing 203 may automatically adjust the vocals volume, by decreasing the vocals volume, while keeping the volume of the accompaniment as before. Thus, the passengers of the vehicle may listen to the ac companiment while at the same time clearly hearing the caller of the incoming call. At 306, the re mixed/ upmixed signal is output to the loudspeaker system of the vehicle. The remixed/ upmixed signal having a decrease of OdB may also be called originally received audio signal.
In the embodiment of Fig. 3a above, if there is no incoming call, the audio remixing/ upmixing is made with a vocals volume decrease parameter of OdB, that is both, the volume of the vocals sepa rated source and the volume of the accompaniment separated source are not changed. In alternative embodiment, the remixing/ upmixing (203 in Fig. 2) may obtain the originally received audio signal and output this to the loudspeaker system in order to avoid artefacts arising from source separation. This is may apply in any situation where the one or more parameters for remixing/ upmixing indicate that there is no change with respect to the originally received audio signal. Fig. 3b shows a flow diagram visualizing a method for signal remixing/ upmbdng according to a fur ther embodiment. This embodiment is an alternative embodiment of this of Fig. 3a, wherein the current situation is a current situation related to a speech recognition system, such as a smart speaker or the like. The current situation may be a living room situation, vehicle related situation, or the like.
At 310, the audio source separation 201 (see Fig. 2) receive an audio signal. At 311, audio source separation is performed, based on the received audio signal, to obtain separated sources, namely vo cals and accompaniments (see Fig. 2). At 312, the situation analyzer 202 (see Fig. 2) recognizes the current situation to detect if a keyword for initiating speech commands is detected, for example the keyword received through microphones, e.g. the microphone array 1310 in Fig. 13. If a keyword for initiating speech commands is not detected at 312, the process continues at 314. At 314, the situa tion analyzer 202 sets a vocals volume decrease parameter to 0 dB, and then the process continues to 315. If a keyword for initiating speech commands is detected the method proceeds at 313. At 313, the situation analyzer 202 sets a vocals volume decrease parameter to -12 dB, for example, and then the process continues to 315. At 315, remixing/upmixing is performed based on the separated sources being obtained at 311, and based on the vocals volume decrease parameter, being set at 313 or at 314, to obtain remixed/upmixed signal. The vocals volume decrease parameter may for exam ple be a predefined volume change parameter related to the vocals, for example a decrease of 12dB or the like. For example, the adaptive remixing/ upmixing 203 may automatically adjust the vocals volume, by decreasing the vocals volume, while keeping the volume of the accompaniment as be fore. Thus, the listeners may listen to the accompaniment while at the same time interacting without any disturbance (clearly hearing and talking to the speech recognition system) with the voice con trolled system. At 316, the remixed/upmixed signal is output to the loudspeaker system, for example the loudspeaker array 1311 in Fig. 13.
In the embodiment of Fig. 3b above, the keyword for initiating speech commands may also be a voice control triggered by pushing a key. Furthermore, if there is no keyword for initiating speech commands is detected, the audio remixing/ upmixing is made with a vocals volume decrease parame ter of OdB, that is both, the volume of the vocals separated source and the volume of the accompa niment separated source are not changed.
Fig. 4 schematically depicts an embodiment of a vehicle audio system, with an electronic system that improves current driving/ passenger situation. A vehicle 401 is equipped with a vehicle audio system and four seats SI to S4. The front seats SI and S2 are provided for a driver PI and a co-driver P2 and the rear seats S3 and S4 are provided for passengers P3 and P4, respectively.
The vehicle audio system comprises two microphones Ml and M2 that constitute a microphone ar ray, and loudspeakers SP1 to SP5 that constitute a loudspeaker array, e.g. the loudspeaker system of the vehicle. The microphones Ml and M2 may obtain an audio signal from the driver and/ or the co passenger during, for example the incoming call situation. The loudspeaker system outputs the loud speaker signal. In particular, the loudspeaker signal of the left channel transmitted from the left loudspeakers, e.g. loudspeakers SP1, SP3, of the loudspeaker system of the vehicle and the loud speaker signal of the right channel transmitted from the right loudspeakers e.g. loudspeakers SP2, SP4 of the loudspeaker system of the vehicle (see Fig. 2).
The vehicle audio system comprises a processor for implementing the functionality described in Fig. 2, in particular a situation analyzer 202, and an adaptive remixing/ upmixing 203, which act on a received audio signal, e.g. an input piece of music. The adaptive remixing/ upmixing 203 (see Fig. 2) automatically decreases the vocals volume within the piece of music, based on the received audio signal (see 301 in Fig. 3a). The situation analyzer 202 (see Fig. 2) analyzes the current situation inside the vehicle, e.g. detects an incoming call (see 302 in Fig. 3a). The adaptive remixing/ upmixing 203 performs remixing/ upmixing based on the separated sources, here vocals and accompaniment, and based on the vocals volume decrease parameter, and outputs the remixed/upmixed signal (see Fig. 3a). Hence, the passengers PI to P4 of the vehicle may hear clearly the voice of the caller trans mitted from the loudspeaker system of the vehicle without being disturbed by the vocals within the piece of music, while the volume of the accompaniment is kept as it was before the incoming call (see Fig. 3a).
Fig. 5 shows a flow diagram visualizing a method for signal remixing/ upmixing in which the re mixed/upmixed signal is output only to front loudspeakers of a loudspeaker system (see Fig. 6).
At 500, the audio source separation 201 (see Fig. 2) receives an audio signal. At 501, audio source separation is performed, based on the received audio signal, to obtain separated sources, namely vo cals and accompaniments (see Fig. 2). At 502, the situation analyzer 202 (see Fig. 2) analyzes the cur rent driving / passenger situation to detect if there is an incoming call. If an incoming call is not detected at 502, the process continues at 507. At 507, the situation analyzer 202 sets a vocals volume decrease parameter to OdB. At 508, output channel selection parameter is set to“front” and to “rear”, so that the remixed/upmixed signal, having a vocals volume decrease parameter to OdB, is output at the loudspeaker system of the vehicle (see Fig. 6), and the process continues to 509. If an incoming call is detected the method proceeds simultaneously, at 503 and at 504. At 503, the situa tion analyzer 202 sets a first vocals volume decrease parameter to -12dB and the process continues at 505. Simultaneously with 503, at 504, the situation analyzer 202 sets a second vocals volume de crease parameter to OdB, and the process continues at 506. At 505, a first output channel selection parameter is set to“front”, and the process continues at 509. Simultaneously with 505, at 506, a sec ond output channel selection parameter is set to“rear”, and the process continues at 509. At 509, remixing/ upmixing is performed based on the separated sources being obtained at 501 and based on the parameters for adaptive remixing/ upmixing obtained at 503, 504, 505, and 506 (the vocals vol ume decrease parameters and the output channel selection parameters) to obtain a re- mixed/upmixed signal. That is, the adaptive remixing/ upmixing 203 automatically adjusts the vocals volume for the front loudspeakers, by decreasing the vocals volume (e.g. for -12dB), while keeping the volume of the accompaniment as before, and, for the rear loudspeakers keeps the volume of the vocals (0 dB) and the volume of the accompaniment as before. At 510, the remixed/upmixed signal is output to the loudspeaker system, in particular to the front loudspeakers and/ or the rear loud speakers of the loudspeaker system of the vehicle. That is, the part of the remixed/upmixed signal having a decrease of -12dB in the vocals is output to the front loudspeakers (SP1, SP2 in Fig. 4) of the loudspeaker system of the vehicle and the part of the remixed/ upmixed signal having no de crease (decrease of OdB) in the vocals is output to the rear loudspeakers (SP3, SP4 ,SP5 in Fig. 4) of the loudspeaker system. Thus, the passengers PI, P2 in the front seats, SI, S2, may listen to the ac companiment, and at the same time clearly hear the caller of the incoming call without being dis turbed by the vocals of the piece of music, while the passengers P3, P4 in the rear seats, S3, S4, may continue listening the original signal of the audio signal (see Fig. 4) without the vocals being reduced in volume.
Passenger-individual sound sources
Fig. 6 schematically depicts a further embodiment of a vehicle audio system, with an electronic sys tem that improves current driving/ passenger situation by generating passenger-individual sound sources based on the presence of a passenger on the rear seat. A vehicle 601 is equipped with a vehi cle audio system and four seats SI to S4. The front seats SI and S2 are provided for a driver PI or a co-driver P2 and the rear seats S3 and S4 are provided for passengers at the rear seats S3, S4.
The vehicle audio system comprises two microphones Ml and M2 that constitute a microphone ar ray, and loudspeakers SP1 to SP8 that constitute a loudspeaker array, e.g. the loudspeaker system of the vehicle. The vehicle audio system comprises a sensor array that comprises two sensors SE1 and SE2 that are each arranged at the respective seats S3 and S4 of the vehicle 601. The sensors SE1 and SE2 may be any kind of sensors such as a pressure sensor capable of obtaining a respective presence of passengers at the rear seats S3, S4. The sensors SE1 and SE2 sense the presence of at least one passenger at the rear seats S3, S4 of the vehicle. The loudspeaker system of the vehicle includes front loudspeakers SP1, SP2 and rear loudspeakers SP3 to SP8. In the case that an incoming call is detected (see 502 in Fig. 5), the driver at the front seat SI may listen the remixed/upmixed signal be ing output to the front loudspeakers SP1, SP2, of the loudspeaker system of the vehicle (see 510 in Fig. 5). In particular, the driver PI may listen to the accompaniment, and at the same time clearly hearing the caller of the incoming call without being disturbed by the vocals of the piece of music, while the passenger P3 in the rear seat, S3, may continue listening the originally received audio signal (see 510 in Fig. 5). The remixed/upmixed signal is output to the front loudspeakers SP1, SP2, of the loudspeaker system of the vehicle and the vehicle audio system generates an individual virtual sound source 601 on the left side of the driver PI, and an individual virtual sound source 602 on the right side of the driver PI, in a predefined location, for example, close to the ears of the driver. Moreover, based on the detected presence of a passenger, for example passenger P3 at the rear seat S3 of the vehicle, the vehicle audio system generates an individual virtual sound source 603 on the left side of the passenger P3, and an individual virtual sound source 604 on the right side of the passenger P3, in a predefined location, for example, close to the ears of each passenger. The virtual sound source 601, 603 represents the loudspeaker signal of the left channel). The virtual sound source 602, 604 represents the loudspeaker signal of the right. For example, as sensor SE1 detects that passenger P3 is sitting on seat S3, a virtual sound source 602 is disposed near the ears of passenger P3, e.g. for ex periencing a stereo sound signal. Otherwise, the remixed/upmixed signal may be output to the loud speaker system of the vehicle (see 510 in Fig. 5).
Fig. 7 shows a flow diagram visualizing a method for signal remixing/ upmixing according to a fur ther embodiment.
At 700, the audio source separation 201 (see Fig. 2) receives an audio signal. At 701, audio source separation is performed, based on the received audio signal, to obtain separated sources, namely vo cals and accompaniments (see Fig. 2). At 702, the situation analyzer 202 (see Fig. 2) recognizes the current driving / passenger situation to detect if there is a passenger conversation. If a passenger conversation is not detected at 702, the process continues at 704. At 704, the situation analyzer 202 sets a vocals volume decrease parameter to 0 dB, and then the process continues to 705. If a passen ger conversation is detected the method proceeds at 703. At 703, the situation analyzer 202 sets a vocals volume decrease parameter to -12 dB, for example, and then the process continues to 705. At 705, remixing/ upmixing is performed based on the separated sources being obtained at 701, and based on the vocals volume decrease parameter, being set at 703 or at 704, to obtain re
mixed/ upmixed signal. The vocals volume decrease parameter is a predefined volume change pa rameter related to the vocals, for example a decrease of 12dB, OdB or the like. For example, the adaptive remixing/ upmixing 203 may automatically adjust the vocals volume, by decreasing the vo cals volume (e.g. for -12dB), while keeping the volume of the accompaniment as before. In another example (e.g. for OdB), the adaptive remixing/ upmixing 203 may keep the volume of the vocals and the volume of the accompaniment as before. At 706, the remixed/upmixed is output to the loud speaker system of the vehicle. The remixed/upmixed signal having a decrease of OdB may also be called originally received audio signal. Fig. 8 shows a flow diagram visualizing a method for signal remixing/ upmbdng according to a fur ther embodiment.
At 800, the audio source separation 201 (see Fig. 2) receives an audio signal. At 801, audio source separation is performed, based on the received audio signal, to obtain separated sources, namely vo cals and accompaniments (see Fig. 2). At 802, the situation analyzer 202 (see Fig. 2) recognizes the current driving / passenger situation to detect if there is a low quality audio. If a low quality audio is not detected at 802, the process continues at 804. At 804, the situation analyzer 202 sets a vocals volume increase parameter to 0 dB, and then the process continues to 805. If a low quality audio is detected the method proceeds at 803. At 803, the situation analyzer 202 sets a vocals volume in crease parameter to +3dB, for example, and then the process continues to 805. At 805, remix- ing/ upmbdng is performed based on the separated sources being obtained at 801, and based on the vocals volume increase parameter, being set at 803 or at 804, to obtain remixed/upmixed signal. The vocals volume increase parameter is a predefined volume change parameter related to the vocals, for example an increase of 3dB, OdB or the like. For example, the adaptive remixing/ upmixing 203 may automatically adjust the vocals volume, by increasing the vocals volume (e.g. for +3dB), while keep ing the volume of the accompaniment as before. In another example (e.g. for OdB), the adaptive re- mixing/ upmixing 203 may keep the volume of the vocals and the volume of the accompaniment as before. At 806, the remixed/ upmixed is output to the loudspeaker system of the vehicle. The re mixed/ upmixed signal having an increase of OdB may also be called originally received audio signal.
Fig. 9 shows a flow diagram visualizing a method for signal remixing/ upmixing according to a fur ther embodiment.
At 901, the electronic system receives an audio signal. At 902, audio source separation is performed, based on the received audio signal, to obtain the separated sources, here vocals and accompaniment. At 903, a surrounding noise floor is detected within the obtained separated sources. At 904, dynamic equalization is performed to the separated sources, obtained at 903, based on the surrounding noise floor to obtain a changed spectrum of the separated sources. The dynamic equalizer 204 (see Fig. 2) performs equalization to the separated sources, here vocals and accompaniment, which may include surrounding noise floor by changing the spectrum of the separated sources, namely the spectrum of the vocals and/ or the spectrum of the accompaniment. The dynamic equalizer 204 may be a special adaptive equalizer that keeps the clarity of the separated sources, for example the clarity of the voice, under the masking effect of the surrounding noise floor. At 905, remixing/ upmixing is performed based on the equalized separated sources obtained at 904, to obtain remixed/ upmixed signal. Hence, in the obtained remixed/ upmixed signal, that is output at 906, the equalized separated sources, for example the vocals, keep their clarity. At 906, the remixed/upmixed signal is output to the loud speaker system of the vehicle.
The above method may be an optional supplementary method to the above described methods of Figs. 3a, 3b, 5, 7, 8.
System for digitalized monopole synthesis
Fig. 10 provides a schematic diagram of a system applying digitalized monopole synthesis algorithm. The theoretical background of this system is described in more detail in patent application US 2016/0037282 A1 which is herewith incorporated by reference.
The technique which is implemented in the embodiments of US 2016/0037282 A1 is conceptually similar to the Wavefield synthesis, which uses a restricted number of acoustic enclosures to generate a defined sound field. The fundamental basis of the generation principle of the embodiments is, however, specific, since the synthesis does not try to model the sound field exactly but is based on a least square approach.
A target sound field is modelled as at least one target monopole placed at a defined target position. In one embodiment, the target sound field is modelled as one single target monopole. In other em bodiments, the target sound field is modelled as multiple target monopoles placed at respective de fined target positions. The position of a target monopole may be moving. For example, a target monopole may adapt to the movement of a noise source to be attenuated. If multiple target mono poles are used to represent a target sound field, then the methods of synthesizing the sound of a tar get monopole based on a set of defined synthesis monopoles, as described below, may be applied for each target monopole independently, and the contributions of the synthesis monopoles obtained for each target monopole may be summed to reconstruct the target sound field.
A source signal x(n) is fed to delay units labelled by z-nP and to amplification units ap, where p = 1, . .. , N is the index of the respective synthesis monopole used for synthesizing the target mono pole signal. The delay and amplification units according to this embodiment may apply equation (117) of US 2016/0037282 A1 to compute the resulting signals yp(n) = sp(n), which are used to synthesize the target monopole signal. The resulting signals Sp (n) are power amplified and fed to loudspeaker Sp.
In this embodiment, the synthesis is thus performed in the form of delayed and amplified compo nents of the source signal X. According to this embodiment, the delay np for a synthesis monopole indexed p is corresponding to the propagation time of sound for the Euclidean distance G = Rp0 = |rp— rQ | between the target monopole r0 and the generator rp.
pc
Further, according to this embodiment, the amplification factor ap = -— is inversely proportional
R, po
to the distance G = R, po ·
In alternative embodiments of the system, the modified amplification factor according to equation (118) of US 2016/0037282 A1 can be used.
In yet further alternative embodiments of the system, a mapping factor as described with regard to Fig. 9 of US 2016/0037282 A1 can be used to modify the amplification.
Automotive implementation
The technology according to an embodiment of the present disclosure is applicable to various prod ucts. For example, the technology according to an embodiment of the present disclosure may be im plemented as a device included in a mobile body that is any of kinds of automobiles, electric vehicles, hybrid electric vehicles, motorcycles, bicycles, personal mobility vehicles, airplanes, drones, ships, robots, construction machinery, agricultural machinery (tractors), and the like.
Fig. 11 shows a block diagram depicting an example of schematic configuration of a vehicle control system 7000 as an example of a mobile body control system to which the technology according to an embodiment of the present disclosure can be applied. The vehicle control system 7000 includes a plurality of electronic control units connected to each other via a communication network 7010. In the example depicted in Fig. 11, the vehicle control system 7000 includes a driving system control unit 7100, a body system control unit 7200, a battery control unit 7300, an outside-vehicle infor mation detecting unit 7400, an in-vehicle information detecting unit 7500, and an integrated control unit 7600. The communication network 7010 connecting the plurality of control units to each other may, for example, be a vehicle-mounted communication network compliant with an arbitrary stand ard such as controller area network (CAN), local interconnect network (LIN), local area network (LAN), FlexRay (registered trademark), or the like.
Each of the control units includes: a microcomputer that performs arithmetic processing according to various kinds of programs; a storage section that stores the programs executed by the microcom puter, parameters used for various kinds of operations, or the like; and a driving circuit that drives various kinds of control target devices. Each of the control units further includes: a network inter face (I/F) for performing communication with other control units via the communication network 7010; and a communication I/F for performing communication with a device, a sensor, or the like within and without the vehicle by wire communication or radio communication. A functional con figuration of the integrated control unit 7600 illustrated in Fig. 11 includes a microcomputer 7610, a general-purpose communication I/F 7620, a dedicated communication I/F 7630, a positioning sec tion 7640, a beacon receiving section 7650, an in-vehicle device I/F 7660, a sound/image output section 7670, a vehicle-mounted network I/F 7680, and a storage section 7690. The other control units similarly include a microcomputer, a communication I/F, a storage section, and the like.
The driving system control unit 7100 controls the operation of devices related to the driving system of the vehicle in accordance with various kinds of programs. The driving system control unit 7100 may have a function as a control device of an antilock brake system (ABS), electronic stability con trol (ESC), or the like.
The driving system control unit 7100 is connected with a vehicle state detecting section 7110. The driving system control unit 7100 performs arithmetic processing using a signal input from the vehi cle state detecting section 7110, and controls the internal combustion engine, the driving motor, an electric power steering device, the brake device, and the like.
The body system control unit 7200 controls the operation of various kinds of devices provided to the vehicle body in accordance with various kinds of programs. For example, the body system con trol unit 7200 functions as a control device for a keyless entry system, a smart key system, a power window device, or various kinds of lamps such as a headlamp, a backup lamp, a brake lamp, a turn signal, a fog lamp, or the like.
The battery control unit 7300 controls a secondary battery 7310, which is a power supply source for the driving motor, in accordance with various kinds of programs.
The outside-vehicle information detecting unit 7400 detects information about the outside of the vehicle including the vehicle control system 7000. For example, the outside-vehicle information de tecting unit 7400 is connected with at least one of an imaging section 7410 and an outside-vehicle information detecting section 7420. The imaging section 7410 includes at least one of a time-of- flight (ToF) camera, a stereo camera, a monocular camera, an infrared camera, and other cameras. The outside-vehicle information detecting section 7420, for example, includes at least one of an en vironmental sensor for detecting current atmospheric conditions or weather conditions and a pe ripheral information detecting sensor for detecting another vehicle, an obstacle, a pedestrian, or the like on the periphery of the vehicle including the vehicle control system 7000.
The in-vehicle information detecting unit 7500 detects information about the inside of the vehicle. The in-vehicle information detecting unit 7500 may collect any information related to a situation re lated to the vehicle. The in-vehicle information detecting unit 7500 is, for example, connected with a driver and/or passengers state detecting section 7510 that detects the state of a driver and/or pas sengers. The driver state detecting section 7510 may include a camera that images the driver, a bio sensor that detects biological information of the driver, a microphone that collects sound within the interior of the vehicle, or the like. The biosensor is, for example, disposed in a seat surface, the steer ing wheel, or the like, and detects biological information of an occupant sitting in a seat or the driver holding the steering wheel (see Fig. 6). On the basis of detection information input from the driver state detecting section 7510, the in-vehicle information detecting unit 7500 may calculate a degree of fatigue of the driver or a degree of concentration of the driver, or may determine whether the driver is dozing. The in-vehicle information detecting unit 7500 may subject an audio signal obtained by the collection of the sound to processing such as noise canceling processing or the like (see Fig. 2).
The integrated control unit 7600 controls general operation within the vehicle control system 7000 in accordance with various kinds of programs. The integrated control unit 7600 is connected with an input section 7800. The input section 7800 is implemented by a device capable of input operation by an occupant, such, for example, as a touch panel, a button, a microphone, a switch, a lever, or the like (see Figs. 4 and 6). The integrated control unit 7600 may be supplied with data obtained by voice recognition of voice input through the microphone. The input section 7800 may, for example, be a remote control device using infrared rays or other radio waves, or an external connecting device such as a mobile telephone (see Figs. 3 and 5), a personal digital assistant (PDA), or the like that supports operation of the vehicle control system 7000. The input section 7800 may be, for example, a camera. In that case, an occupant can input information by gesture. Alternatively, data may be in put which is obtained by detecting the movement of a wearable device that an occupant wears. Fur ther, the input section 7800 may, for example, include an input control circuit or the like that generates an input signal on the basis of information input by an occupant or the like using the above-described input section 7800, and which outputs the generated input signal to the integrated control unit 7600. An occupant or the like inputs various kinds of data or gives an instruction for processing operation to the vehicle control system 7000 by operating the input section 7800.
The storage section 7690 may include a read only memory (ROM) that stores various kinds of pro grams executed by the microcomputer and a random access memory (RAM) that stores various kinds of parameters, operation results, sensor values, or the like. In addition, the storage section 7690 may be implemented by a magnetic storage device such as a hard disc drive (HDD) or the like, a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like.
The general-purpose communication I/F 7620 is a communication I/F used widely, which commu nication I/F mediates communication with various apparatuses present in an external environment 7750. The general-purpose communication I/F 7620 may implement a cellular communication pro tocol such as global system for mobile communications (GSM (registered trademark)), worldwide interoperability for microwave access (WiMAX (registered trademark)), long term evolution (LTE (registered trademark)), LTE-advanced (LTE-A), or the like, or another wireless communication protocol such as wireless LAN (referred to also as wireless fidelity (Wi-Fi (registered trademark)), Bluetooth (registered trademark), or the like. The general-purpose communication I/F 7620 may, for example, connect to an apparatus (for example, an application server or a control server) present on an external network (for example, the Internet, a cloud network, or a company-specific network) via a base station or an access point. In addition, the general-purpose communication I/F 7620 may connect to a terminal present in the vicinity of the vehicle (which terminal is, for example, a terminal of the driver, a pedestrian, or a store, or a machine type communication (MTC) terminal) using a peer to peer (P2P) technology, for example.
The dedicated communication I/F 7630 is a communication I/F that supports a communication protocol developed for use in vehicles. The dedicated communication I/F 7630 may implement a standard protocol such, for example, as wireless access in vehicle environment (WAVE), which is a combination of institute of electrical and electronic engineers (IEEE) 802.1 lp as a lower layer and IEEE 1609 as a higher layer, dedicated short range communications (DSRC), or a cellular communi cation protocol. The dedicated communication I/F 7630 typically carries out V2X communication as a concept including one or more of communication between a vehicle and a vehicle (V ehicle to Vehicle), communication between a road and a vehicle (Vehicle to Infrastructure), communication between a vehicle and a home (Vehicle to Home), and communication between a pedestrian and a vehicle (Vehicle to Pedestrian).
The positioning section 7640, for example, performs positioning by receiving a global navigation satellite system (GNSS) signal from a GNSS satellite (for example, a GPS signal from a global posi tioning system (GPS) satellite), and generates positional information including the latitude, longi tude, and altitude of the vehicle. Incidentally, the positioning section 7640 may identify a current position by exchanging signals with a wireless access point, or may obtain the positional information from a terminal such as a mobile telephone, a personal handyphone system (PHS), or a smart phone that has a positioning function.
The beacon receiving section 7650, for example, receives a radio wave or an electromagnetic wave transmitted from a radio station installed on a road or the like, and thereby obtains information about the current position, congestion, a closed road, a necessary time, or the like. Incidentally, the function of the beacon receiving section 7650 may be included in the dedicated communication I/F 7630 described above. The in-vehicle device I/F 7660 is a communication interface that mediates connection between the microcomputer 7610 and various in-vehicle devices 7760 present within the vehicle. The in-vehicle device I/F 7660 may establish wireless connection using a wireless communication protocol such as wireless LAN, Bluetooth (registered trademark), near field communication (NFC), or wireless uni versal serial bus (WUSB). In addition, the in-vehicle device I/F 7660 may establish wired connec tion by universal serial bus (USB), high-definition multimedia interface (HDMI (registered trademark)), mobile high-definition link (MHL), or the like via a connection terminal (and a cable if necessary) not depicted in the figures. The in-vehicle devices 7760 may, for example, include at least one of a mobile device and a wearable device possessed by an occupant and an information device carried into or attached to the vehicle. The in-vehicle devices 7760 may also include a navigation device that searches for a path to an arbitrary destination. The in-vehicle device I/F 7660 exchanges control signals or data signals with these in-vehicle devices 7760.
The vehicle-mounted network I/F 7680 is an interface that mediates communication between the microcomputer 7610 and the communication network 7010. The vehicle-mounted network I/F 7680 transmits and receives signals or the like in conformity with a predetermined protocol sup ported by the communication network 7010.
The microcomputer 7610 of the integrated control unit 7600 controls the vehicle control system 7000 in accordance with various kinds of programs on the basis of information obtained via at least one of the general-purpose communication I/F 7620, the dedicated communication I/F 7630, the positioning section 7640, the beacon receiving section 7650, the in-vehicle device I/F 7660, and the vehicle-mounted network I/F 7680. The microcomputer 7610 may implement the functionality de scribed in Fig. 1 and Fig. 2 and in particular the processes describes in Figs.3, 5, 7, 8, 9 and Fig. 10. For example, the microcomputer 7610 may calculate a control target value for the driving force gen erating device, the steering mechanism, or the braking device on the basis of the obtained infor mation about the inside and outside of the vehicle, and output a control command to the driving system control unit 7100. For example, the microcomputer 7610 may perform cooperative control intended to implement functions of an advanced driver assistance system (ADAS) which functions include collision avoidance or shock mitigation for the vehicle, following driving based on a follow ing distance, vehicle speed maintaining driving, a warning of collision of the vehicle, a warning of deviation of the vehicle from a lane, or the like. In addition, the microcomputer 7610 may perform cooperative control intended for automatic driving, which makes the vehicle to travel autonomously without depending on the operation of the driver, or the like, by controlling the driving force gener ating device, the steering mechanism, the braking device, or the like on the basis of the obtained in formation about the surroundings of the vehicle. The microcomputer 7610 may generate three-dimensional distance information between the vehicle and an object such as a surrounding structure, a person, or the like, and generate local map infor mation including information about the surroundings of the current position of the vehicle, on the basis of information obtained via at least one of the general-purpose communication I/F 7620, the dedicated communication I/F 7630, the positioning section 7640, the beacon receiving section 7650, the in-vehicle device I/F 7660, and the vehicle-mounted network I/F 7680. In addition, the micro computer 7610 may predict danger such as collision of the vehicle, approaching of a pedestrian or the like, an entry to a closed road, or the like on the basis of the obtained information, and generate a warning signal. The warning signal may, for example, be a signal for producing a warning sound or lighting a warning lamp.
The sound/image output section 7670 transmits an output signal (e.g. remixed/upmixed signal) of at least one of a sound and an image to an output device capable of visually or auditorily notifying information to an occupant of the vehicle or the outside of the vehicle. In the example of Fig. 11, an audio speaker 7710, a display section 7720, and an instrument panel 7730 are illustrated as the out put device. The display section 7720 may, for example, include at least one of an on-board display and a head-up display. The display section 7720 may have an augmented reality (AR) display func tion. The output device may be other than these devices, and may be another device such as head phones, a wearable device such as an eyeglass type display worn by an occupant or the like, a projector, a lamp, or the like. In a case where the output device is a display device, the display device visually displays results obtained by various kinds of processing performed by the microcomputer 7610 or information received from another control unit in various forms such as text, an image, a table, a graph, or the like. In addition, in a case where the output device is an audio output device (e.g. loudspeaker system, see Fig. 2).
Incidentally, at least two control units connected to each other via the communication network 7010 in the example depicted in Fig. 11 may be integrated into one control unit. Alternatively, each indi vidual control unit may include a plurality of control units. Further, the vehicle control system 7000 may include another control unit not depicted in the figures. In addition, part or the whole of the functions performed by one of the control units in the above description may be assigned to an other control unit. That is, predetermined arithmetic processing may be performed by any of the control units as long as information is transmitted and received via the communication network 7010. Similarly, a sensor or a device connected to one of the control units may be connected to an other control unit, and a plurality of control units may mutually transmit and receive detection infor mation via the communication network 7010. Incidentally, a computer program for realizing the functions of the electronic device according to the present embodiment described with reference to FIG. 2 can be implemented in one of the con trol units or the like. In addition, a computer readable recording medium storing such a computer program can also be provided. The recording medium is, for example, a magnetic disk, an optical disk, a magneto-optical disk, a flash memory, or the like. In addition, the above-described computer program may be distributed via a network, for example, without the recording medium being used.
Fig. 12 shows an example of installation positions of the imaging section 7410 and the outside-vehi cle information detecting section 7420. Imaging sections 7910, 7912, 7914, 7916, and 7918 are, for example, disposed at at least one of positions on a front nose, side-view mirrors, a rear bumper, and a back door of the vehicle 7900 and a position on an upper portion of a windshield within the inte rior of the vehicle. The imaging section 7910 provided to the front nose and the imaging section 7918 provided to the upper portion of the windshield within the interior of the vehicle obtain mainly an image of the front of the vehicle 7900. The imaging sections 7912 and 7914 provided to the sideview mirrors obtain mainly an image of the sides of the vehicle 7900. The imaging section 7916 provided to the rear bumper or the back door obtains mainly an image of the rear of the vehi cle 7900. The imaging section 7918 provided to the upper portion of the windshield within the inte rior of the vehicle is used mainly to detect a preceding vehicle, a pedestrian, an obstacle, a signal, a traffic sign, a lane, or the like.
Incidentally, Fig. 12 depicts an example of photographing ranges of the respective imaging sections 7910, 7912, 7914, and 7916. An imaging range a represents the imaging range of the imaging section 7910 provided to the front nose. Imaging ranges b and c respectively represent the imaging ranges of the imaging sections 7912 and 7914 provided to the side-view mirrors. An imaging range d repre sents the imaging range of the imaging section 7916 provided to the rear bumper or the back door.
Outside-vehicle information detecting sections 7920, 7922, 7924, 7926, 7928, and 7930 provided to the front, rear, sides, and corners of the vehicle 7900 and the upper portion of the windshield within the interior of the vehicle may be, for example, an ultrasonic sensor or a radar device. The outside- vehicle information detecting sections 7920, 7926, and 7930 provided to the front nose of the vehi cle 7900, the rear bumper, the back door of the vehicle 7900, and the upper portion of the wind shield within the interior of the vehicle may be a LIDAR device, for example. These outside-vehicle information detecting sections 7920 to 7930 are used mainly to detect a preceding vehicle, a pedes trian, an obstacle, or the like.
Implementation in intelligent personal assistants
Fig. 13 shows a block diagram depicting an example of schematic configuration of a voice controlled system, for example, a voice controlled system related to a vehicle or a household. The electronic system 1300 comprises a CPU 1301 as processor. The electronic device 1300 further comprises a microphone array 1310 and a loudspeaker array 1311 that are connected to the processor 1301. Pro cessor 1301 may for example implement an audio source separation , an adaptive remixing/upmix- ing, situation analyzer, and/ or a dynamic equalizer that realize the processes described with regard to Fig. 2 in more detail. The microphone array 1310 may be configured to receive speech (voice) com mands via automatic speech recognition (see Figs. 3a, 3b, 5, 7). Loudspeaker array 1311 consists of one or more loudspeakers that are distributed over a predefined space and is configured to render 3D audio as described in the embodiments above. The electronic device 1300 further comprises a user interface 1312 that is connected to the processor 1301. This user interface 1312 acts as a man- machine interface and enables a dialogue between an administrator and the electronic system. For example, an administrator may make configurations to the system using this user interface 1312. The electronic system 1300 further comprises an Ethernet interface 1321, a Bluetooth interface 1304, and a WLAN interface 1305. These units 1304, 1305 act as 1/ O interfaces for data communication with external devices. For example, additional loudspeakers, microphones, and video cameras with Ethernet, WLAN or Bluetooth connection may be coupled to the processor 1301 via these inter faces 1321, 1304, and 1305.
The electronic device 1300 further comprises a data storage 1302 and a data memory 1303 (here a RAM). The data memory 1303 is arranged to temporarily store or cache data or computer instruc tions for processing by the processor 1301. The data storage 1302 is arranged as a long term storage, e.g., for recording sensor data obtained from the microphone array 1310. The data storage 1302 may also store audio data that represents audio messages, which the public announcement system may transport to people moving in the predefined space.
The voice controlled system of Fig. 13 may for example be used in a smart speaker, or the like.
Via the Ethernet interface 1321 or the WLAN interface 1305, the voice controlled system of Fig. 13 may be connected to a telephone system to receive incoming calls. That is, the voice controlled sys tem can implement the process of Figs. 3a, 3b, 5.
It should be noted that the description above is only an example configuration. Alternative configu rations may be implemented with additional or other sensors, storage devices, interfaces, or the like.
Figure imgf000025_0001
It should be recognized that the embodiments describe methods with an exemplary ordering of method steps. The specific ordering of method steps is however given for illustrative purposes only and should not be construed as binding. It should also be recognized that the division of the electronic system of Fig. 11 into units is only made for illustration purposes and that the present disclosure is not limited to any specific division of functions in specific units. For instance, at least parts of the circuitry could be implemented by a respective programmed processor, field programmable gate array (FPGA), dedicated circuits, and the like.
All units and entities described in this specification and claimed in the appended claims can, if not stated otherwise, be implemented as integrated circuit logic, for example on a chip, and functionality provided by such units and entities can, if not stated otherwise, be implemented by software.
In so far as the embodiments of the disclosure described above are implemented, at least in part, us ing software-controlled data processing apparatus, it will be appreciated that a computer program providing such software control and a transmission, storage or other medium by which such a com puter program is provided are envisaged as aspects of the present disclosure.
Note that the present technology can also be configured as described below.
(1) An electronic device comprising circuitry configured to:
perform audio source separation (301; 501; 701; 801) based on a received audio signal to ob tain separated sources;
analyze (302; 502; 702; 802) a current situation to determine one or more parameters for adaptive remixing/ upmixing, and
perform remixing/ upmixing (305; 509; 705; 805) based on the separated sources and based on the parameters for adaptive remixing/ upmixing to obtain a remixed/ upmixed signal.
(2) The electronic device of (1), wherein the current situation is a current situation related to a vehicle.
(3) The electronic device of anyone of (1) to (2), wherein the current situation is a current situa tion related to a speech recognition system.
(4) The electronic device of anyone of (1) to (3), wherein analyzing the current situation com prises using input from at least one sensor and/ or historic information. (5) The electronic device of anyone of (1) to (4), wherein the parameters for adaptive remixing or upmixing comprise a predefined volume change parameter.
(6) The electronic device of anyone of (1) to (5), wherein the separated sources comprise vocals and accompaniment and wherein the parameters for adaptive remixing or upmixing comprise a pre defined volume decrease parameter related to the vocals.
(7) The electronic device of anyone of (1) to (6), wherein analyzing the current situation com prises analyzing the current situation inside a vehicle, or analyzing a current driving/ passenger situa tion.
(8) The electronic device of anyone of (1) to (7), wherein the circuitry (7610, 7670) is configured to output (306; 706; 806; 906) the remixed or upmixed signal to a loudspeaker system (7710; SP1 to SP8).
(9) The electronic device of anyone of (1) to (8), wherein analyzing the current situation com prises determining (302; 502) if there is an incoming call.
(10) The electronic device of anyone of (1) to (9), wherein the parameters for adaptive remix- ing/ upmixing comprise an output channel selection parameter.
(11) The electronic device of (10), wherein the circuitry (7610, 7670) is configured to determine to which of front loudspeakers and/ or rear loudspeakers the remixed or upmixed signal is to be out put (510) based on the output channel selection parameter.
(12) The electronic device of anyone of (1) to (11), wherein analyzing the current situation com prises determining (702) if there is a passengers conversation.
(13) The electronic device of anyone of (1) to (11), wherein analyzing the current situation com prises determining (802) the audio quality of the received audio signal. (14) The electronic device of (13), wherein the separated sources comprise vocals and accompani ment and wherein the parameters for adaptive remixing or upmixing comprise a predefined volume increase parameter related to the vocals that depends on the audio quality of the received audio sig nal.
(15) The electronic device of anyone of (1) to (14), wherein analyzing the current situation com prises determining (312) if a keyword for initiating speech commands detected.
(16) The electronic device of (15), wherein the separated sources comprise vocals and accompani ment and wherein the parameters for adaptive remixing or upmixing comprise a predefined volume decrease parameter related to the vocals that depends on the detection of the keyword for initiating speech commands.
(17) The electronic device of anyone of (1) to (16), wherein the circuitry (7610, 7670, 7500) is fur ther configured to detect (903) a surrounding noise floor within the obtained separated sources.
(18) The electronic device of (17), wherein the circuitry (7610, 7670) is further configured to per form dynamic equalization (904) to the separated sources to obtain equalized separated sources, and to perform remixing or upmixing (905) based on the equalized separated sources.
(19) A method comprising: performing audio source separation (301; 501; 701; 801) based on a received audio signal to obtain separated sources;
analyzing (302; 502; 702; 802) a current situation to determine one or more parameters for adaptive remixing or upmixing; and performing remixing or upmixing (305; 509; 705; 805) based on the separated sources and based on the parameters for adaptive remixing or upmixing to obtain a remixed or upmixed signal
(20) A computer program comprising instructions, the instructions when executed on a processor causing the processor to: perform audio source separation (301; 501; 701; 801) based on a received audio signal to ob tain separated sources;
analyze (302; 502; 702; 802) a current situation to determine one or more parameters for adaptive remixing or upmixing, and perform remixing or upmixing (305; 509; 705; 805) based on the separated sources and based on the parameters for adaptive remixing or upmixing to obtain a remixed or upmixed signal.

Claims

1. An electronic device comprising circuitry (7610, 7670) configured to: perform audio source separation (301; 501; 701; 801) based on a received audio signal to obtain separated sources;
analyze (302; 502; 702; 802) a current situation to determine one or more parameters for adaptive remixing or upmixing, and perform remixing/ upmixing (305; 509; 705; 805) based on the separated sources and based on the parameters for adaptive remixing/ upmixing to obtain a remixed/upmixed signal.
2. The electronic device of claim 1, wherein the current situation is a current situation related to a vehicle.
3. The electronic device of claim 1, wherein the current situation is a current situation related to a speech recognition system.
4. The electronic device of claim 1, wherein analyzing the current situation comprises using input from at least one sensor and/ or historic information.
5. The electronic device of claim 1, wherein the parameters for adaptive remixing or upmixing comprise a predefined volume change parameter.
6. The electronic device of claim 1, wherein the separated sources comprise vocals and accompaniment and wherein the parameters for adaptive remixing or upmixing comprise a predefined volume decrease parameter related to the vocals.
7. The electronic device of claim 1, wherein analyzing the current situation comprises analyzing the current situation inside a vehicle, or analyzing a current driving/ passenger situation.
8. The electronic device of claim 1, wherein the circuitry (7610, 7670) is configured to output (306; 706; 806; 906) the remixed or upmixed signal to a loudspeaker system (7710; SP1 to SP8).
9. The electronic device of claim 1, wherein analyzing the current situation comprises
determining (302; 502) if there is an incoming call.
10. The electronic device of claim 1, wherein the parameters for adaptive remixing or upmixing comprise an output channel selection parameter.
11. The electronic device of claim 10, wherein the circuitry (7610, 7670) is configured to determine to which of front loudspeakers and/ or rear loudspeakers the remixed or upmixed signal is to be output (510) based on the output channel selection parameter.
12. The electronic device of claim 1, wherein analyzing the current situation comprises
determining (702) if there is a passengers conversation.
13. The electronic device of claim 1, wherein analyzing the current situation comprises determining (802) the audio quality of the received audio signal.
14. The electronic device of claim 13, wherein the separated sources comprise vocals and accompaniment and wherein the parameters for adaptive remixing or upmixing comprise a predefined volume increase parameter related to the vocals that depends on the audio quality of the received audio signal.
15. The electronic device of claim 1, wherein analyzing the current situation comprises
determining (312) if a keyword for initiating speech commands detected.
16. The electronic device of claim 15, wherein the separated sources comprise vocals and accompaniment and wherein the parameters for adaptive remixing or upmixing comprise a predefined volume decrease parameter related to the vocals that depends on the detection of the keyword for initiating speech commands.
17. The electronic device of claim 1, wherein the circuitry (7610, 7670, 7500) is further configured to detect (903) a surrounding noise floor within the obtained separated sources.
18. The electronic device of claim 17, wherein the circuitry (7610, 7670) is further configured to perform dynamic equalization (904) to the separated sources to obtain equalized separated sources, and to perform remixing or upmixing (905) based on the equalized separated sources.
19. A method comprising: performing audio source separation (301; 501; 701; 801) based on a received audio signal to obtain separated sources;
analyzing (302; 502; 702; 802) a current situation to determine one or more parameters for adaptive remixing or upmixing; and performing remixing or upmixing (305; 509; 705; 805) based on the separated sources and based on the parameters for adaptive remixing or upmixing to obtain a remixed or upmixed signal
20. A computer program comprising instructions, the instructions when executed on a processor causing the processor to: perform audio source separation (301; 501; 701; 801) based on a received audio signal to obtain separated sources;
analyze (302; 502; 702; 802) a current situation to determine one or more parameters for adaptive remixing or upmixing, and perform remixing or upmixing (305; 509; 705; 805) based on the separated sources and based on the parameters for adaptive remixing or upmixing to obtain a remixed or upmixed signal.
PCT/EP2019/085130 2018-12-14 2019-12-13 Audio processing device, audio processing method and computer program thereof WO2020120754A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP18212762.1 2018-12-14
EP18212762 2018-12-14

Publications (1)

Publication Number Publication Date
WO2020120754A1 true WO2020120754A1 (en) 2020-06-18

Family

ID=64665714

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2019/085130 WO2020120754A1 (en) 2018-12-14 2019-12-13 Audio processing device, audio processing method and computer program thereof

Country Status (1)

Country Link
WO (1) WO2020120754A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113873421A (en) * 2021-12-01 2021-12-31 杭州当贝网络科技有限公司 Method and system for realizing sky sound effect based on screen projection equipment
WO2022175552A1 (en) * 2021-02-22 2022-08-25 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Signal-adaptive remixing of separated audio sources
WO2023047620A1 (en) * 2021-09-24 2023-03-30 ソニーグループ株式会社 Information processing device, information processing method, and program
CN116594586A (en) * 2023-07-18 2023-08-15 苏州清听声学科技有限公司 Vehicle-mounted self-adaptive adjusting audio playing system and method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120114130A1 (en) * 2010-11-09 2012-05-10 Microsoft Corporation Cognitive load reduction
WO2014142201A1 (en) * 2013-03-15 2014-09-18 ヤマハ株式会社 Device and program for processing separating data
US20150016614A1 (en) * 2013-07-12 2015-01-15 Wim Buyens Pre-Processing of a Channelized Music Signal
US20160037282A1 (en) 2014-07-30 2016-02-04 Sony Corporation Method, device and system
US20170309289A1 (en) * 2016-04-26 2017-10-26 Nokia Technologies Oy Methods, apparatuses and computer programs relating to modification of a characteristic associated with a separated audio signal
WO2018017878A1 (en) * 2016-07-22 2018-01-25 Dolby Laboratories Licensing Corporation Network-based processing and distribution of multimedia content of a live musical performance
US20180220250A1 (en) * 2012-04-19 2018-08-02 Nokia Technologies Oy Audio scene apparatus
WO2018179989A1 (en) * 2017-03-29 2018-10-04 パナソニックIpマネジメント株式会社 Acoustic processing device, acoustic processing method, and program
US10111000B1 (en) * 2017-10-16 2018-10-23 Tp Lab, Inc. In-vehicle passenger phone stand

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120114130A1 (en) * 2010-11-09 2012-05-10 Microsoft Corporation Cognitive load reduction
US20180220250A1 (en) * 2012-04-19 2018-08-02 Nokia Technologies Oy Audio scene apparatus
WO2014142201A1 (en) * 2013-03-15 2014-09-18 ヤマハ株式会社 Device and program for processing separating data
US20150016614A1 (en) * 2013-07-12 2015-01-15 Wim Buyens Pre-Processing of a Channelized Music Signal
US20160037282A1 (en) 2014-07-30 2016-02-04 Sony Corporation Method, device and system
US20170309289A1 (en) * 2016-04-26 2017-10-26 Nokia Technologies Oy Methods, apparatuses and computer programs relating to modification of a characteristic associated with a separated audio signal
WO2018017878A1 (en) * 2016-07-22 2018-01-25 Dolby Laboratories Licensing Corporation Network-based processing and distribution of multimedia content of a live musical performance
WO2018179989A1 (en) * 2017-03-29 2018-10-04 パナソニックIpマネジメント株式会社 Acoustic processing device, acoustic processing method, and program
US20200015029A1 (en) * 2017-03-29 2020-01-09 Panasonic Intellectual Property Management Co., Ltd. Acoustic processing device, acoustic processing method, and recording medium
US10111000B1 (en) * 2017-10-16 2018-10-23 Tp Lab, Inc. In-vehicle passenger phone stand

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
UHLICH, STEFAN ET AL.: "Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on", 2017, IEEE, article "Improving music source separation based on deep neural networks through data augmentation and network blending"
UHLICH, STEFANFRANCK GIRONYUKI MITSUFUJI: "Deep neural network based instrument extraction from music", ICASSP, 2015

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022175552A1 (en) * 2021-02-22 2022-08-25 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Signal-adaptive remixing of separated audio sources
WO2023047620A1 (en) * 2021-09-24 2023-03-30 ソニーグループ株式会社 Information processing device, information processing method, and program
CN113873421A (en) * 2021-12-01 2021-12-31 杭州当贝网络科技有限公司 Method and system for realizing sky sound effect based on screen projection equipment
CN113873421B (en) * 2021-12-01 2022-03-22 杭州当贝网络科技有限公司 Method and system for realizing sky sound effect based on screen projection equipment
CN116594586A (en) * 2023-07-18 2023-08-15 苏州清听声学科技有限公司 Vehicle-mounted self-adaptive adjusting audio playing system and method
CN116594586B (en) * 2023-07-18 2023-09-26 苏州清听声学科技有限公司 Vehicle-mounted self-adaptive adjusting audio playing system and method

Similar Documents

Publication Publication Date Title
WO2020120754A1 (en) Audio processing device, audio processing method and computer program thereof
US9743213B2 (en) Enhanced auditory experience in shared acoustic space
US10650798B2 (en) Electronic device, method and computer program for active noise control inside a vehicle
CN110366852B (en) Information processing apparatus, information processing method, and recording medium
US20220408212A1 (en) Electronic device, method and computer program
US11854541B2 (en) Dynamic microphone system for autonomous vehicles
CN109960764A (en) Running information reminding method, device, electronic equipment and medium
WO2018074025A1 (en) Signal processing device, method, and program
JP2017069806A (en) Speaker array device
US20230164510A1 (en) Electronic device, method and computer program
US20220095046A1 (en) Hybrid in-car speaker and headphone based acoustical augmented reality system
JP2017069805A (en) On-vehicle acoustic device
US20210345043A1 (en) Systems and methods for external environment sensing and rendering
EP3609198B1 (en) Method for generating sounds in a vehicle and corresponding sound system
CN110139205B (en) Method and device for auxiliary information presentation
JP2022518135A (en) Acoustic augmented reality system for in-car headphones
US10812924B2 (en) Control apparatus configured to control sound output apparatus, method for controlling sound output apparatus, and vehicle
WO2023204076A1 (en) Acoustic control method and acoustic control device
US11943581B2 (en) Transparent audio mode for vehicles
US11974103B2 (en) In-car headphone acoustical augmented reality system
EP4171021A1 (en) Control device, projection system, control method, and program
CN117376808A (en) Vehicle-mounted communication method and system based on integration of visual perception and ICC technology
CN116368398A (en) Voice sound source positioning method, device and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19817339

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19817339

Country of ref document: EP

Kind code of ref document: A1