KR20230113853A

KR20230113853A - Psychoacoustic reinforcement based on audio source directivity

Info

Publication number: KR20230113853A
Application number: KR1020237025350A
Authority: KR
Inventors: 아이작 가르시아 무노스
Original assignee: 퀄컴 인코포레이티드
Priority date: 2021-01-29
Filing date: 2021-12-17
Publication date: 2023-08-01
Also published as: CN116803106B; BR112023014480A2; US11646046B2; CN116803106A; TW202304226A; KR102650763B1; US20220246160A1; EP4285611A1; JP7459391B2; WO2022164590A1; JP2023554694A

Abstract

디바이스는 하나 이상의 입력 오디오 신호들에 대응하는 하나 이상의 오디오 소스들의 지향성 데이터를 저장하도록 구성된 메모리를 포함한다. 디바이스는 또한 지향성 데이터에 적어도 부분적으로 기초하여 하나 이상의 등화기 설정들을 결정하도록 구성된 하나 이상의 프로세서들을 포함한다. 하나 이상의 프로세서들은 또한, 등화기 설정들에 기초하여, 하나 이상의 입력 오디오 신호들의 심리음향 강화 버전에 대응하는 하나 이상의 출력 오디오 신호들을 생성하도록 구성된다.The device includes memory configured to store directional data of one or more audio sources corresponding to one or more input audio signals. The device also includes one or more processors configured to determine one or more equalizer settings based at least in part on the directivity data. The one or more processors are also configured to generate one or more output audio signals corresponding to a psychoacoustic enhanced version of the one or more input audio signals based on the equalizer settings.

Description

Psychoacoustic reinforcement based on audio source directivity

I.I. 관련 related 출원들에 대한 상호참조CROSS REFERENCES TO APPLICATIONS

본 출원은 2021년 1월 29일자로 출원된 공동 소유의 미국 정규 특허출원 제 17/162,241 호로부터의 우선권의 이익을 주장하며, 그 내용들은 전부가 본 명세서에 원용에 의해 명시적으로 통합된다.This application claims the benefit of priority from commonly owned US Provisional Patent Application Serial No. 17/162,241, filed on January 29, 2021, the contents of which are expressly incorporated herein by reference in their entirety.

II.II. 분야Field

본 개시는 일반적으로 오디오 소스 지향성에 기초한 심리음향 강화에 관한 것이다.This disclosure relates generally to psychoacoustic enhancement based on audio source directivity.

III.III. 관련 기술의 설명Description of related technology

기술에서의 진보들은 더 소형이고 더 강력한 컴퓨팅 디바이스들을 발생시켰다. 예를 들어, 소형이고 경량이며 사용자들에 의해 용이하게 휴대되는 모바일 및 스마트 폰들과 같은 무선 전화기들, 태블릿들 및 랩톱 컴퓨터들을 포함하는 다양한 휴대용 개인 컴퓨팅 디바이스들이 현재 존재한다. 이들 디바이스들은 무선 네트워크들을 통해 음성 및 데이터 패킷들을 통신할 수 있다. 추가로, 다수의 그러한 디바이스들은 디지털 스틸 카메라, 디지털 비디오 카메라, 디지털 레코더, 및 오디오 파일 플레이어와 같은 추가적인 기능성을 통합한다. 또한, 그러한 디바이스들은, 인터넷에 액세스하는데 사용될 수 있는 웹 브라우저 애플리케이션과 같은 소프트웨어 애플리케이션들을 포함한 실행가능 명령들을 프로세싱할 수 있다. 이로써, 이들 디바이스들은 현저한 컴퓨팅 능력들을 포함할 수 있다.Advances in technology have resulted in smaller and more powerful computing devices. For example, a variety of portable personal computing devices currently exist, including wireless telephones such as mobile and smart phones, tablets, and laptop computers that are small, lightweight, and easily carried by users. These devices can communicate voice and data packets over wireless networks. Additionally, many such devices incorporate additional functionality such as digital still cameras, digital video cameras, digital recorders, and audio file players. Also, such devices may process executable instructions including software applications, such as a web browser application that may be used to access the Internet. As such, these devices may include significant computing capabilities.

그러한 컴퓨팅 디바이스들은 종종, 하나 이상의 마이크로폰들로부터 오디오 신호를 수신하기 위한 기능성을 통합한다. 예를 들어, 오디오 신호는 마이크로폰들에 의해 캡처된 사용자 스피치, 마이크로폰들에 의해 캡처된 주변 사운드들, 또는 이들의 조합을 나타낼 수도 있다. 사용자 스피치(user speech)는 사용자 스피치가 마이크로폰들에 의해 캡처되는 사용자로부터의 마이크로폰들의 거리 때문에 오디오 신호에서 듣기 어려울 수도 있다. 예를 들어, 사용자로부터 더 멀리 떨어져 있는 마이크로폰들은 교통, 다른 사용자들의 스피치 등과 같은 더 많은 주변 사운드들을 캡처할 수도 있다. 다른 예로서, 사용자 스피치는 더 멀리 떨어져 있는 마이크로폰들에 의해 캡처될 때 더 부드럽게 들린다. 오디오 신호에서 특정 사운드들에 초점을 맞추는 능력은 통신 애플리케이션 또는 음성 제어 어시스턴트 시스템에서 사용자 스피치를 더 명확하게 통신하는 것과 같은 다양한 애플리케이션들에 대해 유용하다.Such computing devices often incorporate functionality for receiving an audio signal from one or more microphones. For example, an audio signal may represent user speech captured by microphones, ambient sounds captured by microphones, or a combination thereof. User speech may be difficult to hear in the audio signal due to the distance of the microphones from the user at which the user speech is captured by the microphones. For example, microphones further away from the user may capture more ambient sounds, such as traffic, speech of other users, and the like. As another example, user speech sounds softer when captured by more distant microphones. The ability to focus on specific sounds in an audio signal is useful for a variety of applications, such as communicating user speech more clearly in a communication application or voice controlled assistant system.

IV.IV. 요약summary

본 개시의 일 구현에 따르면, 디바이스는 하나 이상의 입력 오디오 신호들에 대응하는 하나 이상의 오디오 소스들의 지향성 (directivity) 데이터를 저장하도록 구성된 메모리를 포함한다. 디바이스는 또한 지향성 데이터에 적어도 부분적으로 기초하여 하나 이상의 등화기 설정들(equalizer settings)을 결정하도록 구성된 하나 이상의 프로세서들을 포함한다. 하나 이상의 프로세서들은 또한, 등화기 설정들에 기초하여, 하나 이상의 입력 오디오 신호들의 심리음향 강화 버전(psychoacoustic enhanced version)에 대응하는 하나 이상의 출력 오디오 신호들을 생성하도록 구성된다.According to one implementation of the present disclosure, a device includes a memory configured to store directivity data of one or more audio sources corresponding to one or more input audio signals. The device also includes one or more processors configured to determine one or more equalizer settings based at least in part on the directivity data. The one or more processors are also configured to generate one or more output audio signals corresponding to a psychoacoustic enhanced version of the one or more input audio signals based on the equalizer settings.

본 개시의 다른 구현에 따르면, 방법은 디바이스에서, 하나 이상의 입력 오디오 신호들에 대응하는 하나 이상의 오디오 소스들의 지향성 데이터를 획득하는 단계를 포함한다. 방법은 또한 디바이스에서, 지향성 데이터에 적어도 부분적으로 기초하여 하나 이상의 등화기 설정들을 결정하는 단계를 포함한다. 방법은, 등화기 설정들에 기초하여, 하나 이상의 입력 오디오 신호들의 심리음향 강화 버전에 대응하는 하나 이상의 출력 오디오 신호들을 생성하는 단계를 더 포함한다.According to another implementation of the present disclosure, a method includes obtaining, at a device, directional data of one or more audio sources corresponding to one or more input audio signals. The method also includes determining, at the device, one or more equalizer settings based at least in part on the directional data. The method further includes generating one or more output audio signals corresponding to the psychoacoustic enhanced version of the one or more input audio signals based on the equalizer settings.

본 개시의 다른 구현에 따르면, 비일시적(non-transitory) 컴퓨터 판독가능 매체는, 하나 이상의 프로세서들에 의해 실행될 때, 그 하나 이상의 프로세서들로 하여금, 하나 이상의 입력 오디오 신호들에 대응하는 하나 이상의 오디오 소스들의 지향성 데이터를 획득하게 하는 명령들을 저장한다. 명령들은, 하나 이상의 프로세서들에 의해 실행될 때, 또한 그 하나 이상의 프로세서들로 하여금 지향성 데이터에 적어도 부분적으로 기초하여 하나 이상의 등화기 설정들을 결정하게 한다. 명령들은, 하나 이상의 프로세서들에 의해 실행될 때, 또한, 그 하나 이상의 프로세서들로 하여금, 등화기 설정들에 기초하여, 하나 이상의 입력 오디오 신호들의 심리음향 강화 버전에 대응하는 하나 이상의 출력 오디오 신호들을 생성하게 한다.According to another implementation of the present disclosure, a non-transitory computer readable medium, when executed by one or more processors, causes the one or more processors to perform one or more audio signals corresponding to one or more input audio signals. Stores instructions that allow obtaining directivity data from sources. The instructions, when executed by one or more processors, also cause the one or more processors to determine one or more equalizer settings based at least in part on the directivity data. The instructions, when executed by one or more processors, also cause the one or more processors to generate, based on the equalizer settings, one or more output audio signals corresponding to a psychoacoustic enhanced version of the one or more input audio signals. let it

본 개시의 다른 구현에 따르면, 장치는 하나 이상의 입력 오디오 신호들에 대응하는 하나 이상의 오디오 소스들의 지향성 데이터를 획득하기 위한 수단을 포함한다. 장치는 또한 지향성 데이터에 적어도 부분적으로 기초하여 하나 이상의 등화기 설정들을 결정하기 위한 수단을 포함한다. 장치는, 등화기 설정들에 기초하여, 하나 이상의 입력 오디오 신호들의 심리음향 강화 버전에 대응하는 하나 이상의 출력 오디오 신호들을 생성하기 위한 수단을 더 포함한다.According to another implementation of the present disclosure, an apparatus includes means for obtaining directional data of one or more audio sources corresponding to one or more input audio signals. The apparatus also includes means for determining one or more equalizer settings based at least in part on the directivity data. The apparatus further comprises means for generating, based on the equalizer settings, one or more output audio signals corresponding to a psychoacoustic enhanced version of the one or more input audio signals.

본 개시의 다른 양태들, 이점들, 및 특징들은 다음의 섹션들: 즉, 도면의 간단한 설명, 상세한 설명, 및 청구항들을 포함하여 전체 출원의 검토 후 자명하게 될 것이다. Other aspects, advantages, and features of the present disclosure will become apparent after review of the entire application, including the following sections: Brief Description of the Drawings, Detailed Description, and Claims.

V. 도면들의 간단한 설명
도 1은 본 개시의 일부 예들에 따른, 오디오 소스 지향성에 기초하여 심리음향 강화를 수행하도록 동작가능한 시스템의 특정 예시적인 양태의 블록도이다.
도 2a는 본 개시의 일부 예들에 따른, 도 1의 시스템에 의해 생성된 그래픽 사용자 인터페이스(GUI)의 예시적인 예의 다이어그램이다.
도 2b는 본 개시의 일부 예들에 따른, 도 1의 시스템에 의해 생성된 GUI의 다른 예시적인 예의 다이어그램이다.
도 3은 본 개시의 일부 예들에 따른, 도 1의 시스템의 컴포넌트들의 예시적인 양태의 다이어그램이다.
도 4는 본 개시의 일부 예들에 따른, 도 1의 시스템의 컴포넌트들의 예시적인 양태의 다이어그램이다.
도 5는 본 개시의 일부 예들에 따른, 도 1의 시스템의 컴포넌트들의 예시적인 양태의 다이어그램이다.
도 6은 본 개시의 일부 예들에 따른, 도 1의 시스템의 컴포넌트들의 예시적인 양태의 다이어그램이다.
도 7은 본 개시의 일부 예들에 따른, 오디오 소스 지향성에 기초하여 심리음향 강화를 수행하도록 동작가능한 집적 회로의 예를 예시한다.
도 8 은 본 개시의 일부 예들에 따른, 오디오 소스 지향성에 기초하여 심리음향 강화를 수행하도록 동작가능한 모바일 디바이스의 다이어그램이다.
도 9는 본 개시의 일부 예들에 따른, 오디오 소스 지향성에 기초하여 심리음향 강화를 수행하도록 동작가능한 헤드셋의 다이어그램이다.
도 10 은 본 개시의 일부 예들에 따른, 오디오 소스 지향성에 기초하여 심리음향 강화를 수행하도록 동작가능한 웨어러블 전자 디바이스의 다이어그램이다.
도 11 은 본 개시의 일부 예들에 따른, 오디오 소스 지향성에 기초하여 심리음향 강화를 수행하도록 동작가능한 음성 제어 스피커 시스템의 다이어그램이다.
도 12 는 본 개시의 일부 예들에 따른, 오디오 소스 지향성에 기초하여 심리음향 강화를 수행하도록 동작가능한 카메라의 다이어그램이다.
도 13 은 본 개시의 일부 예들에 따른, 오디오 소스 지향성에 기초하여 심리음향 강화를 수행하도록 동작가능한, 가상 현실 또는 증강 현실 헤드셋과 같은 헤드셋의 다이어그램이다.
도 14 는 본 개시의 일부 예들에 따른, 오디오 소스 지향성에 기초하여 심리음향 강화를 수행하도록 동작가능한 차량의 제 1 예의 다이어그램이다.
도 15 는 본 개시의 일부 예들에 따른, 오디오 소스 지향성에 기초하여 심리음향 강화를 수행하도록 동작가능한 차량의 제 2 예의 다이어그램이다.
도 16 은 본 개시의 일부 예들에 따른, 도 1 의 디바이스에 의해 수행될 수도 있는 오디오 소스 지향성에 기초한 심리음향 강화 방법의 특정 구현의 다이어그램이다.
도 17 은 본 개시의 일부 예들에 따른, 오디오 소스 지향성에 기초하여 심리음향 강화를 수행하도록 동작가능한 디바이스의 특정의 예시적인 예의 블록도이다. V. BRIEF DESCRIPTION OF THE DRAWINGS
1 is a block diagram of a particular illustrative aspect of a system operable to perform psychoacoustic enhancement based on audio source directivity, in accordance with some examples of the present disclosure.
2A is a diagram of an illustrative example of a graphical user interface (GUI) generated by the system of FIG. 1, in accordance with some examples of the present disclosure.
2B is a diagram of another illustrative example of a GUI generated by the system of FIG. 1, in accordance with some examples of the present disclosure.
3 is a diagram of an exemplary aspect of components of the system of FIG. 1, in accordance with some examples of the present disclosure.
4 is a diagram of an exemplary aspect of components of the system of FIG. 1, in accordance with some examples of the present disclosure.
5 is a diagram of an exemplary aspect of components of the system of FIG. 1, in accordance with some examples of the present disclosure.
6 is a diagram of an exemplary aspect of components of the system of FIG. 1, in accordance with some examples of the present disclosure.
7 illustrates an example of an integrated circuit operable to perform psychoacoustic enhancement based on audio source directivity, in accordance with some examples of the present disclosure.
8 is a diagram of a mobile device operable to perform psychoacoustic enhancement based on audio source directivity, in accordance with some examples of the present disclosure.
9 is a diagram of a headset operable to perform psychoacoustic enhancement based on audio source directivity, in accordance with some examples of the present disclosure.
10 is a diagram of a wearable electronic device operable to perform psychoacoustic enhancement based on audio source directivity, in accordance with some examples of the present disclosure.
11 is a diagram of a voice controlled speaker system operable to perform psychoacoustic enhancement based on audio source directivity, in accordance with some examples of the present disclosure.
12 is a diagram of a camera operable to perform psychoacoustic enhancement based on audio source directivity, in accordance with some examples of the present disclosure.
13 is a diagram of a headset, such as a virtual reality or augmented reality headset, operable to perform psychoacoustic enhancement based on audio source directivity, in accordance with some examples of the present disclosure.
14 is a diagram of a first example of a vehicle operable to perform psychoacoustic enhancement based on audio source directivity, in accordance with some examples of the present disclosure.
15 is a diagram of a second example of a vehicle operable to perform psychoacoustic enhancement based on audio source directivity, in accordance with some examples of the present disclosure.
16 is a diagram of a particular implementation of a method of psychoacoustic enhancement based on audio source directivity that may be performed by the device of FIG. 1, in accordance with some examples of this disclosure.
17 is a block diagram of a particular illustrative example of a device operable to perform psychoacoustic enhancement based on audio source directivity, in accordance with some examples of the present disclosure.

VI.VI. 상세한 설명details

마이크로폰들은 사용자 스피치, 주변 사운드들, 또는 이들의 조합과 같은 캡처된 사운드를 나타내는 오디오 신호들을 생성한다. 오디오 소스로부터의 마이크로폰들의 거리 때문에 다양한 사운드들이 오디오 신호에서 듣기 어려울 수도 있다. 오디오 신호에서 특정 사운드들에 초점을 맞추는 능력은 통신 애플리케이션에서의 사용자 스피치 또는 조류 추적 애플리케이션에서의 조류 사운드들과 같은 다양한 애플리케이션들에 대해 유용하다.Microphones produce audio signals representative of captured sound, such as user speech, ambient sounds, or a combination thereof. Due to the distance of the microphones from the audio source, various sounds may be difficult to hear in the audio signal. The ability to focus on specific sounds in an audio signal is useful for a variety of applications, such as user speech in a communication application or bird sounds in a bird tracking application.

오디오 소스 지향성에 기초한 심리음향 강화(psychoacoustic enhancement)의 시스템들 및 방법들이 개시된다. 상이한 타입들의 오디오 소스들은 상이한 사운드 지향성 특성들을 가질 수 있다. 예를 들어, 인간 스피치는 뒤보다 인간 머리의 앞에 더 많이 지향되고, 인간 발화자가 향하는 방향으로부터의 거리 및 각도 오프셋에 기초하여 변하는 주파수 응답을 나타낼 수도 있는 반면, 12면체 사운드 소스는 전방향성 지향성에 근접한다.Systems and methods of psychoacoustic enhancement based on audio source directivity are disclosed. Different types of audio sources may have different sound directivity characteristics. For example, human speech is directed more in front of the human head than behind, and may exhibit a frequency response that varies based on distance and angular offset from the direction the human talker is facing, whereas a dodecahedral sound source is omnidirectional. Close up.

오디오 인핸서(audio enhancer)는 오디오 소스에 대해 (예를 들어, 오디오 소스에 더 가깝게 또는 그로부터 멀리) 마이크로폰을 이동시킴으로써 캡처될 사운드를 근사화하기 위해 오디오 소스의 지향성에 기초하여 심리음향 강화(psychoacoustic enhancement)를 수행한다. 예를 들어, 오디오 인핸서는 지향성 분석기(directivity analyzer) 및 등화기(equalizer)를 포함한다. 지향성 분석기는 오디오 소스(audio source) 및 줌 타겟(zoom target)의 지향성 데이터에 기초하여 등화기 설정들을 생성한다. 예를 들어, 지향성 분석기는, 등화기 설정들을 적용하는 것이 마이크로폰을 줌 타겟으로 이동시키는 것을 에뮬레이트(emulate)하기 위해 특정 오디오 주파수들의 라우드니스(loudness)를 조정하도록 등화기 설정들을 생성한다. 등화기는 입력 오디오 신호들에 등화기 설정들을 적용하여 입력 오디오 신호들의 심리음향 강화 버전에 대응하는 출력 오디오 신호들을 생성한다. 예를 들어, 입력 오디오 신호들은 마이크로폰들의 마이크로폰 출력들에 기초하고, 출력 오디오 신호들은 줌 타겟에서의 오디오 소스의 주파수 응답에 근사한다. 따라서, 출력 오디오 신호들은 줌 타겟에서 마이크로폰들에 의해 캡처될 사운드들에 근사한다.An audio enhancer performs psychoacoustic enhancement based on the directivity of the audio source to approximate the sound that would be captured by moving the microphone relative to the audio source (eg, closer to or away from the audio source). do For example, audio enhancers include directivity analyzers and equalizers. The directivity analyzer creates equalizer settings based on directivity data of an audio source and zoom target. For example, a directivity analyzer creates equalizer settings to adjust the loudness of specific audio frequencies to emulate that applying the equalizer settings moves a microphone to a zoom target. An equalizer applies equalizer settings to the input audio signals to produce output audio signals corresponding to psychoacoustic enhanced versions of the input audio signals. For example, the input audio signals are based on the microphone outputs of the microphones, and the output audio signals approximate the frequency response of the audio source at the zoom target. Thus, the output audio signals approximate the sounds to be captured by the microphones at the zoom target.

본 개시의 특정 양태들은 도면들을 참조하여 이하에 설명된다. 설명에서, 공통 피처들 (features) 은 공통 참조 번호들로 지정된다. 본 명세서에서 사용된 바와 같이, 다양한 용어가 단지 특정 구현들을 설명할 목적으로 사용되고 구현들을 한정하는 것으로 의도되지 않는다. 예를 들어, 단수 형태들 "a", "an", 및 "the" 는, 문맥이 분명히 달리 표시하지 않는 한, 복수 형태들을 물론 포함하도록 의도된다. 추가로, 본 명세서에서 설명된 일부 피처들은 일부 구현들에서 단수이고 다른 구현들에서는 복수이다. 예시하자면, 도 1 은 하나 이상의 프로세서들 (도 1 의 "프로세서(들)” (190)) 을 포함하는 디바이스 (102) 를 도시하며, 이는 일부 구현들에서는 디바이스 (102) 가 단일 프로세서 (190) 를 포함하고 다른 구현들에서는 디바이스 (102) 가 다중의 프로세서들 (190) 을 포함함을 나타낸다.Certain aspects of the present disclosure are described below with reference to the drawings. In the description, common features are designated with common reference numbers. As used herein, various terms are used merely for the purpose of describing particular implementations and are not intended to limit the implementations. For example, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context clearly dictates otherwise. Additionally, some features described herein are singular in some implementations and plural in others. To illustrate, FIG. 1 shows a device 102 that includes one or more processors (“processor(s)” 190 in FIG. and indicates that device 102 includes multiple processors 190 in other implementations.

본 명세서에서 사용된 바와 같이, 용어들 "포함한다(comprise)", "포함한다(comprises)", 및 "포함하는(comprising)" 은 "포함한다(include)", "포함한다(includes)", 또는 "포함하는(including)" 과 상호교환가능하게 사용될 수도 있다. 부가적으로, 용어 "여기서 (wherein)" 는 "여기에서 (where)" 와 상호교환가능하게 사용될 수도 있다. 본 명세서에서 사용된 바와 같이, "예시적인” 은 예, 구현, 및/또는 양태를 나타내며, 제한하는 것으로서 또는 선호도 또는 선호된 구현을 나타내는 것으로서 해석되지 않아야 한다. 본 명세서에서 사용된 바와 같이, 구조, 컴포넌트, 동작 등과 같은 엘리먼트를 수정하는데 사용되는 서수 용어 (예를 들어, "제 1", "제 2", "제 3" 등) 는 홀로 다른 엘리먼트에 관하여 엘리먼트의 임의의 우선순위 또는 순서를 표시하는 것이 아니라, 오히려 단지 엘리먼트를 (서수 용어의 사용이 없다면) 동일한 명칭을 갖는 다른 엘리먼트로부터 구별할 뿐이다. 본 명세서에서 사용된 바와 같이, 용어 "세트" 는 특정 엘리먼트의 하나 이상을 지칭하고, 용어 "복수" 는 특정 엘리먼트의 배수 (예컨대, 2 이상) 를 지칭한다.As used herein, the terms "comprise", "comprises", and "comprising" refer to "include", "includes" , or "including" may be used interchangeably. Additionally, the term “wherein” may be used interchangeably with “where”. As used herein, "exemplary" indicates examples, implementations, and/or aspects, and is not to be construed as limiting or indicating a preference or preferred implementation. As used herein, structure An ordinal term (e.g., “first,” “second,” “third,” etc.) used to modify an element, such as a component, operation, etc., alone indicates any priority or order of elements relative to other elements. does not indicate, but rather merely distinguishes an element from other elements having the same name (unless the ordinal term is used) As used herein, the term "set" refers to one or more of a particular element; The term "plurality" refers to multiples (eg, two or more) of a particular element.

본 명세서에서 사용된 바와 같이, "결합된(coupled)” 은 "통신가능하게 결합된", "전기적으로 결합된", 또는 "물리적으로 결합된” 을 포함할 수도 있으며, 또한 (또는 대안적으로) 이들의 임의의 조합들을 포함할 수도 있다. 2 개의 디바이스들 (또는 컴포넌트들) 은 하나 이상의 다른 디바이스, 컴포넌트, 와이어, 버스, 네트워크 (예를 들어, 유선 네트워크, 무선 네트워크, 또는 이들의 조합) 등을 통해 직접적으로 또는 간접적으로 결합될 (예를 들어, 통신적으로 결합될, 전기적으로 결합될, 또는 물리적으로 결합될) 수도 있다. 전기적으로 결합된 2 개의 디바이스들 (또는 컴포넌트들) 은 동일한 디바이스 또는 상이한 디바이스에 포함될 수도 있고, 예시적인, 비-제한적인 예들로서 전자기기들, 하나 이상의 커넥터들 또는 유도 결합을 통해 접속될 수도 있다. 일부 구현들에서, 전기 통신에서와 같이, 통신가능하게 결합되는 2개의 디바이스들 (또는 컴포넌트들) 은 하나 이상의 와이어, 버스, 네트워크 등을 통해, 직접 또는 간접적으로 신호들 (예를 들어, 디지털 신호들 또는 아날로그 신호들) 을 전송 및 수신할 수도 있다. 본 명세서에서 사용된 바와 같이, "직접 결합된" 은 컴포넌트들을 개재하지 않으면서 결합 (예를 들어, 통신가능하게 결합, 전기적으로 결합, 또는 물리적으로 결합) 되는 2개의 디바이스들을 포함할 수도 있다.As used herein, "coupled" may include "communicatively coupled," "electrically coupled," or "physically coupled," and may also (or alternatively ) may include any combination of these. Two devices (or components) may be coupled (e.g., directly or indirectly) through one or more other devices, components, wires, buses, networks (e.g., wired networks, wireless networks, or combinations thereof), or the like. eg, communicatively coupled, electrically coupled, or physically coupled). Two electrically coupled devices (or components) may be included in the same device or different devices and may be connected via electronics, one or more connectors or inductive coupling as illustrative, non-limiting examples. . In some implementations, as in telecommunication, two devices (or components) that are communicatively coupled transmit signals (e.g., a digital signal) directly or indirectly, via one or more wires, buses, networks, or the like. or analog signals) may be transmitted and received. As used herein, “directly coupled” may include two devices that are coupled (eg, communicatively coupled, electrically coupled, or physically coupled) without intervening components.

본 개시에서, 용어들 이를 테면 "결정하는 것", "계산하는 것", "추정하는 것", "시프트하는 것", "조정하는 것", 등은 하나 이상의 동작들이 수행되는 방법을 설명하기 위해 사용될 수도 있다. 그러한 용어들은 한정하는 것으로서 해석되지 않아야 하고 다른 기법들이 유사한 동작들을 수행하는데 활용될 수도 있음을 유의해야 한다. 부가적으로, 본 명세서에서 지칭되는 바와 같이, "생성하는 것", "계산하는 것", “추정하는 것”, "사용하는 것", "선택하는 것", "액세스하는 것" 및 "결정하는 것" 은 상호교환가능하게 사용될 수도 있다. 예를 들어, 파라미터 (또는 신호) 를 "생성하는 것", "계산하는 것”, “추정하는 것”, 또는 "결정하는 것" 은 파라미터 (또는 신호) 를 능동적으로 생성하는 것, 추정하는 것, 계산하는 것, 또는 결정하는 것을 지칭할 수도 있거나, 또는 예컨대, 다른 컴포넌트 또는 디바이스에 의해 이미 생성된 파라미터 (또는 신호) 를 사용하는 것, 선택하는 것, 또는 액세스하는 것을 지칭할 수도 있다.In this disclosure, terms such as “determining”, “calculating”, “estimating”, “shifting”, “adjusting”, etc. are used to describe how one or more operations are performed. may also be used for It should be noted that such terms should not be construed as limiting and that other techniques may be utilized to perform similar operations. Additionally, as referred to herein, "generate", "calculate", "estimate", "use", "select", "access" and "determine" "To do" may be used interchangeably. For example, “generating”, “calculating”, “estimating”, or “determining” a parameter (or signal) means actively generating, estimating, or estimating a parameter (or signal). , calculating, or determining, or may refer to using, selecting, or accessing a parameter (or signal) already generated, eg, by another component or device.

도 1 을 참조하면, 오디오 소스 지향성에 기초하여 심리음향 강화를 수행하도록 구성된 시스템의 특정 예시적인 양태가 개시되고 일반적으로 100으로 지정된다. 시스템 (100) 은 하나 이상의 마이크로폰들 (120), 카메라 (140), 하나 이상의 스피커들 (160), 디스플레이 디바이스 (162), 입력 디바이스 (130), 또는 이들의 조합에 결합된 디바이스 (102) 를 포함한다. 일부 구현들에서, 디스플레이 디바이스(162)는 입력 디바이스(130)(예를 들어, 터치스크린)를 포함한다.Referring to FIG. 1 , a particular exemplary aspect of a system configured to perform psychoacoustic enhancement based on audio source directivity is disclosed and generally designated 100 . System 100 includes a device 102 coupled to one or more microphones 120, a camera 140, one or more speakers 160, a display device 162, an input device 130, or a combination thereof. include In some implementations, display device 162 includes input device 130 (eg, a touchscreen).

디바이스 (102) 는 메모리 (132)에 결합된 하나 이상의 프로세서들 (190) 을 포함한다. 메모리(132)는 등화기(equalizer; Eq) 설정 데이터(149), 지향성 데이터(141), 오디오 인핸서(192)에 의해 사용되거나 생성되는 다른 데이터, 또는 이들의 조합을 저장하도록 구성된다. 특정 양태에서, 하나 이상의 프로세서들 (190) 은 하나 이상의 입력 인터페이스들 (124) 을 통해 하나 이상의 마이크로폰들 (120)에 결합된다. 예를 들어, 하나 이상의 입력 인터페이스(124)는 하나 이상의 마이크로폰(120)으로부터 하나 이상의 마이크로폰 출력(122)을 수신하고 하나 이상의 마이크로폰 출력(122)을 하나 이상의 입력 오디오 신호(SIG)(126)로서 오디오 인핸서(192)에 제공하도록 구성된다.Device 102 includes one or more processors 190 coupled to memory 132 . Memory 132 is configured to store equalizer (Eq) setting data 149, directivity data 141, other data used or generated by audio enhancer 192, or a combination thereof. In a particular aspect, one or more processors 190 are coupled to one or more microphones 120 via one or more input interfaces 124 . For example, one or more input interfaces 124 may receive one or more microphone outputs 122 from one or more microphones 120 and convert one or more microphone outputs 122 to one or more input audio signals (SIGs) 126 as audio. Enhancer 192 is configured to provide.

특정 양태에서, 하나 이상의 프로세서들 (190) 은 입력 인터페이스 (144) 를 통해 카메라 (140)에 결합된다. 예를 들어, 입력 인터페이스(144)는 카메라(140)로부터 카메라 출력(142)을 수신하고 카메라 출력(142)을 이미지 데이터(145)로서 오디오 인핸서(192)에 제공하도록 구성된다. 특정 양태에서, 입력 인터페이스 (144) 는 하나 이상의 입력 인터페이스들 (124) 이 이미지 데이터 (145) 를 오디오 인핸서 (192)에 제공하는 것과 동시에 오디오 인핸서 (192)에 카메라 출력 (142) 을 제공하도록 구성된다.In a particular aspect, one or more processors 190 are coupled to camera 140 via input interface 144 . For example, input interface 144 is configured to receive camera output 142 from camera 140 and provide camera output 142 as image data 145 to audio enhancer 192 . In a particular aspect, input interface 144 is configured to provide camera output 142 to audio enhancer 192 at the same time as one or more input interfaces 124 provide image data 145 to audio enhancer 192. do.

디바이스 (102) 는 하나 이상의 프로세서들 (190)에 포함된 오디오 인핸서 (192) 를 사용하여 오디오 소스 지향성에 기초하여 심리음향 강화를 수행하도록 구성된다. 오디오 인핸서(192)는 등화기(148)에 결합된 지향성 분석기(152) 및 등화기(148)에 결합된 등화 전 (pre-equalization) 신호 프로세서(146)를 포함한다. 일부 구현들에 따르면, 지향성 데이터 업데이터(150)가 오디오 인핸서(192)에 포함된다.Device 102 is configured to perform psychoacoustic enhancement based on audio source directivity using audio enhancer 192 included in one or more processors 190 . Audio enhancer 192 includes directivity analyzer 152 coupled to equalizer 148 and pre-equalization signal processor 146 coupled to equalizer 148 . According to some implementations, directional data updater 150 is included in audio enhancer 192 .

입력 디바이스(130)는 줌 타겟(133)을 나타내는 사용자 입력(131)을 디바이스(102)에 제공하도록 구성된다. 지향성 분석기(152)는 지향성 데이터(141), 줌 타겟(133), 등화기 설정 데이터(149), 또는 이들의 조합에 기초하여 등화기 설정들(153)을 생성하도록 구성된다. 예를 들어, 지향성 분석기(152)는 등화기 설정들(153)을 적용하여 하나 이상의 마이크로폰들(120)을 줌 타겟(133)에 더 가깝게 이동시키는 것을 에뮬레이트하기 위해 특정 오디오 주파수들의 라우드니스를 조정하도록 등화기 설정들(153)을 생성하도록 구성된다.Input device 130 is configured to provide user input 131 indicating zoom target 133 to device 102 . Directivity analyzer 152 is configured to generate equalizer settings 153 based on directivity data 141 , zoom target 133 , equalizer settings data 149 , or a combination thereof. For example, directivity analyzer 152 applies equalizer settings 153 to adjust the loudness of certain audio frequencies to emulate moving one or more microphones 120 closer to zoom target 133. configured to generate equalizer settings (153).

등화기(148)는 하나 이상의 출력 오디오 신호들(138)을 생성하기 위해 하나 이상의 등화기 입력 오디오 신호들(147)에 등화기 설정들(153)을 적용하도록 구성된다. 특정 양태에서, 하나 이상의 등화기 입력 오디오 신호들 (147) 은 하나 이상의 입력 오디오 신호들 (126) 을 포함한다. 대안적인 구현에서, 등화 전 신호 프로세서(146)는, 도 3을 참조하여 추가로 설명되는 바와 같이, 하나 이상의 입력 오디오 신호들(126)을 프로세싱하여 하나 이상의 등화기 입력 오디오 신호들(147)을 생성하도록 구성된다.Equalizer 148 is configured to apply equalizer settings 153 to one or more equalizer input audio signals 147 to generate one or more output audio signals 138 . In a particular aspect, one or more equalizer input audio signals 147 includes one or more input audio signals 126 . In an alternative implementation, pre-equalization signal processor 146 processes one or more input audio signals 126 to obtain one or more equalizer input audio signals 147, as further described with reference to FIG. configured to create

일부 구현들에서, 디바이스 (102) 는 하나 또는 여러 타입들의 디바이스들에 대응하거나 그에 포함된다. 예시적인 예에서, 프로세서(190)는, 도 9를 참조하여 추가로 설명되는 바와 같이, 하나 이상의 스피커들(160)을 포함하는 헤드셋 디바이스에 통합된다. 다른 예들에서, 프로세서 (190) 는 도 8 을 참조하여 설명된 바와 같은 모바일 폰 또는 태블릿 컴퓨터 디바이스, 도 10 을 참조하여 설명된 바와 같은 웨어러블 전자 디바이스, 도 11 을 참조하여 설명된 바와 같은 음성 제어 스피커 시스템, 도 12 를 참조하여 설명된 바와 같은 카메라 디바이스, 또는 도 13 을 참조하여 설명된 바와 같은 가상 현실 헤드셋 또는 증강 현실 헤드셋 중 적어도 하나에 통합된다. 다른 예시적인 예에서, 프로세서(190)는, 도 14 및 도 15를 참조하여 추가로 설명되는 바와 같이, 하나 이상의 스피커(160)를 또한 포함하는 차량에 통합된다.In some implementations, device 102 corresponds to or is included in one or several types of devices. In the illustrative example, processor 190 is integrated into a headset device that includes one or more speakers 160, as further described with reference to FIG. 9 . In other examples, processor 190 may be a mobile phone or tablet computer device as described with reference to FIG. 8 , a wearable electronic device as described with reference to FIG. 10 , a voice controlled speaker as described with reference to FIG. 11 . system, a camera device as described with reference to FIG. 12 , or a virtual reality headset or augmented reality headset as described with reference to FIG. 13 . In another illustrative example, processor 190 is integrated into a vehicle that also includes one or more speakers 160, as described further with reference to FIGS. 14 and 15 .

동작 동안, 하나 이상의 마이크로폰들 (120) 은 오디오 소스 (184) (예를 들어, 사람) 를 포함하는 하나 이상의 오디오 소스들로부터 사운드 (186) 를 캡처하고 사운드 (186) 를 나타내는 하나 이상의 마이크로폰 출력들 (122) 을 생성한다. 특정 양태에서, 하나 이상의 오디오 소스들은 사람, 동물, 새, 차량, 악기, 다른 타입의 오디오 소스, 또는 이들의 조합을 포함한다. 하나 이상의 입력 인터페이스(124)는 하나 이상의 입력 오디오 신호(126)로서 하나 이상의 마이크로폰 출력(122)을 오디오 인핸서(192)에 제공한다.During operation, one or more microphones 120 capture sound 186 from one or more audio sources including audio source 184 (eg, a person) and output one or more microphone outputs representing sound 186. (122). In certain aspects, the one or more audio sources include people, animals, birds, vehicles, musical instruments, other types of audio sources, or combinations thereof. One or more input interfaces 124 provide one or more microphone outputs 122 to audio enhancer 192 as one or more input audio signals 126 .

특정 구현에서, 카메라 (140) 는 오디오 소스 (184) 와 같은 하나 이상의 오디오 소스들의 이미지들 (예를 들어, 비디오, 스틸 이미지들, 또는 양자 모두) 을 캡처하고, 이미지들을 나타내는 카메라 출력 (142) 을 생성한다. 이 구현에서, 입력 인터페이스(144)는 카메라 출력(142)을 이미지 데이터(145)로서 오디오 인핸서(192)에 제공한다. 특정 양태에서, 카메라 (140) 는 하나 이상의 마이크로폰들 (120) 이 하나 이상의 마이크로폰 출력들 (122) 을 디바이스 (102)에 제공하는 것과 동시에 카메라 출력 (142) 을 디바이스 (102)에 제공한다.In a particular implementation, camera 140 captures images (e.g., video, still images, or both) of one or more audio sources, such as audio source 184, and camera output 142 representing the images. generate In this implementation, input interface 144 provides camera output 142 as image data 145 to audio enhancer 192 . In a particular aspect, camera 140 provides camera output 142 to device 102 at the same time as one or more microphones 120 provide one or more microphone outputs 122 to device 102 .

특정 구현에서, 이미지 데이터 (145), 하나 이상의 입력 오디오 신호들 (126), 또는 이들의 조합은, 외부 센서들 (예를 들어, 마이크로폰 (120) 및 카메라 (140)) 을 통해 캡처된 데이터에 대응하는 대신에, 비디오 게임 데이터 또는 이전에 레코딩된 데이터와 같은 저장된 데이터에 대응한다. 예를 들어, 오디오 인핸서 (192) 는 메모리 (132) 로부터 이미지 데이터 (145), 하나 이상의 입력 오디오 신호들 (126), 또는 이들의 조합을 취출한다.In a particular implementation, image data 145 , one or more input audio signals 126 , or a combination thereof may be added to data captured via external sensors (e.g., microphone 120 and camera 140). Instead of corresponding, it corresponds to stored data, such as video game data or previously recorded data. For example, audio enhancer 192 retrieves image data 145, one or more input audio signals 126, or a combination thereof from memory 132.

하나 이상의 프로세서들 (190) 은 하나 이상의 입력 오디오 신호들 (126)에 기초하여 하나 이상의 출력 오디오 신호들 (138) 을 생성하고 하나 이상의 스피커들 (160) 을 통해 하나 이상의 출력 오디오 신호들 (138) 을 출력한다. 특정 구현에서, 하나 이상의 프로세서(190)는 이미지 데이터(145)에 기초하여 그래픽 사용자 인터페이스(161)를 생성하고, 하나 이상의 스피커(160)를 통해 하나 이상의 출력 오디오 신호(138)를 출력하는 것과 동시에 사용자(101)에게 카메라(140)에 의해 캡처된 이미지를 디스플레이하기 위해 그래픽 사용자 인터페이스(161)를 디스플레이 디바이스(162)에 제공한다.One or more processors 190 generate one or more output audio signals 138 based on one or more input audio signals 126 and send one or more output audio signals 138 via one or more speakers 160. outputs In certain implementations, one or more processors 190 generate graphical user interface 161 based on image data 145 and simultaneously output one or more output audio signals 138 through one or more speakers 160. A graphical user interface 161 is provided on the display device 162 to display the image captured by the camera 140 to the user 101 .

디바이스(102)는 오디오 줌 동작을 개시하기 위해 사용자(101)에 응답한다. 예를 들어, 사용자(101)는 입력 디바이스(130)를 사용하여 줌 타겟(133)을 나타내는 사용자 입력(131)을 오디오 인핸서(192)에 제공한다. 특정 구현에서, 사용자(101)는 도 2a 및 도 2b를 참조하여 추가로 설명되는 바와 같이, 그래픽 사용자 인터페이스(161)에 디스플레이된 줌 선택기를 이동시켜 줌 타겟(133)을 선택하기 위해 입력 디바이스(130)(예를 들어, 마우스, 키보드, 버튼, 슬라이더 입력, 또는 이들의 조합)를 사용한다. 다른 구현에서, 사용자(101)는 그래픽 사용자 인터페이스(161)와 독립적으로 오디오 줌 동작을 개시한다. 예를 들어, 하나 이상의 프로세서들 (190) 은 디스플레이 디바이스 (162)에 임의의 GUI 를 제공하는 것과 독립적으로 하나 이상의 스피커들 (160)에 하나 이상의 출력 오디오 신호들 (138) 을 제공한다. 사용자(101)는 입력 디바이스(130)(예를 들어, 키보드 상의 화살표 키들, 헤드셋 상의 버튼들 등)를 사용하여 줌 타겟(133)을 표시하는 사용자 입력(131)을 오디오 인핸서(192)에 제공한다. 예시하기 위해, 사용자 (101) 는 도 9 를 참조하여 추가로 설명된 바와 같이, 입력 디바이스 (130) 를 사용하여 하나 이상의 스피커들 (160) 의 오디오 출력에 대응하는 음장(sound field)의 상이한 영역들로 줌한다.Device 102 responds to user 101 to initiate an audio zoom operation. For example, user 101 uses input device 130 to provide user input 131 representing zoom target 133 to audio enhancer 192 . In a particular implementation, user 101 moves a zoom selector displayed on graphical user interface 161 to select zoom target 133 by using an input device (as further described with reference to FIGS. 2A and 2B ). 130) (e.g., mouse, keyboard, button, slider input, or combinations thereof). In another implementation, user 101 initiates an audio zoom operation independently of graphical user interface 161 . For example, one or more processors 190 provide one or more output audio signals 138 to one or more speakers 160 independent of providing any GUI to display device 162 . User 101 provides user input 131 indicating zoom target 133 to audio enhancer 192 using input device 130 (eg, arrow keys on a keyboard, buttons on a headset, etc.) do. To illustrate, user 101 uses input device 130 to select different regions of the sound field corresponding to the audio output of one or more speakers 160, as further described with reference to FIG. zoom into fields

줌 타겟(133)은 오디오 줌이 어떻게 수행되어야 하는지를 나타내는 정보를 포함한다. 다양한 구현들에서, 줌 타겟(133)은 도 4 내지 도 6을 참조하여 추가로 설명되는 바와 같이, 적어도 하나의 오디오 소스(예를 들어, 오디오 소스(184))의 사용자의 선택, 마이크로폰(120)을 이동시키는 것을 시뮬레이션하는 방식으로 오디오를 조정하기 위한 사용자의 선택, 또는 이들의 조합을 포함하거나 표시할 수 있다. 예를 들어, 줌 타겟 (133) 은 오디오 소스 (184) 의 사용자의 선택 및 하나 이상의 마이크로폰들 (120) 이 오디오 소스 (184) 에 얼마나 더 가깝게 위치되는 것으로 (예를 들어, 2 피트 더 가깝게) 인지되어야 하는지를 나타내는 줌 거리 (135) 를 포함할 수 있다. 다른 예에서, 줌 타겟 (133) 은 하나 이상의 마이크로폰들 (120) 이 위치 (134) (예를 들어, 물리적 위치) 로부터 얼마나 많이 그리고 어느 방향으로 이동한 것으로 인지되어야 하는지를 나타내는 줌 배향 (137) 및 줌 거리 (135) 의 사용자의 선택을 포함할 수 있다. 특정 예시적인 예에서, 줌 배향(137)의 제 1 값(예를 들어, 0도), 제 2 값(예를 들어, 90도), 제 3 값(예를 들어, 180도), 또는 제 4 값(예를 들어, 270도)은 위치(134)에 대한 하나 이상의 마이크로폰들(120)의 전방 이동, 우측 이동, 후방 이동, 또는 좌측 이동에 각각 대응한다. 사용자(101)가 줌 거리(135) 및 줌 배향(137)을 선택할 때와 같은 특정 예에서, 오디오 인핸서(192)는 줌 배향(137) 및 줌 거리(135)를 (하나 이상의 마이크로폰(120)의) 위치(134)에 적용함으로써 줌 위치(136)를 결정한다. 다른 예에서, 줌 타겟(133)이 줌 위치(136)의 사용자의 선택을 포함할 때, 오디오 인핸서(192)는 위치(134)와 줌 위치(136)의 비교에 기초하여 줌 배향(137) 및 줌 거리(135)를 결정한다. 특정 예에서, 줌 타겟(133)이 오디오 소스(184)의 사용자의 선택을 포함할 때, 오디오 인핸서(192)는 오디오 소스(184)의 위치를 추정하고, 오디오 소스(184)의 추정된 위치에 기초하여 줌 거리(135), 줌 위치(136), 줌 배향(137), 또는 이들의 조합을 결정한다. 특정 양태에서, 오디오 인핸서 (192) 는 이미지 분석 기법들, 오디오 분석 기법들, 오디오 소스 (184) 의 포지션 정보, 또는 이들의 조합을 이용하여 오디오 소스 (184) 의 위치를 추정한다. 특정 양태에서, 위치 (134) 는 복수의 마이크로폰들 (120) 의 위치들의 대표 위치 (예를 들어, 평균 위치)에 대응하고, 줌 위치 (136) 는 복수의 마이크로폰들 (120) 이 이동한 것으로 에뮬레이트되는 위치들의 대표 위치 (예를 들어, 평균 위치)에 대응한다.The zoom target 133 includes information indicating how audio zoom is to be performed. In various implementations, zoom target 133 is a user's selection of at least one audio source (eg, audio source 184 ), microphone 120 , as further described with reference to FIGS. 4-6 . ), the user's choice to adjust the audio in a way that simulates moving the audio, or a combination thereof. For example, zoom target 133 is a user's selection of audio source 184 and how closer one or more microphones 120 are positioned to audio source 184 (eg, 2 feet closer). may include a zoom distance 135 indicating what should be perceived. In another example, zoom target 133 includes zoom orientation 137 indicating how much and in which direction one or more microphones 120 should be perceived to have moved from location 134 (eg, physical location) and may include the user's selection of the zoom distance 135. In certain illustrative examples, a first value (eg, 0 degrees), a second value (eg, 90 degrees), a third value (eg, 180 degrees), or a second value of zoom orientation 137 . The four values (eg, 270 degrees) correspond to forward movement, right movement, backward movement, or left movement of one or more microphones 120 relative to position 134 , respectively. In certain instances, such as when user 101 selects zoom distance 135 and zoom orientation 137, audio enhancer 192 adjusts zoom orientation 137 and zoom distance 135 (one or more microphones 120). ) to determine the zoom position 136 by applying to the position 134 of In another example, when zoom target 133 includes the user's selection of zoom position 136 , audio enhancer 192 determines zoom orientation 137 based on a comparison of position 134 and zoom position 136 . and zoom distance 135. In a particular example, when zoom target 133 includes the user's selection of audio source 184, audio enhancer 192 estimates the location of audio source 184, and the estimated location of audio source 184. Determines zoom distance 135, zoom position 136, zoom orientation 137, or a combination thereof based on . In a particular aspect, audio enhancer 192 estimates a location of audio source 184 using image analysis techniques, audio analysis techniques, position information of audio source 184, or a combination thereof. In a particular aspect, position 134 corresponds to a representative position (e.g., average position) of positions of plurality of microphones 120, and zoom position 136 indicates that plurality of microphones 120 have moved. Corresponds to a representative location (eg, average location) of the emulated locations.

지향성 분석기(152)는 하나 이상의 입력 오디오 신호들(126)에 대응하는 하나 이상의 오디오 소스들(예를 들어, 오디오 소스(184))의 지향성 데이터(141)를 획득한다. 예를 들어, 지향성 분석기(152)는 (예를 들어, 입력 오디오 신호(126)를 분석하는 것, 이미지 데이터(145)를 분석하는 것, 또는 이들의 조합으로부터와 같은 소스의 타입에 기초하여) 오디오 소스(184)를 식별하고, 메모리(132)로부터 오디오 소스(184)에 가장 가깝게 대응하는 지향성 데이터(141)를 취출한다. 다른 예에서, 지향성 분석기(152)는 다른 디바이스 또는 네트워크로부터 지향성 데이터(141)를 요청(예를 들어, 다운로드)한다.Directivity analyzer 152 obtains directivity data 141 of one or more audio sources (eg, audio source 184 ) corresponding to one or more input audio signals 126 . For example, directivity analyzer 152 may (e.g., based on the type of source, such as from analyzing input audio signal 126, analyzing image data 145, or a combination thereof) The audio source 184 is identified and the directivity data 141 that most closely corresponds to the audio source 184 is retrieved from the memory 132 . In another example, the directivity analyzer 152 requests (eg, downloads) the directivity data 141 from another device or network.

특정 오디오 소스의 지향성 데이터(141)는 특정 오디오 소스의 배향 및 거리 주파수 응답 특성들을 나타낸다. 특정 양태에서, 지향성 데이터 (141) 는 일반적인 오디오 소스와 연관된다. 예를 들어, 지향성 데이터(141)는 일반적인 오디오 소스의 배향 및 주파수 응답 특성들을 나타낸다. 예시하기 위해, 지향성 데이터(141)는 중간-주파수들(mid-frequencies)에 대응하는 주파수 응답이 일반 오디오 소스에 대한 제 1 거리로부터 일반 오디오 소스에 대한 제 2 거리로의 변화에 응답하여 제 1 양만큼 변화(예를 들어, 감소 또는 증가)함을 표시한다. 대안적인 양태에서, 지향성 데이터 (141) 는 특정 타입들의 오디오 소스들과 연관된 지향성 데이터를 나타낸다. 예를 들어, 지향성 데이터(141)는 도 4를 참조하여 추가로 설명되는 바와 같이, 오디오 소스(184)의 특정 오디오 소스 타입(예를 들어, 인간 발화자, 새, 악기 등)으로부터의 거리, 배향, 또는 양자 모두의 변화에 응답하는 다양한 주파수들의 주파수 응답 변화들을 표시한다.The specific audio source's directivity data 141 represents the orientation and distance frequency response characteristics of the specific audio source. In a particular aspect, directional data 141 is associated with a generic audio source. For example, directivity data 141 represents orientation and frequency response characteristics of a typical audio source. To illustrate, the directivity data 141 may provide a frequency response corresponding to mid-frequencies in response to a change from a first distance to a generic audio source to a second distance to a generic audio source at a first frequency response. indicates a change (eg, decrease or increase) by an amount. In an alternative aspect, directional data 141 represents directional data associated with particular types of audio sources. For example, directivity data 141 may include orientation, distance from a particular audio source type (eg, human talker, bird, musical instrument, etc.) of audio source 184, as further described with reference to FIG. 4 . , or changes in the frequency response of various frequencies in response to changes in both.

지향성 분석기(152)는, 도 4를 참조하여 추가로 설명되는 바와 같이, 지향성 데이터(141), 줌 타겟(133), 및 등화기 설정 데이터(149)에 적어도 부분적으로 기초하여 등화기 설정들(153)을 결정한다. 예를 들어, 지향성 분석기(152)는 등화기 설정들(153)을 적용하여 하나 이상의 마이크로폰들(120)을 줌 위치(136)로(또는 줌 위치에 더 가깝게) 이동시키는 것을 에뮬레이트하기 위해 특정 오디오 주파수들의 라우드니스를 조정하도록 등화기 설정들(153)을 생성한다. 특정 구현에서, 지향성 분석기(152)는, 오디오 소스(184)의 오디오 소스 타입에 대한 지향성 데이터가 이용가능하지 않다고 결정하는 것에 응답하여, 디폴트 지향성 데이터에 기초하여 등화기 설정들(153)을 선택한다. 예시를 위해, 지향성 분석기(152)는 (예를 들어, 오디오 소스(184)의 오디오 소스 타입과 독립적으로) 중간 주파수들에 대응하는 주파수 응답을 조정(예를 들어, 증가 또는 감소)하기 위해 등화기 설정들(153)을 선택한다. 예를 들어, 지향성 분석기(152)는 줌 위치(136)와 오디오 소스(184) 사이의 거리가 위치(134)와 오디오 소스(184) 사이의 거리보다 작다는 결정에 응답하여 중간 주파수들에 대응하는 라우드니스를 증가시키기 위해 등화기 설정들(153)을 선택한다. 다른 예로서, 지향성 분석기(152)는 줌 위치(136)와 오디오 소스(184) 사이의 거리가 위치(134)와 오디오 소스(184) 사이의 거리보다 크다고 결정하는 것에 응답하여 중간 주파수들에 대응하는 라우드니스를 감소시키기 위해 등화기 설정들(153)을 선택한다. 대안적인 구현에서, 지향성 분석기(152)는, 도 4를 참조하여 추가로 설명되는 바와 같이, 지향성 데이터(141)에 의해 표시된 오디오 소스(184)의 오디오 소스 타입(예를 들어, 인간 발화자 또는 새)의 방향성(예를 들어, 주파수 응답)에 기초하여 등화기 설정들(153)을 선택한다. 지향성 분석기(152)는 등화기 설정들(153)을 등화기(148)에 제공한다.Directivity analyzer 152 determines equalizer settings (based at least in part on directivity data 141 , zoom target 133 , and equalizer setting data 149 , as further described with reference to FIG. 4 ). 153). For example, directivity analyzer 152 applies equalizer settings 153 to a specific audio signal to emulate moving one or more microphones 120 to (or closer to) zoom position 136 . Create equalizer settings 153 to adjust the loudness of the frequencies. In a particular implementation, directivity analyzer 152, in response to determining that directivity data for an audio source type of audio source 184 is not available, selects equalizer settings 153 based on default directivity data. do. To illustrate, directivity analyzer 152 equalizes (e.g., independently of the audio source type of audio source 184) to adjust (e.g., increase or decrease) the frequency response corresponding to the intermediate frequencies. Presets 153 are selected. For example, directivity analyzer 152 responds to intermediate frequencies in response to determining that the distance between zoom position 136 and audio source 184 is less than the distance between position 134 and audio source 184. select equalizer settings 153 to increase the loudness of the As another example, directivity analyzer 152 responds to intermediate frequencies in response to determining that the distance between zoom position 136 and audio source 184 is greater than the distance between position 134 and audio source 184. Equalizer settings 153 to reduce the loudness of the noise. In an alternative implementation, directivity analyzer 152 determines the audio source type of audio source 184 indicated by directivity data 141 (eg, a human talker or a bird, as further described with reference to FIG. 4 ). ) selects equalizer settings 153 based on the directionality (eg, frequency response) of . Directivity analyzer 152 provides equalizer settings 153 to equalizer 148 .

등화기(148)는 등화기 설정들(153)을 하나 이상의 등화기 입력 오디오 신호들(147)에 적용함으로써 하나 이상의 출력 오디오 신호들(138)을 생성한다. 특정 구현에서, 하나 이상의 등화기 입력 오디오 신호들 (147) 은 하나 이상의 입력 오디오 신호들 (126) 을 포함한다. 다른 구현에서, 등화 전 신호 프로세서(146)는, 도 3을 참조하여 추가로 설명되는 바와 같이, 하나 이상의 입력 오디오 신호들(126)에 등화 전 프로세싱을 적용함으로써 하나 이상의 등화기 입력 오디오 신호들(147)을 생성한다. 등화기(148)는 하나 이상의 출력 오디오 신호들(138)을 하나 이상의 스피커들(160)에 제공한다.Equalizer 148 generates one or more output audio signals 138 by applying equalizer settings 153 to one or more equalizer input audio signals 147 . In a particular implementation, the one or more equalizer input audio signals 147 include one or more input audio signals 126 . In another implementation, pre-equalization signal processor 146 applies pre-equalization processing to one or more input audio signals 126, as further described with reference to FIG. 147). Equalizer 148 provides one or more output audio signals 138 to one or more speakers 160 .

하나 이상의 출력 오디오 신호들 (138) 은 하나 이상의 입력 오디오 신호들 (126) 의 심리음향 강화된 버전에 대응한다. 심리음향 강화 버전(예를 들어, 하나 이상의 출력 오디오 신호들(138))은 오디오 줌 동작과 연관된 줌 위치(136)(예를 들어, 줌 배향(137) 및 줌 거리(135))에서의 오디오 소스(184)의 주파수 응답을 근사화한다. 따라서, 하나 이상의 스피커(160)에 의해 생성된 (하나 이상의 출력 오디오 신호(138)에 대응하는) 사운드는 하나 이상의 마이크로폰(120)을 줌 위치(136)로(또는 더 가깝게) 이동시킨 것을 에뮬레이트한다.One or more output audio signals 138 correspond to a psychoacoustic enhanced version of one or more input audio signals 126 . The psychoacoustic enhancement version (eg, one or more output audio signals 138) is audio at a zoom position 136 (eg, zoom orientation 137 and zoom distance 135) associated with an audio zoom operation. Approximate the frequency response of source 184. Thus, the sound (corresponding to the one or more output audio signals 138) produced by the one or more speakers 160 emulates moving the one or more microphones 120 to (or closer to) the zoom position 136. .

특정 구현에서, 지향성 데이터 업데이터(150)는 지향성 데이터(141)를 생성하거나 업데이트한다. 지향성 업데이터(150)는 오디오 소스로부터 다양한 거리들 및 배향들에서 캡처된 오디오를 샘플링 및 분석하고 그 오디오 소스와 연관된 지향성 데이터를 생성 또는 업데이트하도록 구성된다. 특정 예에서, 지향성 데이터 업데이터(150)는, 제 1 시간에, 오디오 소스(184)에 대응하는 하나 이상의 입력 오디오 신호들(126) 중 입력 오디오 신호의 제 1 사운드 스펙트럼을 생성한다. 제 1 사운드 스펙트럼은 오디오 소스(184)가 하나 이상의 마이크로폰(120)에 대한 제 1 배향을 가질 때 오디오 소스(184)로부터 제 1 거리에서 하나 이상의 마이크로폰(120)에 의해 캡처된 사운드를 나타낸다. 지향성 데이터 업데이터(150)는, 제 2 시간에, 오디오 소스(184)에 대응하는 하나 이상의 입력 오디오 신호들(126) 중 입력 오디오 신호의 제 2 사운드 스펙트럼을 생성한다. 제2 사운드 스펙트럼은, 오디오 소스(184)가 하나 이상의 마이크로폰들(120)에 대한 제2 배향을 가질 때 오디오 소스(184)로부터 제 2 거리에서 하나 이상의 마이크로폰들(120)에 의해 캡처된 사운드를 나타낸다. 지향성 데이터 업데이터(150)는 제 1 거리와 제 1 방향 및 제 2 거리와 제 2 방향 사이의 차이가 제 1 사운드 스펙트럼과 제 2 사운드 스펙트럼 사이의 차이에 대응함을 나타내도록 지향성 데이터(141)를 업데이트한다.In a particular implementation, directional data updater 150 creates or updates directional data 141 . Directivity updater 150 is configured to sample and analyze audio captured at various distances and orientations from an audio source and generate or update directivity data associated with the audio source. In a particular example, the directional data updater 150 generates, at a first time, a first sound spectrum of an input audio signal of the one or more input audio signals 126 corresponding to the audio source 184 . A first sound spectrum represents sound captured by one or more microphones 120 at a first distance from audio source 184 when audio source 184 has a first orientation relative to one or more microphones 120. Directional data updater 150 generates, at a second time, a second sound spectrum of an input audio signal of one or more input audio signals 126 corresponding to audio source 184 . The second sound spectrum represents sound captured by one or more microphones 120 at a second distance from audio source 184 when audio source 184 has a second orientation relative to one or more microphones 120. indicate Directivity data updater 150 updates directivity data 141 to indicate that a difference between a first distance and a first direction and a second distance and a second direction corresponds to a difference between a first sound spectrum and a second sound spectrum. do.

시스템(100)은 하나 이상의 마이크로폰들(120)을 줌 위치(136)로 이동시키는 것에 근사하는 줌 타겟(133)에 대한 오디오 줌 동작을 가능하게 한다. 오디오 소스(184)의 지향성에 기초하여 주파수들에 대한 라우드니스를 조정함으로써 하나 이상의 출력 오디오 신호들(138)을 생성하는 것은 하나 이상의 입력 오디오 신호들(126)의 이득들만을 조정하는 것에 비해 더 자연스러운 사운딩 오디오 줌을 초래한다.System 100 enables an audio zoom operation relative to zoom target 133 that approximates moving one or more microphones 120 to zoom position 136 . Generating one or more output audio signals 138 by adjusting loudness with respect to frequencies based on the directivity of audio source 184 is more natural than adjusting only the gains of one or more input audio signals 126. This results in a sounding audio zoom.

하나 이상의 마이크로폰들 (120), 카메라 (140), 하나 이상의 스피커들 (160), 디스플레이 디바이스 (162), 및 입력 디바이스 (130) 가 디바이스 (102)에 결합되는 것으로서 예시되지만, 다른 구현들에서, 하나 이상의 마이크로폰들 (120), 카메라 (140), 하나 이상의 스피커들 (160), 디스플레이 디바이스 (162), 입력 디바이스 (130), 또는 이들의 조합이 디바이스 (102)에 통합될 수도 있다. 시스템(100)의 다양한 구현들은 더 적은, 추가적인, 또는 상이한 컴포넌트들을 포함할 수도 있다. 예를 들어, 일부 구현들에서, 지향성 데이터 업데이터(150), 카메라(140) 또는 양자 모두는 생략될 수 있다.Although one or more microphones 120, camera 140, one or more speakers 160, display device 162, and input device 130 are illustrated as being coupled to device 102, in other implementations, One or more microphones 120 , camera 140 , one or more speakers 160 , display device 162 , input device 130 , or a combination thereof may be incorporated into device 102 . Various implementations of system 100 may include fewer, additional, or different components. For example, in some implementations, directional data updater 150, camera 140, or both may be omitted.

도 2a를 참조하면, GUI(161)의 예가 도시되어 있다. 특정 양태에서, 그래픽 사용자 인터페이스 (161) 는 도 1 의 오디오 인핸서 (192), 하나 이상의 프로세서들 (190), 디바이스 (102), 시스템 (100), 또는 이들의 조합에 의해 생성된다.Referring to FIG. 2A , an example of GUI 161 is shown. In a particular aspect, graphical user interface 161 is generated by audio enhancer 192 of FIG. 1 , one or more processors 190 , device 102 , system 100 , or a combination thereof.

그래픽 사용자 인터페이스(161)는 도 1의 이미지 데이터(145)에 대응하는 이미지들을 디스플레이하도록 구성된 비디오 디스플레이(204)를 포함한다. 예를 들어, 비디오 디스플레이(204)는 오디오 소스(184)의 이미지들을 디스플레이한다. 그래픽 사용자 인터페이스(161)는 오디오 줌 동작을 개시하는 데 사용될 수 있는 줌 선택기(202)를 포함한다. 예를 들어, 도 1의 사용자(101)는 오디오 소스(184)로 줌인하기 위해 줌 선택기(202)를 위로 또는 오디오 소스(184)로부터 줌아웃하기 위해 아래로 이동시킬 수 있다. 특정 양태에서, 줌 선택기(202)를 위로 이동시키는 것은 줌 배향(137)에 대한 제 1 값(예를 들어, 0도, 전방, 또는 줌 인)을 선택하는 것에 대응하고, 줌 선택기(202)를 아래로 이동시키는 것은 줌 배향(137)에 대한 제 2 값(예를 들어, 180도, 후방, 또는 줌 아웃)을 선택하는 것에 대응한다. 줌 선택기(202)의 이동량은 줌 거리(135)를 나타낸다. 줌 타겟(133)은 줌 거리(135), 줌 배향(137), 또는 양자 모두를 포함한다.Graphical user interface 161 includes video display 204 configured to display images corresponding to image data 145 of FIG. 1 . For example, video display 204 displays images of audio source 184. The graphical user interface 161 includes a zoom selector 202 that can be used to initiate audio zoom operations. For example, user 101 of FIG. 1 can move zoom selector 202 up to zoom in to audio source 184 or down to zoom out of audio source 184 . In certain aspects, moving zoom selector 202 up corresponds to selecting a first value for zoom orientation 137 (e.g., 0 degrees, forward, or zoom in), and zoom selector 202 Moving down corresponds to selecting a second value for zoom orientation 137 (eg, 180 degrees, back, or zoom out). The amount of movement of the zoom selector 202 represents the zoom distance 135 . A zoom target 133 includes a zoom distance 135 , a zoom orientation 137 , or both.

도 2b를 참조하면, 그래픽 사용자 인터페이스(161)의 예가 도시되어 있다. 특정 양태에서, 그래픽 사용자 인터페이스 (161) 는 도 1 의 오디오 인핸서 (192), 하나 이상의 프로세서들 (190), 디바이스 (102), 시스템 (100), 또는 이들의 조합에 의해 생성된다.Referring to FIG. 2B , an example of a graphical user interface 161 is shown. In a particular aspect, graphical user interface 161 is generated by audio enhancer 192 of FIG. 1 , one or more processors 190 , device 102 , system 100 , or a combination thereof.

그래픽 사용자 인터페이스(161)는 사용자(101)가 오디오 줌 동작을 개시하기 위해 줌 선택기(202)를 이동시켰음을 나타낸다. 예를 들어, 사용자(101)는 줌 배향(137)(예를 들어, 전방, 0도, 또는 줌 인) 및 줌 거리(135)(예를 들어, 2피트)의 선택에 대응하여 줌 선택기(202)를 위로 이동시키기 위해 입력 디바이스(130)를 사용하고, 입력 디바이스(130)는 줌 타겟(133)을 나타내는 사용자 입력(131)을 오디오 인핸서(192)에 제공한다. 줌 타겟(133)은 (예를 들어, 줌 선택기(202)의 이동량에 기초하여) 줌 배향(137)(예를 들어, 0도, 전방, 또는 줌 인) 및 줌 거리(135)를 나타낸다. 지향성 분석기(152)는 도 4를 참조하여 추가로 설명되는 바와 같이, 줌 타겟(133)에 적어도 부분적으로 기초하여 등화기 설정들(153)을 생성한다. 등화기(148)는, 도 1을 참조하여 설명된 바와 같이, 하나 이상의 등화기 입력 오디오 신호들(147)에 등화기 설정들(153)을 적용함으로써 하나 이상의 출력 오디오 신호들(138)을 생성(예를 들어, 업데이트)한다. 등화기(148)는 하나 이상의 출력 오디오 신호들(138)을 하나 이상의 스피커들(160)에 제공한다.Graphical user interface 161 indicates that user 101 has moved zoom selector 202 to initiate an audio zoom operation. For example, user 101 responds to selection of zoom orientation 137 (e.g. forward, 0 degree, or zoom in) and zoom distance 135 (e.g. 2 feet) with a zoom selector ( 202), and input device 130 provides user input 131 representing zoom target 133 to audio enhancer 192. The zoom target 133 represents a zoom orientation 137 (eg, 0 degree, forward, or zoomed in) and a zoom distance 135 (eg, based on the amount of movement of the zoom selector 202 ). Directivity analyzer 152 generates equalizer settings 153 based at least in part on zoom target 133 , as further described with reference to FIG. 4 . Equalizer 148 generates one or more output audio signals 138 by applying equalizer settings 153 to one or more equalizer input audio signals 147, as described with reference to FIG. (e.g. update). Equalizer 148 provides one or more output audio signals 138 to one or more speakers 160 .

특정 양태에서, 하나 이상의 프로세서들 (190) 은, 사용자 입력 (131)에 응답하여, 이미지 데이터 (145)에 대해 이미지 줌 동작을 수행하고, 등화기 (148) 가 하나 이상의 출력 오디오 신호들 (138) 을 하나 이상의 스피커들 (160)에 제공하는 것과 동시에 이미지 데이터 (145) 의 줌된 버전을 디스플레이하도록 비디오 디스플레이 (204) 를 업데이트한다. 예시된 바와 같이, 오디오 소스(184)는 도 2a와 비교하여 도 2b의 비디오 디스플레이(204)에서 확대되어, 오디오 줌 동작이 오디오 소스(184)로 줌인되었음을 나타낸다.In a particular aspect, one or more processors 190, in response to user input 131, perform an image zoom operation on image data 145, and equalizer 148 outputs one or more output audio signals 138 ) to one or more speakers 160 while updating video display 204 to display a zoomed version of image data 145 . As illustrated, the audio source 184 is enlarged in the video display 204 of FIG. 2B compared to FIG. 2A to indicate that the audio zoom operation has zoomed into the audio source 184 .

줌 선택기(202)는 줌 타겟(133)을 선택하는 예시적인 예로서 제공된다. 다른 구현들에서, 사용자(101)는 줌 타겟(133)을 특정하는 다른 방식들을 사용할 수도 있다. 특정 예에서, 그래픽 사용자 인터페이스(161)는 터치스크린(예를 들어, 입력 디바이스(130)) 상에 디스플레이되고, 사용자(101)는 줌 타겟(133)을 특정하기 위해 터치스크린과 상호 작용(예를 들어, 탭핑 또는 핀치 줌 제스처를 사용)한다. 예를 들어, 사용자(101)는 줌 타겟(133)으로서 줌 위치(136), 오디오 소스(184), 또는 양자 모두의 선택에 대응하는 비디오 디스플레이(204) 상의 위치를 선택하기 위해 터치 스크린을 탭할 수 있다. 다른 예로서, 사용자(101)는 줌 배향(137)의 제 1 값(예를 들어, 전방, 0도, 또는 줌 인)을 표시하기 위해 제 1 핀치-줌(예를 들어, 넓어짐) 제스처를 사용하거나, 줌 배향(137)의 제 2 값(예를 들어, 후방, 180도, 또는 줌 아웃)을 표시하기 위해 제 2 핀치-줌(예를 들어, 좁어짐) 제스처를 사용할 수 있다. 핀치 줌 제스처(pinch-zoom gesture)의 거리는 줌 거리(135)를 나타낸다. 줌 타겟(133)은 줌 거리(135), 줌 배향(137), 또는 양자 모두를 포함한다.Zoom selector 202 is provided as an illustrative example of selecting a zoom target 133 . In other implementations, user 101 may use other ways of specifying zoom target 133 . In a particular example, graphical user interface 161 is displayed on a touchscreen (eg, input device 130 ), and user 101 interacts with (eg, input device 130 ) the touchscreen to specify zoom target 133 . , using a tap or pinch-to-zoom gesture). For example, user 101 may tap the touch screen to select a location on video display 204 corresponding to selection of zoom position 136, audio source 184, or both as zoom target 133. can As another example, user 101 makes a first pinch-zoom (eg, widen) gesture to indicate a first value of zoom orientation 137 (eg, forward, 0 degrees, or zoomed in). or use a second pinch-zoom (eg, narrow) gesture to indicate a second value of the zoom orientation 137 (eg, back, 180 degrees, or zoom out). The distance of the pinch-zoom gesture represents the zoom distance 135 . A zoom target 133 includes a zoom distance 135 , a zoom orientation 137 , or both.

특정 예에서, 사용자(101)는 줌 타겟(133)으로서 줌 위치(136), 오디오 소스(184), 또는 양자 모두의 식별자(예를 들어, 명칭)를 나타내는 사용자 입력(예를 들어, 음성 명령, 옵션 선택, 또는 양자 모두)을 제공한다. 오디오 인핸서(192)는 이미지 데이터(145)에 대한 이미지 인식, 입력 오디오 신호들(126)의 오디오 분석, 또는 양자 모두를 수행하여 오디오 소스(184), 줌 위치(136), 또는 양자 모두를 식별한다. 예를 들어, 사용자(101)는 오디오 소스(184)(예를 들어, "Sarah Lee")의 식별자(예를 들어, 연락처 이름)를 갖는 사용자 입력(예를 들어, "zoom to Sarah Lee")을 제공한다. 오디오 인핸서(192)는 오디오 소스(184)에 대응하는 이미지 데이터(145)의 부분들(예를 들어, "Sarah Lee")을 식별하기 위해 이미지 데이터(145)에 대해 이미지 인식(예를 들어, 사람 인식 또는 객체 인식)을 수행하거나, 오디오 소스(184)에 대응하는 입력 오디오 신호들(126)의 부분들(예를 들어, "Sarah Lee")을 식별하기 위해 입력 오디오 신호들(126)에 대해 음성 인식을 수행하거나, 또는 양자 모두를 수행한다. 줌 타겟(133)은 오디오 소스(184)를 포함한다.In a particular example, user 101 receives user input (eg, a voice command) indicating an identifier (eg, name) of zoom position 136 , audio source 184 , or both as zoom target 133 . , option selection, or both). Audio enhancer 192 performs image recognition on image data 145, audio analysis of input audio signals 126, or both to identify audio source 184, zoom position 136, or both. do. For example, user 101 receives user input (eg, "zoom to Sarah Lee") with an identifier (eg, contact name) of audio source 184 (eg, "Sarah Lee"). provides Audio enhancer 192 performs image recognition (e.g., on image data 145) to identify portions of image data 145 (e.g., “Sarah Lee”) that correspond to audio source 184. to the input audio signals 126 to perform human recognition or object recognition), or to identify portions of the input audio signals 126 that correspond to the audio source 184 (eg, “Sarah Lee”). speech recognition, or both. Zoom target 133 includes audio source 184 .

도 3 을 참조하면, 도 1 의 시스템 (100) 의 컴포넌트들의 다이어그램 (300) 이 특정 구현에 따라 도시된다. 등화 전 신호 프로세서 (146) 는 공간 분석기 (340), 활동(activity) 검출기 (342), 이득 조정기 (344), 노이즈 억제기 (346), 컨텍스트 검출기 (350), 또는 이들의 조합을 포함한다. 컨텍스트 검출기(350)는 소스 검출기(362), 소스 포지션 검출기(364), 또는 양자 모두를 포함한다. 도 3에서 점선으로 도시된 컴포넌트들 중 하나 이상은 일부 구현들에서 생략될 수 있다.Referring to FIG. 3 , a diagram 300 of the components of system 100 of FIG. 1 is shown according to a particular implementation. The pre-equalization signal processor 146 includes a spatial analyzer 340, an activity detector 342, a gain adjuster 344, a noise suppressor 346, a context detector 350, or a combination thereof. Context detector 350 includes source detector 362, source position detector 364, or both. One or more of the components shown in dotted lines in FIG. 3 may be omitted in some implementations.

공간 분석기 (340) 는 하나 이상의 빔포밍된 오디오 신호들 (341) 을 생성하기 위해 하나 이상의 입력 오디오 신호들 (126)에 빔포밍을 적용하도록 구성된다. 특정 양태에서, 공간 분석기 (340) 는 줌 타겟 (133)에 기초하여 빔포밍을 적용한다. 예를 들어, 공간 분석기 (340) 는 하나 이상의 빔포밍된 오디오 신호들 (341) 이 줌 배향 (137) 주위에서 캡처된 사운드를 나타내도록 도 1 의 줌 배향 (137)에 기초하여 빔포밍을 적용한다. 공간 분석기 (340) 는 하나 이상의 빔포밍된 오디오 신호들 (341) 을 등화 전 신호 프로세서 (146) 의 하나 이상의 컴포넌트들에 또는 등화기 (148)에 제공한다. 예를 들어, 공간 분석기(340)는 하나 이상의 빔포밍된 오디오 신호들(341)을 하나 이상의 활동 입력 오디오 신호들(361)로서 활동 검출기(342)에, 하나 이상의 이득 조정기 입력 오디오 신호들(363)로서 이득 조정기(344)에, 하나 이상의 컨텍스트 검출기 입력 오디오 신호들(369)로서 컨텍스트 검출기(350)에, 하나 이상의 노이즈 억제 입력 오디오 신호들(365)로서 노이즈 억제기(346)에, 하나 이상의 등화기 입력 오디오 신호들(147)로서 등화기(148)에, 또는 이들의 조합으로서 제공한다.Spatial analyzer 340 is configured to apply beamforming to one or more input audio signals 126 to generate one or more beamformed audio signals 341 . In a particular aspect, spatial analyzer 340 applies beamforming based on zoom target 133 . For example, spatial analyzer 340 applies beamforming based on zoom orientation 137 of FIG. 1 such that one or more beamformed audio signals 341 represent sound captured around zoom orientation 137 . do. Spatial analyzer 340 provides one or more beamformed audio signals 341 to one or more components of pre-equalization signal processor 146 or to equalizer 148 . For example, spatial analyzer 340 may send one or more beamformed audio signals 341 as one or more activity input audio signals 361 to activity detector 342, one or more gain adjuster input audio signals 363 to gain adjuster 344 as one or more context detector input audio signals 369 to context detector 350 as one or more noise suppression input audio signals 365 to noise suppressor 346, to one or more Equalizer input audio signals 147 to equalizer 148, or a combination thereof.

활동 검출기(342)는 하나 이상의 활동 입력 오디오 신호들(361)에서 활동을 검출하도록 구성된다. 특정 구현에서, 하나 이상의 활동 입력 오디오 신호들(361)은 하나 이상의 입력 오디오 신호들(126)을 포함한다. 대안적인 구현에서, 하나 이상의 활동 입력 오디오 신호들(361)은 하나 이상의 빔포밍된 오디오 신호들(341)을 포함한다.Activity detector 342 is configured to detect activity in one or more activity input audio signals 361 . In a particular implementation, one or more activity input audio signals 361 include one or more input audio signals 126 . In an alternative implementation, the one or more activity input audio signals 361 include one or more beamformed audio signals 341 .

활동 검출기(342)는 하나 이상의 활동 입력 오디오 신호들(361)에서 검출된 활동에 기초하여 하나 이상의 활동 오디오 신호들(343)을 생성하도록 구성된다. 특정 예에서, 활동 검출기(342)(예를 들어, 스피치 활동 검출기)는 하나 이상의 활동 입력 오디오 신호(361)의 제 1 활동 입력 오디오 신호에서 스피치를 검출하고, 스피치를 포함하는 하나 이상의 활동 오디오 신호(343)의 제 1 활동 오디오 신호 및 제 1 활동 입력 오디오 신호의 나머지 사운드들을 포함하는 제 2 활동 오디오 신호를 생성하도록 구성된다. 예시를 위해, 제 1 활동 오디오 신호는 감소되거나 나머지 사운드들을 포함하지 않고, 제 2 활동 오디오 신호는 감소되거나 스피치를 포함하지 않는다.Activity detector 342 is configured to generate one or more activity audio signals 343 based on activity detected in one or more activity input audio signals 361 . In a particular example, activity detector 342 (e.g., a speech activity detector) detects speech in a first activity input audio signal of one or more activity input audio signals 361, and detects speech in one or more activity audio signals containing speech. 343 is configured to generate a second activity audio signal comprising the first activity audio signal and remaining sounds of the first activity input audio signal. For illustrative purposes, the first activity audio signal is reduced or contains no remaining sounds, and the second activity audio signal is reduced or contains no speech.

특정 구현에서, 활동 검출기(342)는 다양한 타입들의 오디오 소스들, 동일한 타입의 다양한 오디오 소스들, 또는 양자 모두에 대응하는 사운드들을 검출하도록 구성된다. 예시적인 예에서, 활동 검출기(342)는 하나 이상의 활동 입력 오디오 신호들(361) 중 제 1 활동 입력 오디오 신호에서, 제 1 발화자와 연관된 제 1 스피치, 제 2 발화자와 연관된 제 2 스피치, 악기와 연관된 음악 사운드들, 새와 연관된 새 사운드들, 또는 이들의 조합을 검출하도록 구성된다. 활동 검출기(342)는 제 1 스피치를 포함하는 제 1 활동 오디오 신호(예를 들어, 나머지 사운드들이 없거나 감소된 것), 제 2 스피치를 포함하는 제 2 활동 오디오 신호(예를 들어, 나머지 사운드들이 없거나 감소된 것), 음악 사운드를 포함하는 제 3 활동 오디오 신호(예를 들어, 나머지 사운드들이 없거나 감소된 것), 새 사운드들을 포함하는 제 4 활동 오디오 신호(예를 들어, 나머지 사운드들이 없거나 감소된 것), 제 1 활동 입력 오디오 신호의 나머지 사운드들을 포함하는 제 5 활동 오디오 신호, 또는 이들의 조합을 생성하도록 구성된다. 하나 이상의 활동 오디오 신호(343)는 제 1 활동 오디오 신호, 제 2 활동 오디오 신호, 제 3 활동 오디오 신호, 제 4 활동 오디오 신호, 제 5 활동 오디오 신호, 또는 이들의 조합을 포함한다.In a particular implementation, activity detector 342 is configured to detect sounds corresponding to different types of audio sources, different audio sources of the same type, or both. In the illustrative example, activity detector 342 may detect in a first activity input audio signal of one or more activity input audio signals 361 a first speech associated with a first speaker, a second speech associated with a second speaker, an instrument and and detect associated music sounds, bird sounds associated with a bird, or a combination thereof. Activity detector 342 receives a first activity audio signal comprising a first speech (e.g., the remaining sounds are absent or reduced), a second activity audio signal comprising a second speech (e.g., remaining sounds are absent or reduced), and absent or reduced), a third activity audio signal comprising music sound (eg no remaining sounds present or reduced), a fourth activity audio signal comprising new sounds (eg no remaining sounds present or reduced) ), a fifth activity audio signal comprising the remaining sounds of the first activity input audio signal, or a combination thereof. The one or more activity audio signals 343 include a first activity audio signal, a second activity audio signal, a third activity audio signal, a fourth activity audio signal, a fifth activity audio signal, or a combination thereof.

활동 검출기(342)는 하나 이상의 활동 오디오 신호들(343)을 등화 전 신호 프로세서(146)의 하나 이상의 컴포넌트들에, 등화기(148)에, 또는 이들의 조합에 제공한다. 예를 들어, 활동 검출기(342)는 하나 이상의 활동 오디오 신호들(343)을 하나 이상의 이득 조정기 입력 오디오 신호들(363)로서 이득 조정기(344)에, 하나 이상의 컨텍스트 검출기 입력 오디오 신호들(369)로서 컨텍스트 검출기(350)에, 하나 이상의 노이즈 억제 입력 오디오 신호들(365)로서 노이즈 억제기(346)에, 하나 이상의 등화기 입력 오디오 신호들(147)로서 등화기(148)에, 또는 이들의 조합으로서 제공한다.Activity detector 342 provides one or more activity audio signals 343 to one or more components of pre-equalization signal processor 146, equalizer 148, or a combination thereof. For example, activity detector 342 may send one or more activity audio signals 343 as one or more gain adjuster input audio signals 363 to gain adjuster 344 and one or more context detector input audio signals 369 to context detector 350 as one or more noise suppression input audio signals 365 to noise suppressor 346, one or more equalizer input audio signals 147 to equalizer 148, or provided as a combination.

이득 조정기 (344) 는 하나 이상의 이득 조정기 입력 오디오 신호들 (363)에 하나 이상의 이득들을 적용한다. 하나 이상의 이득 조정기 입력 오디오 신호들 (363) 은 하나 이상의 입력 오디오 신호들 (126), 하나 이상의 빔포밍된 오디오 신호들 (341), 또는 하나 이상의 활동 오디오 신호들 (343) 을 포함한다. 이득 조정기(344)는 줌 타겟(133)에 기초하여 하나 이상의 이득들을 적용한다. 예를 들어, 오디오 줌 동작이 줌 타겟 (133)에 대한 줌인에 대응할 때, 이득 조정기 (344) 는 줌 배향 (137) 으로부터의 사운드들에 대응하는 하나 이상의 이득 조정기 입력 오디오 신호들 (363) 의 제 1 입력 오디오 신호들의 이득들을 증가시키거나, 나머지 방향들로부터의 사운드들에 대응하는 하나 이상의 이득 조정기 입력 오디오 신호들 (363) 의 제 2 입력 오디오 신호들의 이득들을 감소시키거나, 또는 양자 모두이다. 다른 예에서, 오디오 줌 동작이 줌 타겟 (133) 으로부터 멀어지는 줌에 대응할 때, 이득 조정기 (344) 는 줌 배향 (137) 으로부터의 사운드들에 대응하는 하나 이상의 이득 조정기 입력 오디오 신호들 (363) 의 제 1 입력 오디오 신호들의 이득들을 감소시키거나, 나머지 방향들로부터의 사운드들에 대응하는 하나 이상의 이득 조정기 입력 오디오 신호들 (363) 의 제 2 입력 오디오 신호들의 이득들을 증가시키거나, 또는 양자 모두이다. 특정 양태에서, 이득 조정량은 줌 거리 (135)에 기초한다.A gain adjuster 344 applies one or more gains to one or more gain adjuster input audio signals 363 . The one or more gain adjuster input audio signals 363 include one or more input audio signals 126 , one or more beamformed audio signals 341 , or one or more activity audio signals 343 . Gain adjuster 344 applies one or more gains based on zoom target 133 . For example, when an audio zoom operation corresponds to zooming in on zoom target 133 , gain adjuster 344 outputs one or more of the gain adjuster input audio signals 363 corresponding to sounds from zoom orientation 137 . Increases the gains of the first input audio signals, decreases the gains of the second input audio signals of the one or more gain adjuster input audio signals 363 corresponding to sounds from the other directions, or both . In another example, when the audio zoom operation corresponds to a zoom away from zoom target 133 , gain adjuster 344 outputs one or more of the gain adjuster input audio signals 363 corresponding to sounds from zoom orientation 137 . reduce the gains of the first input audio signals, increase the gains of the second input audio signals of the one or more gain adjuster input audio signals 363 corresponding to sounds from the other directions, or both . In certain aspects, the amount of gain adjustment is based on zoom distance 135 .

이득 조정기 (344) 는 하나 이상의 이득 조정된 오디오 신호들 (345) 을 사전 등화 신호 프로세서 (146) 의 하나 이상의 컴포넌트들, 등화기 (148), 또는 이들의 조합에 제공한다. 예를 들어, 이득 조정기(344)는 하나 이상의 컨텍스트 검출기 입력 오디오 신호들(369)로서 하나 이상의 이득 조정된 오디오 신호들(345)을 컨텍스트 검출기(350)에, 하나 이상의 노이즈 억제 입력 오디오 신호들(365)을 노이즈 억제기(346)에, 하나 이상의 등화기 입력 오디오 신호들(147)을 등화기(148)에, 또는 이들의 조합으로서 제공한다.Gain adjuster 344 provides one or more gain-adjusted audio signals 345 to one or more components of pre-equalization signal processor 146, equalizer 148, or a combination thereof. For example, gain adjuster 344 may send one or more gain-adjusted audio signals 345 as one or more context detector input audio signals 369 to context detector 350, as well as one or more noise suppression input audio signals ( 365 to noise suppressor 346, one or more equalizer input audio signals 147 to equalizer 148, or a combination thereof.

컨텍스트 검출기(350)는 컨텍스트 데이터(351)를 생성하기 위해 하나 이상의 컨텍스트 검출기 입력 오디오 신호(369), 이미지 데이터(145) 또는 이들의 조합을 프로세싱한다. 특정 양태에서, 하나 이상의 컨텍스트 검출기 입력 오디오 신호들 (369) 은 하나 이상의 입력 오디오 신호들 (126), 하나 이상의 빔포밍된 오디오 신호들 (341), 하나 이상의 활동 오디오 신호들 (343), 또는 하나 이상의 이득 조정된 오디오 신호들 (345) 을 포함한다.Context detector 350 processes one or more context detector input audio signals 369 , image data 145 , or a combination thereof to generate context data 351 . In a particular aspect, one or more context detector input audio signals 369 may be one or more input audio signals 126, one or more beamformed audio signals 341, one or more activity audio signals 343, or one Gain-adjusted audio signals 345 above.

소스 검출기(362)는 하나 이상의 컨텍스트 검출기 입력 오디오 신호(369), 이미지 데이터(145) 또는 이들의 조합에 대해 오디오 소스 인식을 수행하여 오디오 소스(184)와 같은 하나 이상의 오디오 소스들의 오디오 소스 타입을 식별한다. 예를 들어, 소스 검출기(362)는 이미지 데이터(145)에 대한 이미지 분석(예를 들어, 객체 인식 및 거리 분석)을 수행하여 이미지 데이터(145)가 카메라(140)에 대한 제 1 위치에서의 오디오 소스 타입(예를 들어, 인간 발화자)을 나타낸다고 결정한다. 특정 양태에서, 소스 검출기 (362) 는 하나 이상의 컨텍스트 검출기 입력 오디오 신호들 (369)에 대해 사운드 분석 (예를 들어, 오디오 소스 인식 및 거리 분석) 을 수행하여 하나 이상의 컨텍스트 검출기 입력 오디오 신호들 (369) 이 하나 이상의 마이크로폰들 (120)에 대한 제 2 위치로부터의 오디오 소스 타입과 매칭하는 사운드들을 포함한다고 결정한다. 특정 양태에서, 소스 검출기 (362) 는 카메라 (140)에 대한 제 1 위치가 하나 이상의 마이크로폰들 (120)에 대한 제 2 위치와 동일한 물리적 위치에 대응한다고 결정한다. 소스 검출기 (362) 는 오디오 소스 타입, 카메라 (140)에 대한 제 1 위치, 하나 이상의 마이크로폰들 (120)에 대한 제 2 위치, 물리적 위치, 또는 이들의 조합을 나타내는 소스 검출 데이터를 소스 포지션 검출기 (364)에 제공한다.Source detector 362 performs audio source recognition on one or more context detector input audio signals 369, image data 145, or combinations thereof to determine the audio source type of one or more audio sources, such as audio source 184. identify For example, source detector 362 may perform image analysis (eg, object recognition and distance analysis) on image data 145 so that image data 145 is at a first location relative to camera 140 . Determines that it represents an audio source type (eg, human talker). In a particular aspect, source detector 362 performs sound analysis (e.g., audio source recognition and distance analysis) on one or more context detector input audio signals 369 to determine the one or more context detector input audio signals 369. ) contains sounds that match the audio source type from the second location for the one or more microphones 120 . In a particular aspect, source detector 362 determines that the first location for camera 140 corresponds to the same physical location as the second location for one or more microphones 120 . Source detector 362 transmits source detection data indicative of an audio source type, a first location for camera 140, a second location for one or more microphones 120, a physical location, or a combination thereof to a source position detector ( 364) is provided.

소스 포지션 검출기(364)는 카메라(140)에 대한 이미지 데이터(145) 내의 오디오 소스(184)의 배향을 검출하기 위해 이미지 분석을 수행한다. 예시를 위해, 오디오 소스(184)가 인간 발화자에 대응하면, 소스 포지션 검출기(364)는 이미지 데이터(145)에 대한 이미지 인식을 수행함으로써 인간 발화자의 머리의 배향(예를 들어, 카메라(140)를 향해 보거나 카메라(140)로부터 멀리 봄)을 추정한다.Source position detector 364 performs image analysis to detect the orientation of audio source 184 in image data 145 relative to camera 140 . For example, if audio source 184 corresponds to a human speaker, source position detector 364 performs image recognition on image data 145 to determine the orientation of the human speaker's head (e.g., camera 140). looking toward or away from the camera 140).

예시적인 예에서, 소스 포지션 검출기(364)는 카메라(140)에 대한 오디오 소스(184)의 배향 및 카메라(140)와 하나 이상의 마이크로폰(120)의 위치의 차이에 기초하여 하나 이상의 마이크로폰(120)에 대한 오디오 소스(184)의 배향을 결정한다. 컨텍스트 검출기 (350) 는 하나 이상의 마이크로폰들 (120)에 대한 제 2 위치가 하나 이상의 마이크로폰들 (120) 로부터의 오디오 소스 (184) 의 거리를 나타낸다고 결정한다. 컨텍스트 검출기(350)는 하나 이상의 마이크로폰들(120)로부터의 오디오 소스(184)의 거리, 하나 이상의 마이크로폰들(120)에 대한 오디오 소스(184)의 배향, 오디오 소스(184)의 오디오 소스 타입, 또는 이들의 조합을 표시하는 컨텍스트 데이터(351)를 생성한다. 컨텍스트 검출기(350)는 컨텍스트 데이터(351)를 지향성 분석기(152)에 제공한다. In the illustrative example, source position detector 364 detects one or more microphones 120 based on the orientation of audio source 184 relative to camera 140 and the difference between the positions of camera 140 and one or more microphones 120. Determines the orientation of the audio source 184 relative to Context detector 350 determines that the second position for one or more microphones 120 represents the distance of audio source 184 from one or more microphones 120 . Context detector 350 includes the distance of audio source 184 from one or more microphones 120, the orientation of audio source 184 relative to one or more microphones 120, the audio source type of audio source 184, Alternatively, context data 351 indicating a combination thereof is generated. Context detector 350 provides context data 351 to orientation analyzer 152 .

노이즈 억제기 (346) 는 하나 이상의 노이즈 억제된 오디오 신호들 (347) 을 생성하기 위해 하나 이상의 노이즈 억제 입력 오디오 신호들 (365)에 대해 노이즈 억제를 수행한다. 특정 양태에서, 하나 이상의 노이즈 억제 입력 오디오 신호들 (365) 은 하나 이상의 입력 오디오 신호들 (126), 하나 이상의 빔포밍된 오디오 신호들 (341), 하나 이상의 활동 오디오 신호들 (343), 또는 하나 이상의 이득 조정된 오디오 신호들 (345) 을 포함한다. 노이즈 억제기(346)는 하나 이상의 노이즈 억제된 오디오 신호들(347)을 하나 이상의 등화기 입력 오디오 신호들(147)로서 등화기(148)에 제공한다.A noise suppressor 346 performs noise suppression on one or more noise suppression input audio signals 365 to produce one or more noise suppressed audio signals 347 . In a particular aspect, one or more noise suppression input audio signals 365 may be one or more input audio signals 126, one or more beamformed audio signals 341, one or more activity audio signals 343, or one Gain-adjusted audio signals 345 above. Noise suppressor 346 provides one or more noise suppressed audio signals 347 to equalizer 148 as one or more equalizer input audio signals 147 .

등화 전 신호 프로세서(146)의 컴포넌트들의 동작들의 특정 순서는 예시적인 예로서 제공된다. 다른 예들에서, 등화 전 신호 프로세서(146)의 컴포넌트들의 동작들의 순서는 상이할 수 있다. 특정 예에서, 줌 타겟(133)은 오디오 소스(184)의 선택을 나타낸다. 컨텍스트 검출기(350)는 줌 타겟(133)에 기초하여, 오디오 소스(184)의 오디오 소스 타입(예를 들어, 인간 발화자 또는 새)을 활동 검출기(342)에 제공한다. 활동 검출기(342)는 오디오 소스 타입의 사운드들에 대응하는(예를 들어, 나머지 사운드들이 감소되거나 없는) 하나 이상의 활동 오디오 신호들(343)의 제 1 활동 신호들, 나머지 사운드들에 대응하는(예를 들어, 오디오 소스 타입의 사운드들이 없거나 또는 감소된 사운드들을 갖는) 제 2 활동 신호들, 또는 조합을 생성한다. 활동 검출기(342)는 하나 이상의 활동 오디오 신호들(343)을 이득 조정기(344)에 제공한다. 이득 조정기(344)는, 오디오 줌 동작이 줌 타겟(133)을 향해 주밍(zooming)하는 것을 포함한다고 결정하는 것에 응답하여, 제 1 활동 신호들의 이득을 증가시키거나, 제 2 활동 신호들의 이득을 감소시키거나, 또는 양자 모두를 행한다. 대안적으로, 이득 조정기(344)는, 오디오 줌 동작이 줌 타겟(133)으로부터 멀리 주밍하는 것을 포함한다고 결정하는 것에 응답하여, 제 1 활동 신호들의 이득을 감소시키고, 제 2 활동 신호들의 이득을 증가시키거나, 또는 양자 모두를 행한다.The specific order of operations of the components of signal processor 146 before equalization is provided as an illustrative example. In other examples, the order of operations of the components of signal processor 146 before equalization may be different. In the particular example, zoom target 133 represents a selection of audio source 184 . Context detector 350 provides the audio source type (eg, human talker or bird) of audio source 184 to activity detector 342 , based on zoom target 133 . Activity detector 342 includes first activity signals of one or more activity audio signals 343 corresponding to sounds of an audio source type (e.g., remaining sounds reduced or absent), remaining sounds corresponding to (e.g., remaining sounds) generating second activity signals, eg, with no sounds or reduced sounds of the audio source type), or a combination. Activity detector 342 provides one or more activity audio signals 343 to gain adjuster 344 . The gain adjuster 344, in response to determining that the audio zoom operation includes zooming toward the zoom target 133, increases the gain of the first activity signals or increases the gain of the second activity signals. decrease, or both. Alternatively, gain adjuster 344, in response to determining that the audio zoom operation includes zooming away from zoom target 133, reduces the gain of the first activity signals and reduces the gain of the second activity signals. increase, or both.

특정 양태에서, 지향성 분석기 (152) 는 도 4 를 참조하여 추가로 설명되는 바와 같이, 오디오 소스 (184) 의 오디오 소스 타입에 기초하여 지향성 데이터 (141) 를 획득한다. 지향성 분석기(152)는, 도 4를 참조하여 추가로 설명되는 바와 같이, 지향성 데이터(141)에 기초하여 등화기 설정들(153)을 생성한다. 지향성 분석기(152)는 등화기 설정들(153)을 등화기(148)에 제공한다.In a particular aspect, directivity analyzer 152 obtains directivity data 141 based on the audio source type of audio source 184, as further described with reference to FIG. Directivity analyzer 152 generates equalizer settings 153 based on directivity data 141 , as further described with reference to FIG. 4 . Directivity analyzer 152 provides equalizer settings 153 to equalizer 148 .

등화기(148)는 하나 이상의 출력 오디오 신호들(138)을 생성하기 위해 하나 이상의 등화기 입력 오디오 신호들(147)에 등화기 설정들(153)을 적용한다. 특정 양태에서, 하나 이상의 등화기 입력 오디오 신호들 (147) 은 하나 이상의 입력 오디오 신호들 (126), 하나 이상의 활동 오디오 신호들 (343), 하나 이상의 이득 조정된 오디오 신호들 (345), 또는 하나 이상의 노이즈 억제된 오디오 신호들 (347) 을 포함한다.Equalizer 148 applies equalizer settings 153 to one or more equalizer input audio signals 147 to produce one or more output audio signals 138 . In a particular aspect, one or more equalizer input audio signals 147 may be one or more input audio signals 126, one or more activity audio signals 343, one or more gain-adjusted audio signals 345, or one or more input audio signals 126. noise suppressed audio signals 347 above.

따라서, 등화 전 신호 프로세서(146)는 등화를 수행하기 전에 빔포밍, 이득 조정, 노이즈 감소, 또는 이들의 조합에 의해 오디오 인핸서(192)의 성능을 개선하기 위해 등화 전 신호 프로세싱을 수행한다. 특정 양태에서, 등화 전 신호 프로세서(146)는 지향성 분석기(152)가 하나 이상의 오디오 소스들의 오디오 소스 타입들의 지향성에 기초하여 등화기 설정들(153)을 결정할 수 있게 하기 위해 컨텍스트 데이터(351)를 결정한다.Accordingly, the pre-equalization signal processor 146 performs pre-equalization signal processing to improve the performance of the audio enhancer 192 by beamforming, gain adjustment, noise reduction, or a combination thereof prior to performing equalization. In a particular aspect, pre-equalization signal processor 146 uses context data 351 to enable directivity analyzer 152 to determine equalizer settings 153 based on directivity of audio source types of one or more audio sources. Decide.

일부 구현들에서, 등화 전 신호 프로세서(146)는 생략될 수 있다. 일 예로서, 지향성 분석기(152)는 디폴트 지향성 데이터에 기초하여 등화기 설정들(153)을 생성하고, 등화기(148)는 하나 이상의 출력 오디오 신호들(138)을 생성하기 위해 하나 이상의 입력 오디오 신호들(126)에 등화기 설정들(153)을 적용한다(예를 들어, 그의 중간 주파수들을 조정한다).In some implementations, the pre-equalization signal processor 146 may be omitted. As an example, directivity analyzer 152 generates equalizer settings 153 based on default directivity data, and equalizer 148 generates one or more input audio signals 138 to produce one or more output audio signals 138. Apply equalizer settings 153 to signals 126 (eg, adjust their intermediate frequencies).

일부 구현들에서, 등화 전 신호 프로세서(146)의 하나 이상의 컴포넌트들은 생략될 수 있다. 일 예에서, 공간 분석기 (340) 및 활동 검출기 (342) 는 생략되고, 하나 이상의 입력 오디오 신호들 (126) 은 하나 이상의 이득 조정기 입력 오디오 신호들 (363) 로서 이득 조정기 (344)에 제공된다. 일부 구현들에서, 공간 분석기(340)는 생략되고, 하나 이상의 입력 오디오 신호들(126)은 하나 이상의 활동 입력 오디오 신호들(361)로서 활동 검출기(342)에 제공된다. 일부 구현들에서, 활동 검출기 (342) 는 생략되고, 하나 이상의 빔포밍된 오디오 신호들 (341) 은 하나 이상의 이득 조정기 입력 오디오 신호들 (363) 로서 이득 조정기 (344)에 제공된다. 일부 구현들에서, 이득 조정기(344)는 생략되고, 하나 이상의 활동 오디오 신호들(343)은 하나 이상의 컨텍스트 검출기 입력 오디오 신호들(369)로서 컨텍스트 검출기(350)에 그리고 하나 이상의 노이즈 억제 입력 오디오 신호들(365)로서 노이즈 억제기(346)에 제공된다. 컴포넌트들의 특정 조합들은 예시적인 예들로서 설명된다. 다른 구현들에서, 컴포넌트들의 다른 조합들이 등화 전 신호 프로세서(146)에 포함된다.In some implementations, one or more components of pre-equalization signal processor 146 may be omitted. In one example, spatial analyzer 340 and activity detector 342 are omitted, and one or more input audio signals 126 are provided to gain adjuster 344 as one or more gain adjuster input audio signals 363. In some implementations, spatial analyzer 340 is omitted and one or more input audio signals 126 are provided to activity detector 342 as one or more activity input audio signals 361 . In some implementations, activity detector 342 is omitted, and one or more beamformed audio signals 341 are provided to gain adjuster 344 as one or more gain adjuster input audio signals 363 . In some implementations, gain adjuster 344 is omitted, and one or more activity audio signals 343 are sent to context detector 350 as one or more context detector input audio signals 369 and one or more noise suppression input audio signals. are provided to the noise suppressor 346 as fields 365 . Certain combinations of components are described as illustrative examples. In other implementations, other combinations of components are included in the pre-equalization signal processor 146.

도 4 를 참조하면, 도 1 의 시스템 (100) 의 컴포넌트들의 특정 예의 다이어그램 (400) 이 도시된다. 지향성 분석기(152)는 지향성 데이터(141), 컨텍스트 데이터(351) 및 사용자 입력(131)을 획득하는 것으로 예시되어 있다. 컨텍스트 데이터(351)는 오디오 소스(184)의 소스 포지션 데이터(420)를 포함한다. 예를 들어, 소스 포지션 데이터 (420) 는 도 3 을 참조하여 설명된 바와 같이, 하나 이상의 마이크로폰들 (120)에 대한 (예를 들어, 0도 또는 그를 향하는) 오디오 소스 (184) 의 소스 배향 (422), 하나 이상의 마이크로폰들 (120) 로부터의 오디오 소스 (184) 의 소스 거리 (424) (예를 들어, 6 피트), 또는 양자 모두를 나타낸다.Referring to FIG. 4 , a diagram 400 of a particular example of components of system 100 of FIG. 1 is shown. Orientation analyzer 152 is illustrated as obtaining orientation data 141 , context data 351 , and user input 131 . Context data 351 includes source position data 420 of audio source 184 . For example, the source position data 420 may be the source orientation of the audio source 184 relative to (e.g., at or toward 0 degrees) one or more microphones 120, as described with reference to FIG. 422), the source distance 424 of the audio source 184 from the one or more microphones 120 (eg, 6 feet), or both.

도 3의 소스 포지션 검출기(364)는 소스 배향(422)(예를 들어, 0도) 및 줌 배향(137)(예를 들어, 0도)에 기초하여, 줌 위치(136)에 대한 오디오 소스(184)의 소스 배향(432)(예를 들어, 0도)을 결정한다. 소스 포지션 검출기(364)는 줌 거리(135)(예를 들어, 2피트), 줌 배향(137)(예를 들어, 0도), 및 소스 거리(424)(예를 들어, 6피트)에 기초하여 줌 위치(136)로부터 오디오 소스(184)의 소스 거리(434)(예를 들어, 4피트)를 결정한다.Source position detector 364 of FIG. 3 determines the audio source relative to zoom position 136 based on source orientation 422 (eg, 0 degrees) and zoom orientation 137 (eg, 0 degrees). Determine the source orientation 432 of (184) (eg, 0 degrees). Source position detector 364 determines zoom distance 135 (eg, 2 feet), zoom orientation 137 (eg, 0 degrees), and source distance 424 (eg, 6 feet). determine the source distance 434 (eg, 4 feet) of the audio source 184 from the zoom position 136 based on the zoom position 136 .

특정한 양태에서, 지향성 분석기 (152) 는 오디오 소스 (184) 의 오디오 소스 타입에 기초하여 지향성 데이터 (141) 를 획득한다. 일 예로서, 사운드 지향성 패턴들(402)의 그래픽 묘사는 수평 평면 및 수직 평면에서의 오디오 소스 타입(예를 들어, 인간 발화자)의 주파수-의존 지향성을 나타낸다. 특정한 양태에서, 지향성 데이터 (141) 는 오디오 소스 타입의 다양한 배향들과 연관된 복수의 지향성 데이터 세트들을 포함한다. 지향성 분석기(152)는 지향성 데이터 세트(404)가 소스 배향(422) 및 소스 배향(432)과 매칭하는 오디오 소스 타입의 특정 배향(예를 들어, 축 상의, 수평 축 및 수직 축을 따른 0도)과 연관된다는 결정에 응답하여 지향성 데이터(141)로부터 지향성 데이터 세트(404)를 선택한다.In a particular aspect, directivity analyzer 152 obtains directivity data 141 based on the audio source type of audio source 184 . As an example, the graphical depiction of sound directivity patterns 402 represents the frequency-dependent directivity of an audio source type (eg, human talker) in a horizontal plane and a vertical plane. In a particular aspect, directivity data 141 includes a plurality of sets of directivity data associated with various orientations of an audio source type. The directivity analyzer 152 determines the specific orientation of the type of audio source for which the directivity data set 404 matches the source orientation 422 and the source orientation 432 (e.g., on axis, 0 degrees along the horizontal axis and vertical axis). Selects a directivity data set 404 from the directivity data 141 in response to determining that it is associated with.

지향성 데이터 세트(404)의 그래픽 묘사는 특정 배향을 따른 특정 거리(예를 들어, 1 미터)로부터 다양한 거리들까지의 (예를 들어, 마이크로폰의) 거리에서의 변화들에 대응하는 오디오 소스 타입(예를 들어, 오디오 소스(184))의 주파수 응답 특성들을 나타낸다. 예를 들어, 지향성 데이터 세트(404)는 특정 배향(예를 들어, 축 상)을 따라 소스 거리(424)(예를 들어, 1 미터)로부터 소스 거리(434)(예를 들어, 1 센티미터)로의 변화에 대한 오디오 소스 타입(예를 들어, 오디오 소스(184))의 주파수 응답 특성들(449)을 나타낸다. 특정 양태에서, 주파수 응답 특성들 (449) 은 다양한 사운드 주파수들에 대한 라우드니스 (예를 들어, 데시벨 (dB))에서의 변화들을 나타낸다. 예를 들어, 주파수 응답 특성들 (449) 은 특정 배향 (예를 들어, 축 상에서) 을 따라 소스 거리 (424) (예를 들어, 1 미터) 로부터 소스 거리 (434) (예를 들어, 1 센티미터) 를 향해 이동하는 것이 특정 주파수 (예를 들어, 500 헤르츠 (Hz))에 대한 라우드니스의 강하 (예를 들어, - 0.2 dB), 다른 주파수 범위 (예를 들어, 800 Hz 내지 1 킬로헤르츠 (kHz))에 대한 라우드니스의 상승 (예를 들어, + 4 dB), 또는 양자 모두에 대응한다는 것을 나타낸다. 특정 예에서, 주파수 응답 특성들(449)은 특정 배향(예를 들어, 축 상)을 따라 소스 거리(424)(예를 들어, 1 미터)로부터 소스 거리(434)(예를 들어, 1 센티미터)를 향해 이동하는 것이 다른 특정 주파수 범위(예를 들어, 200 Hz 내지 400 Hz)에 대한 라우드니스의 무시할 수 있는(예를 들어, 임계치 미만의) 변화들에 대응한다는 것을 나타낸다. 예시하자면, 특정 주파수 범위(예를 들어, 200 Hz 내지 400 Hz)에 대한 라우드니스에서의 변화들은 인간의 청각 시스템에 대해 지각가능하지 않을 수도 있다.A graphical depiction of the directivity data set 404 is an audio source type (e.g., of a microphone) corresponding to changes in distance (e.g., of a microphone) from a specific distance (e.g., 1 meter) along a specific orientation to various distances. For example, it represents the frequency response characteristics of audio source 184. For example, the directional data set 404 is a source distance 434 (eg, 1 centimeter) from a source distance 424 (eg, 1 meter) along a particular orientation (eg, on an axis). frequency response characteristics 449 of an audio source type (e.g., audio source 184) for a change to . In a particular aspect, the frequency response characteristics 449 indicate changes in loudness (eg, in decibels (dB)) for various sound frequencies. For example, frequency response characteristics 449 can be measured along a particular orientation (eg, on axis) from source distance 424 (eg, 1 meter) to source distance 434 (eg, 1 centimeter). ) is a drop in loudness (e.g., -0.2 dB) for a particular frequency (e.g., 500 hertz (Hz)), other frequency ranges (e.g., 800 Hz to 1 kilohertz (kHz)). )) corresponds to a rise in loudness (eg, + 4 dB), or both. In a particular example, the frequency response characteristics 449 are measured along a particular orientation (eg, on axis) from a source distance 424 (eg, 1 meter) to a source distance 434 (eg, 1 centimeter). ) corresponds to negligible (eg, below threshold) changes in loudness for another specific frequency range (eg, 200 Hz to 400 Hz). To illustrate, changes in loudness for a particular frequency range (eg, 200 Hz to 400 Hz) may not be perceptible to the human auditory system.

특정 예에서, 소스 배향(422)은 소스 배향(432)과 매칭하고, 지향성 분석기(152)는 특정 배향(예를 들어, 소스 배향(422) 및 소스 배향(432))을 따른 소스 거리에서의 변화에 대응하는 지향성 데이터 세트(404)를 선택한다. 이 예에서, 지향성 데이터 세트(404)는 특정 배향(예를 들어, 소스 배향(422) 및 소스 배향(432))을 따른 소스 거리에서의 변화(예를 들어, 소스 거리(424)로부터 소스 거리(434)까지)에 대응하는 주파수 응답 특성들(449)을 표시한다. 일부 다른 예들에서, 소스 배향(422)은 도 6을 참조하여 추가로 설명되는 바와 같이 소스 배향(432)과 상이하고, 지향성 분석기(152)는 (예를 들어, 소스 거리(424)로부터 소스 거리(434)로의) 소스 거리의 변화 및 (예를 들어, 소스 배향(422)으로부터 소스 배향(432)으로의) 소스 배향의 변화에 대응하는 주파수 응답 특성들(449)을 표시하는 지향성 데이터(141)로부터 지향성 데이터 세트를 선택한다.In a particular example, source orientation 422 matches source orientation 432, and directivity analyzer 152 determines at a source distance along a particular orientation (eg, source orientation 422 and source orientation 432). A directivity data set 404 corresponding to the change is selected. In this example, the directional data set 404 is a change in source distance along a particular orientation (e.g., source orientation 422 and source orientation 432) (e.g., source distance from source distance 424). (up to 434). In some other examples, source orientation 422 is different than source orientation 432, as further described with reference to FIG. 6, and directivity analyzer 152 determines (e.g., source distance from source distance 424 Directivity data 141 indicating frequency response characteristics 449 corresponding to a change in source distance (to 434) and a change in source orientation (eg, from source orientation 422 to source orientation 432). ) to select a directional data set from

특정 양태에서, 지향성 분석기 (152) 는 메모리 (132), 다른 디바이스, 네트워크, 또는 이들의 조합으로부터 등화기 설정 데이터 (149) 를 획득한다. 특정 구현에서, 등화기 설정 데이터(149)는 컨텍스트 데이터(351)(예를 들어, 오디오 소스(184)의 오디오 소스 타입), 지향성 데이터(141)(예를 들어, 지향성 데이터 세트(404)), 줌 거리(135), 소스 거리(424), 소스 거리(434), 줌 배향(137), 소스 배향(422), 소스 배향(432), 주파수 응답 특성들(예를 들어, 주파수 응답 특성들(449)), 또는 이들의 조합을 등화기 설정들(153)과 연관시킨다. 지향성 분석기(152)는, 등화기 설정 데이터(149)에 기초하여, 오디오 소스(184)의 오디오 소스 타입, 줌 거리(135), 소스 거리(424), 소스 거리(434), 줌 배향(137), 소스 배향(422), 소스 배향(432), 주파수 응답 특성들(449), 또는 이들의 조합에 매칭되는 등화기 설정들(153)을 선택한다.In a particular aspect, directivity analyzer 152 obtains equalizer setting data 149 from memory 132, another device, a network, or a combination thereof. In certain implementations, equalizer setting data 149 includes context data 351 (e.g., audio source type of audio source 184), directivity data 141 (e.g., directivity data set 404) , zoom distance 135, source distance 424, source distance 434, zoom orientation 137, source orientation 422, source orientation 432, frequency response characteristics (e.g., frequency response characteristics (449)), or a combination thereof, with equalizer settings (153). Based on the equalizer setting data 149, the directivity analyzer 152 determines the audio source type of the audio source 184, the zoom distance 135, the source distance 424, the source distance 434, and the zoom orientation 137. ), source orientation 422, source orientation 432, frequency response characteristics 449, or a combination thereof.

특정 양태에서, 지향성 분석기(152)는, 등화기 설정 데이터(149)에 기초하여, 주파수 응답 특성들(449)과 매칭되는 등화기 설정들(153)을 선택한다. 예를 들어, 등화기 설정들(153)은 특정 주파수(예를 들어, 500Hz)에 대한 라우드니스의 강하(예를 들어, - 0.2dB), 제 1 주파수 범위(예를 들어, 800Hz 내지 1킬로헤르츠(kHz))에 대한 라우드니스의 상승(예를 들어, + 4dB), 제 2 주파수 범위(예를 들어, 200Hz 내지 400Hz)에 대한 라우드니스의 어떠한 변화들도 없음, 또는 이들의 조합에 대응한다. 따라서, 지향성 분석기(152)는 등화기 설정들(153)을 적용하여 하나 이상의 마이크로폰들(120)을 줌 위치(136)로(또는 줌 위치에 더 가깝게) 이동시키는 것의 주파수 응답 특성들에 근사하도록 등화기 설정들(153)을 생성한다.In a particular aspect, directivity analyzer 152 selects, based on equalizer setting data 149 , equalizer settings 153 that match frequency response characteristics 449 . For example, the equalizer settings 153 may be a drop in loudness (eg, -0.2 dB) for a specific frequency (eg, 500 Hz), a first frequency range (eg, 800 Hz to 1 kilohertz). (kHz)), no changes in loudness for the second frequency range (eg, 200 Hz to 400 Hz), or a combination thereof. Accordingly, directivity analyzer 152 applies equalizer settings 153 to approximate the frequency response characteristics of moving one or more microphones 120 to (or closer to) zoom position 136 . Create equalizer settings (153).

도 5 를 참조하면, 도 1 의 시스템 (100) 의 컴포넌트들의 특정 예의 다이어그램 (500) 이 도시된다. 지향성 데이터(141)의 지향성 데이터 세트(504)의 그래픽 묘사는 특정 배향(예를 들어, - 45도 수평축 및 0도 수직축)에 대한 오디오 소스(184)의 오디오 소스 타입(예를 들어, 인간 발화자)의 주파수 응답 특성들을 나타낸다. 예를 들어, 지향성 데이터 세트(504)는 특정 배향을 따라 특정 거리(예를 들어, 1 미터)로부터 다양한 거리들까지의 (예를 들어, 마이크로폰의) 거리의 변화들에 대응하는 주파수 응답 특성들을 나타낸다.Referring to FIG. 5 , a diagram 500 of a particular example of components of the system 100 of FIG. 1 is shown. A graphical representation of the directivity data set 504 of the directivity data 141 is an audio source type (eg, human talker) of the audio source 184 for a particular orientation (eg, -45 degree horizontal axis and 0 degree vertical axis) ) represents the frequency response characteristics of For example, the directivity data set 504 provides frequency response characteristics corresponding to changes in distance (eg, of a microphone) from a specific distance (eg, 1 meter) to various distances along a specific orientation. indicate

도 3의 컨텍스트 검출기(350)는 도 3을 참조하여 설명된 바와 같이 오디오 소스(184)의 소스 포지션 데이터(520)를 결정한다. 예를 들어, 소스 포지션 데이터 (520) 는 오디오 소스 (184) 가 하나 이상의 마이크로폰들 (120) 의 위치(134)에 대해 소스 배향 (522) (예를 들어, - 45 도 수평 축 및 0 도 수직 축) 으로 소스 거리 (424) (예를 들어, 1 미터)에 대략적으로 위치됨을 나타낸다. 소스 포지션 데이터(520)는 오디오 소스(184)가 줌 위치(136)에 대해 소스 배향(532)(예를 들어, - 45도 수평 축 및 0도 수직 축)으로 소스 거리(434)(예를 들어, 10 센티미터)에 대략적으로 위치됨을 나타낸다.The context detector 350 of FIG. 3 determines the source position data 520 of the audio source 184 as described with reference to FIG. 3 . For example, the source position data 520 may indicate that the audio source 184 has a source orientation 522 relative to the position 134 of the one or more microphones 120 (e.g., -45 degree horizontal axis and 0 degree vertical). axis) at approximately the source distance 424 (eg, 1 meter). The source position data 520 is the source distance 434 (eg, - 45 degree horizontal axis and 0 degree vertical axis) of the audio source 184 to the source orientation 532 (eg - 45 degree horizontal axis and 0 degree vertical axis) relative to the zoom position 136 . e.g., 10 centimeters).

지향성 분석기(152)는 소스 배향(422)(예를 들어, 수평축 상의 - 45도 및 수직축 상의 0도) 및 소스 배향(432)(예를 들어, 수평축 상의 - 45도 및 수직축 상의 0도)이 지향성 데이터 세트(504)와 연관된 특정 배향(예를 들어, - 45도 수평축 및 0도 수직축)과 일치한다는 결정에 응답하여 지향성 데이터(141)로부터 지향성 데이터 세트(504)를 선택한다. 지향성 데이터 세트(504)는 특정 배향(예를 들어, 수평축 상에서 - 45도 및 수직축 상에서 0도)을 따라 소스 거리(424)(예를 들어, 1 미터)로부터 소스 거리(434)(예를 들어, 1 센티미터)로의 변화에 대한 주파수 응답 특성들(549)을 표시한다. 특정 양태에서, 주파수 응답 특성들 (549) 은 소스 거리 (424) (예를 들어, 1 미터) 로부터 소스 거리 (434) (예를 들어, 1 센티미터) 를 향해 특정 배향 (예를 들어, 수평축 상에서 - 45 도 및 수직축 상에서 0 도) 을 따라 이동하는 것이 제 1 주파수 (예를 들어, 500 Hz)에 대한 라우드니스의 강하 (예를 들어, - 0.2 dB), 제 2 주파수 (예를 들어, 800 Hz)에 대한 라우드니스의 제 1 상승 (예를 들어, + 2 dB), 제 3 주파수 (예를 들어, 1 kHz)에 대한 라우드니스의 제 2 상승 (예를 들어, + 4 dB), 특정 주파수 범위 (예를 들어, 200 Hz 내지 315 Hz)에 대한 라우드니스의 무시할 수 있는 (예를 들어, 임계치 미만) 변화, 또는 이들의 조합에 대응한다는 것을 나타낸다.The directivity analyzer 152 determines the source orientation 422 (eg, −45 degrees on the horizontal axis and 0 degrees on the vertical axis) and the source orientation 432 (eg, −45 degrees on the horizontal axis and 0 degrees on the vertical axis). A directional data set 504 is selected from the directional data set 504 in response to a determination that it coincides with a particular orientation associated with the directional data set 504 (eg - 45 degree horizontal axis and 0 degree vertical axis). The directional data set 504 is a source distance 434 (eg, −45 degrees on the horizontal axis and 0 degrees on the vertical axis) from a source distance 424 (eg, 1 meter) along a particular orientation (eg, minus 45 degrees on the horizontal axis and 0 degrees on the vertical axis). , 1 centimeter) of the frequency response characteristics 549. In a particular aspect, the frequency response characteristics 549 have a particular orientation (eg, on a horizontal axis) from a source distance 424 (eg, 1 meter) toward a source distance 434 (eg, 1 centimeter). - 45 degrees and 0 degrees on the vertical axis) is a drop in loudness (eg - 0.2 dB) for a first frequency (eg 500 Hz), a second frequency (eg 800 Hz) ) a first rise in loudness (eg + 2 dB), a second rise in loudness (eg + 4 dB) for a third frequency (eg 1 kHz), a specific frequency range ( eg, between 200 Hz and 315 Hz), or a negligible (eg, below threshold) change in loudness, or a combination thereof.

특정 양태에서, 지향성 분석기(152)는, 등화기 설정 데이터(149)에 기초하여, 주파수 응답 특성들(549)과 매칭되는 등화기 설정들(153)을 선택한다. 예를 들어, 등화기 설정들(153)은 제 1 주파수(예를 들어, 500Hz)에 대한 라우드니스의 강하(예를 들어, - 0.2dB), 제 2 주파수(예를 들어, 800Hz)에 대한 라우드니스의 제 1 상승(예를 들어, + 2dB), 제 3 주파수에 대한 라우드니스의 제 2 상승(예를 들어, + 4dB), 특정 주파수 범위(예를 들어, 200Hz 내지 315Hz)에 대한 라우드니스의 무변화, 또는 이들의 조합에 대응한다. 따라서, 지향성 분석기(152)는 지향성 데이터 세트(504)에 기초하여 등화기 설정들(153)을 생성하여, 등화기 설정들(153)을 적용하는 것이 오디오 소스(184)가 하나 이상의 마이크로폰들(120)에 대한 특정 배향(예를 들어, 수평축 상에서 45도 및 수직축 상에서 0도)을 가질 때 하나 이상의 마이크로폰들(120)을 줌 위치(136)로(또는 줌 위치에 더 가깝게) 이동시키는 주파수 응답 특성들을 근사화하도록 한다.In a particular aspect, directivity analyzer 152 selects, based on equalizer setting data 149 , equalizer settings 153 that match frequency response characteristics 549 . For example, the equalizer settings 153 may be a drop in loudness (eg, -0.2 dB) for a first frequency (eg, 500 Hz), and a loudness for a second frequency (eg, 800 Hz). A first rise (e.g., +2dB), a second rise (e.g., +4dB) in loudness for a third frequency, no change in loudness for a specific frequency range (e.g., 200Hz to 315Hz), or a combination thereof. Accordingly, the directivity analyzer 152 generates equalizer settings 153 based on the directivity data set 504 so that applying the equalizer settings 153 causes the audio source 184 to transmit one or more microphones ( Frequency response that moves one or more microphones 120 to (or closer to) the zoom position 136 when having a particular orientation (eg, 45 degrees on the horizontal axis and 0 degrees on the vertical axis) relative to 120). to approximate the properties.

도 6 을 참조하면, 도 3 의 컨텍스트 검출기 (350) 가 도 3 의 하나 이상의 컨텍스트 검출기 입력 오디오 신호들 (369)에 기초하여 다수의 오디오 소스들, 예를 들어 오디오 소스 (184) 및 오디오 소스 (684) 를 검출한 구현에 따라 도 1 의 시스템 (100) 의 컴포넌트들의 예의 다이어그램 (600) 이 도시된다.Referring to FIG. 6 , the context detector 350 of FIG. 3 determines based on the one or more context detector input audio signals 369 of FIG. 3 multiple audio sources, e.g., audio source 184 and an audio source ( An example diagram 600 of the components of system 100 of FIG. 1 according to an implementation that has detected 684 is shown.

컨텍스트 검출기(350)는 도 3을 참조하여 설명된 것과 유사한 방식으로 오디오 소스(684)의 소스 포지션 데이터(620)를 결정한다. 예를 들어, 소스 포지션 데이터 (620) 는 오디오 소스 (684) 가 하나 이상의 마이크로폰들 (120) 의 위치 (134)에 대해 소스 배향 (622) (예를 들어, -30 도 수평 축 및 0 도 수직 축) 을 갖는 소스 거리 (624) (예를 들어, 2 미터)에 대략적으로 위치됨을 나타낸다. 특정 양태에서, 소스 포지션 데이터(620)는 오디오 소스(684)가 줌 위치(136)에 대해 소스 배향(632)(예를 들어, - 2도 수평축 및 0도 수직축)을 갖는 소스 거리(634)(예를 들어, 2.2 미터)에 대략적으로 위치됨을 나타낸다.Context detector 350 determines source position data 620 of audio source 684 in a manner similar to that described with reference to FIG. 3 . For example, the source position data 620 is the source orientation 622 of the audio source 684 relative to the position 134 of the one or more microphones 120 (e.g., -30 degrees horizontal axis and 0 degrees vertical). axis) with a source distance 624 (e.g., 2 meters). In certain aspects, the source position data 620 is the source distance 634 at which the audio source 684 has the source orientation 632 relative to the zoom position 136 (e.g., -2 degree horizontal axis and 0 degree vertical axis). (e.g., 2.2 meters).

특정 구현에서, 줌 타겟(133)은 오디오 소스(184)를 나타내고, 지향성 분석기(152)는 등화기 설정들(153)을 결정할 때 오디오 소스(684)를 무시한다. 특정 양태에서, 하나 이상의 출력 오디오 신호들 (138) 은 오디오 소스 (684) 의 감소된 (예를 들어, 없는) 사운드들을 포함한다. 일 예로서, 활동 검출기(342)는 오디오 소스(684)의 사운드들이 감소된(예를 들어, 사운드들이 없는) 오디오 소스(184)의 사운드들에 대응하는 하나 이상의 활동 오디오 신호들(343)을 생성한다. 다른 예로서, 이득 조정기 (344) 는 오디오 소스 (684) 의 사운드들이 감소된 (예를 들어, 없는) 하나 이상의 이득 조정된 오디오 신호들 (345) 을 생성한다. 다른 예에서, 공간 분석기 (340) 는 오디오 소스 (684) 의 사운드들이 감소된 (예를 들어, 없는) 하나 이상의 빔포밍된 오디오 신호들 (341) 을 생성하기 위해 빔포밍을 적용한다. 이 구현에서, 지향성 분석기(152)는, 도 5를 참조하여 설명된 바와 같이, 지향성 데이터 세트(504) 및 소스 포지션 데이터(520)에 기초하여 등화기 설정들(153)을 생성한다.In a particular implementation, zoom target 133 represents audio source 184 , and directivity analyzer 152 ignores audio source 684 when determining equalizer settings 153 . In a particular aspect, one or more output audio signals 138 include reduced (eg, no) sounds of audio source 684 . As an example, activity detector 342 may detect one or more activity audio signals 343 corresponding to sounds of audio source 184 in which sounds of audio source 684 have been reduced (eg, no sounds). generate As another example, gain adjuster 344 generates one or more gain-adjusted audio signals 345 with reduced (eg, no) sounds of audio source 684. In another example, spatial analyzer 340 applies beamforming to generate one or more beamformed audio signals 341 with reduced (eg, no) sounds of audio source 684 . In this implementation, directivity analyzer 152 generates equalizer settings 153 based on directivity data set 504 and source position data 520, as described with reference to FIG.

특정 구현에서, 줌 타겟 (133) 은 오디오 소스 (184) 를 나타내고, 오디오 인핸서 (192) 는 오디오 소스 (684) 의 사운드들에 대한 거의 또는 전혀 변화 없이 오디오 소스 (184) 의 지향성에 기초하여 조정된 오디오 소스 (184) 의 사운드들로 하나 이상의 출력 오디오 신호들 (138) 을 생성한다. 일 예로서, 활동 검출기(342)는 오디오 소스(684)의 사운드들이 감소된(예를 들어, 없는) 오디오 소스(184)의 사운드들에 대응하는 하나 이상의 활동 오디오 신호들(343)의 제 1 서브세트 및 오디오 소스(184)의 사운드들이 감소된(예를 들어, 없는) (오디오 소스(684)의 사운드들을 포함하는) 나머지 사운드들에 대응하는 하나 이상의 활동 오디오 신호들(343)의 제 2 서브세트를 생성한다.In a particular implementation, zoom target 133 represents audio source 184, and audio enhancer 192 adjusts based on directivity of audio source 184 with little or no change to the sounds of audio source 684. produces one or more output audio signals 138 with the sounds of the audio source 184. As an example, activity detector 342 may detect a first of one or more activity audio signals 343 corresponding to sounds of audio source 184 with reduced (eg, no) sounds of audio source 684. A second portion of one or more activity audio signals 343 corresponding to the subset and remaining sounds (including sounds of audio source 684) with reduced (eg, no) sounds of audio source 184. create a subset

지향성 분석기(152)는, 도 5를 참조하여 설명된 바와 같이, 지향성 데이터 세트(504) 및 소스 포지션 데이터(520)에 기초하여 등화기 설정들(153)을 생성한다. 하나 이상의 등화기 입력 오디오 신호들(147)은 하나 이상의 활동 오디오 신호들(343)의 제 1 서브세트, 하나 이상의 활동 오디오 신호들(343)의 제 1 서브세트의 이득-조정된 버전, 하나 이상의 활동 오디오 신호들(343)의 제 1 서브세트의 노이즈-억제된 버전, 또는 이들의 조합을 포함한다. 등화기(148)는 등화기 설정들(153)을 하나 이상의 등화기 입력 오디오 신호들(147)에 적용함으로써 하나 이상의 출력 오디오 신호들(138)의 제 1 서브세트를 생성하고, 사용자(101)가 줌 위치(136)에 위치하는 것처럼 인지되는 오디오 소스(184)로부터의 심리음향 강화 버전의 사운드들을 생성한다.Directivity analyzer 152 generates equalizer settings 153 based on directivity data set 504 and source position data 520, as described with reference to FIG. The one or more equalizer input audio signals 147 may include a first subset of one or more activity audio signals 343, a gain-adjusted version of the first subset of one or more activity audio signals 343, one or more a noise-suppressed version of the first subset of activity audio signals 343, or a combination thereof. Equalizer 148 generates a first subset of one or more output audio signals 138 by applying equalizer settings 153 to one or more equalizer input audio signals 147, and user 101 produces a psychoacoustic enhanced version of sounds from audio source 184 that is perceived as being located at zoom position 136.

하나 이상의 출력 오디오 신호들(138)의 제 2 서브세트는 하나 이상의 활동 오디오 신호들(343)의 제 2 서브세트에 기초하고 오디오 소스(684)로부터의 사운드들을 포함한다. 예를 들어, 하나 이상의 출력 오디오 신호들(138)의 제 2 서브세트는 하나 이상의 활동 오디오 신호들(343)의 제 2 서브세트, 하나 이상의 활동 오디오 신호들(343)의 제 2 서브세트의 이득-조정된 버전, 하나 이상의 활동 오디오 신호들(343)의 제 2 서브세트의 노이즈-억제된 버전, 또는 이들의 조합을 포함한다.The second subset of one or more output audio signals 138 is based on the second subset of one or more activity audio signals 343 and includes sounds from an audio source 684 . For example, the second subset of one or more output audio signals 138 is a second subset of one or more activity audio signals 343, the gain of the second subset of one or more activity audio signals 343 -adjusted version, noise-suppressed version of the second subset of one or more activity audio signals 343, or a combination thereof.

따라서, 하나 이상의 출력 오디오 신호들(138)은 하나 이상의 마이크로폰들(120)을 위치(134)로부터 줌 위치(136)로 이동시키고 오디오 소스(684)에 대한 변화들이 없는(또는 거의 없는) 것으로 오디오 소스(184)의 주파수 응답 특성들을 근사화한다. 이 구현에서, 오디오 줌 동작은 오디오 소스(684)에 대해 거의 또는 전혀 변화없이 오디오 소스(184)에 대해 줌하는 것으로 보인다. 예를 들어, 하나 이상의 출력 오디오 신호(138)에서의 오디오 소스(184)의 사운드는 줌 위치(136)에 대한 소스 배향(532)으로 대략 소스 거리(434)에서 오디오 소스(184)로부터 오는 것으로 보인다. 하나 이상의 출력 오디오 신호(138)에서 오디오 소스(684)의 사운드들은 줌 위치(136)에 대한 소스 방향(622)으로 대략적으로 소스 거리(624)에서 오디오 소스(684)로부터 오는 것으로 보인다.Accordingly, one or more output audio signals 138 move one or more microphones 120 from position 134 to zoom position 136 and the audio with no (or few) changes to audio source 684. Approximate the frequency response characteristics of source 184. In this implementation, the audio zoom operation appears to zoom on the audio source 184 with little or no change to the audio source 684. For example, the sound of audio source 184 in one or more output audio signals 138 is believed to be coming from audio source 184 at approximately source distance 434 with source orientation 532 relative to zoom position 136 . see. Sounds of audio source 684 in one or more output audio signals 138 appear to be coming from audio source 684 at approximately source distance 624 in source direction 622 relative to zoom position 136 .

다른 특정 구현에서, 줌 타겟 (133) 은 오디오 소스 (184) 를 나타내고, 오디오 인핸서 (192) 는, 오디오 소스 (184) 의 지향성에 기초하여 오디오 소스 (184) 의 사운드들을 조정하는 것 및 오디오 소스 (684) 의 지향성에 기초하여 조정된 오디오 소스 (684) 의 사운드들을 조정하는 것을 포함하는 동작에서 하나 이상의 출력 오디오 신호들 (138) 을 생성한다. 특정 양태에서, 오디오 소스(684)는 오디오 소스(184)와 동일한 오디오 소스 타입(예를 들어, 인간 발화자)을 갖는다. 이러한 양태에서, 지향성 분석기(152)는 오디오 소스(684)와 연관된 (예를 들어, 소스 배향(622)으로부터 소스 배향(632)으로의) 배향의 변화 및 (예를 들어, 소스 거리(624)로부터 소스 거리(634)로의) 거리의 변화와 매칭하는 지향성 데이터 세트(604)를 지향성 데이터(141)로부터 선택한다.In another particular implementation, zoom target 133 represents audio source 184, and audio enhancer 192 adjusts sounds of audio source 184 based on directivity of audio source 184 and audio source 184. Generates one or more output audio signals 138 in an operation that includes adjusting the sounds of the adjusted audio source 684 based on the directivity of 684 . In certain aspects, audio source 684 has the same audio source type as audio source 184 (eg, human talker). In this aspect, the directivity analyzer 152 determines the change in orientation associated with the audio source 684 (e.g., from source orientation 622 to source orientation 632) and (e.g., source distance 624). Select the directivity data set 604 from the directivity data 141 that matches the change in distance (from the source distance 634).

대안적인 양태에서, 오디오 소스(684)는 오디오 소스(184)의 제 1 오디오 소스 타입(예를 들어, 인간 발화자)과 상이한 제 2 오디오 소스 타입(예를 들어, 새)을 갖는다. 이러한 양태에서, 지향성 분석기(152)는 제 2 오디오 소스 타입과 연관된 제 2 지향성 데이터를 획득하고, (예를 들어, 소스 배향(622)으로부터 소스 배향(632)으로의) 배향에서의 그리고 소스 거리(624)로부터 다양한 거리들로의 변화에 대한 오디오 소스(684)의 주파수 응답 특성들을 표시하는 지향성 데이터 세트(604)를 제 2 지향성 데이터로부터 선택한다. 예시를 위해, 지향성 데이터 세트(604)는 (예를 들어, 소스 배향(622)으로부터 소스 배향(632)으로의) 배향의 변화 및 (예를 들어, 소스 거리(624)로부터 소스 거리(634)로의) 거리의 변화에 대한 주파수 응답 특성들(649)을 표시한다.In an alternative aspect, audio source 684 has a second audio source type (eg, bird) that is different from the first audio source type (eg, human talker) of audio source 184 . In this aspect, directivity analyzer 152 obtains second directivity data associated with a second audio source type, at an orientation (e.g., from source orientation 622 to source orientation 632) and at a source distance A directivity data set 604 representing the frequency response characteristics of the audio source 684 for changes at various distances from 624 is selected from the second directivity data. For illustrative purposes, the directivity data set 604 includes changes in orientation (e.g., from source orientation 622 to source orientation 632) and (e.g., source distance 624 to source distance 634). frequency response characteristics 649 for changes in distance).

지향성 분석기(152)는, 등화기 설정 데이터(149)에 기초하여, 주파수 응답 특성들(649)과 매칭하는 등화기 설정들(653)을 결정한다. 지향성 분석기(152)는 오디오 소스(684)에 대응하는 등화기 설정들(653) 및 오디오 소스(184)에 대응하는 등화기 설정들(153)을 등화기(148)에 제공한다.Directivity analyzer 152 determines, based on equalizer setting data 149 , equalizer settings 653 that match frequency response characteristics 649 . Directivity analyzer 152 provides equalizer settings 653 corresponding to audio source 684 and equalizer settings 153 corresponding to audio source 184 to equalizer 148 .

특정 양태에서, 활동 검출기 (342) 는 오디오 소스 (184) 의 사운드들 및 감소된 (예를 들어, 없는) 다른 사운드들에 대응하는 하나 이상의 활동 오디오 신호들 (343) 의 제 1 서브세트, 오디오 소스 (684) 의 사운드들 및 감소된 (예를 들어, 없는) 다른 사운드들에 대응하는 하나 이상의 활동 오디오 신호들 (343) 의 제 2 서브세트, 오디오 소스 (184) 및 오디오 소스 (684) 의 나머지 사운드들 및 감소된 (예를 들어, 없는) 사운드들에 대응하는 하나 이상의 활동 오디오 신호들 (343) 의 제 3 서브세트, 또는 이들의 조합을 생성한다. 특정 양태에서, 하나 이상의 등화기 입력 오디오 신호들 (147) 의 제 1 서브세트는 하나 이상의 활동 오디오 신호들 (343) 의 제 1 서브세트에 기초하고, 하나 이상의 등화기 입력 오디오 신호들 (147) 의 제 2 서브세트는 하나 이상의 활동 오디오 신호들 (343) 의 제 2 서브세트에 기초하고, 하나 이상의 등화기 입력 오디오 신호들 (147) 의 제 3 서브세트는 하나 이상의 활동 오디오 신호들 (343) 의 제 3 서브세트에 기초하고, 또는 이들의 조합이다. 등화기(148)는 오디오 소스(184)에 대응하는 하나 이상의 등화기 입력 오디오 신호(147)의 제 1 서브세트에 등화기 설정들(153)을, 오디오 소스(684)에 대응하는 하나 이상의 등화기 입력 오디오 신호(147)의 제 2 서브세트에 등화기 설정들(653)을, 나머지 오디오에 대응하는 하나 이상의 등화기 입력 오디오 신호(147)의 제 3 서브세트에 대해 변경이 없는 것, 또는 이들의 조합을 적용함으로써 하나 이상의 출력 오디오 신호(138)를 생성한다. 따라서, 등화기 설정들(153) 및 등화기 설정들(653)은 하나 이상의 출력 오디오 신호들(138)이 하나 이상의 마이크로폰들(120)을 위치(134)로부터 줌 위치(136)로 이동시키는 것과 연관된 오디오 소스(184) 및 오디오 소스(684)의 주파수 응답 특성들에 근사할 수 있게 한다. 예를 들어, 하나 이상의 출력 오디오 신호(138)에서의 오디오 소스(184)의 사운드들은 줌 위치(136)에 대한 소스 배향(532)으로 대략 소스 거리(434)에서 오디오 소스(184)로부터 오는 것으로 보인다. 하나 이상의 출력 오디오 신호(138)에서 오디오 소스(684)의 사운드들은 줌 위치(136)에 대한 소스 방향(632)으로 대략적으로 소스 거리(634)에서 오디오 소스(684)로부터 오는 것으로 보인다.In a particular aspect, activity detector 342 determines a first subset of one or more activity audio signals 343, audio A second subset of one or more activity audio signals 343 corresponding to sounds of source 684 and other sounds that are reduced (eg, absent), audio source 184 and audio source 684 Generate a third subset of one or more activity audio signals 343 corresponding to the remaining sounds and reduced (eg, absent) sounds, or a combination thereof. In a particular aspect, the first subset of one or more equalizer input audio signals 147 is based on the first subset of one or more activity audio signals 343 and the one or more equalizer input audio signals 147 The second subset of the one or more activity audio signals 343 is based on the second subset of the one or more activity audio signals 343 and the third subset of the one or more equalizer input audio signals 147 is based on the one or more activity audio signals 343 is based on the third subset of, or a combination thereof. Equalizer 148 applies equalizer settings 153 to a first subset of one or more equalizer input audio signals 147 corresponding to audio source 184 and one or more equalizers corresponding to audio source 684. equalizer settings 653 on the second subset of the input audio signal 147 and no change on the third subset of the equalizer input audio signal 147 corresponding to one or more of the remaining audio; or Applying a combination of these produces one or more output audio signals 138. Accordingly, equalizer settings 153 and equalizer settings 653 are equivalent to one or more output audio signals 138 moving one or more microphones 120 from position 134 to zoom position 136 and vice versa. It allows approximation of the frequency response characteristics of the associated audio source 184 and audio source 684. For example, sounds of audio source 184 in one or more output audio signals 138 are believed to be coming from audio source 184 at approximately source distance 434 with source orientation 532 relative to zoom position 136 . see. Sounds of audio source 684 in one or more output audio signals 138 appear to be coming from audio source 684 at approximately source distance 634 in source direction 632 relative to zoom position 136 .

도 7 은 하나 이상의 프로세서들 (190) 을 포함하는 집적 회로 (702) 로서의 디바이스 (102) 의 구현 (700) 을 도시한다. 집적 회로(702)는 또한 하나 이상의 입력 오디오 신호(126)가 프로세싱을 위해 수신될 수 있게 하는 하나 이상의 버스 인터페이스와 같은 오디오 입력부(704)를 포함한다. 집적 회로(702)는 또한 하나 이상의 출력 오디오 신호들(138)과 같은 출력 신호의 전송을 가능하게 하는 버스 인터페이스와 같은 오디오 출력부(706)를 포함한다. 집적 회로(702)는 도 8에 도시된 바와 같은 모바일 폰 또는 태블릿, 도 9에 도시된 바와 같은 헤드셋, 도 10에 도시된 바와 같은 웨어러블 전자 디바이스, 도 11에 도시된 바와 같은 음성 제어 스피커 시스템, 도 12에 도시된 바와 같은 카메라, 도 13에 도시된 바와 같은 가상 현실 헤드셋 또는 증강 현실 헤드셋, 또는 도 14 또는 도 15에 도시된 바와 같은 차량과 같은 시스템에서의 컴포넌트로서 오디오 소스 지향성에 기초한 심리음향 강화의 구현을 가능하게 한다.7 shows an implementation 700 of device 102 as an integrated circuit 702 that includes one or more processors 190 . Integrated circuit 702 also includes audio inputs 704, such as one or more bus interfaces, through which one or more input audio signals 126 can be received for processing. Integrated circuit 702 also includes an audio output 706 such as a bus interface that enables transmission of an output signal, such as one or more output audio signals 138 . The integrated circuit 702 is a mobile phone or tablet as shown in FIG. 8, a headset as shown in FIG. 9, a wearable electronic device as shown in FIG. 10, a voice controlled speaker system as shown in FIG. 11, A psychoacoustic based audio source directivity as a component in a system such as a camera as shown in FIG. 12, a virtual reality headset or augmented reality headset as shown in FIG. 13, or a vehicle as shown in FIG. 14 or 15 Enables implementation of reinforcement.

도 8 은, 디바이스 (102) 가 예시적 비한정적인 예들로서 전화기 또는 태블릿과 같은 모바일 디바이스 (802) 를 포함하는 구현 (800) 을 도시한다. 모바일 디바이스 (802) 는 하나 이상의 스피커들 (160), 하나 이상의 마이크로폰들 (120), 및 디스플레이 스크린 (804) 을 포함한다. 오디오 인핸서(192)를 포함하는 프로세서(190)의 컴포넌트들은 모바일 디바이스(802)에 통합되고, 모바일 디바이스(802)의 사용자에게 일반적으로 보이지 않는 내부 컴포넌트들을 표시하기 위해 점선들을 사용하여 예시된다. 특정 예에서, 오디오 인핸서(192)는 사용자 음성 활동을 강화하도록 동작하고, 이어서 그래픽 사용자 인터페이스를 개시하거나 그렇지 않으면 (예를 들어, 통합된 "스마트 어시스턴트" 애플리케이션을 통해) 디스플레이 스크린(804)에서 사용자의 스피치와 연관된 다른 정보를 디스플레이하는 것과 같이, 모바일 디바이스(802)에서 하나 이상의 동작을 수행하도록 프로세싱된다. 특정 예에서, 오디오 인핸서(192)는 온라인 회의 동안 발화자의 음성 활동을 강화한다. 예시를 위해, 사용자는 온라인 회의 동안 디스플레이 스크린(804) 상에서 발화자를 볼 수 있고 발화자를 줌 타겟으로서 선택한다. 오디오 인핸서(192)는 줌 타겟의 선택에 응답하여 발화자의 스피치를 강화한다. 다른 예에서, 모바일 디바이스(802)의 조류 추적 애플리케이션의 사용자는 줌 타겟으로서 나무를 선택한다. 오디오 인핸서(192)는 줌 타겟의 선택에 응답하여 나무 상의 새로부터의 새 소리를 강화한다.8 shows an implementation 800 in which device 102 includes a mobile device 802 such as a phone or tablet as illustrative, non-limiting examples. Mobile device 802 includes one or more speakers 160 , one or more microphones 120 , and a display screen 804 . Components of processor 190 , including audio enhancer 192 , are integrated into mobile device 802 and are illustrated using dotted lines to indicate internal components that are not normally visible to a user of mobile device 802 . In a specific example, audio enhancer 192 operates to enhance user voice activity, then initiates a graphical user interface or otherwise (eg, via an integrated “smart assistant” application) on display screen 804 to enable user processed to perform one or more actions on the mobile device 802, such as displaying other information associated with the speech of the user. In a particular example, audio enhancer 192 enhances a speaker's voice activity during an online conference. To illustrate, a user may view a talker on a display screen 804 during an online conference and select the talker as a zoom target. Audio enhancer 192 enhances the speaker's speech in response to selection of the zoom target. In another example, a user of a bird tracking application on mobile device 802 selects a tree as a zoom target. Audio enhancer 192 enhances bird calls from birds on trees in response to selection of a zoom target.

도 9 는, 디바이스 (102) 가 헤드셋 디바이스 (902) 를 포함하는 구현 (900) 을 도시한다. 헤드셋 디바이스 (902) 는 하나 이상의 마이크로폰들 (120), 하나 이상의 스피커들 (160), 또는 이들의 조합을 포함한다. 오디오 인핸서(192)를 포함하는 프로세서(190)의 컴포넌트들은 헤드셋 디바이스(902)에 통합된다. 특정 예에서, 오디오 인핸서(192)는 사용자 음성 활동을 강화하도록 동작하고, 이는 헤드셋 디바이스(902)로 하여금 헤드셋 디바이스(902)에서 하나 이상의 동작들을 수행하게 하거나, 추가 프로세싱을 위해 사용자 음성 활동에 대응하는 오디오 데이터를 제 2 디바이스(미도시)에 송신하게 하거나, 또는 이들의 조합을 수행할 수도 있다. 특정 양태에서, 헤드셋 디바이스 (902) 는 헤드셋 디바이스 (902) 의 오디오 출력에 대응하는 음장의 상이한 부분들로 줌하는데 사용될 수 있는 입력들 (예를 들어, 버튼들 또는 화살표들) 을 갖는다. 예를 들어, 헤드셋 디바이스 (902) 는 오케스트라 음악을 출력하고 헤드셋 디바이스 (902) 를 착용한 사용자는 헤드셋 디바이스 (902) 의 입력들을 사용하여 오케스트라의 특정 섹션 또는 기구를 줌 타겟으로서 선택한다. 오디오 인핸서 (192) 는 줌 타겟(예를 들어, 특정 섹션 또는 기구)에 대한 오디오 줌 동작에 대응하는 하나 이상의 출력 오디오 신호들 (138) 을 생성한다.9 shows an implementation 900 in which device 102 includes a headset device 902 . Headset device 902 includes one or more microphones 120, one or more speakers 160, or a combination thereof. Components of processor 190 , including audio enhancer 192 , are integrated into headset device 902 . In a particular example, audio enhancer 192 is operative to enhance user voice activity, which causes headset device 902 to perform one or more actions on headset device 902, or to respond to user voice activity for further processing. audio data to be transmitted to the second device (not shown), or a combination thereof may be performed. In a particular aspect, headset device 902 has inputs (eg, buttons or arrows) that can be used to zoom to different parts of the sound field corresponding to audio output of headset device 902 . For example, headset device 902 outputs orchestral music and a user wearing headset device 902 uses inputs of headset device 902 to select a particular section or instrument of the orchestra as a zoom target. Audio enhancer 192 generates one or more output audio signals 138 corresponding to an audio zoom operation relative to a zoom target (eg, a particular section or instrument).

도 10 은, 디바이스 (102) 가 "스마트 워치” 로서 예시된 웨어러블 전자 디바이스 (1002) 를 포함하는 구현 (1000) 을 도시한다. 오디오 인핸서 (192), 하나 이상의 마이크로폰들 (120), 하나 이상의 스피커들 (160), 또는 이들의 조합은 웨어러블 전자 디바이스 (1002)에 통합된다. 특정 예에서, 오디오 인핸서(192)는 사용자 음성 활동을 강화하도록 동작하고, 이어서 그래픽 사용자 인터페이스를 시작하거나 그렇지 않으면 웨어러블 전자 디바이스(1002)의 디스플레이 스크린(1004)에서 사용자의 스피치와 연관된 다른 정보를 디스플레이하는 것과 같이, 웨어러블 전자 디바이스(1002)에서 하나 이상의 동작들을 수행하도록 프로세싱된다. 예시하자면, 웨어러블 전자 디바이스(1002)는 웨어러블 전자 디바이스(1002)에 의해 강화된 사용자 스피치에 기초하여 통지를 디스플레이하도록 구성된 디스플레이 스크린을 포함할 수도 있다. 특정 예에서, 웨어러블 전자 디바이스(1002)는 사용자 음성 활동에 응답하여 햅틱 통지를 제공하는(예를 들어, 진동하는) 햅틱 디바이스를 포함한다. 예를 들어, 햅틱 통지는 사용자로 하여금 웨어러블 전자 디바이스 (1002) 를 보게 하여, 사용자가 말한 키워드의 검출을 표시하는 디스플레이된 통지를 보게 할 수 있다. 따라서, 웨어러블 전자 디바이스(1002)는 청각 장애를 갖는 사용자 또는 헤드셋을 착용한 사용자에게 사용자의 음성 활동이 검출됨을 알릴 수 있다. 특정 양태에서, 웨어러블 전자 디바이스(1002)는 웨어러블 전자 디바이스(1002)의 오디오 출력에 대응하는 음장의 상이한 부분들로 줌하기 위해 사용될 수 있는 입력들(예를 들어, 버튼들 또는 화살표들)을 포함한다.10 shows an implementation 1000 in which device 102 includes a wearable electronic device 1002 illustrated as a “smart watch”. An audio enhancer 192, one or more microphones 120, one or more speakers s 160, or a combination thereof, is incorporated into wearable electronic device 1002. In a particular example, audio enhancer 192 is operative to enhance user voice activity, which then initiates a graphical user interface or otherwise displays the wearable electronic device 1002. processed to perform one or more actions on wearable electronic device 1002, such as displaying other information associated with the user's speech on display screen 1004 of device 1002. To illustrate, wearable electronic device 1002 may: It may also include a display screen configured to display a notification based on user speech enhanced by wearable electronic device 1002. In certain examples, wearable electronic device 1002 provides a haptic notification in response to user voice activity. A haptic device (e.g., vibrating) A haptic notification may cause a user to look at the wearable electronic device 1002 to see a displayed notification indicating detection of a keyword spoken by the user. Accordingly, the wearable electronic device 1002 can notify a hearing-impaired user or a user wearing a headset that the user's voice activity has been detected In certain aspects, the wearable electronic device 1002 can ) inputs (eg buttons or arrows) that can be used to zoom to different parts of the sound field corresponding to the audio output of .

도 11 은, 디바이스 (102) 가 무선 스피커 및 음성 활성화 디바이스 (1102) 를 포함하는 구현 (1100) 이다. 무선 스피커 및 음성 활성화 디바이스 (1102) 는 무선 네트워크 연결성을 가질 수 있고 어시스턴트 동작을 실행하도록 구성된다. 오디오 인핸서 (192) 를 포함하는 하나 이상의 프로세서들 (190), 하나 이상의 마이크로폰들 (120), 하나 이상의 스피커들 (160), 또는 이들의 조합은 무선 스피커 및 음성 활성화 디바이스 (1102)에 포함된다. 동작 동안, 오디오 인핸서(192)의 오디오 강화 동작을 통해 사용자 스피치로서 식별된 구두 커맨드를 수신하는 것에 응답하여, 무선 스피커 및 음성 활성화 디바이스(1102)는, 예컨대 음성 활성화 시스템(예를 들어, 통합 어시스턴트 애플리케이션)의 실행을 통해 어시스턴트 동작들을 실행할 수 있다. 어시스턴트 동작들은 온도를 조정하는 것, 음악을 재생하는 것, 조명들을 켜는 것 등을 포함할 수 있다. 예를 들어, 어시스턴트 동작들은 키워드 또는 키 프레이즈 (예컨대, "헬로 어시스턴트(hello assistant)") 후에 커맨드를 수신하는 것에 응답하여 수행된다.11 is an implementation 1100 in which a device 102 includes a wireless speaker and a voice activated device 1102 . The wireless speaker and voice activated device 1102 can have wireless network connectivity and is configured to perform assistant operations. Included in the wireless speaker and voice activation device 1102 are one or more processors 190 including an audio enhancer 192, one or more microphones 120, one or more speakers 160, or a combination thereof. During operation, in response to receiving verbal commands identified as user speech through audio enhancement operations of audio enhancer 192, wireless speaker and voice activation device 1102, for example, a voice activation system (e.g., an integrated assistant) Application) may execute assistant operations. Assistant actions may include adjusting the temperature, playing music, turning on lights, and the like. For example, assistant actions are performed in response to receiving a command followed by a keyword or key phrase (eg, “hello assistant”).

도 12 는 디바이스(102)가 카메라 디바이스(1202)에 대응하는 휴대용 전자 디바이스를 포함하는 구현(1200)을 도시한다. 오디오 인핸서 (192), 하나 이상의 마이크로폰들 (120), 하나 이상의 스피커들 (160), 또는 이들의 조합은 카메라 디바이스 (1202)에 포함된다. 동작 동안, 오디오 인핸서(192)의 동작을 통해 강화된 사용자 스피치로서 구두 명령을 수신하는 것에 응답하여, 카메라 디바이스(1202)는 예시적인 예들로서, 줌 동작들을 수행하거나, 이미지 또는 비디오 캡처 설정들, 이미지 또는 비디오 재생 설정들, 또는 이미지 또는 비디오 캡처 지시들을 조정하는 것과 같이, 발화된 사용자 커맨드들에 응답하여 동작들을 실행할 수 있다. 특정 예에서, 카메라 디바이스 (1202) 는, 뷰파인더에서 보이는 오디오 소스 (184) 상에서 줌인할 때, 오디오 인핸서 (192) 로 하여금 오디오 소스 (184) 로부터 캡처된 오디오를 강화하기 위해 오디오 줌 동작을 수행하게 하는 비디오 카메라를 포함한다.12 shows an implementation 1200 in which device 102 includes a portable electronic device corresponding to camera device 1202 . Audio enhancer 192 , one or more microphones 120 , one or more speakers 160 , or a combination thereof are included in camera device 1202 . During operation, in response to receiving verbal commands as user speech enhanced through operation of audio enhancer 192, camera device 1202 may, as illustrative examples, perform zoom operations, set image or video capture, It may execute actions in response to issued user commands, such as adjusting image or video playback settings, or image or video capture instructions. In a particular example, when camera device 1202 zooms in on audio source 184 visible in the viewfinder, audio enhancer 192 performs an audio zoom operation to enhance audio captured from audio source 184. Including a video camera that allows

도 13 은 디바이스 (102) 가 가상 현실, 증강 현실 또는 혼합 현실 헤드셋 (1302) 에 대응하는 휴대용 전자 디바이스를 포함하는 구현 (1300) 을 도시한다. 오디오 인핸서 (192), 하나 이상의 마이크로폰들 (120), 하나 이상의 스피커들 (160), 또는 이들의 조합은 헤드셋 (1302)에 통합된다. 오디오 향상은 헤드셋(1302)의 하나 이상의 마이크로폰(120)으로부터 수신된 오디오 신호들에 기초하여 수행될 수 있다. 특정 예에서, 오디오 향상은 메모리, 네트워크, 다른 디바이스, 또는 이들의 조합으로부터 수신되는 가상, 증강, 또는 혼합 현실에 대응하는 오디오 신호들에 대해 수행될 수 있다. 시각적 인터페이스 디바이스는 헤드셋 (1302) 이 착용된 동안 사용자에게 증강 현실 또는 가상 현실 이미지들 또는 장면들의 디스플레이를 가능하게 하기 위해 사용자의 눈들 앞에 포지셔닝된다. 특정 예에서, 시각적 인터페이스 디바이스는 오디오 신호의 강화된 스피치를 나타내는 통지를 디스플레이하도록 구성된다. 특정 구현에서, 사용자가 헤드셋(1302)을 사용하여 시각적 인터페이스 디바이스에서 보여지는 가상 또는 실제 객체로 줌할 때, 오디오 인핸서(192)는 (예를 들어, 게임플레이의 일부로서) 그 객체에 대응하는 오디오의 오디오 줌을 수행한다. 일부 예들에서, 오디오 인핸서(192)는 시각적 인터페이스 디바이스에 의해 디스플레이되는 시각적 줌과 함께 오디오 줌을 수행한다.FIG. 13 shows an implementation 1300 in which device 102 includes a portable electronic device corresponding to a virtual reality, augmented reality, or mixed reality headset 1302 . An audio enhancer 192 , one or more microphones 120 , one or more speakers 160 , or a combination thereof are integrated into the headset 1302 . Audio enhancement may be performed based on audio signals received from one or more microphones 120 of headset 1302 . In certain examples, audio enhancement may be performed on audio signals corresponding to virtual, augmented, or mixed reality received from memory, a network, another device, or a combination thereof. The visual interface device is positioned in front of the user's eyes to enable display of augmented reality or virtual reality images or scenes to the user while the headset 1302 is being worn. In a particular example, the visual interface device is configured to display a notification indicating enhanced speech of an audio signal. In certain implementations, when a user uses headset 1302 to zoom to a virtual or real object being viewed on the visual interface device, audio enhancer 192 provides audio corresponding to that object (eg, as part of gameplay). Performs an audio zoom of In some examples, audio enhancer 192 performs audio zoom in conjunction with visual zoom displayed by the visual interface device.

도 14 는 디바이스 (102) 가 유인 또는 무인 항공 디바이스 (예를 들어, 수화물 배송 드론) 로서 예시되는 비히클 (1402) 에 대응하거나 비히클 내에 통합되는 구현 (1400) 을 도시한다. 오디오 인핸서 (192), 하나 이상의 마이크로폰들 (120), 하나 이상의 스피커들 (160), 또는 이들의 조합은 비히클(1402)에 통합된다. 오디오 (예를 들어, 사용자 음성 활동) 강화는, 비히클(1402)의 허가된 사용자로부터의 전달 명령들을 위한 것과 같이, 비히클(1402)의 하나 이상의 마이크로폰들(120)로부터 수신된 오디오 신호들에 기초하여 수행될 수 있다.14 shows an implementation 1400 in which a device 102 corresponds to or is incorporated within a vehicle 1402 exemplified as a manned or unmanned aerial device (eg, a baggage delivery drone). An audio enhancer 192 , one or more microphones 120 , one or more speakers 160 , or a combination thereof are integrated into the vehicle 1402 . Audio (eg, user voice activity) enhancement is based on audio signals received from one or more microphones 120 of vehicle 1402, such as for conveying commands from an authorized user of vehicle 1402. can be performed by

도 15 는 디바이스 (102) 가 자동차로서 예시된 차량 (1502) 에 대응하거나 차량 내에 통합되는 다른 구현 (1500) 을 도시한다. 차량(1502)은 오디오 인핸서(192)를 포함하는 프로세서(190)를 포함한다. 차량 (1502) 은 또한 하나 이상의 마이크로폰들 (120) 을 포함한다. 오디오(예를 들어, 사용자 음성 활동) 강화는 차량(1502)의 하나 이상의 마이크로폰(120)으로부터 수신된 오디오 신호에 기초하여 수행될 수 있다. 일부 구현들에서, 오디오(예를 들어, 음성 활동) 강화는, 인가된 승객으로부터의 음성 커맨드에 대한 것과 같이, 내부 마이크로폰들(예를 들어, 하나 이상의 마이크로폰들(120))로부터 수신된 오디오 신호에 기초하여 수행될 수 있다. 예를 들어, 사용자 음성 활동 강화는 차량(1502)의 운전자 또는 승객으로부터의 음성 커맨드를 강화하기 위해 사용될 수 있다. 일부 구현들에서, 오디오 강화는 오디오 소스(184) (예를 들어, 새, 해변의 파도, 야외 음악, 차량(1502)의 인가된 사용자, 드라이브 스루 소매 직원, 또는 연석 옆 픽업 사람) 로부터의 사운드들과 같은 외부 마이크로폰들(예를 들어, 하나 이상의 마이크로폰들(120))로부터 수신된 오디오 신호에 기초하여 수행될 수 있다. 특정 구현에서, 오디오 인핸서(192)의 동작을 통해 강화된 사용자 스피치로서 구두 명령을 수신하는 것에 응답하여, 음성 활성화 시스템은, 예컨대, 디스플레이(1520) 또는 하나 이상의 스피커들(예를 들어, 스피커(1510))을 통해 피드백 또는 정보를 제공함으로써, 하나 이상의 출력 오디오 신호들(138)에서 검출된 하나 이상의 키워드들(예를 들어, "잠금 해제", "엔진 시작", "음악 재생", "날씨 예보 표시", 또는 다른 음성 커맨드)에 기초하여 차량(1502)의 하나 이상의 동작들을 개시한다. 특정 구현에서, 강화된 외부 사운드들(예를 들어, 실외 음악, 새 소리 등)은 하나 이상의 스피커들(160)을 통해 차량(1502)의 내부에서 재생된다.15 shows another implementation 1500 in which the device 102 corresponds to or is incorporated within a vehicle 1502 exemplified as an automobile. Vehicle 1502 includes a processor 190 that includes an audio enhancer 192 . Vehicle 1502 also includes one or more microphones 120 . Audio (eg, user voice activity) enhancement may be performed based on audio signals received from one or more microphones 120 of vehicle 1502 . In some implementations, audio (eg, voice activity) enhancement is an audio signal received from internal microphones (eg, one or more microphones 120), such as for a voice command from an authorized passenger. can be performed based on For example, user voice activity enhancement may be used to enhance a voice command from a driver or passenger of vehicle 1502 . In some implementations, audio enhancement is a sound from an audio source 184 (eg, birds, waves on the beach, outdoor music, an authorized user of vehicle 1502, a drive-thru retail employee, or a curbside pickup person). It may be performed based on an audio signal received from external microphones (eg, one or more microphones 120), such as . In certain implementations, in response to receiving a spoken command as user speech enhanced through operation of audio enhancer 192, the voice activation system may, for example, display 1520 or one or more speakers (e.g., a speaker ( 1510), so that one or more keywords detected in one or more output audio signals 138 (e.g., "unlock", "start engine", "play music", "weather") Initiate one or more operations of vehicle 1502 based on "show forecast", or other voice command. In a particular implementation, enhanced external sounds (eg, outdoor music, birdsong, etc.) are played inside the vehicle 1502 through one or more speakers 160 .

도 16 을 참조하면, 오디오 소스 지향성에 기초한 심리음향 강화의 방법 (1600) 의 특정 구현이 도시된다. 특정 양태에서, 방법 (1600) 의 하나 이상의 동작들은 도 1 의 지향성 분석기 (152), 등화기 (148), 오디오 인핸서 (192), 하나 이상의 프로세서들 (190), 디바이스 (102), 시스템 (100), 또는 이들의 조합 중 적어도 하나에 의해 수행된다.Referring to FIG. 16 , a specific implementation of a method 1600 of psychoacoustic enhancement based on audio source directivity is shown. In a particular aspect, one or more operations of method 1600 may include directivity analyzer 152 of FIG. 1 , equalizer 148 , audio enhancer 192 , one or more processors 190 , device 102 , system 100 ), or a combination thereof.

방법 (1600) 은 1602에서, 하나 이상의 입력 오디오 신호들에 대응하는 하나 이상의 오디오 소스들의 지향성 데이터를 획득하는 단계를 포함한다. 예를 들어, 도 1의 지향성 분석기 (152) 는 도 1 및 도 4 내지 도 6을 참조하여 설명된 바와 같이, 하나 이상의 입력 오디오 신호들 (126)에 대응하는 오디오 소스 (184) 의 지향성 데이터 (141) 를 획득한다.The method 1600 includes obtaining directional data of one or more audio sources corresponding to one or more input audio signals, at 1602 . For example, the directivity analyzer 152 of FIG. 1 uses the directivity data of the audio source 184 corresponding to one or more input audio signals 126 ( 141) is obtained.

방법 (1600) 은 또한 1604에서, 지향성 데이터에 적어도 부분적으로 기초하여 하나 이상의 등화기 설정들을 결정하는 단계를 포함한다. 예를 들어, 도 1의 지향성 분석기(152)는, 도 1 및 도 4 내지 도 6을 참조하여 설명된 바와 같이, 지향성 데이터(141)에 적어도 부분적으로 기초하여 등화기 설정들(153)을 결정한다.The method 1600 also includes determining one or more equalizer settings based at least in part on the directivity data, at 1604 . For example, directivity analyzer 152 of FIG. 1 determines equalizer settings 153 based at least in part on directivity data 141, as described with reference to FIGS. 1 and 4-6. do.

방법 (1600) 은 1606에서, 등화기 설정들에 기초하여, 하나 이상의 입력 오디오 신호들의 심리음향 강화 버전에 대응하는 하나 이상의 출력 오디오 신호들을 생성하는 단계를 더 포함한다. 예를 들어, 도 1의 등화기(148)는 등화기 설정들(153)에 기초하여, 하나 이상의 입력 오디오 신호들(126)의 심리음향 강화 버전에 대응하는 하나 이상의 출력 오디오 신호들(138)을 생성한다.The method 1600 further includes generating one or more output audio signals corresponding to a psychoacoustic enhanced version of the one or more input audio signals based on the equalizer settings, at 1606 . For example, equalizer 148 of FIG. 1 outputs one or more output audio signals 138 corresponding to a psychoacoustic enhanced version of one or more input audio signals 126 based on equalizer settings 153. generate

방법 (1600) 은 오디오 소스 (184) 의 지향성에 기초하여 주파수들에 대한 라우드니스를 조정함으로써 하나 이상의 출력 오디오 신호들 (138) 을 생성하는 것을 가능하게 한다. 하나 이상의 출력 오디오 신호들(138)은, 예를 들어, 하나 이상의 입력 오디오 신호들(126)의 이득들만을 조정하는 것과 비교하여, 더 자연스러운 사운딩 오디오 줌에 대응한다.Method 1600 enables generating one or more output audio signals 138 by adjusting loudness for frequencies based on directivity of audio source 184 . The one or more output audio signals 138 corresponds to a more natural sounding audio zoom, eg, compared to adjusting only the gains of the one or more input audio signals 126 .

도 16의 방법(1600)은 필드 프로그래밍가능 게이트 어레이 (FPGA) 디바이스, 주문형 집적 회로 (ASIC), 프로세싱 유닛, 이를 테면 중앙 프로세싱 유닛 (CPU), DSP, 제어기, 다른 하드웨어 디바이스, 펌웨어 디바이스, 또는 이들의 임의의 조합에 의해 구현될 수도 있다. 일 예로서, 도 16 의 방법 (1600) 은 도 17 을 참조하여 설명된 바와 같은 명령들을 실행하는 프로세서에 의해 수행될 수도 있다.The method 1600 of FIG. 16 may include a field programmable gate array (FPGA) device, an application specific integrated circuit (ASIC), a processing unit such as a central processing unit (CPU), a DSP, a controller, other hardware device, a firmware device, or It may be implemented by any combination of As an example, the method 1600 of FIG. 16 may be performed by a processor executing instructions as described with reference to FIG. 17 .

도 17 을 참조하면, 디바이스의 특정 예시적인 구현의 블록도가 도시되고 일반적으로 1700 으로 지정된다. 다양한 구현들에서, 디바이스 (1700) 는 도 17에 예시된 것보다 더 많거나 또는 더 적은 컴포넌트들을 가질 수도 있다. 예시적인 구현에서, 디바이스 (1700) 는 디바이스 (102) 에 대응할 수도 있다. 예시적인 구현에서, 디바이스 (1700) 는 도 1 내지 도 16 을 참조하여 설명된 하나 이상의 동작들을 수행할 수도 있다.Referring to FIG. 17 , a block diagram of a particular example implementation of a device is shown and generally designated 1700 . In various implementations, device 1700 may have more or fewer components than illustrated in FIG. 17 . In an example implementation, device 1700 may correspond to device 102 . In an example implementation, device 1700 may perform one or more operations described with reference to FIGS. 1-16 .

특정 구현에서, 디바이스 (1700) 는 프로세서 (1706) (예컨대, 중앙 프로세싱 유닛 (CPU)) 를 포함한다. 디바이스 (1700) 는 하나 이상의 추가적인 프로세서들 (1710) (예컨대, 하나 이상의 DSP들) 을 포함할 수도 있다. 특정 양태에서, 도 1 의 하나 이상의 프로세서들 (190) 은 프로세서 (1706), 프로세서들 (1710), 또는 이들의 조합에 대응한다. 프로세서들 (1710) 은 음성 코더 ("보코더") 인코더 (1736), 보코더 디코더 (1738), 오디오 인핸서 (192), 또는 이들의 조합을 포함하는 스피치 및 뮤직 코더-디코더 (CODEC) (1708) 를 포함할 수도 있다.In a particular implementation, the device 1700 includes a processor 1706 (eg, a central processing unit (CPU)). Device 1700 may include one or more additional processors 1710 (eg, one or more DSPs). In a particular aspect, one or more processors 190 of FIG. 1 correspond to processor 1706, processors 1710, or a combination thereof. Processors 1710 may include a speech and music coder-decoder (CODEC) 1708 including a voice coder (“vocoder”) encoder 1736, a vocoder decoder 1738, an audio enhancer 192, or a combination thereof. may also include

디바이스 (1700) 는 메모리 (132) 및 코덱 (1734) 을 포함할 수도 있다. 메모리 (132) 는 오디오 인핸서 (192) 를 참조하여 설명된 기능성을 구현하기 위해 하나 이상의 추가적인 프로세서들 (1710) (또는 프로세서 (1706))에 의해 실행가능한 명령들 (1756) 을 포함할 수도 있다. 디바이스 (1700) 는, 트랜시버 (1750) 를 통해, 안테나 (1752) 에 결합된 모뎀 (1746) 을 포함할 수도 있다.Device 1700 may include memory 132 and codec 1734 . Memory 132 may include instructions 1756 executable by one or more additional processors 1710 (or processor 1706) to implement the functionality described with reference to audio enhancer 192. Device 1700 may include a modem 1746 coupled to an antenna 1752 via a transceiver 1750 .

디바이스 (1700) 는 디스플레이 제어기 (1726)에 결합된 디스플레이 디바이스 (162) 를 포함할 수도 있다. 하나 이상의 스피커들 (160), 하나 이상의 마이크로폰들 (120), 또는 이들의 조합은 코덱 (1734)에 결합될 수도 있다. 예를 들어, 하나 이상의 마이크로폰들 (120) 은 도 1 의 하나 이상의 입력 인터페이스들 (124) 을 통해 코덱 (1734)에 결합될 수도 있다. 하나 이상의 스피커들 (160) 은 하나 이상의 출력 인터페이스들을 통해 CODEC (1734)에 결합될 수도 있다. 코덱 (1734) 은 디지털-대-아날로그 변환기 (DAC)(1702), 아날로그-대-디지털 변환기 (ADC)(1704), 또는 양자 모두를 포함할 수도 있다. 특정 구현에서, 코덱 (1734) 은 하나 이상의 마이크로폰들 (120) 로부터 아날로그 신호들을 수신하고, 아날로그-대-디지털 변환기 (1704) 를 사용하여 아날로그 신호들을 디지털 신호들로 변환하고, 디지털 신호들을 스피치 및 뮤직 코덱 (1708)에 제공할 수도 있다. 스피치 및 뮤직 코덱 (1708) 은 디지털 신호들을 프로세싱할 수도 있고, 디지털 신호들은 오디오 인핸서 (192)에 의해 추가로 프로세싱될 수도 있다. 특정 구현에서, 스피치 및 뮤직 코덱 (1708) 은 코덱 (1734) 에 디지털 신호들을 제공할 수도 있다. 코덱 (1734) 은 디지털-대-아날로그 변환기 (1702) 를 이용하여 디지털 신호들을 아날로그 신호들로 변환할 수도 있고, 아날로그 신호들을 하나 이상의 스피커들 (160)에 제공할 수도 있다.Device 1700 may include a display device 162 coupled to a display controller 1726 . One or more speakers 160 , one or more microphones 120 , or a combination thereof may be coupled to the codec 1734 . For example, one or more microphones 120 may be coupled to the codec 1734 via one or more input interfaces 124 of FIG. 1 . One or more speakers 160 may be coupled to CODEC 1734 via one or more output interfaces. The codec 1734 may include a digital-to-analog converter (DAC) 1702 , an analog-to-digital converter (ADC) 1704 , or both. In a particular implementation, codec 1734 receives analog signals from one or more microphones 120, converts the analog signals to digital signals using analog-to-digital converter 1704, and converts the digital signals into speech and It can also be provided to the music codec 1708. Speech and music codec 1708 may process digital signals, which may be further processed by audio enhancer 192 . In a particular implementation, speech and music codec 1708 may provide digital signals to codec 1734. The codec 1734 may convert the digital signals to analog signals using the digital-to-analog converter 1702 and may provide the analog signals to one or more speakers 160 .

특정 구현에서, 디바이스 (1700) 는 시스템-인-패키지 또는 시스템-온-칩 디바이스 (1722) 에 포함될 수도 있다. 특정 구현에서, 메모리 (132), 프로세서 (1706), 프로세서들 (1710), 디스플레이 제어기 (1726), 코덱 (1734), 및 모뎀 (1746) 은 시스템-인-패키지 또는 시스템-온-칩 디바이스 (1722) 에 포함된다. 특정 구현에서, 입력 디바이스 (130), 카메라 (140), 및 전력 공급기 (1744) 는 시스템-온-칩 디바이스 (1722)에 결합된다. 더욱이, 특정 구현에서, 도 17에 예시된 바와 같이, 디스플레이 디바이스 (162), 입력 디바이스 (130), 카메라 (140), 하나 이상의 스피커들 (160), 하나 이상의 마이크로폰들 (120), 안테나 (1752), 및 전력 공급기 (1744) 는 시스템-온-칩 디바이스 (1722) 외부에 있다. 특정 구현에서, 디스플레이 디바이스 (162), 입력 디바이스 (130), 카메라 (140), 하나 이상의 스피커들 (160), 하나 이상의 마이크로폰들 (120), 안테나 (1752), 및 전력 공급기 (1744) 의 각각은 인터페이스 (예컨대, 하나 이상의 입력 인터페이스들 (124), 입력 인터페이스 (144), 하나 이상의 추가적인 인터페이스들, 또는 이들의 조합) 또는 제어기와 같은 시스템-온-칩 디바이스 (1722) 의 컴포넌트에 결합될 수도 있다.In a particular implementation, device 1700 may be included in a system-in-package or system-on-chip device 1722 . In a particular implementation, memory 132, processor 1706, processors 1710, display controller 1726, codec 1734, and modem 1746 are system-in-package or system-on-chip devices ( 1722) included. In a particular implementation, input device 130 , camera 140 , and power supply 1744 are coupled to system-on-chip device 1722 . Moreover, in a particular implementation, as illustrated in FIG. 17 , a display device 162 , an input device 130 , a camera 140 , one or more speakers 160 , one or more microphones 120 , an antenna 1752 ), and the power supply 1744 is external to the system-on-chip device 1722. In a particular implementation, each of display device 162, input device 130, camera 140, one or more speakers 160, one or more microphones 120, antenna 1752, and power supply 1744 may be coupled to a component of the system-on-chip device 1722, such as an interface (eg, one or more input interfaces 124, input interface 144, one or more additional interfaces, or a combination thereof) or a controller. there is.

디바이스(1700)는 가상 어시스턴트, 가전 제품, 스마트 디바이스, 사물 인터넷(IoT) 디바이스, 통신 디바이스, 헤드셋, 차량, 컴퓨터, 디스플레이 디바이스, 텔레비전, 게임 콘솔, 뮤직 플레이어, 라디오, 비디오 플레이어, 엔터테인먼트 유닛, 개인용 미디어 플레이어, 디지털 비디오 플레이어, 카메라, 내비게이션 디바이스, 스마트 스피커, 스피커 바, 모바일 통신 디바이스, 스마트폰, 셀룰러 폰, 랩톱 컴퓨터, 태블릿, 개인용 디지털 어시스턴트, 디지털 비디오 디스크(DVD) 플레이어, 튜너, 증강 현실 헤드셋, 가상 현실 헤드셋, 항공 비히클, 홈 자동화 시스템, 음성 활성화 디바이스, 무선 스피커 및 음성 활성화 디바이스, 휴대용 전자 디바이스, 자동차, 컴퓨팅 디바이스, 가상 현실(VR) 디바이스, 기지국, 모바일 디바이스, 또는 이들의 임의의 조합을 포함할 수도 있다.Device 1700 may be a virtual assistant, consumer appliance, smart device, Internet of Things (IoT) device, communication device, headset, vehicle, computer, display device, television, game console, music player, radio, video player, entertainment unit, personal use Media players, digital video players, cameras, navigation devices, smart speakers, speaker bars, mobile communication devices, smartphones, cellular phones, laptop computers, tablets, personal digital assistants, digital video disc (DVD) players, tuners, augmented reality headsets , virtual reality headsets, aviation vehicles, home automation systems, voice activated devices, wireless speakers and voice activated devices, portable electronic devices, automobiles, computing devices, virtual reality (VR) devices, base stations, mobile devices, or any combination thereof. may include.

설명된 구현들과 함께, 장치는 하나 이상의 입력 오디오 신호들에 대응하는 하나 이상의 오디오 소스들의 지향성 데이터를 획득하기 위한 수단을 포함한다. 예를 들어, 획득하기 위한 수단은 도 1의 지향성 분석기(152), 오디오 인핸서(192), 하나 이상의 프로세서들(190), 디바이스(102), 시스템(100), 프로세서(1706), 프로세서들(1710), 모뎀(1746), 트랜시버(1750), 안테나(1752), 하나 이상의 오디오 소스들의 지향성 데이터를 획득하도록 구성된 하나 이상의 다른 회로들 또는 컴포넌트들, 또는 이들의 임의의 조합에 대응할 수 있다.In conjunction with the described implementations, an apparatus includes means for obtaining directional data of one or more audio sources corresponding to one or more input audio signals. For example, means for obtaining may include directivity analyzer 152 of FIG. 1 , audio enhancer 192 , one or more processors 190 , device 102 , system 100 , processor 1706 , processors ( 1710), modem 1746, transceiver 1750, antenna 1752, one or more other circuits or components configured to obtain directional data of one or more audio sources, or any combination thereof.

장치는 또한 지향성 데이터에 적어도 부분적으로 기초하여 하나 이상의 등화기 설정들을 결정하기 위한 수단을 포함한다. 예를 들어, 결정하기 위한 수단은 도 1의 지향성 분석기(152), 오디오 인핸서(192), 하나 이상의 프로세서들(190), 디바이스(102), 시스템(100), 프로세서(1706), 프로세서들(1710), 지향성 데이터에 적어도 부분적으로 기초하여 하나 이상의 등화기 설정들을 결정하도록 구성된 하나 이상의 다른 회로들 또는 컴포넌트들, 또는 이들의 임의의 조합에 대응할 수 있다.The apparatus also includes means for determining one or more equalizer settings based at least in part on the directivity data. For example, the means for determining may include directivity analyzer 152 of FIG. 1 , audio enhancer 192 , one or more processors 190 , device 102 , system 100 , processor 1706 , processors ( 1710), one or more other circuits or components configured to determine one or more equalizer settings based at least in part on the directivity data, or any combination thereof.

장치는, 등화기 설정들에 기초하여, 하나 이상의 입력 오디오 신호들의 심리음향 강화 버전에 대응하는 하나 이상의 출력 오디오 신호들을 생성하기 위한 수단을 더 포함한다. 예를 들어, 생성하기 위한 수단은 도 1의 지향성 분석기(152), 오디오 인핸서(192), 하나 이상의 프로세서들(190), 디바이스(102), 시스템(100), 프로세서(1706), 프로세서들(1710), 등화기 설정들에 기초하여, 하나 이상의 입력 오디오 신호들의 심리음향 강화 버전에 대응하는 하나 이상의 출력 오디오 신호들을 생성하도록 구성된 하나 이상의 다른 회로들 또는 컴포넌트들, 또는 이들의 임의의 조합에 대응할 수 있다.The apparatus further comprises means for generating, based on the equalizer settings, one or more output audio signals corresponding to a psychoacoustic enhanced version of the one or more input audio signals. For example, means for generating may include directivity analyzer 152 of FIG. 1 , audio enhancer 192 , one or more processors 190 , device 102 , system 100 , processor 1706 , processors ( 1710), which may correspond to one or more other circuits or components configured to generate, based on the equalizer settings, one or more output audio signals corresponding to a psychoacoustic enhanced version of one or more input audio signals, or any combination thereof. can

일부 구현들에서, 비일시적 컴퓨터 판독가능 매체 (예를 들어, 메모리 (132) 와 같은 컴퓨터 판독가능 저장 디바이스) 는 하나 이상의 프로세서들 (예를 들어, 하나 이상의 프로세서들 (1710) 또는 프로세서 (1706))에 의해 실행될 때, 그 하나 이상의 프로세서들로 하여금, 하나 이상의 입력 오디오 신호들 (예를 들어, 하나 이상의 입력 오디오 신호들 (126))에 대응하는 하나 이상의 오디오 소스들 (예를 들어, 오디오 소스 (184), 오디오 소스 (684), 또는 양자 모두) 의 지향성 데이터 (예를 들어, 지향성 데이터 (141)) 를 획득하게 하는 명령들 (예를 들어, 명령들 (1756)) 을 포함한다. 명령들은, 하나 이상의 프로세서들에 의해 실행될 때, 또한 그 하나 이상의 프로세서들로 하여금 지향성 데이터에 적어도 부분적으로 기초하여 하나 이상의 등화기 설정들(예를 들어, 등화기 설정들(153), 등화기 설정들(653), 또는 이들의 조합)을 결정하게 한다. 명령들은, 하나 이상의 프로세서들에 의해 실행될 때, 또한, 그 하나 이상의 프로세서들로 하여금, 등화기 설정들에 기초하여, 하나 이상의 입력 오디오 신호들의 심리음향 강화 버전에 대응하는 하나 이상의 출력 오디오 신호들 (예를 들어, 하나 이상의 출력 오디오 신호들 (138)) 을 생성하게 한다.In some implementations, a non-transitory computer-readable medium (eg, a computer-readable storage device such as memory 132) may be one or more processors (eg, one or more processors 1710 or processor 1706) ), causes the one or more processors to generate one or more audio sources (e.g., audio source) corresponding to one or more input audio signals (e.g., one or more input audio signals 126). 184, instructions that cause obtaining directivity data (eg, directivity data 141) of audio source 684, or both (eg, instructions 1756). The instructions, when executed by one or more processors, also cause the one or more processors to set one or more equalizer settings (e.g., equalizer settings 153, equalizer settings) based at least in part on the directivity data. 653, or a combination thereof). The instructions, when executed by the one or more processors, also cause the one or more processors to generate, based on the equalizer settings, one or more output audio signals corresponding to a psychoacoustic enhanced version of the one or more input audio signals ( For example, to generate one or more output audio signals 138).

본 개시의 특정 양태들은 상호관련된 조항들의 제 1 세트로 하기에서 기술된다:Certain aspects of the present disclosure are described below in a first set of interrelated clauses:

조항 1에 따르면, 디바이스는: 하나 이상의 입력 오디오 신호들에 대응하는 하나 이상의 오디오 소스들의 지향성 데이터를 획득하고; 지향성 데이터에 적어도 부분적으로 기초하여 하나 이상의 등화기 설정들을 결정하고; 등화기 설정들에 기초하여, 하나 이상의 입력 오디오 신호들의 심리음향 강화 버전에 대응하는 하나 이상의 출력 오디오 신호들을 생성하도록 구성된 하나 이상의 프로세서들을 포함한다.According to clause 1, a device: obtains directivity data of one or more audio sources corresponding to one or more input audio signals; determine one or more equalizer settings based at least in part on the directivity data; and one or more processors configured to generate one or more output audio signals corresponding to a psychoacoustic enhanced version of the one or more input audio signals based on the equalizer settings.

조항 2는 조항 1의 디바이스를 포함하고, 심리음향 강화 버전은 오디오 줌 동작과 연관된 줌 배향(zoom orientation) 및 줌 거리(zoom distance)에서 하나 이상의 오디오 소스들의 주파수 응답을 근사화(approximate)한다.Clause 2 includes the device of clause 1, wherein the psychoacoustic enhancement version approximates the frequency response of one or more audio sources at a zoom orientation and zoom distance associated with an audio zoom operation.

조항 3은 조항 1 또는 조항 2의 디바이스를 포함하고, 여기서 하나 이상의 프로세서들은: 오디오 줌 동작의 줌 타겟(zoom target)을 나타내는 사용자 입력을 수신하고; 줌 타겟에 기초하여 하나 이상의 등화기 설정들을 결정하도록 추가로 구성된다.Clause 3 includes the device of clause 1 or clause 2, wherein the one or more processors are configured to: receive user input indicating a zoom target of an audio zoom operation; and further configured to determine one or more equalizer settings based on the zoom target.

조항 4는 조항 3의 디바이스를 포함하고, 줌 타겟은 줌 위치, 줌 거리, 줌 배향, 하나 이상의 오디오 소스들 중 적어도 하나의 선택, 또는 이들의 조합을 포함한다.Clause 4 includes the device of clause 3, wherein the zoom target comprises a zoom position, a zoom distance, a zoom orientation, a selection of at least one of one or more audio sources, or a combination thereof.

조항 5는 조항 1 내지 조항 4 중 어느 것의 디바이스를 포함하고, 여기서 하나 이상의 오디오 소스들 중 특정 오디오 소스의 지향성 데이터는 특정 오디오 소스의 배향 및 거리 주파수 응답 특성들을 나타낸다.Clause 5 includes the device of any of clauses 1-4, wherein the directivity data of a particular audio source of the one or more audio sources indicates orientation and distance frequency response characteristics of the particular audio source.

조항 6은 조항 1 내지 조항 5 중 어느 것의 디바이스를 포함하고, 여기서 하나 이상의 프로세서들은, 하나 이상의 빔포밍된 오디오 신호들을 생성하기 위해 하나 이상의 입력 오디오 신호들에 대해 빔포밍을 수행하고; 그리고 하나 이상의 출력 오디오 신호들을 생성하기 위해 하나 이상의 빔포밍된 오디오 신호들에 기초하는 등화기 입력 오디오 신호를 프로세싱하도록 추가로 구성된다.Clause 6 includes the device of any of clauses 1-5, wherein the one or more processors are configured to perform beamforming on the one or more input audio signals to generate one or more beamformed audio signals; and further configured to process the equalizer input audio signal based on the one or more beamformed audio signals to generate one or more output audio signals.

조항 7은 조항 1 내지 조항 6 중 어느 것의 디바이스를 포함하고, 여기서 하나 이상의 프로세서들은, 하나 이상의 스피치 오디오 신호들을 생성하기 위해 하나 이상의 입력 오디오 신호들에 기초하는 스피치 검출 입력 오디오 신호에서 스피치를 식별하고; 그리고 하나 이상의 출력 오디오 신호들을 생성하기 위해 하나 이상의 스피치 오디오 신호들에 기초하는 등화기 입력 오디오 신호를 프로세싱하도록 추가로 구성된다.Clause 7 includes the device of any of clauses 1-6, wherein the one or more processors are configured to: identify speech in the speech detection input audio signal based on the one or more input audio signals to generate the one or more speech audio signals; ; and further configured to process the equalizer input audio signal based on the one or more speech audio signals to generate one or more output audio signals.

조항 8은 조항 1 내지 조항 7 중 어느 것의 디바이스를 포함하고, 여기서 하나 이상의 프로세서들은, 줌 타겟에 기초하여, 이득 조정된 오디오 신호를 생성하기 위해 하나 이상의 입력 오디오 신호들에 기초하는 이득 조정기 입력 오디오 신호에 하나 이상의 이득들을 적용하고; 그리고 하나 이상의 출력 오디오 신호들을 생성하기 위해, 이득 조정된 오디오 신호에 기초하는 등화기 입력 오디오 신호를 프로세싱하도록 추가로 구성된다.Clause 8 includes the device of any of clauses 1 through 7, wherein the one or more processors include a gain adjuster input audio signal based on the one or more input audio signals to generate a gain adjusted audio signal based on the zoom target. apply one or more gains to the signal; and further configured to process the equalizer input audio signal based on the gain-adjusted audio signal to generate one or more output audio signals.

조항 9는 조항 1 내지 조항 8 중 어느 것의 디바이스를 포함하고, 여기서 하나 이상의 프로세서들은, 노이즈 억제된 오디오 신호를 생성하기 위해 하나 이상의 입력 오디오 신호들에 기초하는 노이즈 억제기 입력 오디오 신호에 대해 노이즈 억제를 수행하고; 그리고 하나 이상의 출력 오디오 신호들을 생성하기 위해 노이즈 억제된 오디오 신호에 기초하는 등화기 입력 오디오 신호를 프로세싱하도록 추가로 구성된다.Clause 9 includes the device of any of clauses 1 through 8, wherein the one or more processors comprise a noise suppressor for the input audio signal based on the one or more input audio signals to generate a noise suppressed audio signal. perform; and further configured to process the equalizer input audio signal based on the noise suppressed audio signal to generate one or more output audio signals.

조항 10은 조항 1 내지 조항 9 중 어느 것의 디바이스를 포함하고, 여기서 하나 이상의 프로세서들은, 하나 이상의 오디오 소스들의 컨텍스트 데이터를 생성하기 위해 하나 이상의 입력 오디오 신호들에 기초하는 컨텍스트 검출기 입력 오디오 신호를 프로세싱하고 - 하나 이상의 오디오 소스들 중 특정 오디오 소스의 컨텍스트 데이터는 특정 오디오 소스의 배향, 특정 오디오 소스의 거리, 특정 오디오 소스의 타입, 또는 이들의 조합을 나타냄 -; 특정 오디오 소스의 타입에 기초하여 특정 오디오 소스의 지향성 데이터를 획득하도록 추가로 구성된다.Clause 10 includes the device of any of clauses 1 to 9, wherein the one or more processors process a context detector input audio signal based on the one or more input audio signals to generate context data of the one or more audio sources; - context data of a specific audio source of the one or more audio sources indicates an orientation of the specific audio source, a distance of the specific audio source, a type of the specific audio source, or a combination thereof; and acquire directivity data of the specific audio source based on the type of the specific audio source.

조항 11은 조항 10의 디바이스를 포함하고, 하나 이상의 프로세서들은 하나 이상의 입력 오디오 신호와 연관된 이미지 데이터에 적어도 부분적으로 기초하여 컨텍스트 데이터를 생성하도록 추가로 구성된다.Clause 11 includes the device of clause 10, wherein the one or more processors are further configured to generate context data based at least in part on image data associated with the one or more input audio signals.

조항 12는 조항 11의 디바이스를 포함하고, 여기서 하나 이상의 프로세서들은 메모리로부터 이미지 데이터 및 하나 이상의 입력 오디오 신호를 취출하도록 추가로 구성된다.Clause 12 includes the device of clause 11, wherein the one or more processors are further configured to retrieve the image data and the one or more input audio signals from the memory.

조항 13은 조항 11 또는 조항 12의 디바이스를 포함하고, 하나 이상의 프로세서들에 결합되고 하나 이상의 입력 오디오 신호들을 생성하도록 구성된 하나 이상의 마이크로폰들을 더 포함한다.Clause 13 includes the device of clause 11 or clause 12, further comprising one or more microphones coupled to the one or more processors and configured to generate one or more input audio signals.

조항 14는 조항 11 내지 조항 13 중 어느 것의 디바이스를 포함하고, 하나 이상의 프로세서들에 결합되고 이미지 데이터를 생성하도록 구성된 카메라를 더 포함한다.Clause 14 includes the device of any of clauses 11-13, further comprising a camera coupled to the one or more processors and configured to generate image data.

조항 15는 조항 11 내지 조항 14 중 어느 것의 디바이스를 포함하고, 여기서 하나 이상의 프로세서들은 하나 이상의 오디오 소스들 중 특정 오디오 소스의 타입을 식별하기 위해 하나 이상의 입력 오디오 신호들, 이미지 데이터, 또는 양자 모두에 기초하여 오디오 소스 인식을 수행하도록 추가로 구성된다.Clause 15 includes the device of any of clauses 11-14, wherein the one or more processors output one or more input audio signals, image data, or both to identify a type of a particular one of the one or more audio sources. and further configured to perform audio source recognition based on.

조항 16은 조항 11 내지 조항 15 중 어느 것의 디바이스를 포함하고, 여기서 하나 이상의 프로세서들은, 하나 이상의 오디오 소스들 중 특정한 오디오 소스를 결정하기 위해, 하나 이상의 입력 오디오 신호들, 이미지 데이터, 또는 양자 모두에 기초하여 오디오 소스 인식을 수행하고; 특정한 오디오 소스의 배향을 결정하기 위해 이미지 데이터에 대해 이미지 분석을 수행하도록 추가로 구성된다.Clause 16 includes the device of any of clauses 11 to 15, wherein the one or more processors are configured to send one or more input audio signals, image data, or both to determine a particular one of the one or more audio sources. perform audio source recognition based on; and further configured to perform image analysis on the image data to determine an orientation of the particular audio source.

조항 17은 조항 11 내지 조항 16 중 어느 것의 디바이스를 포함하고, 여기서 하나 이상의 프로세서들은, 하나 이상의 오디오 소스들 중 특정 오디오 소스를 결정하기 위해, 하나 이상의 입력 오디오 신호들, 이미지 데이터, 또는 양자 모두에 기초하여 오디오 소스 인식을 수행하고; 특정 오디오 소스의 거리를 결정하기 위해, 하나 이상의 입력 오디오 신호들, 이미지 데이터, 또는 양자 모두에 대해 거리 분석을 수행하도록 추가로 구성된다.Clause 17 includes the device of any of clauses 11-16, wherein the one or more processors, to determine a particular one of the one or more audio sources, output one or more input audio signals, image data, or both. perform audio source recognition based on; and further configured to perform distance analysis on one or more input audio signals, image data, or both to determine a distance of a particular audio source.

조항 18은 조항 10 내지 조항 17 중 어느 것의 디바이스를 포함하고, 여기서 하나 이상의 프로세서들은 하나 이상의 등화기 설정들을 컨텍스트 데이터, 지향성 데이터, 줌 배향, 줌 거리, 또는 이들의 조합과 연관시키는 등화기 설정 데이터에 기초하여 하나 이상의 등화기 설정들을 선택하도록 추가로 구성된다.Clause 18 includes the device of any of clauses 10-17, wherein the one or more processors associate one or more equalizer settings with context data, directivity data, zoom orientation, zoom distance, or a combination thereof, equalizer settings data. is further configured to select one or more equalizer settings based on .

조항 19는 조항 18의 디바이스를 포함하고, 여기서 하나 이상의 프로세서들은 메모리, 다른 디바이스, 또는 양자 모두로부터 등화기 설정 데이터를 획득하도록 추가로 구성된다.Clause 19 includes the device of clause 18, wherein the one or more processors are further configured to obtain equalizer setting data from memory, another device, or both.

조항 20은 조항 10 내지 조항 19 중 어느 것의 디바이스를 포함하고, 여기서 하나 이상의 프로세서들은 중간-주파수들에 대응하는 주파수 응답을 감소시키기 위해 하나 이상의 등화기 설정들을 선택하도록 추가로 구성된다.Clause 20 includes the device of any of clauses 10 to 19, wherein the one or more processors are further configured to select one or more equalizer settings to reduce a frequency response corresponding to mid-frequencies.

조항 21은 조항 1 내지 조항 20 중 어느 것의 디바이스를 포함하고, 여기서 하나 이상의 프로세서들은, 제 1 시간에, 하나 이상의 오디오 소스 중 특정 오디오 소스에 대응하는 제 1 입력 오디오 신호의 제 1 사운드 스펙트럼을 생성하고; 제 2 시간에, 특정 오디오 소스에 대응하는 제 2 입력 오디오 신호의 제 2 사운드 스펙트럼을 생성하고; 그리고 제 1 시간에서의 제 1 거리 및 제 1 배향과 제 2 시간에서의 제 2 거리 및 제 2 배향 사이의 차이가 제 1 사운드 스펙트럼과 제 2 사운드 스펙트럼 사이의 차이에 대응한다는 것을 나타내기 위해 지향성 데이터를 업데이트하도록 추가로 구성된다.Clause 21 includes the device of any of clauses 1 to 20, wherein the one or more processors generate, at a first time, a first sound spectrum of a first input audio signal corresponding to a particular one of the one or more audio sources. do; at a second time, generating a second sound spectrum of the second input audio signal corresponding to the specific audio source; and directivity to indicate that the difference between the first distance and the first orientation at the first time and the second distance and the second orientation at the second time corresponds to the difference between the first sound spectrum and the second sound spectrum. It is further configured to update data.

조항 22는 조항 1 내지 조항 21 중 어느 것의 디바이스를 포함하며, 여기서 하나 이상의 프로세서들은 메모리, 다른 디바이스, 또는 양자 모두로부터 지향성 데이터를 획득하도록 추가로 구성된다.Clause 22 includes the device of any of clauses 1-21, wherein the one or more processors are further configured to obtain directional data from memory, another device, or both.

조항 23은 조항 1 내지 조항 5, 조항 21, 또는 조항 22 중 어느 것의 디바이스를 포함하고, 여기서 하나 이상의 프로세서들은, 하나 이상의 빔포밍된 오디오 신호들을 생성하기 위해 하나 이상의 입력 오디오 신호들에 대해 빔포밍을 수행하고; 하나 이상의 스피치 오디오 신호들을 생성하기 위해 하나 이상의 입력 오디오 신호들에서 스피치를 검출하고; 줌 타겟에 기초하여, 하나 이상의 이득 조정된 오디오 신호들을 생성하기 위해 하나 이상의 빔포밍된 오디오 신호들, 하나 이상의 스피치 오디오 신호들, 또는 이들의 조합에 하나 이상의 이득들을 적용하고; 하나 이상의 이득 조정된 오디오 신호들에 적어도 부분적으로 기초하여, 하나 이상의 오디오 소스들의 컨텍스트 데이터를 생성하고 - 하나 이상의 오디오 소스들 중 특정 오디오 소스의 컨텍스트 데이터는 특정 오디오 소스의 배향, 특정 오디오 소스의 거리, 특정 오디오 소스의 타입, 또는 이들의 조합을 표시함 -; 특정 오디오 소스의 타입에 기초하여 특정 오디오 소스의 지향성 데이터를 획득하고; 컨텍스트 데이터, 줌 배향, 및 줌 거리에 추가로 기초하여 하나 이상의 등화기 설정들을 결정하고; 하나 이상의 노이즈 억제된 오디오 신호들을 생성하기 위해 하나 이상의 이득 조정된 오디오 신호들에 노이즈 억제를 적용하고; 하나 이상의 등화기 설정들에 기초하여 하나 이상의 노이즈 억제된 오디오 신호들을 프로세싱함으로써 하나 이상의 출력 오디오 신호들을 생성하도록 추가로 구성된다.Clause 23 includes the device of any of clauses 1-5, 21, or 22, wherein the one or more processors perform beamforming on the one or more input audio signals to generate the one or more beamformed audio signals. to perform; detect speech in the one or more input audio signals to produce one or more speech audio signals; based on the zoom target, apply one or more gains to one or more beamformed audio signals, one or more speech audio signals, or a combination thereof to generate one or more gain-adjusted audio signals; generate context data of one or more audio sources based at least in part on the one or more gain-adjusted audio signals, wherein the context data of a particular audio source of the one or more audio sources includes an orientation of the particular audio source, a distance of the particular audio source, , indicating the type of specific audio source, or a combination thereof -; obtain directivity data of a specific audio source based on the type of the specific audio source; determine one or more equalizer settings further based on the context data, zoom orientation, and zoom distance; apply noise suppression to the one or more gain-adjusted audio signals to produce one or more noise suppressed audio signals; and generate one or more output audio signals by processing the one or more noise suppressed audio signals based on the one or more equalizer settings.

본 개시의 특정 양태들은 상호관련된 조항들의 제 2 세트로 하기에서 기술된다:Certain aspects of the present disclosure are described below in a second set of interrelated clauses:

조항 24에 따르면, 방법은: 디바이스에서, 하나 이상의 입력 오디오 신호들에 대응하는 하나 이상의 오디오 소스들의 지향성 데이터를 획득하는 단계; 디바이스에서, 지향성 데이터에 적어도 부분적으로 기초하여 하나 이상의 등화기 설정들을 결정하는 단계; 및 등화기 설정들에 기초하여, 하나 이상의 입력 오디오 신호들의 심리음향 강화 버전에 대응하는 하나 이상의 출력 오디오 신호들을 생성하는 단계를 포함한다.According to clause 24, a method comprises: obtaining, at a device, directional data of one or more audio sources corresponding to one or more input audio signals; determining, at the device, one or more equalizer settings based at least in part on the directivity data; and based on the equalizer settings, generating one or more output audio signals corresponding to the psychoacoustic enhanced version of the one or more input audio signals.

조항 25는, 디바이스에서, 오디오 줌 동작의 줌 타겟을 나타내는 사용자 입력을 수신하는 단계; 및 디바이스에서, 줌 타겟에 기초하여 하나 이상의 등화기 설정들을 결정하는 단계를 더 포함하고, 줌 타겟은 줌 위치, 줌 거리, 줌 배향, 하나 이상의 오디오 소스들 중 적어도 하나의 선택, 또는 이들의 조합을 포함한다.Clause 25 includes receiving, at the device, user input indicating a zoom target of an audio zoom operation; and determining, at the device, one or more equalizer settings based on a zoom target, wherein the zoom target is a zoom position, a zoom distance, a zoom orientation, a selection of at least one of one or more audio sources, or a combination thereof. includes

본 개시의 특정 양태들은 상호관련된 조항들의 제 3 세트로 하기에서 기술된다:Certain aspects of the present disclosure are described below in a third set of interrelated clauses:

조항 26에 따르면, 비일시적 컴퓨터 판독가능 매체는, 하나 이상의 프로세서들에 의해 실행될 때, 하나 이상의 프로세서들로 하여금: 하나 이상의 입력 오디오 신호들에 대응하는 하나 이상의 오디오 소스들의 지향성 데이터를 획득하게 하고; 지향성 데이터에 적어도 부분적으로 기초하여 하나 이상의 등화기 설정들을 결정하게 하고; 등화기 설정들에 기초하여, 하나 이상의 입력 오디오 신호들의 심리음향 강화 버전에 대응하는 하나 이상의 출력 오디오 신호들을 생성하게 하는 명령들을 저장한다.According to clause 26, the non-transitory computer-readable medium, when executed by one or more processors, causes the one or more processors to: obtain directional data of one or more audio sources corresponding to one or more input audio signals; determine one or more equalizer settings based at least in part on the directivity data; Stores instructions that cause, based on the equalizer settings, to generate one or more output audio signals corresponding to a psychoacoustic enhanced version of the one or more input audio signals.

조항 27은 조항 26의 비일시적 컴퓨터 판독가능 매체를 포함하고, 여기서 명령들은, 하나 이상의 프로세서들에 의해 실행될 때, 추가로, 하나 이상의 프로세서들로 하여금: 하나 이상의 빔포밍된 오디오 신호들을 생성하기 위해 하나 이상의 입력 오디오 신호들에 대해 빔포밍을 수행하게 하고; 그리고 하나 이상의 출력 오디오 신호들을 생성하기 위해 하나 이상의 빔포밍된 오디오 신호들에 기초하는 등화기 입력 오디오 신호를 프로세싱하게 한다.Clause 27 includes the non-transitory computer-readable medium of clause 26, wherein the instructions, when executed by the one or more processors, further cause the one or more processors to: generate one or more beamformed audio signals. perform beamforming on one or more input audio signals; and process an equalizer input audio signal based on the one or more beamformed audio signals to generate one or more output audio signals.

조항 28은 조항 26 또는 조항 27의 비일시적 컴퓨터 판독가능 매체를 포함하고, 여기서 명령들은, 하나 이상의 프로세서들에 의해 실행될 때, 추가로, 하나 이상의 프로세서들로 하여금: 하나 이상의 스피치 오디오 신호들을 생성하기 위해 하나 이상의 입력 오디오 신호들에 기초하는 스피치 검출 입력 오디오 신호에서 스피치를 식별하게 하고; 하나 이상의 출력 오디오 신호들을 생성하기 위해 하나 이상의 스피치 오디오 신호들에 기초하는 등화기 입력 오디오 신호를 프로세싱하게 한다.Clause 28 includes the non-transitory computer-readable medium of clause 26 or clause 27, wherein the instructions, when executed by the one or more processors, further cause the one or more processors to: generate one or more speech audio signals. detect speech based on the one or more input audio signals to identify speech in the input audio signal; Process an equalizer input audio signal based on the one or more speech audio signals to generate one or more output audio signals.

본 개시의 특정 양태들은 상호관련된 조항들의 제 4 세트로 하기에서 기술된다:Certain aspects of the present disclosure are described below in a fourth set of interrelated clauses:

조항 29에 따르면, 장치는: 하나 이상의 입력 오디오 신호들에 대응하는 하나 이상의 오디오 소스들의 지향성 데이터를 획득하기 위한 수단; 지향성 데이터에 적어도 부분적으로 기초하여 하나 이상의 등화기 설정들을 결정하기 위한 수단; 및 등화기 설정들에 기초하여, 하나 이상의 입력 오디오 신호들의 심리음향 강화 버전에 대응하는 하나 이상의 출력 오디오 신호들을 생성하기 위한 수단을 포함한다.According to clause 29, an apparatus comprising: means for obtaining directional data of one or more audio sources corresponding to one or more input audio signals; means for determining one or more equalizer settings based at least in part on the directivity data; and means for generating, based on the equalizer settings, one or more output audio signals corresponding to a psychoacoustic enhanced version of the one or more input audio signals.

조항 30은 조항 29의 장치를 포함하고, 여기서 획득하기 위한 수단, 결정하기 위한 수단, 및 생성하기 위한 수단은 가상 어시스턴트, 가전 제품, 스마트 디바이스, 사물 인터넷(IoT) 디바이스, 통신 디바이스, 헤드셋, 차량, 컴퓨터, 디스플레이 디바이스, 텔레비전, 게이밍 콘솔, 뮤직 플레이어, 라디오, 비디오 플레이어, 엔터테인먼트 유닛, 개인용 미디어 플레이어, 디지털 비디오 플레이어, 카메라, 또는 내비게이션 디바이스 중 적어도 하나에 통합된다.Clause 30 includes the apparatus of clause 29, wherein the means for obtaining, the means for determining, and the means for generating are virtual assistants, consumer electronics, smart devices, Internet of Things (IoT) devices, communication devices, headsets, vehicles. , a computer, display device, television, gaming console, music player, radio, video player, entertainment unit, personal media player, digital video player, camera, or navigation device.

당업자는 또한, 본원에 개시된 구현들과 관련하여 설명된 다양한 예시적인 논리 블록들, 구성들, 모듈들, 회로들, 및 알고리즘 단계들이 전자 하드웨어, 프로세서에 의해 실행되는 컴퓨터 소프트웨어, 또는 이 양자의 조합으로 구현될 수도 있음을 인식할 것이다. 다양한 예시적인 컴포넌트들, 블록들, 구성들, 모듈들, 회로들, 및 단계들이 일반적으로 그들의 기능성의 관점에서 위에서 설명되었다. 그러한 기능성이 하드웨어 또는 프로세서 실행가능한 명령들로 구현될지 여부는, 전체 시스템에 부과된 설계 제약 및 특정 애플리케이션에 의존한다. 당업자들은 각각의 특정 애플리케이션에 대해 다양한 방식들로 설명된 기능성을 구현할 수도 있으며, 이러한 구현 결정들은 본 개시의 범위로부터 벗어남을 야기하는 것으로서 해석되어서는 안된다.Those skilled in the art will also understand that the various illustrative logical blocks, configurations, modules, circuits, and algorithm steps described in connection with the implementations disclosed herein are electronic hardware, computer software executed by a processor, or combinations of both. It will be appreciated that it may be implemented as Various illustrative components, blocks, configurations, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented in hardware or processor executable instructions depends on the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, and such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

본 명세서에 개시된 구현들과 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어에서, 프로세서에 의해 실행되는 소프트웨어 모듈에서, 또는 이들 양자의 조합에서 직접 구현될 수도 있다. 소프트웨어 모듈은 랜덤 액세스 메모리 (RAM), 플래시 메모리, 판독 전용 메모리 (ROM), 프로그래밍가능 판독 전용 메모리 (PROM), 소거가능 프로그래밍가능 판독 전용 메모리 (EPROM), 전기적으로 소거가능 프로그래밍가능 판독 전용 메모리 (EEPROM), 레지스터, 하드디스크, 착탈형 디스크, 콤팩트 디스크 판독 전용 메모리 (CD-ROM), 또는 당업계에 알려져 있는 임의의 다른 형태의 비일시적 저장 매체에 상주할 수도 있다. 예시적인 저장 매체는, 프로세서가 저장 매체로부터 정보를 판독할 수도 있고 저장 매체에 정보를 기입할 수도 있도록 프로세서에 결합된다. 다르게는, 저장 매체는 프로세서에 통합될 수도 있다. 프로세서 및 저장 매체는 ASIC (application-specific integrated circuit) 에 상주할 수도 있다. ASIC 은 컴퓨팅 디바이스 또는 사용자 단말기에 상주할 수도 있다. 대안으로, 프로세서 및 저장 매체는 컴퓨팅 디바이스 또는 사용자 단말기에 별개의 컴포넌트들로서 상주할 수도 있다.The steps of a method or algorithm described in connection with the implementations disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of both. A software module may include random access memory (RAM), flash memory, read only memory (ROM), programmable read only memory (PROM), erasable programmable read only memory (EPROM), electrically erasable programmable read only memory ( EEPROM), registers, hard disk, removable disk, compact disk read-only memory (CD-ROM), or any other form of non-transitory storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. Alternatively, the storage medium may be integral to the processor. The processor and storage medium may reside in an application-specific integrated circuit (ASIC). An ASIC may reside on a computing device or user terminal. Alternatively, the processor and storage medium may reside as separate components in a computing device or user terminal.

개시된 실시형태들의 상기 설명은 당업자로 하여금 개시된 실시형태들을 제조 또는 이용할 수 있게 하기 위해 제공된다. 이들 양태들에 대한 다양한 변형들은 당업자들에게 명백할 것이며, 본원에서 정의된 원리들은 본 개시의 범위로부터 일탈함이 없이 다른 양태들에 적용될 수도 있다. 따라서, 본 개시는 본원에서 나타낸 양태들에 한정하려는 것이 아니라, 다음 청구항들에 의해 정의되는 바와 같은 원리들 및 신규한 특징들과 가능한 부합하는 최광의의 범위를 부여하려는 것이다.The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the disclosed embodiments. Various modifications to these aspects will be readily apparent to those skilled in the art, and the principles defined herein may be applied to other aspects without departing from the scope of the present disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope possible consistent with the principles and novel features as defined by the following claims.

Claims

As a device,
a memory configured to store directional data of one or more audio sources corresponding to one or more input audio signals; and
determine one or more equalizer settings based at least in part on the directivity data; and
generate one or more output audio signals corresponding to a psychoacoustic enhanced version of the one or more input audio signals based on the equalizer settings;
A device comprising one or more processors configured.

According to claim 1,
wherein the psychoacoustic enhancement version approximates a frequency response of the one or more audio sources at a zoom orientation and zoom distance associated with an audio zoom operation.

According to claim 1,
The one or more processors may further:
receive user input indicating a zoom target of an audio zoom operation; and
determine the one or more equalizer settings based on the zoom target;
configured device.

According to claim 3,
wherein the zoom target comprises a zoom position, a zoom distance, a zoom orientation, a selection of at least one of the one or more audio sources, or a combination thereof.

According to claim 1,
wherein the directional data of a particular audio source of the one or more audio sources indicates orientation and distance frequency response characteristics of the particular audio source.

According to claim 1,
The one or more processors may further:
performing beamforming on the one or more input audio signals to generate one or more beamformed audio signals; and
process an equalizer input audio signal based on the one or more beamformed audio signals to generate the one or more output audio signals;
configured device.

According to claim 1,
The one or more processors may further:
identify speech in a speech detection input audio signal based on the one or more input audio signals to generate one or more speech audio signals; and
process an equalizer input audio signal based on the one or more speech audio signals to generate the one or more output audio signals;
configured device.

According to claim 1,
The one or more processors may further:
based on a zoom target, apply one or more gains to a gain adjuster input audio signal based on the one or more input audio signals to generate a gain-adjusted audio signal; and
process an equalizer input audio signal based on the gain-adjusted audio signal to generate the one or more output audio signals;
configured device.

According to claim 1,
The one or more processors may further:
a noise suppressor based on the one or more input audio signals performs noise suppression on an input audio signal to generate a noise suppressed audio signal; and
process an equalizer input audio signal based on the noise suppressed audio signal to generate the one or more output audio signals;
configured device.

According to claim 1,
The one or more processors may further:
processing a context detector input audio signal based on the one or more input audio signals to generate context data of the one or more audio sources, wherein the context data of a particular audio source of the one or more audio sources is processing the context detector input audio signal indicative of an orientation of the specific audio source, a distance of the specific audio source, a type of the specific audio source, or a combination thereof; and
Obtain directivity data of the specific audio source based on the type of the specific audio source
configured device.

According to claim 10,
wherein the one or more processors are further configured to generate the context data based at least in part on image data associated with the one or more input audio signals.

According to claim 11,
wherein the one or more processors are further configured to retrieve the image data and the one or more input audio signals from a memory.

According to claim 11,
and one or more microphones coupled to the one or more processors and configured to generate the one or more input audio signals.

According to claim 11,
and a camera coupled to the one or more processors and configured to generate the image data.

According to claim 11,
The one or more processors are further configured to perform audio source recognition based on the one or more input audio signals, the image data, or both to identify the type of the specific audio source of the one or more audio sources. device.

According to claim 11,
The one or more processors may further:
perform audio source recognition based on the one or more input audio signals, the image data, or both, to determine the specific one of the one or more audio sources; and
perform image analysis on the image data to determine an orientation of the particular audio source;
configured device.

According to claim 11,
The one or more processors may further:
perform audio source recognition based on the one or more input audio signals, the image data, or both, to determine the specific one of the one or more audio sources; and
perform a distance analysis on the one or more input audio signals, the image data, or both to determine a distance of the particular audio source;
configured device.

According to claim 10,
The one or more processors may further configure the one or more equalizer settings based on equalizer setting data associating the one or more equalizer settings with the context data, the directivity data, zoom orientation, zoom distance, or a combination thereof. A device configured to select.

According to claim 18,
wherein the one or more processors are further configured to obtain the equalizer setting data from the memory, another device, or both.

According to claim 10,
wherein the one or more processors are further configured to select the one or more equalizer settings to reduce a frequency response corresponding to intermediate frequencies.

According to claim 1,
The one or more processors may further:
generate, at a first time, a first sound spectrum of a first input audio signal corresponding to a specific one of the one or more audio sources;
at a second time, generate a second sound spectrum of a second input audio signal corresponding to the specific audio source; and
and that the difference between the first distance and the first orientation at the first time and the second distance and the second orientation at the second time corresponds to the difference between the first sound spectrum and the second sound spectrum. to update the directivity data to
configured device.

According to claim 1,
wherein the one or more processors are further configured to obtain the directional data from the memory, another device, or both.

According to claim 1,
The one or more processors may further:
performing beamforming on the one or more input audio signals to generate one or more beamformed audio signals;
detect speech in the one or more input audio signals to generate one or more speech audio signals;
based on a zoom target, apply one or more gains to the one or more beamformed audio signals, the one or more speech audio signals, or a combination thereof to generate one or more gain-adjusted audio signals;
generating context data of the one or more audio sources based at least in part on the one or more gain-adjusted audio signals, wherein the context data of the specific one of the one or more audio sources is of the specific audio source. generating context data of the one or more audio sources indicating an orientation, a distance of the particular audio source, the type of the particular audio source, or a combination thereof;
obtain the directivity data of the specific audio source based on the type of the specific audio source;
determine the one or more equalizer settings further based on the context data, zoom orientation, and zoom distance;
apply noise suppression to the one or more gain-adjusted audio signals to produce one or more noise suppressed audio signals; and
generate the one or more output audio signals by processing the one or more noise suppressed audio signals based on the one or more equalizer settings;
configured device.

As a method,
obtaining, at the device, directional data of one or more audio sources corresponding to one or more input audio signals;
determining, at the device, one or more equalizer settings based at least in part on the directivity data; and
generating one or more output audio signals corresponding to a psychoacoustic enhanced version of the one or more input audio signals based on the equalizer settings.

25. The method of claim 24,
Receiving, in the device, a user input indicating a zoom target of an audio zoom operation; and
further comprising, in the device, determining the one or more equalizer settings based on the zoom target;
wherein the zoom target comprises a zoom position, a zoom distance, a zoom orientation, a selection of at least one of the one or more audio sources, or a combination thereof.

A non-transitory computer readable medium storing instructions,
The instructions, when executed by one or more processors, cause the one or more processors to:
obtain directional data of one or more audio sources corresponding to one or more input audio signals;
determine one or more equalizer settings based at least in part on the directivity data; and
generate one or more output audio signals corresponding to a psychoacoustic enhanced version of the one or more input audio signals based on the equalizer settings.

27. The method of claim 26,
The instructions, when executed by the one or more processors, further cause the one or more processors to:
perform beamforming on the one or more input audio signals to generate one or more beamformed audio signals; and
processing an equalizer input audio signal based on the one or more beamformed audio signals to generate the one or more output audio signals.

27. The method of claim 26,
The instructions, when executed by the one or more processors, further cause the one or more processors to:
identify speech in an input audio signal to generate one or more speech audio signals; and
processing an equalizer input audio signal based on the one or more speech audio signals to generate the one or more output audio signals.

As a device,
means for obtaining directional data of one or more audio sources corresponding to one or more input audio signals;
means for determining one or more equalizer settings based at least in part on the directivity data; and
means for generating one or more output audio signals corresponding to a psychoacoustic enhanced version of the one or more input audio signals based on the equalizer settings.

The method of claim 29,
The means for obtaining, the means for determining, and the means for generating are virtual assistants, household appliances, smart devices, Internet of Things (IoT) devices, communication devices, headsets, vehicles, computers, display devices, televisions, gaming A device integrated into at least one of a console, music player, radio, video player, entertainment unit, personal media player, digital video player, camera, or navigation device.