JP2023519596A

JP2023519596A - Systems and methods for enhancing audio in changing environments

Info

Publication number: JP2023519596A
Application number: JP2022559344A
Authority: JP
Inventors: ダーシー，ダニエル，ポール; ユイ，シュエメイ; イー－フオンヴォー，クララ; マリー，ステュワート; ルオ，リービン
Original assignee: ドルビーラボラトリーズライセンシングコーポレイション
Priority date: 2020-04-02
Filing date: 2021-04-02
Publication date: 2023-05-11
Also published as: US20230169989A1; CN115362499A; EP4128223A1; WO2021202956A1

Abstract

種々の環境音声条件を補償するために、会話ブースト及び音声イコライザ調整のためにユーザプロファイルを生成し及び使用する新規な方法及びシステム。環境音声ではないとき、ノイズ条件をシミュレートするよう、プロファイルを生成するために、合成された／予め記録された環境ノイズをメディアとミキシングすることができる。Novel methods and systems for generating and using user profiles for speech boosting and audio equalizer adjustments to compensate for various environmental audio conditions. When not ambient audio, synthesized/pre-recorded ambient noise can be mixed with media to generate profiles to simulate noise conditions.

Description

［関連出願の相互参照］
本願は、国際特許出願番号第PCT/CN２０２０/０８３０８３号、２０２０年４月２日出願、米国仮特許出願番号第６３/０１４,５０２号、２０２０年４月２３日出願、及び米国仮特許出願番号第６２/１２５,１３２号、２０２０年１２月１４日出願の優先権を主張する。これらの出願は、参照によりその全体がここに組み込まれる。 [Cross reference to related applications]
This application is filed April 2, 2020, International Patent Application No. PCT/CN2020/083083, U.S. Provisional Patent Application No. 63/014,502, filed April 23, 2020, and U.S. Provisional Patent Application No. No. 62/125,132, filed Dec. 14, 2020, is claimed. These applications are incorporated herein by reference in their entireties.

［技術分野］
本開示は、メディアコンテンツのオーディオ再生のための改良に関する。特に、本開示は、様々な環境で、特にモバイル装置上で再生されるメディアコンテンツのオーディオのための好適なノイズ補償の設定及び適用に関する。 [Technical field]
The present disclosure relates to improvements for audio playback of media content. In particular, the present disclosure relates to setting and applying suitable noise compensation for audio of media content played in various environments, especially on mobile devices.

会話を有するメディア（映画、テレビ番組、等）のオーディオ再生は、通常、家庭や劇場のような比較的静かな環境で楽しまれるように制作されている。しかしながら、人々が彼らのモバイル装置により外出先でそのようなコンテンツを消費することが益々一般的になっている。これは、非常に多くの環境ノイズがあるとき（車両のノイズ、人混み、等）、又はモバイルハードウェアのオーディオ品質限界若しくは使用されるオーディオ再生機器の種類（ヘッドフォン等）により、俳優が何を言っているのか理解するのが困難になるため、問題になっている。 Audio playback of media with dialogue (movies, television shows, etc.) is typically designed to be enjoyed in relatively quiet environments such as homes and theaters. However, it is becoming more common for people to consume such content on the go with their mobile devices. This can affect what an actor says when there is too much environmental noise (vehicle noise, crowds, etc.), or due to the audio quality limitations of mobile hardware or the type of audio playback device used (headphones, etc.). This is a problem because it makes it difficult to understand what is happening.

一般的な解決策は、ノイズ除去ヘッドフォン／イヤフォンを使用することである。しかしながら、これは、高価なソリューションになり、ユーザが聞きたい可能性のある環境ノイズ（車のクラクション、サイレン、叫び声の警告、等）を遮断してしまうという欠点を有する。 A common solution is to use noise canceling headphones/earbuds. However, this is an expensive solution and has the drawback of blocking out environmental noises that the user may want to hear (car horns, sirens, yell warnings, etc.).

種々のオーディオ処理システム及び方法が、本明細書に開示される。 Various audio processing systems and methods are disclosed herein.

第１の態様によると、モバイル装置のユーザのために、環境ノイズを伴う使用のため前記にモバイル装置を設定する方法であって、
前記ユーザから、前記環境ノイズのケース識別を受信するステップと、
前記ユーザから、前記環境ノイズのノイズレベルを受信するステップと、
前記ユーザから、前記ノイズレベルにおける前記環境ノイズの会話ブーストレベルを受信するステップと、
前記ユーザから、前記ノイズレベルにおける前記環境ノイズのグラフィックイコライザ設定を受信するステップと、
前記ユーザが前記会話ブーストレベル及び前記グラフィックイコライザ設定を設定している間、前記モバイル装置から、前記ユーザのサンプルオーディオを再生するステップと、
プロファイル内のノイズレベルにおける前記ケース識別の前記会話ブーストレベル及び前記グラフィックイコライザ設定を、前記モバイル装置に格納するステップであって、前記装置は、前記ユーザにより前記プロファイルが選択されるとき、前記会話ブーストレベル及び前記グラフィックイコライザ設定を用いてオーディオメディアを再生するよう構成される、ステップと、
を含む方法が記載される。 According to a first aspect, for a user of a mobile device, a method of configuring said mobile device for use with environmental noise, comprising:
receiving from the user a case identification of the environmental noise;
receiving from the user a noise level of the environmental noise;
receiving from the user a speech boost level of the environmental noise at the noise level;
receiving from the user a graphic equalizer setting for the ambient noise at the noise level;
playing sample audio of the user from the mobile device while the user sets the dialogue boost level and the graphic equalizer settings;
storing the speech boost level and the graphic equalizer settings of the case identification at noise levels in a profile on the mobile device, wherein the device activates the speech boost when the profile is selected by the user; configured to play audio media using levels and the graphic equalizer settings;
A method is described comprising:

第２の態様によると、ユーザのためにモバイル装置のオーディオを調整する方法であって、
前記ユーザからプロファイル選択を受信するステップであて、前記プロファイル選択は、少なくとも環境ノイズ条件に関連するステップと、
前記ユーザから前記環境ノイズ条件のノイズレベルを受信するステップと、
前記モバイル装置にあるメモリから、会話ブーストレベル及びグラフィックイコライザ設定を読み出すステップと、
前記会話ブーストレベル及び前記グラフィックイコライザ設定を用いて前記オーディオのレベルを調整するステップと、
を含む方法が記載される。 According to a second aspect, a method of adjusting mobile device audio for a user, comprising:
receiving a profile selection from the user, the profile selection relating to at least environmental noise conditions;
receiving from the user a noise level for the environmental noise condition;
reading speech boost levels and graphic equalizer settings from memory at the mobile device;
adjusting the level of the audio using the dialog boost level and the graphic equalizer settings;
A method is described comprising:

本明細書に記載された方法の一部又は全部は、１つ又は複数の非一時的媒体に記憶された命令（例えば、ソフトウェア）に従って１つ又は複数の装置によって実行され得る。このような非一時的媒体は、ランダムアクセスメモリ（RAM）、読み出し専用メモリ（ROM）、等を含むがこれらに限定されない、本願明細書に記載のようなメモリ装置を含んでよい。従って、本開示に記載された主題の種々の革新的な態様は、ソフトウェアを格納した非一時的媒体で実施することができる。ソフトウェアは、例えば、本願明細書に開示されるような、制御システムの１つ以上のコンポーネントにより実行可能であってよい。ソフトウェアは、例えば、本願明細書に開示される方法のうちの１つ以上を実行するための命令を含んでよい。 Some or all of the methods described herein may be performed by one or more devices according to instructions (eg, software) stored on one or more non-transitory media. Such non-transitory media may include memory devices such as those described herein, including but not limited to random access memory (RAM), read only memory (ROM), and the like. Accordingly, various innovative aspects of the subject matter described in this disclosure can be embodied in non-transitory media containing software. Software may be executable by one or more components of a control system, for example, as disclosed herein. Software may, for example, include instructions for performing one or more of the methods disclosed herein.

本開示の少なくとも幾つかの態様は、装置又は装置を介して実施することができる。例えば、１つ以上の装置は、本願明細書に開示した方法を少なくとも部分的に実行するよう構成されてよい。幾つかの実装では、機器は、インタフェースシステム及び制御システムを含んでよい。インタフェースシステムは、１つ以上のネットワークインタフェース、制御システムとメモリシステムとの間の１つ以上のインタフェース、制御システムと別のデバイスとの間の１つ以上のインタフェース、及び/又は１つ以上の外部デバイスインタフェースを含んでもよい。 At least some aspects of the disclosure can be implemented via a device or devices. For example, one or more devices may be configured to at least partially perform the methods disclosed herein. In some implementations, the device may include an interface system and a control system. The interface system may include one or more network interfaces, one or more interfaces between the control system and the memory system, one or more interfaces between the control system and another device, and/or one or more external It may also include a device interface.

本願明細書に記載の主題の１つ以上の実装の詳細は、添付の図面及び以下の説明において説明される。他の特徴、態様、及び利点は、説明、図面、及び特許請求の範囲から明らかになる。以下の図面の相対的寸法は縮尺通りに描かれないことがある。種々の図面における参照番号及び呼称と同様に、一般に、種々の図面における参照番号及び呼称は同様の要素を示すが、種々の参照番号は、種々の図面間における種々の要素を必ずしも示すものではない。 Details of one or more implementations of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages will become apparent from the description, drawings, and claims. Relative dimensions in the following drawings may not be drawn to scale. Like reference numbers and designations in different drawings, reference numbers and designations in the various drawings generally indicate similar elements, but different reference numbers do not necessarily indicate different elements between the different drawings. .

ケース固有オーディオ設定を設定するための例示的なフローチャートを示す。4 shows an exemplary flowchart for setting case-specific audio settings.

ケース固有オーディオ設定を利用するための例示的なフローチャートを示す。4 shows an exemplary flow chart for utilizing case-specific audio settings.

環境ノイズを合成することを含む、ケース固有オーディオ設定を設定するための例示的なフローチャートを示す。4 shows an exemplary flow chart for setting case-specific audio settings including synthesizing environmental noise.

主観テスト経験により取得される非調整及び調整オーディオの例示的な比較を示す。4 shows an exemplary comparison of unconditioned and conditioned audio obtained through subjective test experience.

ある種類のコーデックに適用される会話ブーストの例示的な周波数応答曲線であって、出力の会話セグメントの応答曲線を示す。FIG. 4 is an exemplary frequency response curve of speech boost applied to a codec of a type showing the response curve of an output speech segment; FIG. ある種類のコーデックに適用される会話ブーストの例示的な周波数応答曲線であって、出力の非会話セグメントの応答曲線を示す。FIG. 10 is an exemplary frequency response curve of speech boost applied to a codec of one type, showing the response curve of a non-speech segment of the output; FIG.

図５A及び５Bと異なる種類のコーデックに適用される会話ブーストの例示的な周波数応答曲線であって、出力の会話セグメントの応答曲線を示す。FIG. 5C is an exemplary frequency response curve of speech boost applied to a codec of a different type than FIGS. 5A and 5B, showing the response curve of the output speech segment; 図５A及び５Bと異なる種類のコーデックに適用される会話ブーストの例示的な周波数応答曲線であって、出力の非会話セグメントの応答曲線を示す。FIG. 5B is an exemplary frequency response curve of speech boost applied to a codec of a different type than FIGS.

本願の方法のための例示的なグラフィカルユーザインタフェースを示す。Figure 3 shows an exemplary graphical user interface for the method of the present application;

本願の方法の例示的なハードウェア／ソフトウェア構成示す。1 illustrates an exemplary hardware/software configuration for the method of the present application;

ノイズの多い環境（環境ノイズ）でメディア再生（オーディオ又はオーディオ／ビジュアル）中の明瞭な会話を提供するという問題に対するソリューションが、特定のノイズレベル及びタイプ（環境タイプ）における特定のユーザのプロファイル内の会話ブースト及びイコライザ設定を生成し及び使用することにより、本願明細書に記載される。 A solution to the problem of providing clear speech during media playback (audio or audio/visual) in noisy environments (environmental noise) is to reduce noise within a particular user's profile at a particular noise level and type (environment type) Generating and using speech boost and equalizer settings is described herein.

本願明細書で使用される用語「モバイル装置」は、オーディオ再生が可能な及び携帯して複数の場所で使用することが可能なユーザの装置を表す。例えば、携帯電話機、ラップトップコンピュータ、タブレットコンピュータ、モバイルゲームシステム、ウェアラブル装置、小型メディアプレイヤ、等を含む。 As used herein, the term "mobile device" refers to a user's device that is capable of playing audio and that can be carried and used in multiple locations. Examples include mobile phones, laptop computers, tablet computers, mobile gaming systems, wearable devices, small media players, and the like.

本願明細書で使用される用語「環境条件」又は「ケース」又は「ケース識別」は、モバイル装置上でオーディオメディアを聴く楽しみを妨害することもあり又はそうでないこともあるノイズの多い場所／環境のカテゴリを表す。例えば、家庭（例えば「デフォルト」）、居住地域（例えば、ウォーキング）、公共交通機関、騒々しい屋内環境（例えば、空港）、等を含む。 The terms "environmental conditions" or "case" or "case identification" as used herein refer to noisy locations/environments that may or may not interfere with the enjoyment of listening to audio media on a mobile device. represents the category of Examples include homes (eg, "default"), residential areas (eg, walking), public transportation, noisy indoor environments (eg, airports), and the like.

用語「会話ブースト」は、非会話成分の無視できる増幅を伴う、オーディオの会話成分の一般的な音声増幅の適用を表す。例えば、会話ブーストは、再生されているオーディオを連続的に監視し、会話の存在を検出し、オーディオコンテンツの話されている部分の明瞭さを向上する処理を動的に適用するアルゴリズムとして実行できる。幾つかの実施形態では、会話ブーストは、オーディオ信号から特徴を分析し、パターン認識システムを適用して、刻一刻と会話の存在を検出する。会話が検出されると、会話スペクトルが必要に応じて変更されて、聴者が会話をより簡潔に聞くことができるように、会話コンテンツを強調する。 The term "speech boost" refers to the application of general speech amplification of speech components of audio with negligible amplification of non-speech components. For example, speech boost can be implemented as an algorithm that continuously monitors the audio being played, detects the presence of speech, and dynamically applies processing that enhances the clarity of spoken portions of the audio content. . In some embodiments, speech boost analyzes features from the audio signal and applies a pattern recognition system to detect the presence of speech moment by moment. When speech is detected, the speech spectrum is modified as needed to emphasize speech content so that the listener can hear the speech more concisely.

用語「等化」又は「グラフィックイコライザ」又は「GED」は、オーディオの周波数に基づく振幅調整を表す。真のGEDでは、振幅設定は、スライダにより設定され、スライダの位置が該スライダの制御する周波数範囲に対応する。しかし、本願明細書では、GEDは、特定の周波数応答曲線を与える、グラフィックイコライザが有し得る特定の設定も表す。 The term "equalization" or "graphic equalizer" or "GED" refers to frequency-based amplitude adjustment of audio. In a true GED, the amplitude setting is set by a slider and the position of the slider corresponds to the frequency range it controls. However, here GED also represents a specific setting that a graphic equalizer can have that gives a specific frequency response curve.

本願明細書で使用される用語「メディア」又は「コンテンツ」は、オーディオコンテンツを有する何らかのものを表す。これは、音楽、映画、ビデオ、ビデオゲーム、電話の会話、警報、等であり得る。特に、本願明細書に記載のシステム及び方法は、会話成分と非会話成分の組み合わせを有するメディアについて、最も有用であるが、システム及び方法は任意のメディアに適用できる。 As used herein, the terms "media" or "content" refer to anything that has audio content. This can be music, movies, videos, video games, phone conversations, alarms, and the like. In particular, the systems and methods described herein are most useful for media having a combination of conversational and non-conversational components, but the systems and methods are applicable to any media.

図１は、異なる環境条件（ケース）についてプロファイルを生成する例示的なフローチャートを示す。ユーザは、装置ユーザインタフェース（UI）から設定を開始することを選択し（１１０）、例示的な再生サンプルがユーザのために再生される（１４０）。このサンプルは、システムにより又はユーザにより選択できる。ユーザは、どんな音量で再生されるかを選択する（１４１）。ユーザは、次に、自動で又は手動で（ユーザが選択する）、異なる環境ノイズケース（デフォルト、ウォーキング、公共交通機関、空港、等）を使用できる（１４２）。システムは、全部のケース、又は１つの選択されたケースを含む選択部分集合を利用できる。選択されたケースがデフォルトケースではない場合、ユーザは、それらの現在の状況について推定環境ノイズレベル（１２５）、会話ブーストレベル（１３０）、及び組み合わせてユーザに彼らの主観意見において最適聴取経験を与えるグラフィックイコライザ（graphic equalizer （GEQ））設定（１３５）を入力する。これらの設定１２５、１３０は、任意の順序で（必ずしも図示の順序ではない）複数回行うことができ、会話成分を有するオーディオのサンプル再生１４０に基づき設定される。会話ブースト及びGEQ設定１３０、１３５がユーザの好みに従い設定されると、それらは、将来の使用のために、プロファイルのデータベース／メモリ１４５に格納される。システムは、次に、全部の適用可能なケースが設定されたかどうかを決定できる（１１５）。設定された場合、設定は終了する（１５０）。設定されていない場合、次に、システムは、次のケース１４２に進み、そのケースについて設定処理を繰り返す。幾つかの実施形態では、保存された設定プロファイルは、また、注入されたノイズレベルに基づきインデックス付けされる（１２５）。 FIG. 1 shows an exemplary flow chart for generating profiles for different environmental conditions (cases). The user selects (110) to initiate setup from the device user interface (UI) and an exemplary playback sample is played (140) for the user. This sample can be selected by the system or by the user. The user selects (141) what volume to play. The user can then automatically or manually (selected by the user) use different environmental noise cases (default, walking, public transport, airport, etc.) (142). The system can utilize a selected subset containing all cases or one selected case. If the selected case is not the default case, the user can set the estimated ambient noise level (125), the speech boost level (130) for their current situation, and in combination give the user the optimal listening experience in their subjective opinion. Enter the graphic equalizer (GEQ) settings (135). These settings 125, 130 can be done multiple times in any order (not necessarily the order shown) and are set based on a sample playback 140 of audio with dialogue components. Once the speech boost and GEQ settings 130, 135 are set according to the user's preferences, they are stored in the profile database/memory 145 for future use. The system can then determine whether all applicable cases have been set (115). If so, the setup ends (150). If not, then the system proceeds to the next case 142 and repeats the setup process for that case. In some embodiments, the saved setting profiles are also indexed (125) based on the injected noise level.

幾つかの実施形態では、会話ブーストレベル及び／又はGEQ設定は、可能な設定の短いリストからの、各々１つの値である。例えば、０～５の範囲からの「３」である。幾つかの実施形態では、設定は、設定に関連する実数値、（例えば、特定の周波数範囲で）例えば＋１０dBである。 In some embodiments, the speech boost level and/or GEQ settings are each one value from a short list of possible settings. For example, "3" from the range 0-5. In some embodiments, the setting is a real value associated with the setting, eg +10 dB (eg, at a particular frequency range).

図２は、本願明細書に記載の方法に従い生成されたプロファイルを使用する例示的なフローチャートを示す。ユーザは、彼らのメディアを開始し（２０５）、彼らの現在の状況を最適に記述するケースプロファイルを選択する（２１０）。プロファイルが環境ノイズレベルに基づきインデックス付けされている場合、それも選択できる。次に、システムは、選択したケース（及び、適用可能な場合には、ノイズレベル）に一致するプロファイルを、データベース／メモリ２５から読み出す（２１０）。システムは、次に、再生がモビリティ状況（つまり、会話ブースト及びGEQ調整を必要とする状況）であるかどうかを決定する（２２０）。この決定は、ユーザ入力、装置識別、位置データ、又は他の手段からできる。システムが、モビリティ状況ではないと決定した場合、ユーザにとって環境ノイズ２５０存在するとき、再生されているメディア２４５からの通常の再生／ミキシングが生じる（２４０）。これは、新しいプロファイルが読み出され（２１０）、再び処理が開始する点である、ケースプロファイルが変更されるまで続く（２５５）。幾つかの実施形態では、ケース切り換え２５５により又はその前に、新しいモビリティ状態チェックが実行され、モビリティ状況が存在する場合に処理が単に繰り返される。モビリティ状況が存在すると分かった場合（２２０）、会話ブースト２３０及びGEQ調整２３５がミキシング２４０に適用され、メディア再生２４５を調整して、環境ノイズ２５０にも拘わらず明瞭な会話を提供する。 FIG. 2 shows an exemplary flow chart for using profiles generated according to the methods described herein. The user initiates their media (205) and selects a case profile that best describes their current situation (210). If the profile is indexed based on ambient noise level, that can also be selected. Next, the system retrieves from database/memory 25 a profile that matches the selected case (and noise level, if applicable) (210). The system then determines (220) whether playback is in a mobility situation (ie, a situation requiring speech boost and GEQ adjustment). This determination can be from user input, device identification, location data, or other means. If the system determines that there is no mobility situation, normal playback/mixing from the media 245 being played occurs (240) when environmental noise 250 is present for the user. This continues until the case profile is changed (255), at which point a new profile is read (210) and processing begins again. In some embodiments, a new mobility state check is performed by or before case switch 255 and the process is simply repeated if mobility conditions exist. If mobility conditions are found 220 to exist, speech boost 230 and GEQ adjustments 235 are applied to mixing 240 to adjust media playback 245 to provide clear speech despite ambient noise 250 .

図３は、（図１及び２に提供されたようなメディアと実際の環境ノイズの実際のミキシングとは異なり）メディアと環境ノイズの仮想的ミキシングのための合成環境ノイズの使用を含む、プロファイルを生成する例示的なフローチャートを示す。システムは、図１のものと同様であるが、ユーザがノイズのある場所でプロファイルを生成しているか又は比較的ノイズの少ない環境（例えば、家庭）からのケースを予め設定しているかを調べるためのチェックが行われる（３１０）点が異なる。このチェックは、ユーザに問い合わせることにより、又はモバイル装置が「家庭」にあることを決定する位置サービスにより、決定できる。幾つかの実施形態では、システムは、常に、ユーザが比較的ノイズの少ない環境にいると想定している。ユーザがその場所にいない場合、ユーザ又はシステムにより、ケース（環境ノイズ状態）が選択される（３２０）。そのケースの環境ノイズは、合成される（３３０）。幾つかの実施形態では、これは、データベース／メモリ３４０に保存された予め記録されたノイズであることができ、又はそれに基づくことができる。このノイズは再生サンプル３５０に追加され、ノイズレベル３６０が、ユーザが経験すると予想されるレベルについて、ユーザにより設定できる。それにより、シミュレートされるノイズを調整する（３３０）。会話ブーストレベル３７０及びGEQレベル３８０は、その場（at-location）設定が実行されるのと同じ方法で設定できる。設定は、次に、将来の使用のためにデータベース／メモリ３９０に保存される。幾つかの実施形態では、記録された環境ノイズは、環境音源（ambisonic source）から取り入れられ、バイノーラルフォーマットにレンダリングされる。 FIG. 3 shows a profile including the use of synthetic ambient noise for virtual mixing of media and ambient noise (as opposed to the actual mixing of media and actual ambient noise as provided in FIGS. 1 and 2). 4 shows an exemplary flow chart for generating; The system is similar to that of FIG. 1, but to see if the user is generating a profile in a noisy location or presetting cases from a relatively quiet environment (e.g., home). is checked (310). This check can be determined by querying the user or by location services determining that the mobile device is "at home". In some embodiments, the system always assumes that the user is in a relatively quiet environment. If the user is not at the location, the case (environmental noise conditions) is selected 320 by the user or the system. The ambient noise for that case is synthesized (330). In some embodiments, this can be or be based on pre-recorded noise stored in database/memory 340 . This noise is added to the reproduced sample 350 and the noise level 360 can be set by the user for the level the user expects to experience. The simulated noise is thereby adjusted (330). The speech boost level 370 and GEQ level 380 can be set in the same way that at-location setting is performed. The settings are then saved in database/memory 390 for future use. In some embodiments, recorded ambient noise is taken from an ambisonic source and rendered into a binaural format.

図４は、システムが生成できる知覚差の例、及び基準状態に対して行われた比較が性能を評価するためにどのように使用できるかを示す。分かるように、会話ブーストは、ブースト無しの基準状態よりも好適であり、会話の明瞭さを向上し、会話を理解することが重要であるメディアについてユーザに取って有益な調整を行う。例えば、図４は、レベル２の会話向上（dialog enhancement（DE））４１５が、高いレベルのユーザの好み４０５及び主観的明瞭さ４１０を示し、従って、多くのユーザにとって好適な設定であり得る。 FIG. 4 shows examples of perceptual differences that the system can generate and how comparisons made to a reference condition can be used to assess performance. As can be seen, the speech boost is better than the no-boost baseline, improving speech intelligibility and making adjustments beneficial to the user for media where understanding speech is important. For example, FIG. 4 shows that level 2 dialogue enhancement (DE) 415 indicates a high level of user preference 405 and subjective clarity 410 and may therefore be the preferred setting for many users.

図５A及び５B、並びに図６A及び６Bは、会話ブーストの例示的なグラフを示す。図５A及び５Bは、メディアの会話成分についての異なる会話ブースト設定のグラフを示す。図示のように、異なる設定は異なる曲線を示す。これに対して、図５B及び図６Bは、同じ異なる会話ブースト設定であるが、メディアの非会話成分のグラフを示す。異なる設定の曲線の間には無視できる差しかない（つまり、会話ブーストは、非会話成分をブーストしない）。図５A及び５Bは、図６A及び６Bよりも小さなレベル間隔を有する会話ブーストレベルを表す。異なる曲線は、環境がどのくらいノイズが多いかに依存して使用できる。環境ノイズが騒々しいほど、再生コンテンツ全体により多くの会話ブーストを有するという観点で、曲線がより活動的になる。図５Aは、オーディオの会話成分について、高い周波数５１０より低い周波数５０５で会話ブーストレベルが強いブーストを示す応答曲線を示す。これに対して、図６Aは、低い周波数６０５と比べて、高い周波数６１０で、強いブーストを示す。両方の場合に、図５B及び６Bは、非会話成分が全周波数に渡り、無視できるブーストしか有しないことを示す。 Figures 5A and 5B and Figures 6A and 6B show exemplary graphs of speech boost. Figures 5A and 5B show graphs of different dialogue boost settings for the dialogue component of the media. As shown, different settings show different curves. In contrast, Figures 5B and 6B show graphs of the non-speech component of the media, but with the same different speech boost settings. There is negligible difference between curves for different settings (ie speech boost does not boost non-speech components). Figures 5A and 5B represent speech boost levels with smaller level intervals than Figures 6A and 6B. Different curves can be used depending on how noisy the environment is. The louder the environmental noise, the more active the curve in terms of having more dialogue boost throughout the reproduced content. FIG. 5A shows a response curve showing a strong boost in the speech boost level at lower frequencies 505 than at higher frequencies 510 for the speech component of the audio. In contrast, FIG. 6A shows a strong boost at high frequencies 610 compared to low frequencies 605 . In both cases, FIGS. 5B and 6B show that the non-speech components have negligible boost across all frequencies.

図７は、プロファイルを設定するための例示的なUI（具体的に、この場合には、グラフィカルユーザインタフェース（GUI））を示す。モバイル装置７００において、設定のための入力は、使用を容易にするために、簡略化された形式で提示できる。ノイズレベル制御７１０は、例えばノイズ無しの０からノイズレベルが増加するにつれて均等に増大する、有限数（例えば、０～５）のノイズレベルとして提示できる（実際のdB又は知覚的段階で）。会話ブースト設定７２０は、ブースト無しから最大ブーストまで、グラフィカルスライダとして提示できる。同様に、GEQ設定７３０は、予め設定されたGEQ設定（例えば、「明朗」、「浅い」、「深い」、等のトーン）を選択するために、１つの値の単一の範囲に簡略化できる（ここではスライダとして示される）。ケース７４０は、（テキストを有する又は有しない）アイコンとして示すことができる。例えば、「デフォルト」は家として示すことができ、「ウォーキング」は人と共に示すことができ、「公共交通機関」は電車又はバスとして示すことができ、「屋内の場所」は（空港を示すために）飛行機と共に示すことができる。他のケース及びアイコンが使用でき、アイコンはユーザに該アイコンが表すケースへの素早い参照を提供する。 FIG. 7 shows an exemplary UI (specifically, in this case a graphical user interface (GUI)) for setting a profile. In mobile device 700, inputs for settings can be presented in a simplified form for ease of use. The noise level control 710 can be presented (in real dB or perceptual steps) as a finite number (eg, 0 to 5) of noise levels, eg, starting at 0 with no noise and increasing evenly as the noise level increases. A speech boost setting 720 can be presented as a graphical slider from no boost to maximum boost. Similarly, the GEQ settings 730 are simplified to a single range of one value for selecting preset GEQ settings (e.g., "bright", "shallow", "deep", etc. tones). (shown here as a slider). Case 740 can be shown as an icon (with or without text). For example, 'default' could be indicated as a house, 'walking' could be indicated with people, 'public transportation' could be indicated as a train or bus, and 'indoor location' could be indicated (to indicate an airport). ) can be shown with the plane. Other cases and icons can be used, and the icon provides the user with a quick reference to the case it represents.

図８は、実施形態による、本願明細書に記載される特徴及び処理を実施する例示的なモバイル装置アーキテクチャを示す。アーキテクチャ８００は、限定ではないが、デスクトップコンピュータ、消費者オーディオ／ビジュアル（AV）機器、無線放送機器、モバイル装置（例えば、スマートフォン、タブレットコンピュータ、ラップトップコンピュータ、ウェアラブル装置）、を含む任意の電子装置に実装することができる。示される例示的な実施形態では、アーキテクチャ８００は、スマートフォンのためのものであり、プロセッサ８０１、周辺機器インタフェース８０２、オーディオサブシステム８０３、スピーカ８０４、マイクロフォン８０５、センサ８０６（例えば、加速度計、ジャイロ、気圧計、磁気計、カメラ）、位置プロセッサ８０７（例えば、GNSS受信機）、無線通信サブシステム８０８（例えば、Wi-Fi、Bluetooth、セルラ）、及びタッチコントローラ８１０及び他の入力コントローラ８１１を含むI/Oサブシステム８０９、タッチ面８１２、及び他の入力／制御装置８１３を含む。メモリインタフェース８１４は、プロセッサ８０１、周辺機器インタフェース８０２、及びメモリ８１５（例えば、フラッシュ、RAM、ROM）に結合される。メモリ８１５は、限定ではないが、オペレーティングシステム命令８１６、通信命令８１７、GUI命令８１８、センサ処理命令８１９、電話命令８２０、電子メッセージング命令８２１、ウェブ閲覧命令８２２、オーディオ処理命令８２３、GNSS／ナビゲーション命令８２４、及びアプリケーション／データ８２５、を含むコンピュータプログラム命令及びデータを格納する。オーディオ処理命令８２３は、本願明細書に記載されたオーディオ処理を実行するための命令を含む。より多くの又は少ないコンポーネントを有する他のアーキテクチャも、開示の実施形態を実装するために使用できる。 FIG. 8 illustrates an exemplary mobile device architecture implementing features and processes described herein, according to an embodiment. Architecture 800 can be any electronic device including, but not limited to, desktop computers, consumer audio/visual (AV) devices, radio broadcast devices, mobile devices (e.g., smart phones, tablet computers, laptop computers, wearable devices). can be implemented in In the exemplary embodiment shown, architecture 800 is for a smart phone and includes processor 801, peripheral interface 802, audio subsystem 803, speaker 804, microphone 805, sensors 806 (e.g., accelerometer, gyro, barometer, magnetometer, camera), position processor 807 (e.g., GNSS receiver), wireless communication subsystem 808 (e.g., Wi-Fi, Bluetooth, cellular), and touch controller 810 and other input controllers 811. /O subsystem 809 , touch surface 812 , and other input/control devices 813 . Memory interface 814 is coupled to processor 801, peripherals interface 802, and memory 815 (eg, flash, RAM, ROM). Memory 815 stores, without limitation, operating system instructions 816, communication instructions 817, GUI instructions 818, sensor processing instructions 819, telephony instructions 820, electronic messaging instructions 821, web browsing instructions 822, audio processing instructions 823, GNSS/navigation instructions. It stores computer program instructions and data, including 824 and applications/data 825 . Audio processing instructions 823 include instructions for performing the audio processing described herein. Other architectures with more or fewer components can also be used to implement the disclosed embodiments.

システムは、リモートサーバからのサービス駆動型として、装置上のスタンドアロンプログラムとして、メディアプレイヤアプリケーションに統合されて、又はオペレーティングシステムの音声設定の部分のようなオペレーティングシステムの部分として含まれて、提供できる。 The system can be provided as service driven from a remote server, as a standalone program on the device, integrated into a media player application, or included as part of the operating system, such as the audio settings part of the operating system.

本開示の多くの実施形態が記述されてきた。しかしながら、本開示の真意及び範囲から逸脱することなく種々の修正を行うことができると理解されるであろう。したがって、他の態様は特許請求の範囲の範囲内にある。 A number of embodiments of the disclosure have been described. However, it will be understood that various modifications can be made without departing from the spirit and scope of this disclosure. Accordingly, other aspects are within the scope of the claims.

本願明細書に記載されたように、本発明の実施形態は、従って、以下に列挙される例示的な実施形態）のうちの１つ以上に関連してよい。従って、本発明は、限定ではないが、本発明の幾つかの部分の構造、特徴、及び機能を記載する以下の列挙される例示的な実施形態（EEE）を含む本願明細書に記載された形式のうちのいずれかにおいて具現化されてよい。 As described herein, embodiments of the present invention may therefore relate to one or more of the exemplary embodiments listed below. Accordingly, the present invention is described herein including, but not limited to, the following enumerated exemplary embodiments (EEE) that describe the structure, features, and functions of some portions of the invention. It may be embodied in any of the forms.

EEE１：モバイル装置のユーザのために、環境ノイズを伴う使用のため前記にモバイル装置を設定する方法であって、
前記ユーザから、前記環境ノイズのケース識別を受信するステップと、
前記ユーザから、前記環境ノイズのノイズレベルを受信するステップと、
前記ユーザから、前記ノイズレベルにおける前記環境ノイズの会話ブーストレベルを受信するステップと、
前記ユーザから、前記ノイズレベルにおける前記環境ノイズのグラフィックイコライザ設定を受信するステップと、
前記ユーザが前記会話ブーストレベル及び前記グラフィックイコライザ設定を設定している間、前記モバイル装置から、前記ユーザのサンプルオーディオを再生するステップと、
プロファイル内のノイズレベルにおける前記ケース識別の前記会話ブーストレベル及び前記グラフィックイコライザ設定を、前記モバイル装置に格納するステップであって、前記装置は、前記ユーザにより前記プロファイルが選択されるとき、前記会話ブーストレベル及び前記グラフィックイコライザ設定を用いてオーディオメディアを再生するよう構成される、ステップと、
を含む方法。 EEE1: For users of mobile devices, a method of configuring a mobile device for use with environmental noise, comprising:
receiving from the user a case identification of the environmental noise;
receiving from the user a noise level of the environmental noise;
receiving from the user a speech boost level of the environmental noise at the noise level;
receiving from the user a graphic equalizer setting for the ambient noise at the noise level;
playing sample audio of the user from the mobile device while the user sets the dialogue boost level and the graphic equalizer settings;
storing the speech boost level and the graphic equalizer settings of the case identification at noise levels in a profile on the mobile device, wherein the device activates the speech boost when the profile is selected by the user; configured to play audio media using levels and the graphic equalizer settings;
method including.

EEE２：前記ノイズレベルにおける前記環境ノイズをシミュレートするステップと、
前記サンプルオーディオを再生するステップの前に、前記シミュレートした環境ノイズを前記サンプルオーディオとミキシングするステップと、
を更に含むEEE１に記載の方法。 EEE2: simulating the environmental noise at the noise level;
mixing the simulated environmental noise with the sample audio prior to playing the sample audio;
The method of EEE1 further comprising:

EEE３：前記シミュレートするステップは、格納された予め記録された環境ノイズをメモリから読み出すステップを含む、EEE２に記載の方法。 EEE3: The method of EEE2, wherein the simulating step includes reading stored pre-recorded environmental noise from memory.

EEE４：前記格納された予め記録された環境ノイズはバイノーラルフォーマットである、EEE項３に記載の方法。 EEE4: The method of clause 3, wherein said stored pre-recorded environmental noise is in binaural format.

EEE５：前記モバイル装置上で、前記ケース識別、前記ノイズレベル、前記会話ブーストレベル、及び前記グラフィックイコライザ設定を設定するためのグラフィカルユーザインタフェース制御を提示するステップ、を更に含むEEE１～EEE４のいずれかに記載の方法。 EEE5: Any of EEE1-EEE4, further comprising presenting, on the mobile device, graphical user interface controls for setting the case identification, the noise level, the speech boost level, and the graphic equalizer settings. described method.

EEE６：
前記プロファイルは、前記ケース識別及び前記ノイズレベルの両方に対応する、EEE１～EEE５のいずれかに記載の方法。 EEE6:
The method of any of EEE1-EEE5, wherein the profile corresponds to both the case identification and the noise level.

EEE７：ユーザのためにモバイル装置のオーディオを調整する方法であって、
前記ユーザからプロファイル選択を受信するステップであて、前記プロファイル選択は、少なくとも環境ノイズ条件に関連するステップと、
前記ユーザから前記環境ノイズ条件のノイズレベルを受信するステップと、
前記モバイル装置にあるメモリから、会話ブーストレベル及びグラフィックイコライザ設定を読み出すステップと、
前記会話ブーストレベル及び前記グラフィックイコライザ設定を用いて前記オーディオのレベルを調整するステップと、
を含む方法。 EEE7: A method of adjusting mobile device audio for a user, comprising:
receiving a profile selection from the user, the profile selection relating to at least environmental noise conditions;
receiving from the user a noise level for the environmental noise condition;
reading speech boost levels and graphic equalizer settings from memory at the mobile device;
adjusting the level of the audio using the dialog boost level and the graphic equalizer settings;
method including.

EEE８：前記モバイル装置上で、環境ノイズ条件に対応するプロファイルを選択するためのグラフィカルユーザインタフェース制御を提示するステップ、を更に含むEEE７に記載の方法。 EEE8: The method of EEE7, further comprising presenting on the mobile device a graphical user interface control for selecting a profile corresponding to environmental noise conditions.

EEE９：EEE１～EEE８に記載の方法のうちの少なくとも１つで、ソフトウェア又はファームウェアで実行するよう構成される装置。 EEE9: A device configured to perform in software or firmware at least one of the methods described in EEE1-EEE8.

EEE１０：コンピュータにより読み取られると、EEE１～EEE８に記載の方法のうちの少なくとも１つを実行するよう前記コンピュータに指示する非一時的コンピュータ可読媒体。 EEE10: A non-transitory computer-readable medium that, when read by a computer, instructs said computer to perform at least one of the methods recited in EEE1-EEE8.

EEE１１：前記装置は、電話機であるEEE９に記載の装置。 EEE11: The device according to EEE9, wherein said device is a telephone.

EEE１２：前記装置は、セルラフォン、ラップトップコンピュータ、タブレットコンピュータ、モバイルゲームシステム、ウェアラブル装置、及び小型メディアプレイヤ、のうちの少なくとも１つである、EEE９に記載の装置。 EEE12: The device of EEE9, wherein the device is at least one of a cellular phone, a laptop computer, a tablet computer, a mobile gaming system, a wearable device, and a small media player.

EEE１３：前記ソフトウェア又はファームウェアは、前記装置のオペレーティングシステムの一部である、EEE９、EEE１１、又はEEE１２のいずれかに記載の装置。 EEE13: A device according to any of EEE9, EEE11 or EEE12, wherein said software or firmware is part of said device's operating system.

EEE１４：前記ソフトウェア又はファームウェアは、前記装置上でスタンドアロンプログラムを実行する、EEE９、EEE１１、又はEEE１２のいずれかに記載の装置。 EEE14: A device according to any of EEE9, EEE11, or EEE12, wherein said software or firmware runs a standalone program on said device.

EEE１５：前記方法は、モバイル装置のオペレーティングシステムにより実行される、EEE１～EEE８のいずれかに記載の装置。 EEE15: The apparatus according to any of EEE1-EEE8, wherein the method is executed by an operating system of the mobile device.

本開示は、本明細書に記載された幾つかの革新的な側面、及びこれらの革新的な側面が実施され得る文脈の例を記述する目的のための特定の実施を対象とする。しかしながら、本願明細書における教示は、種々の異なる方法で適用できる。更に、記載される実施形態は、種々のハードウェア、ソフトウェア、ファームウェア、等で実装されてよい。例えば、本願の態様は、少なくとも部分的に、機器、１つより多くの装置を含むシステム、方法、コンピュータプログラムプロダクト、等で実現されてよい。したがって、本願の態様は、ハードウェアの実施形態、ソフトウェアの実施形態（ファームウェア、常駐ソフトウェア、マイクロコード、等を含む）、及び／又はソフトウェアとハードウェアの態様の両者を組み合わせる実施形態の形式を取ってよい。このような実施形態は、本願明細書では、「回路」、「モジュール」、「装置」、「機器」、又は「エンジン」と呼ばれてよい。本願の幾つかの態様は、コンピュータ可読プログラムコードを実装された１つ以上の非一時的媒体に具現化されたコンピュータプログラムプロダクトの形式を取ってよい。このような非一時的媒体は、例えば、ハードディスク、ランダムアクセスメモリ（RAM）、読み出し専用メモリ（ROM）、消去可能なプログラマブル読み出し専用メモリ（EPROM又はフラッシュメモリ）、ポータブルコンパクトディスク読み出し専用メモリ（CD-ROM）、光記憶装置、磁気記憶装置、又はこれらの任意の適切な組み合わせを含んでよい。したがって、本開示の教示は、本願明細書に図示された及び／又は記載された実装に限定されず、むしろ広範な適用可能性を有する。 The present disclosure is directed to specific implementations for the purpose of describing some of the innovative aspects described herein and examples of contexts in which these innovative aspects can be implemented. However, the teachings herein can be applied in a variety of different ways. Moreover, the described embodiments may be implemented in various hardware, software, firmware, and the like. For example, aspects of the present application may be implemented, at least in part, in apparatus, systems including more than one device, methods, computer program products, and the like. Accordingly, aspects of the present application may take the form of hardware embodiments, software embodiments (including firmware, resident software, microcode, etc.), and/or embodiments combining both software and hardware aspects. you can Such embodiments may be referred to herein as "circuits," "modules," "devices," "instruments," or "engines." Some aspects of the present application may take the form of a computer program product embodied on one or more non-transitory media having computer-readable program code embodied therein. Such non-transitory media include, for example, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disc read-only memory (CD-ROM). ROM), optical storage, magnetic storage, or any suitable combination thereof. Accordingly, the teachings of the present disclosure are not limited to the implementations illustrated and/or described herein, but rather have broad applicability.

Claims

A method, for a user of a mobile device, of configuring the mobile device for use with environmental noise, comprising:
receiving from the user a case identification of the environmental noise;
receiving from the user a noise level of the environmental noise;
receiving from the user a speech boost level of the environmental noise at the noise level;
receiving from the user a graphic equalizer setting for the ambient noise at the noise level;
playing sample audio of the user from the mobile device while the user sets the dialogue boost level and the graphic equalizer settings;
storing the speech boost level and graphic equalizer settings of the case identification at noise levels in a profile on the mobile device, the device displaying the speech boost level when the profile is selected by the user; and playing the audio media using the graphic equalizer settings;
method including.

simulating the environmental noise at the noise level;
mixing the simulated environmental noise with the sample audio prior to playing the sample audio;
2. The method of claim 1, further comprising:

3. The method of claim 2, wherein the simulating step comprises reading stored pre-recorded environmental noise from memory.

4. The method of claim 3, wherein the stored prerecorded environmental noise is in binaural format.

5. Any of claims 1-4, further comprising presenting, on the mobile device, graphical user interface controls for setting the case identification, the noise level, the speech boost level, and the graphic equalizer settings. described method.

A method according to any preceding claim, wherein said profile corresponds to both said case identification and said noise level.

A method of adjusting mobile device audio for a user, comprising:
receiving profile selections from the user, wherein the profile selections relate to at least environmental noise conditions;
receiving from the user a noise level for the environmental noise condition;
reading speech boost levels and graphic equalizer settings from memory at the mobile device;
adjusting the level of the audio using the dialog boost level and the graphic equalizer settings;
method including.

8. The method of claim 7, further comprising presenting on the mobile device a graphical user interface control for selecting a profile corresponding to environmental noise conditions.

Apparatus configured to perform in software or firmware at least one of the methods of claims 1-8.

A non-transitory computer readable medium that, when read by a computer, instructs the computer to perform at least one of the methods of claims 1-8.

10. The device of Claim 9, wherein the device is a telephone.

10. The device of claim 9, wherein the device is at least one of a cellular phone, laptop computer, tablet computer, mobile gaming system, wearable device, and miniature media player.

13. A device according to any of claims 9, 11 or 12, wherein said software or firmware is part of an operating system of said device.

13. A device according to any of claims 9, 11 or 12, wherein said software or firmware runs a standalone program on said device.

A method according to any preceding claim, wherein the method is executed by an operating system of a mobile device.