JP2018129678A

JP2018129678A - Information processing apparatus, method of using microphone, program to be executed by computer

Info

Publication number: JP2018129678A
Application number: JP2017021534A
Authority: JP
Inventors: 雅春米田; Masaharu Yoneda; 浩造西野; Hirozo Nishino; 遷王; Qian Wang; 欣梅楊; Xinmei Yang
Original assignee: Lenovo Singapore Pte Ltd
Current assignee: Lenovo Singapore Pte Ltd
Priority date: 2017-02-08
Filing date: 2017-02-08
Publication date: 2018-08-16

Abstract

PROBLEM TO BE SOLVED: To provide an easy-to-use information processing apparatus according to a user's environment when using a voice assistant, method of using a microphone, and a program to be executed by a computer.SOLUTION: The information processing apparatus including a plurality of microphones, includes: mode setting means that selects and sets one of a first mode having directivity and a second mode having no directivity as a mode of use of the plurality of microphones on the basis of a state of the information processing apparatus and ambient sound; voice processing means for performing signal processing on sounds input from the plurality of microphones in accordance with the mode set by the mode setting means; and voice assistant means for performing voice assist by voice recognition of the sound processed by the voice processing means.SELECTED DRAWING: Figure 3

Description

本発明は、情報処理装置、そのマイク使用方法、及びコンピュータが実行するためのプログラムに関する。 The present invention relates to an information processing apparatus, a microphone usage method thereof, and a program executed by a computer.

近時、ノートＰＣ、スマートフォン、タブレット等の情報処理装置では、「Ｃｏｒｔａｎａ」、「Ｓｉｒｉ」、「ＯＫＧｏｏｇｌｅ」、「しゃべってコンシェル」等の音声アシスタント機能を使用する人も増加している。かかる音声アシスタントは、情報処理装置がユーザの発話を解釈し、音声で指示された各種操作を実行する機能のことである。音声アシスタントは、一般的に、音声認識や自然言語処理などの技術を駆使してユーザの話した内容を解釈する。 Recently, in information processing apparatuses such as notebook PCs, smartphones, and tablets, an increasing number of people use voice assistant functions such as “Cortana”, “Siri”, “OK Google”, and “Talking Concierge”. Such a voice assistant is a function in which an information processing apparatus interprets a user's utterance and executes various operations instructed by voice. A voice assistant generally interprets what the user has spoken using techniques such as voice recognition and natural language processing.

情報処理装置では、複数のマイクを備えるものが多い。複数のマイクは、ＶＯＩＰや音声アシスタントで使用される頻度が高い。情報処理装置では、複数のマイクを使用する場合に、ビームフォーミング処理を行ってノイズキャンセルを行うものもあり、このビームフォーミングは、デフォルトで有効に設定されている場合が多い。かかるビームフォーミングでは、まず、複数のマイクを利用し、指定角度での各マイクへの音声到達時間のずれを計算して補正することで、指定角度の音声を抽出する。 Many information processing apparatuses include a plurality of microphones. Multiple microphones are frequently used in VOIP and voice assistants. Some information processing apparatuses perform noise cancellation by performing beam forming processing when a plurality of microphones are used, and this beam forming is often enabled by default. In such beam forming, first, a plurality of microphones are used, and a voice at a specified angle is extracted by calculating and correcting a difference in the arrival time of sound at each microphone at the specified angle.

しかしながら、例えば、ユーザは情報処理装置に手が届かない、少し離れた場所にいる場合に音声アシスタントを使用したい場合がある。より具体的には、例えば、ユーザは情報処理装置から離れたところから、音声アシスタントに「今日は傘が必要か？」と質問して、回答を求める場合がある。他方、音声アシスタントは、ユーザが発話する必要があるため、一般に周囲に人が大勢いるところでは、恥ずかしかったり、他人の迷惑になるため、使用されない場合が多い。そのため、音声アシスタントを使用する場合に、ユーザの使用環境に応じて使い勝手の良いシステムが望まれる。 However, for example, there are cases where the user wants to use the voice assistant when the information processing apparatus is out of reach or at a distance. More specifically, for example, the user may ask the voice assistant, “Do you need an umbrella today?” From a location away from the information processing apparatus and ask for an answer. On the other hand, since the voice assistant needs to be uttered by the user, it is often not used because it is embarrassed and annoyed by others in general where there are many people around. Therefore, when using a voice assistant, a user-friendly system is desired according to the use environment of the user.

特開２０１６−４２７０号公報Japanese Patent Laying-Open No. 2014-4270

本発明は、上記に鑑みてなされたものであって、音声アシスタントを使用する場合に、ユーザの使用環境に応じて使い勝手の良い情報処理装置、そのマイク使用方法、及びコンピュータが実行するためのプログラムを提供することを目的とする。 The present invention has been made in view of the above, and when using a voice assistant, an information processing device that is easy to use according to a user's usage environment, a method of using the microphone, and a program executed by a computer The purpose is to provide.

上述した課題を解決し、目的を達成するために、本発明は、複数のマイクを備えた情報処理装置であって、前記情報処理装置の状態及び周囲の音に基づいて、前記複数のマイクの使用モードとして、指向性がある第１のモード及び指向性がない第２のモードの一方を選択して設定するモード設定手段と、前記複数のマイクから入力される音に対して信号処理を行う音声処理手段と、前記音声処理手段で信号処理された音を音声認識して、音声アシストを行う音声アシスタント手段と、を備え、前記複数のモードは指向性がある第１のモードと、指向性がない第２のモードと、を含むことを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention provides an information processing apparatus including a plurality of microphones, the state of the plurality of microphones based on the state of the information processing apparatus and surrounding sounds. Mode setting means for selecting and setting one of the first mode having directivity and the second mode having no directivity as use modes, and signal processing is performed on sound input from the plurality of microphones Voice processing means and voice assistant means for performing voice assist by recognizing the sound signal-processed by the voice processing means, wherein the plurality of modes have a directivity first mode, and directivity And a second mode having no.

また、本発明の好ましい態様によれば、前記モード設定手段は、前記第１のモードが設定されている場合に、前記情報処理装置がアイドル状態となった場合には、前記第２のモードを設定することが望ましい。 Further, according to a preferred aspect of the present invention, the mode setting means sets the second mode when the information processing apparatus is in an idle state when the first mode is set. It is desirable to set.

また、本発明の好ましい態様によれば、前記モード設定手段は、前記第２のモードが設定されている場合に、周囲の音源が複数の場合は、前記第１のモードを設定することが望ましい。 Further, according to a preferred aspect of the present invention, it is desirable that the mode setting means sets the first mode when the second mode is set and there are a plurality of surrounding sound sources. .

また、本発明の好ましい態様によれば、前記モード設定手段は、前記第２のモードが設定されている場合に、周囲の音源が１つの場合は、スピーカの音量を当該音源よりも大きく設定し、前記音声アシスタント手段は、ユーザからの発話コマンドの音声入力を待つことが望ましい。 According to a preferred aspect of the present invention, when the second mode is set and the surrounding sound source is one, the mode setting means sets the speaker volume to be larger than the sound source. The voice assistant means preferably waits for the voice input of the utterance command from the user.

また、本発明の好ましい態様によれば、前記モード設定手段は、前記第２のモードが設定されている場合に、周囲がサイレントな場合は、当該第２のモードを維持し、前記音声アシスタント手段は、ユーザからの発話コマンドの音声入力を待つことが望ましい。 According to a preferred aspect of the present invention, when the second mode is set and the surroundings are silent, the mode setting means maintains the second mode, and the voice assistant means It is desirable to wait for the voice input of the utterance command from the user.

また、本発明の好ましい態様によれば、前記第１のモードは、前記複数のマイクから入力される音のうち、前記情報処理装置に対して正面方向の音のみを抽出するビームフォーミング処理を行うモードであることが望ましい。 According to a preferred aspect of the present invention, in the first mode, beam forming processing is performed to extract only sound in the front direction from the information inputted from the plurality of microphones. The mode is desirable.

また、本発明の好ましい態様によれば、前記第２のモードは、前記第１のモードよりも前記複数のマイクの感度を高く設定し、前記情報処理装置に対して全方向の音を広範囲に集音するモードであることが望ましい。 Further, according to a preferred aspect of the present invention, in the second mode, the sensitivity of the plurality of microphones is set higher than that in the first mode, and the sound in all directions is spread over a wide range with respect to the information processing apparatus. It is desirable to be in a sound collecting mode.

また、本発明の好ましい態様によれば、前記第２のモードは、前記情報処理装置に対して全方向の音を集音して、集音した音のうち最も大きい音を抽出する処理を行うモードであることが望ましい。 According to a preferred aspect of the present invention, in the second mode, the information processing apparatus collects sounds in all directions and performs a process of extracting the loudest sound among the collected sounds. The mode is desirable.

また、本発明の好ましい態様によれば、前記モード設定手段は、デフォルトで前記第１のモードを設定することが望ましい。 Moreover, according to a preferable aspect of the present invention, it is desirable that the mode setting means sets the first mode by default.

また、本発明の好ましい態様によれば、前記情報処理装置は、ノート型ＰＣであることが望ましい。 According to a preferred aspect of the present invention, it is desirable that the information processing apparatus is a notebook PC.

また、上述した課題を解決し、目的を達成するために、本発明は、複数のマイクを備えた情報処理装置のマイク使用方法であって、前記情報処理装置の状態及び周囲の音に基づいて、前記複数のマイクの使用モードとして、指向性がある第１のモード及び指向性がない第２のモードの一方を選択して設定するモード設定工程と、前記モード設定工程で設定されたモードに従って、前記複数のマイクから入力される音に対して信号処理を行う音声処理工程と、前記音声処理工程で信号処理された音を音声認識して、音声アシストを行う音声アシスタント工程と、を含むことを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention provides a method of using a microphone of an information processing apparatus including a plurality of microphones, based on the state of the information processing apparatus and surrounding sounds. According to a mode setting step of selecting and setting one of the first mode having directivity and the second mode having no directivity as the use modes of the plurality of microphones, and the mode set in the mode setting step A voice processing step for performing signal processing on sound input from the plurality of microphones; and a voice assistant step for performing voice assist by recognizing the sound signal-processed in the voice processing step. It is characterized by.

また、上述した課題を解決し、目的を達成するために、本発明は、複数のマイクを備えた情報処理装置に搭載されるプログラムであって、前記情報処理装置の状態及び周囲の音に基づいて、前記複数のマイクの使用モードとして、指向性がある第１のモード及び指向性がない第２のモードの一方を選択して設定するモード設定工程と、前記モード設定工程で設定されたモードに従って、前記複数のマイクから入力される音に対して信号処理を行う音声処理工程と、前記音声処理工程で信号処理された音を音声認識して、音声アシストを行う音声アシスタント工程と、をコンピュータに実行させることを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention is a program installed in an information processing apparatus including a plurality of microphones, and is based on the state of the information processing apparatus and surrounding sounds. The mode setting step of selecting and setting one of the first mode having directivity and the second mode having no directivity as the use modes of the plurality of microphones, and the mode set in the mode setting step And a voice processing step for performing signal processing on sounds input from the plurality of microphones, and a voice assistant step for performing voice assist by recognizing the sound processed in the voice processing step and performing voice assist. It is made to perform.

本発明によれば、音声アシスタントを使用する場合に、ユーザの使用環境に応じて使い勝手のよい情報処理装置を提供することが可能になるという効果を奏する。 According to the present invention, when a voice assistant is used, there is an effect that it is possible to provide an information processing apparatus that is easy to use according to the use environment of the user.

図１は、本発明に係る情報処理装置を適用したノートＰＣの概略の外観図である。FIG. 1 is a schematic external view of a notebook PC to which an information processing apparatus according to the present invention is applied. 図２は、図１のノートＰＣの概略のハードウェア構成例を示す図である。FIG. 2 is a diagram illustrating a schematic hardware configuration example of the notebook PC of FIG. 図３は、図２のノートＰＣの音声入力・出力に関連する概略の機能構成図である。FIG. 3 is a schematic functional configuration diagram related to voice input / output of the notebook PC of FIG. 図４は、マイクの使用モードを説明するための説明図である。FIG. 4 is an explanatory diagram for explaining a use mode of the microphone. 図５は、ノートＰＣの状態及び周囲の音に応じて、マイクの使用モードを切り替える処理の一例を説明するためのフローチャートである。FIG. 5 is a flowchart for explaining an example of processing for switching the use mode of the microphone according to the state of the notebook PC and surrounding sounds.

以下、本実施の形態に係る情報処理装置、そのマイク使用方法、およびコンピュータが実行するためのプログラムを適用したコンピュータシステムの実施の形態について説明する。本発明の構成要素は、本明細書の図面に一般に示してあるが、様々な構成で広く多様に配置し設計してもよいことは容易に理解できる。したがって、本発明の装置、方法、およびプログラムの実施の形態についての以下のより詳細な説明は、特許請求の範囲に示す本発明の範囲を限定するものではなく、単に本発明の選択した実施の形態の一例を示すものであって、本明細書の特許請求の範囲に示す本発明と矛盾無く装置、システムおよび方法についての選択した実施の形態を単に示すものである。当業者は、特定の細目の１つ以上が無くても、または他の方法、部品、材料でも本発明を実現できることが理解できる。 Hereinafter, an embodiment of a computer system to which an information processing apparatus according to the present embodiment, a method of using the microphone, and a program to be executed by a computer are applied will be described. Although the components of the present invention are generally illustrated in the drawings herein, it can be readily understood that they may be arranged and designed in a wide variety of configurations with various configurations. Accordingly, the following more detailed description of the apparatus, method and program embodiments of the present invention is not intended to limit the scope of the invention as set forth in the appended claims, but merely to implement selected embodiments of the invention. It is intended as an example only, and is merely illustrative of selected embodiments of an apparatus, system and method consistent with the present invention as set forth in the claims herein. Those skilled in the art will appreciate that the present invention may be practiced without one or more of the specific details or with other methods, components, or materials.

（実施の形態１）
図１は、本発明に係る情報処理装置を適用したノートＰＣ１の概略の外観図である。ノートＰＣ１は、同図に示すように、いずれも略直方体である本体側筐体２およびディスプレイ側筐体３を備える。本体側筐体２は、キーボードおよびタッチパッド等を有する入力部４と、左右のスピーカ６ａ、６ｂとを備える。ディスプレイ側筐体３は、ＬＣＤ（液晶ディスプレイ）７と、ＬＣＤ７の表示面側にその上方の略中央に配置され、前方の被写体を撮像可能なカメラ８と、カメラ８を挟んでその両側に配置される左右の複数のマイク５ａ、５ｂとを備える。なお、マイクの数を２つとしているが、３つ以上としてもよい。 (Embodiment 1)
FIG. 1 is a schematic external view of a notebook PC 1 to which an information processing apparatus according to the present invention is applied. As shown in the figure, the notebook PC 1 includes a main body side housing 2 and a display side housing 3 which are substantially rectangular parallelepipeds. The main body side housing 2 includes an input unit 4 having a keyboard and a touchpad, and left and right speakers 6a and 6b. The display-side housing 3 is arranged on the LCD (Liquid Crystal Display) 7, on the display surface side of the LCD 7, approximately at the upper center thereof, and on the both sides of the camera 8, with the camera 8 capable of imaging a front subject. Left and right microphones 5a and 5b. Although the number of microphones is two, it may be three or more.

本体側筐体２およびディスプレイ側筐体３は、それぞれの端部で左右の一対の連結部（ヒンジ部）９ａ、９ｂによって連結されており、連結部９ａ、９ｂは、これらの筐体を開閉自在に支持している。 The main body side housing 2 and the display side housing 3 are connected to each other by a pair of left and right connecting portions (hinge portions) 9a and 9b at the respective ends, and the connecting portions 9a and 9b open and close these housings. Supports freely.

図２は、図１のノートＰＣ１の概略のハードウェア構成例を示す図である。ノートＰＣ１は、同図に示すように、ＣＰＵ１１、ＲＯＭ１２、メモリ１３、ストレージ１４、ＬＣＤ７、入力部４、カメラデバイス１５、オーディオデバイス１７、通信デバイス１９、バッテリ２１、ＤＣ−ＤＣコンバータ２２，ＡＣアダプタ２３を備えており、各部はバスを介して直接または間接的に接続されている。 FIG. 2 is a diagram illustrating a schematic hardware configuration example of the notebook PC 1 of FIG. As shown in the figure, the notebook PC 1 includes a CPU 11, a ROM 12, a memory 13, a storage 14, an LCD 7, an input unit 4, a camera device 15, an audio device 17, a communication device 19, a battery 21, a DC-DC converter 22, and an AC adapter. 23, each part is connected directly or indirectly via a bus.

ＣＰＵ１１は、バスを介して接続されたストレージ１４に格納されたＯＳ３０によりノートＰＣ１全体の制御を行うとともに、ストレージ１４に格納された各種のプログラムに基づいて処理を実行する機能を司る。ＲＯＭ１２は、ＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ：基本入出力システム）１２ａやデータ等を格納している。 The CPU 11 controls the entire notebook PC 1 by the OS 30 stored in the storage 14 connected via the bus, and manages the function of executing processing based on various programs stored in the storage 14. The ROM 12 stores a BIOS (Basic Input / Output System: basic input / output system) 12a, data, and the like.

メモリ１３は、キャッシュメモリやＲＡＭで構成されており、ＣＰＵ１１の実行プログラムの読み込み領域として、実行プログラムの処理データを書き込む作業領域として利用される書き込み可能メモリである。 The memory 13 is composed of a cache memory and a RAM, and is a writable memory used as a work area for writing processing data of the execution program as a read area for the execution program of the CPU 11.

ストレージ１４は、例えば、ＨＤＤ（ハードディスク）やＳＤＤ等の不揮発性の記憶装置で構成されており、例えば、Ｗｉｎｄｏｗｓ（登録商標）ＸＰ、Ｖｉｓｔａ、７、８、８．１、１０等のノートＰＣ１全体の制御を行うためのＯＳ３０と、オーディオドライバ３１ａを含む、周辺機器類をハードウェア操作するための各種ドライバ３１と、音声アシスタントアプリケーション・プログラム（以下、「アプリケーション・プログラム」を「アプリ」と称する）３２と、ＶＯＩＰアプリ３３と、ブラウザやメールアプリ等を含む他のアプリ３４等を記憶する機能を有する。 The storage 14 is configured by a non-volatile storage device such as an HDD (hard disk) or SDD, for example. For example, the entire notebook PC 1 such as Windows (registered trademark) XP, Vista, 7, 8, 8.1, 10 or the like. OS 30 for performing control, various drivers 31 for operating peripheral devices including an audio driver 31a, and voice assistant application program (hereinafter, “application program” is referred to as “application”). 32, a VOIP application 33, and other applications 34 including a browser, a mail application, and the like.

ＬＣＤ７は、ＣＰＵ１１の制御に従って、表示情報をビデオ信号に変換し、変換したビデオ信号に応じた各種情報を表示画面に表示する。 The LCD 7 converts display information into a video signal under the control of the CPU 11 and displays various information corresponding to the converted video signal on the display screen.

なお、本実施の形態では、ディスプレイとしてＬＣＤを使用することにしているが、本発明はこれに限られるものではなく、有機ＥＬディスプレイやＣＲＴ等の他のディスプレイを使用することにしてもよい。 In this embodiment, an LCD is used as a display. However, the present invention is not limited to this, and another display such as an organic EL display or a CRT may be used.

入力部４は、ユーザが入力操作を行うためのユーザインターフェースであり、文字、コマンド等を入力する各種キーより構成されるキーボードや、画面上のカーソルを移動させたり、各種メニューを選択するタッチパッド等を備えている。 The input unit 4 is a user interface for a user to perform an input operation, and includes a keyboard composed of various keys for inputting characters, commands, etc., a touch pad for moving a cursor on the screen, and selecting various menus. Etc.

カメラデバイス１５は、カメラ８と、カメラ処理回路１６とを備えている。カメラ８は、レンズや撮像部（ＣＣＤやＣＭＯＳ）を備えており、レンズは被写体光を結像し、撮像部は結像された被写体光をＲ，Ｇ，Ｂの画像信号として出力する。カメラ処理回路１６は、Ａ／Ｄ変換器、画像処理用ＬＳＩ、メモリ等を備え、撮像部の駆動タイミングや露出制御等を行うと共に、撮像部で得られたＲＧＢの画像信号をデータ処理（Ａ／Ｄ変換等）して、ＣＰＵ１１に出力する。 The camera device 15 includes a camera 8 and a camera processing circuit 16. The camera 8 includes a lens and an imaging unit (CCD or CMOS). The lens forms subject light, and the imaging unit outputs the imaged subject light as R, G, and B image signals. The camera processing circuit 16 includes an A / D converter, an image processing LSI, a memory, and the like. The camera processing circuit 16 performs drive timing and exposure control of the imaging unit, and performs data processing (A / D conversion, etc.) and output to the CPU 11.

オーディオデバイス１７は、マイク５ａ、５ｂと、スピーカ６ａ、６ｂと、音声処理回路１８とを備えている。マイク５ａ、５ｂは、音声を集音して音声データを音声処理回路１８に出力する。スピーカ６ａ、６ｂは、音声処理回路１８から出力される音声データに応じた音声を出力する。音声処理回路１８は、Ａ／Ｄ変換器と、Ｄ／Ａ変換器と、アンプと、各種フィルタ等を含む音声処理用ＬＳＩ及びメモリ等を備えており、マイク５ａ、５ｂから入力される音声をＡ／Ｄ変換した後に音声処理し、音声処理後の音声データ（デジタルデータ）をＣＰＵ１１に出力したり、ＣＰＵ１１から入力される音声データ（デジタル）を、Ａ／Ｄ変換した後に音声処理し、音声処理後の音声データをＤ／Ａ変換して、スピーカ６ａ、６ｂから出力させる。 The audio device 17 includes microphones 5a and 5b, speakers 6a and 6b, and an audio processing circuit 18. The microphones 5 a and 5 b collect sound and output sound data to the sound processing circuit 18. The speakers 6 a and 6 b output sound corresponding to the sound data output from the sound processing circuit 18. The audio processing circuit 18 includes an A / D converter, a D / A converter, an amplifier, an audio processing LSI including various filters, a memory, and the like, and receives audio input from the microphones 5a and 5b. Audio processing is performed after A / D conversion, and audio data (digital data) after audio processing is output to the CPU 11, or audio data (digital) input from the CPU 11 is audio processed after A / D conversion, and audio The processed audio data is D / A converted and output from the speakers 6a and 6b.

通信デバイス１９は、ネットワークを介してデータの送受信を行うためのものであり、画像データおよび音声データをネットワークに送信し、また、ネットワークを介して送信されてくる画像データおよび音声データを受信する。 The communication device 19 is for transmitting and receiving data via a network, and transmits image data and audio data to the network, and receives image data and audio data transmitted via the network.

ＡＣアダプタ２３は、商用電源に接続して、ＡＣ電圧をＤＣ電圧に変換してＤＣ−ＤＣコンバータ２２に出力する。ＤＣ−ＤＣコンバータ２２は、ＡＣアダプタ２３から供給されるＤＣ電圧を所定の電圧に変換して各部に電力を供給し、また、バッテリ２１の充電を行う。バッテリ２１は、ＤＣ−ＤＣコンバータ２２により充電され、充電した電圧を各部に供給する。バッテリ２１は、ＡＣアダプタ２３が商用電源に接続されていない場合に使用される。 The AC adapter 23 is connected to a commercial power supply, converts an AC voltage into a DC voltage, and outputs the DC voltage to the DC-DC converter 22. The DC-DC converter 22 converts the DC voltage supplied from the AC adapter 23 into a predetermined voltage, supplies power to each unit, and charges the battery 21. The battery 21 is charged by the DC-DC converter 22 and supplies the charged voltage to each unit. The battery 21 is used when the AC adapter 23 is not connected to a commercial power source.

図３は、図２のノートＰＣ１の音声の入力・出力に関連する概略の機能構成図である。図４は、プライベートモード（ビームフォーミング）、会議モード（ＦａｒＦｉｅｌｄＰｉｃｋｕｐ）、及びマルチアングルモードを説明するための説明図である。 FIG. 3 is a schematic functional configuration diagram related to audio input / output of the notebook PC 1 of FIG. FIG. 4 is an explanatory diagram for explaining a private mode (beam forming), a conference mode (Far Field Pick up), and a multi-angle mode.

図３において、ストレージ１４にインストールされたＯＳ３０、オーディオドライバ３１ａを含むドライバ３１、音声アシスタントアプリ３２，ＶＯＩＰアプリ３３（Ｓｋｙｐｅ（登録商標），Ｗｉｎｄｏｗｓ（登録商標）Ｌｉｖｅｍｅｓｓｅｎｇｅｒ）、その他のアプリ３４は、メモリ１３に読み込まれ、ＣＰＵ１１によって実行される。各アプリ及びドライバ間のデータ又はコマンドの送受信には、ＯＳ３０が介在する。 In FIG. 3, the OS 30 installed in the storage 14, the driver 31 including the audio driver 31a, the voice assistant application 32, the VOIP application 33 (Skype (registered trademark), Windows (registered trademark) Live messenger), and other applications 34 are as follows: It is read into the memory 13 and executed by the CPU 11. The OS 30 intervenes in transmission / reception of data or commands between each application and driver.

ＯＳ３０は、ノートＰＣ１の基本的な動作を制御しているものであり、各種資源を管理し、例えば、アプリケーション・プログラムが発生した命令を、ドライバ３１やＢＩＯＳ１２ａに伝える。ＯＳ３０は、マルチタスク機能およびマルチウィンドウ機能を有し、アプリケーション・プログラムの実行コンテキスト（あるアプリケーション・プログラムが利用しているレジスタセットやメインメモリイメージ、ファイルハンドルなど）やＧＵＩの部品などのソフトウェア資源の管理も行うようになされている。ＯＳ３０は、ノートＰＣ１の低消費電力制御を行っており、ノートＰＣ１が通常状態からアイドル状態（所定期間ユーザ操作が行われない場合）になった場合は、スタンバイ（スリープ）又は休止状態に移行させ、スタンバイ（スリープ）又は休止状態でユーザ操作が行われた場合は通常状態に復帰させる。 The OS 30 controls basic operations of the notebook PC 1, manages various resources, and transmits, for example, instructions generated by application programs to the driver 31 and the BIOS 12a. The OS 30 has a multi-task function and a multi-window function, and is an application program execution context (a register set used by a certain application program, a main memory image, a file handle, etc.) and a software resource such as a GUI component. Management is also done. The OS 30 performs low power consumption control of the notebook PC 1, and when the notebook PC 1 enters an idle state from a normal state (when no user operation is performed for a predetermined period), the OS 30 shifts to a standby (sleep) or hibernation state. When a user operation is performed in the standby (sleep) or hibernation state, the normal state is restored.

オーディオドライバ３１ａは、マイク５ａ、５ｂの使用モードを設定するモード設定手段として機能し、ＯＳ３０の指示に従ってオーディオデバイス１７を制御する。オーディオドライバ３１ａは、ノートＰＣ１の状態及び周囲の音に基づいて、マイク５ａ、５ｂの使用モードとして、指向性のある第１のモード及び指向性のない第２のモードの一方を選択してオーディオデバイス１７に設定する。第１のモードは、例えば、プライベートモード（ビームフォーミングモード）である。第２のモードは、例えば、会議モード（ＦａｒＦｉｅｌｄＰｉｃｋｕｐ）と、マルチアングルモードである。 The audio driver 31a functions as a mode setting unit that sets the use mode of the microphones 5a and 5b, and controls the audio device 17 in accordance with an instruction from the OS 30. The audio driver 31a selects one of the first mode having directivity and the second mode having no directivity as the use mode of the microphones 5a and 5b based on the state of the notebook PC 1 and surrounding sounds. Set to device 17. The first mode is, for example, a private mode (beam forming mode). The second mode is, for example, a conference mode (Far Field Pick up) and a multi-angle mode.

プライベートモードは、図４（Ａ）に示すように、マイク５ａ、５ｂの指向性を高くしたモードであり、マイク５ａ、５ｂで入力される音のうち、ノートＰＣ１（マイク５ａ、５ｂ）に対して正面方向の音声のみを抽出するビームフォーミング処理を行うためのモードである。プライベートモードは、ユーザがノートＰＣ１の正面に座って音声アシスタントアプリ３２やＶＯＩＰアプリ３３を使用する場合に適している。 As shown in FIG. 4A, the private mode is a mode in which the directivities of the microphones 5a and 5b are increased. Of the sounds input from the microphones 5a and 5b, the private mode is used for the notebook PC 1 (the microphones 5a and 5b). This is a mode for performing beam forming processing for extracting only the sound in the front direction. The private mode is suitable when the user uses the voice assistant application 32 or the VOIP application 33 while sitting in front of the notebook PC 1.

マルチアングルモードは、ビームフォーミング処理を行わずに、図４（Ｂ）に示すように、ノートＰＣ１に対して全方向の音を集音して、集音した音のうち最も大きい音を抽出する処理を行うモードである。マルチアングルモードは、例えば、ユーザが少し離れた場所から音声アシスタントアプリ３２を使用する場合に適しており、ユーザがノートＰＣ１の正面に居なくてもユーザの音声認識を行うことが可能である。 In the multi-angle mode, as shown in FIG. 4 (B), the sound in all directions is collected with respect to the notebook PC 1 and the loudest sound is extracted from the collected sounds without performing beam forming processing. This is the mode for processing. The multi-angle mode is suitable, for example, when the user uses the voice assistant application 32 from a slightly distant place, and can recognize the user's voice even if the user is not in front of the notebook PC 1.

会議モードは、ビームフォーミング処理を行わずに、図４（Ｃ）に示すように、プライベートモード及びマルチアングルモードよりもマイク５ａ、５ｂの感度を高く設定し、ノートＰＣ１に対して全方向の音を広範囲に集音するモードである。会議モードは、ユーザが離れた場所から音声アシスタントアプリ３２を使用する場合でも好適に使用することができ、ユーザがノートＰＣ１の正面に居なくてもユーザの音声認識を行うことが可能である。 In the conference mode, as shown in FIG. 4C, the sensitivity of the microphones 5a and 5b is set higher than that in the private mode and the multi-angle mode without performing beam forming processing, and sound in all directions with respect to the notebook PC 1 is set. Is a mode to collect sound in a wide range. The conference mode can be preferably used even when the user uses the voice assistant application 32 from a remote location, and the user can recognize the voice even when the user is not in front of the notebook PC 1.

音声アシスタントアプリ３２は、ＯＳ３０上で実行されるアプリであり、マイク５ａ、５ｂ及び音声処理回路１８を介して入力される音声データを音声認識してユーザの発話内容を解釈し（発話内容に含まれる発話コマンドを抽出して発話内容を解釈する）、音声で指示された各種操作を、他のアプリ３４等に指示して実行するためのものである（例えば、ブラウザアプリに検索させてその検索結果を音声案内したり、メールアプリにメールを送信させたりする）。音声アシスタントアプリ３２は、ユーザの所定の起動発話コマンド（例えば、ハロー、〇〇）の音声入力、ＬＣＤ７に表示される音声アシスタントアプリ３２のアイコン（不図示）の押下等で起動させることができる。なお、音声アシスタントアプリ３２は、クライアント−サーバシステムで構成してもよく、例えば、音声アシスタントアプリ３２は、音声データをサーバに送出し、サーバがユーザの発話内容の解釈等を行ってその結果を音声アシスタントアプリ３２に返信してもよい。 The voice assistant application 32 is an application executed on the OS 30, and recognizes voice data input through the microphones 5a and 5b and the voice processing circuit 18 to interpret the user's utterance content (included in the utterance content). Utterance commands are extracted and the utterance contents are interpreted), and various operations instructed by voice are instructed to be executed by another application 34 or the like (for example, the browser application is searched for the search) Voice the results or send an email to the email app). The voice assistant application 32 can be activated by voice input of a user's predetermined activation utterance command (for example, hello, OO), pressing of an icon (not shown) of the voice assistant application 32 displayed on the LCD 7, or the like. The voice assistant application 32 may be configured by a client-server system. For example, the voice assistant application 32 sends voice data to the server, and the server interprets the content of the user's utterance and the result. You may reply to the voice assistant application 32.

ＶＯＩＰアプリ３３は、ＯＳ３０上で実行されるアプリであり、相手方端末と画像と音声で通話を行うためのものである。また、ＶＯＩＰアプリ３３は、ＯＳ３０を介して、通信デバイス１９に相手方端末とリンクを確立させ、カメラ８で撮影した画像やマイク５ａ、５ｂで集音した音声を送信させたり、相手方端末から送出されてくる画像や音声をＬＣＤ７への表示・スピーカ６ａ、６ｂからの出力を行わせる。 The VOIP application 33 is an application executed on the OS 30 and is used for making a call with an opponent terminal using an image and voice. In addition, the VOIP application 33 causes the communication device 19 to establish a link with the counterpart terminal via the OS 30 to transmit an image captured by the camera 8 or sound collected by the microphones 5a and 5b, or to be transmitted from the counterpart terminal. The incoming image and sound are displayed on the LCD 7 and output from the speakers 6a and 6b.

オーディオデバイス１７の音声処理回路１８は、プライベートモードが設定されている場合には、マイク５ａ、５ｂから入力される音のうち正面方向（所定方向）の音のみを抽出するビームフォーミング処理を行う。また、音声処理回路１８は、会議モードが設定されている場合には、ビームフォーミング処理を行わず、マイク５ａ、５ｂの感度を高く設定して（アンプのゲインを高く設定して）、遠くの音まで集音する処理を行う。また、オーディオデバイス１７は、マルチアングルモードが設定されている場合には、ビームフォーミング処理を行わずに、マイク５ａ、５ｂから入力される音のうち最も大きい音を抽出する処理を行う。 When the private mode is set, the audio processing circuit 18 of the audio device 17 performs beam forming processing that extracts only the sound in the front direction (predetermined direction) from the sounds input from the microphones 5a and 5b. Further, when the conference mode is set, the audio processing circuit 18 does not perform the beam forming process, sets the sensitivity of the microphones 5a and 5b high (sets the gain of the amplifier high), and moves far away. Performs processing to collect sound. Further, when the multi-angle mode is set, the audio device 17 performs a process of extracting the loudest sound from the sounds input from the microphones 5a and 5b without performing the beam forming process.

図５は、ノートＰＣ１の状態及び周囲の音に応じて、マイク５ａ、５ｂの使用モードを切り替える処理の一例を説明するためのフローチャートである。以下の説明では、上記第２のモードとして、会議モードを使用する場合について説明するが、マルチアングルモードを使用することにしてもよい。 FIG. 5 is a flowchart for explaining an example of processing for switching the use mode of the microphones 5a and 5b in accordance with the state of the notebook PC 1 and surrounding sounds. In the following description, the case where the conference mode is used as the second mode will be described, but the multi-angle mode may be used.

図５において、まず、ノートＰＣ１の電源が投入されると、オーディオドライバ３１ａは、プライベートモードを設定する（ステップＳ１）。すなわち、オーディオドライバ３１ａは、デフォルトではプライベートモードを設定する。ユーザは、通常、ノートＰＣ１を使用する場合は正面に座って使用する場合が多いため、デフォルトでは、プライベートモードを設定して、マイク５ａ、５ｂ（オーディオデバイス１７）でノートＰＣ１の正面からの音声を集音するのが望ましい。 In FIG. 5, first, when the notebook PC 1 is powered on, the audio driver 31a sets the private mode (step S1). That is, the audio driver 31a sets the private mode by default. Since the user usually uses the notebook PC 1 while sitting in front of the user, by default, the private mode is set and voices from the front of the notebook PC 1 are set with the microphones 5a and 5b (audio device 17). It is desirable to collect sound.

次に、オーディオドライバ３１ａは、ノートＰＣ１がアイドル状態（所定期間、ユーザ操作が行われない状態）であるか否かを判断する（ステップＳ２）。上述したように、ＯＳ３０は、ノートＰＣ１がアイドル状態の場合には、スタンバイ（スリープ）又は休止状態に移行して、消費電力を低減させる。ＯＳ３０は、スタンバイ（スリープ）又は休止状態に移行した場合でもオーディオデバイス１７に電力を供給して音声入力を可能な状態として、音声アシスタントアプリ３２を起動可能な状態としてもよい。 Next, the audio driver 31a determines whether or not the notebook PC 1 is in an idle state (a state in which no user operation is performed for a predetermined period) (step S2). As described above, when the notebook PC 1 is in an idle state, the OS 30 shifts to a standby (sleep) or hibernation state to reduce power consumption. The OS 30 may be in a state in which the voice assistant application 32 can be activated by supplying power to the audio device 17 to enable voice input even when the OS 30 enters the standby (sleep) or hibernation state.

アイドル状態である場合には（ステップＳ２の「Ｙｅｓ」）、オーディオドライバ３１ａは、会議モードを設定する（ステップＳ３）。アイドル状態の場合には、ユーザが離席していることが想定されるため、会議モードを設定して、マイク５ａ、５ｂ（オーディオデバイス１７）でノートＰＣ１の全方向からの音声を集音するのが望ましい。 If the audio driver 31a is in the idle state (“Yes” in step S2), the audio driver 31a sets the conference mode (step S3). In the idle state, since it is assumed that the user is away from the desk, the conference mode is set, and sounds from all directions of the notebook PC 1 are collected by the microphones 5a and 5b (audio device 17). Is desirable.

次に、オーディオドライバ３１ａは、オーディオデバイス１７を介して、周囲の音をチエックする（ステップＳ４）。ノイズ源が複数ある場合は、ステップＳ１に戻り、プライベートモードを設定する。これは、ノイズ源が複数ある場合（周囲が騒がしい場合）には、会議モードではユーザの音声を誤認識する虞があるため、プライベートモードに戻して、ユーザの音声の誤認識を防止するためである。 Next, the audio driver 31a checks surrounding sounds via the audio device 17 (step S4). If there are a plurality of noise sources, the process returns to step S1 to set the private mode. This is because when there are multiple noise sources (when the surroundings are noisy), there is a risk of misrecognizing the user's voice in the conference mode. is there.

ステップＳ４において、ノイズ源が１つだけの場合（例えば、テレビ、ステレオ等）は、オーディオドライバ３１ａは、会議モードを維持して、スピーカ６ａ、６ｂの音量をノイズ源の音量よりも大きく設定して（ステップＳ５）、ステップＳ６に移行する。これは、音声アシスタントアプリ３２からの音声案内をユーザが聞き易くするためである。ステップＳ４において、ノイズ源がない（サイレント）場合には、ステップＳ６に移行する。 In step S4, when there is only one noise source (for example, TV, stereo, etc.), the audio driver 31a maintains the conference mode and sets the volume of the speakers 6a and 6b to be larger than the volume of the noise source. (Step S5), the process proceeds to step S6. This is to make it easier for the user to hear the voice guidance from the voice assistant application 32. In step S4, when there is no noise source (silent), the process proceeds to step S6.

ステップＳ６では、音声アシスタントアプリ３２は、ユーザの発話コマンドの音声入力を待つ（ステップＳ６）。例えば、音声アシスタントアプリ３２は、ユーザの所定の起動発話コマンド（例えば、ハロー、〇〇）が入力された場合に、動作を開始して、以降のユーザの発話内容を解釈し、音声で指示された操作内容を実行してもよい。これにより、ユーザはノートＰＣ１から離れた場所で音声アシスタントアプリ３２を使用する場合に、ノートＰＣ１の正面に居なくても音声を誤認識することなく好適に使用することができる。付言すると、ノイズ源が１つ又はサイレントの場合は、周囲に人が大勢いる環境ではないため、ユーザは恥ずかしがることなく、また、他人に迷惑をかけることなく、ノートＰＣ１から離れたところから音声アシスタントアプリ３２を使用することができる。また、会議モードでは、ノイズ源がない場合には、ユーザの音声を誤認識することがなく、また、ノイズ源が１つの場合もユーザがノイズ源よりも大きい声で発話すれば音声の誤認識を防止することができる。 In step S6, the voice assistant application 32 waits for the voice input of the user's utterance command (step S6). For example, the voice assistant application 32 starts an operation when a user's predetermined activation utterance command (for example, hello, OO) is input, interprets the content of the subsequent user's utterance, and is instructed by voice. The operation content may be executed. Thereby, when the user uses the voice assistant application 32 at a location away from the notebook PC 1, the user can preferably use the voice assistant application 32 without erroneously recognizing the voice even if the user is not in front of the notebook PC 1. In addition, when there is one noise source or silent, it is not an environment where there are many people in the surroundings, so the user is not shy and does not bother others, and from a distance from the notebook PC 1 The voice assistant application 32 can be used. In the conference mode, when there is no noise source, the user's voice is not erroneously recognized, and even when there is only one noise source, if the user speaks with a voice higher than the noise source, the voice is erroneously recognized. Can be prevented.

次に、オーディオドライバ３１ａは、ノートＰＣ１がアイドル状態であるか否かを判断する（ステップＳ７）。アイドル状態である場合には（ステップＳ７の「Ｙｅｓ」）、ステップＳ４に戻る。アイドル状態でない場合には（ステップＳ７の「Ｎｏ」）、すなわち、ノートＰＣ１が操作されると、ステップＳ１に戻り、プライベートモードに戻す。 Next, the audio driver 31a determines whether or not the notebook PC 1 is in an idle state (step S7). If it is in the idle state (“Yes” in step S7), the process returns to step S4. If it is not in the idle state (“No” in step S7), that is, if the notebook PC 1 is operated, the process returns to step S1 and returns to the private mode.

以上説明したように、本実施の形態によれば、ノートＰＣ１の状態及び周囲の音に基づいて、複数のマイク５ａ、５ｂの使用モードとして、指向性がある第１のモード及び指向性がない第２のモードの一方を選択して設定するオーディオドライバ３１ａと、オーディオドライバ３１ａで設定されたモードに従って、複数のマイク５ａ，５ｂから入力される音に対して信号処理を行うオーディオデバイス１７と、オーディオデバイス１７で信号処理された音を音声認識して、音声アシストを行う音声アシスタントアプリ３２と、を備えているので、音声アシスタントを使用する場合に、ユーザの使用環境に応じて使い勝手のよい情報処理装置を提供することが可能になる。 As described above, according to the present embodiment, based on the state of the notebook PC 1 and surrounding sounds, the first mode having directivity and the directivity are not used as the use modes of the plurality of microphones 5a and 5b. An audio driver 31a that selects and sets one of the second modes, an audio device 17 that performs signal processing on sounds input from the plurality of microphones 5a and 5b according to the mode set by the audio driver 31a, And a voice assistant application 32 that performs voice assist by recognizing the sound signal-processed by the audio device 17. Therefore, when the voice assistant is used, information that is easy to use according to the use environment of the user. A processing device can be provided.

また、第１のモードは、複数のマイク５ａ、５ｂから入力される音のうち、ノートＰＣ１に対して正面方向の音のみを抽出するビームフォーミング処理を行うプライベートモードであることとしたので、ノートＰＣ１の正面に座って音声アシスタントアプリ３２を使用する場合に、好適に音声認識を行うことが可能となる。 In addition, since the first mode is a private mode in which beam forming processing for extracting only the sound in the front direction from the sound input from the plurality of microphones 5a and 5b is performed with respect to the notebook PC 1, When the voice assistant application 32 is used while sitting in front of the PC 1, it is possible to perform voice recognition suitably.

また、第２のモードは、第１のモードよりも複数のマイク５ａ，５ｂの感度を高く設定し、ノートＰＣ１に対して全方向の音を広範囲に集音する会議モードであることとしたので、ユーザがノートＰＣ１から離れた場所で音声アシスタントアプリ３２を使用する場合でも、ノートＰＣ１の正面に居なくても好適に使用することが可能となる。 In addition, the second mode is a conference mode in which the sensitivity of the plurality of microphones 5a and 5b is set higher than that in the first mode, and the sound is collected in a wide range with respect to the notebook PC 1 in a wide range. Even when the user uses the voice assistant application 32 at a location away from the notebook PC 1, the user can use the voice assistant application 32 without being in front of the notebook PC 1.

また、第２のモードは、ノートＰＣに対して全方向の音を集音して、集音した音のうち最も大きい音を抽出する処理を行うマルチアングルモードであることとしたので、ユーザがノートＰＣ１から少し離れた場所で音声アシスタントアプリ３２を使用する場合でも、ノートＰＣ１の正面に居なくても好適に使用することが可能となる。 In addition, since the second mode is a multi-angle mode in which sound is collected in all directions with respect to the notebook PC and processing for extracting the loudest sound among the collected sounds is performed, Even when the voice assistant application 32 is used at a location slightly away from the notebook PC 1, it can be suitably used without being in front of the notebook PC 1.

また、オーディオドライバ３１ａは、第１のモードが設定されている場合に、ノートＰＣ１がアイドル状態となった場合には、第２のモードを設定することとしたので、ユーザがノートＰＣ１から離れた場所でも音声アシスタントアプリ３２を好適に使用することが可能となる。 The audio driver 31a sets the second mode when the notebook PC 1 is in an idle state when the first mode is set, so that the user leaves the notebook PC 1. The voice assistant application 32 can be preferably used even at a place.

また、オーディオドライバ３１ａは、第２のモードが設定されている場合に、周囲の音源が複数の場合は、第１のモードを設定することとしたので、周囲が騒がしい場合にユーザの音声の誤認識を防止することが可能となる。 Also, the audio driver 31a sets the first mode when the second mode is set and there are a plurality of surrounding sound sources. Recognition can be prevented.

また、オーディオドライバ３１ａは、第２のモードが設定されている場合に、周囲の音源が１つの場合は、第２のモードを維持すると共に、スピーカ６ａ、６ｂの音量を当該音源よりも大きく設定し、音声アシスタントアプリ３２は、ユーザからの発話コマンドの音声入力を待つこととしたので、音声アシスタントアプリ３２は、周囲の音源よりも大きな音声で音声案内を行うことが可能となる。 Also, when the second mode is set and the surrounding sound source is one, the audio driver 31a maintains the second mode and sets the volume of the speakers 6a and 6b to be larger than that of the sound source. Since the voice assistant application 32 waits for the voice input of the utterance command from the user, the voice assistant application 32 can perform voice guidance with a voice larger than that of the surrounding sound source.

また、オーディオドライバ３１は、第２のモードが設定されている場合に、周囲がサイレントな場合は、第２のモードを維持し、音声アシスタントアプリ３２は、ユーザからの発話コマンドの音声入力を待つこととしたので、ユーザはノートＰＣ１から離れた場所で音声アシスタントアプリ３２を使用する場合に、ノートＰＣ１の正面に居なくても音声を誤認識することなく好適に使用することができる。 Further, when the second mode is set and the surroundings are silent, the audio driver 31 maintains the second mode, and the voice assistant application 32 waits for the voice input of the utterance command from the user. Therefore, when the user uses the voice assistant application 32 at a location away from the notebook PC 1, the user can use the voice assistant application 32 without erroneously recognizing the voice even if the user is not in front of the notebook PC 1.

なお、上記実施の形態では、本発明をノートＰＣに適用した場合について説明したが、本発明はこれに限られるものではなく、スマートフォン、タブレット、携帯電話、ＰＤＡ、デスクトップＰＣ等の情報処理装置にも適用可能である。 In the above embodiment, the case where the present invention is applied to a notebook PC has been described. However, the present invention is not limited to this, and the present invention is applied to information processing apparatuses such as a smartphone, a tablet, a mobile phone, a PDA, and a desktop PC. Is also applicable.

また、上記実施の形態では、音声認識を音声アシスタントアプリ３２で行うこととしたが、オーディオデバイス１７、オーディオドライバ３１ａ、及び／又はＯＳ３０で行うことにしてもよい。 In the above embodiment, the voice recognition is performed by the voice assistant application 32. However, the voice recognition may be performed by the audio device 17, the audio driver 31a, and / or the OS 30.

また、オーディオデバイス１７の音声処理回路１８の機能の一部又は全部をソフトウェアで行うことにしてもよい。 Also, some or all of the functions of the audio processing circuit 18 of the audio device 17 may be performed by software.

また、オーディオドライバ３１ａは、第２のモードのうち、会議モードとマルチアングルモードとを、ノートＰＣ１とユーザとの距離に応じて切り替えることにしてもよく、例えば、ノートＰＣ１とユーザとの距離が閾値より小さい場合は、マルチアングルモードを設定し、閾値以上の場合は、会議モードを設定することにしてもよい。 The audio driver 31a may switch between the conference mode and the multi-angle mode in the second mode according to the distance between the notebook PC 1 and the user. For example, the distance between the notebook PC 1 and the user may be changed. When it is smaller than the threshold value, the multi-angle mode may be set, and when it is equal to or larger than the threshold value, the conference mode may be set.

また、マイク５ａ、５ｂの使用モードをオーディオドライバ３１ａが設定することとしたが、ＯＳ３０及び／又は音声アシスタントアプリ３２が設定することにしてもよい。 Further, although the audio driver 31a sets the use mode of the microphones 5a and 5b, the OS 30 and / or the voice assistant application 32 may set them.

１ノートＰＣ
２本体側筐体
３ディスプレイ側筐体
４入力部
５ａ、５ｂマイク
６ａ、６ｂスピーカ
７ＬＣＤ（液晶ディスプレイ）
８カメラ
９ａ，９ｂ連結部（ヒンジ部）
１１ＣＰＵ
１２ＲＯＭ
１３メモリ
１４ストレージ
１５カメラデバイス
１６カメラ処理回路
１７オーディオデバイス
１８音声処理回路
１９通信デバイス
２１バッテリ
２２ＤＣ−ＤＣコンバータ
２３ＡＣアダプタ
３０ＯＳ
３１ドライバ
３１ａオーディオドライバ
３２音声アシスタントアプリ
３３ＶＯＩＰアプリ
３４他のアプリ 1 Notebook PC
2 Main Body Side Housing 3 Display Side Housing 4 Input Unit 5a, 5b Microphone 6a, 6b Speaker 7 LCD (Liquid Crystal Display)
8 Camera 9a, 9b Connecting part (hinge part)
11 CPU
12 ROM
13 Memory 14 Storage 15 Camera Device 16 Camera Processing Circuit 17 Audio Device 18 Audio Processing Circuit 19 Communication Device 21 Battery 22 DC-DC Converter 23 AC Adapter 30 OS
31 Driver 31a Audio Driver 32 Voice Assistant Application 33 VOIP Application 34 Other Applications

Claims

An information processing apparatus having a plurality of microphones,
Mode setting for selecting and setting one of the first mode having directivity and the second mode having no directivity as the use mode of the plurality of microphones based on the state of the information processing apparatus and surrounding sounds Means,
Sound processing means for performing signal processing on sound input from the plurality of microphones according to the mode set by the mode setting means;
A voice assistant means for performing voice assist by recognizing the sound signal-processed by the voice processing means;
An information processing apparatus comprising:

The mode setting means sets the second mode when the information processing apparatus is in an idle state when the first mode is set. The information processing apparatus described.

The mode setting means sets the first mode when the second mode is set and there are a plurality of surrounding sound sources. Information processing device.

When the second mode is set and the surrounding sound source is one, the mode setting means maintains the second mode and sets the volume of the speaker to be larger than the sound source,
The information processing apparatus according to claim 1, wherein the voice assistant unit waits for a voice input of an utterance command from a user.

The mode setting means maintains the second mode when the second mode is set and the surroundings are silent,
The information processing apparatus according to claim 1, wherein the voice assistant unit waits for a voice input of an utterance command from a user.

2. The mode according to claim 1, wherein the first mode is a mode in which beam forming processing is performed to extract only sound in a front direction from the information inputted from the plurality of microphones. The information processing apparatus according to any one of claims 5 to 6.

The second mode is a mode in which sensitivity of the plurality of microphones is set higher than that in the first mode, and sound in all directions is collected over a wide range with respect to the information processing apparatus. The information processing apparatus according to any one of claims 1 to 6.

The second mode is a mode in which the information processing apparatus collects sounds in all directions and performs a process of extracting the loudest sound among the collected sounds. The information processing apparatus according to any one of claims 6 to 6.

The information processing apparatus according to claim 1, wherein the mode setting unit sets the first mode by default.

The information processing apparatus according to claim 1, wherein the information processing apparatus is a notebook PC.

A method of using a microphone of an information processing apparatus including a plurality of microphones,
Mode setting for selecting and setting one of the first mode having directivity and the second mode having no directivity as the use mode of the plurality of microphones based on the state of the information processing apparatus and surrounding sounds Process,
An audio processing step for performing signal processing on sound input from the plurality of microphones according to the mode set in the mode setting step;
A voice assistant step of performing voice assist by recognizing the sound signal-processed in the voice processing step;
A method of using a microphone, comprising:

A program installed in an information processing apparatus having a plurality of microphones,
Mode setting for selecting and setting one of the first mode having directivity and the second mode having no directivity as the use mode of the plurality of microphones based on the state of the information processing apparatus and surrounding sounds Process,
An audio processing step for performing signal processing on sound input from the plurality of microphones according to the mode set in the mode setting step;
A voice assistant step of performing voice assist by recognizing the sound signal-processed in the voice processing step;
A computer-executable program characterized by causing a computer to execute.