JP2014010420A

JP2014010420A - Integrated circuit device

Info

Publication number: JP2014010420A
Application number: JP2012149177A
Authority: JP
Inventors: Masayuki Murakami; 雅行村上
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2012-07-03
Filing date: 2012-07-03
Publication date: 2014-01-20

Abstract

PROBLEM TO BE SOLVED: To provide an integrated circuit device that contributes to improvement of a recognition rate of voice recognition.SOLUTION: A integrated circuit device includes: an analog-digital converter that digitizes input voice to generate voice data; a voice recognition unit that selects a first option on the basis of the voice data; a storage unit that stores first information based on the voice data; and a control unit. The selection is performed when predetermined conditions are satisfied, and when the selection is not performed, the control unit performs output of the first information.

Description

本発明は、集積回路装置に関する。 The present invention relates to an integrated circuit device.

人の音声に基づいて特定の語彙を認識する音声認識技術が開発されている。また、音声認識の認識率を向上させるために、様々なアイディアが提案されている（特許文献１、特許文献２）。 Speech recognition technology that recognizes specific vocabulary based on human speech has been developed. Various ideas have been proposed in order to improve the recognition rate of voice recognition (Patent Documents 1 and 2).

特開２００６−１４５７９１号公報JP 2006-145791 A 特開２００９−１９２９４２号公報JP 2009-192942 A

音声認識の認識率を向上させるために、音声認識できなかった場合の音声を解析したい要求があるが、従来の技術ではこの要求に十分に応えられなかった。 In order to improve the recognition rate of speech recognition, there is a demand for analyzing speech when speech recognition is not possible, but the prior art cannot sufficiently meet this requirement.

本発明は、以上のような問題点に鑑みてなされたものであり、本発明のいくつかの態様によれば、音声認識の認識率の向上に資する集積回路装置を提供することができる。 The present invention has been made in view of the above problems, and according to some aspects of the present invention, it is possible to provide an integrated circuit device that contributes to an improvement in the recognition rate of speech recognition.

［適用例１］
本適用例にかかる集積回路装置は、入力音声をデジタル化して音声データを生成するアナログデジタル変換器と、前記音声データに基づいて第１の選択肢の選出を行う音声認識部と、前記音声データに基づいた第１の情報を記憶する記憶部と、制御部と、を含み、前記選出は、所定の条件を満たした場合に行われ、前記選出が行われなかった場合は、前記制御部が前記第１の情報の出力を行うことを特徴とする。 [Application Example 1]
An integrated circuit device according to this application example includes an analog-to-digital converter that digitizes input speech to generate speech data, a speech recognition unit that selects a first option based on the speech data, and the speech data A storage unit that stores the first information based on, and a control unit, wherein the selection is performed when a predetermined condition is satisfied, and if the selection is not performed, the control unit The first information is output.

この構成によれば、入力音声をデジタル化して音声データを生成するアナログデジタル変換器と、音声データに基づいて第１の選択肢の選出を行う音声認識部と、音声データに基づいた第１の情報を記憶する記憶部と、制御部と、を含み、第１の選択肢の選出が、所定の条件を満たした場合に行われ、選出が行われなかった場合に制御部が第１の情報の出力を行うことで、所定の条件を満たさなかったときの第１の情報を得ることができ、所定の条件を満たさなかった理由の解析に第１の情報を利用することができる。この解析の結果を音声認識部における選出の手段に反映させることで、より高い音声認識率を有する集積回路装置を開発することができる。 According to this configuration, the analog-digital converter that digitizes the input voice to generate voice data, the voice recognition unit that selects the first option based on the voice data, and the first information based on the voice data The first option is selected when the predetermined condition is satisfied, and when the selection is not performed, the control unit outputs the first information. By performing the above, it is possible to obtain the first information when the predetermined condition is not satisfied, and it is possible to use the first information for the analysis of the reason why the predetermined condition is not satisfied. By reflecting the result of this analysis on the selection means in the voice recognition unit, an integrated circuit device having a higher voice recognition rate can be developed.

第１の情報は、音声データに基づいた情報である。例えば、第１の情報は、音声データに対してＦＦＴ（ＦｉｒｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）を行って得た周波数スペクトルデータなどの情報である。また、音声データそのものも、第１の情報である。従って、記憶部に記憶される第１の情報は、周波数スペクトルデータ及び音声データの双方を含んでいても構わない。 The first information is information based on audio data. For example, the first information is information such as frequency spectrum data obtained by performing FFT (First Fourier Transform) on audio data. Also, the audio data itself is the first information. Therefore, the first information stored in the storage unit may include both frequency spectrum data and audio data.

［適用例２］
上記適用例にかかる集積回路装置において、前記第１の選択肢は、複数の選択肢の中から選出され、前記所定の条件は、前記複数の選択肢の中で尤度が最も大きく、かつ、所定の閾値よりも大きいことであることを特徴とする。 [Application Example 2]
In the integrated circuit device according to the application example described above, the first option is selected from a plurality of options, and the predetermined condition has a maximum likelihood among the plurality of options and has a predetermined threshold value. It is characterized by being larger than.

この構成によれば、第１の選択肢は、複数の選択肢の中から選出され、所定の条件は、複数の選択肢の中で尤度が最も大きく、かつ、所定の閾値よりも大きいことであることで、第１の選択肢を複数の選択肢の中における最も音声データと一致している選択肢であると判断することができる。ここで、所定の閾値とは、集積回路装置の用途に応じて任意に設定される値ということでよい。また、値は予め行った実験やシミュレーションなどから算出した値でもよい。 According to this configuration, the first option is selected from a plurality of options, and the predetermined condition is that the likelihood is the largest among the plurality of options and is greater than a predetermined threshold. Thus, it can be determined that the first option is the option that most closely matches the voice data among the plurality of options. Here, the predetermined threshold may be a value that is arbitrarily set according to the application of the integrated circuit device. The value may be a value calculated from an experiment or simulation performed in advance.

［適用例３］
上記適用例にかかる集積回路装置において、前記選出が行われなかった場合に、前記第１の情報に付随して、前記複数の選択肢の中で前記尤度が最も高い第２の選択肢の情報を出力することを特徴とする。 [Application Example 3]
In the integrated circuit device according to the application example described above, when the selection is not performed, information on the second option having the highest likelihood among the plurality of options is attached to the first information. It is characterized by outputting.

この構成によれば、第１の選択肢が選出されなかった場合に、複数の選択肢の中で最も尤度が高い第２の選択肢の情報を第１の情報と共に出力することにより、第１の情報と第２の選択肢とを用いて音声認識の処理の結果についての原因の解析を行うことができる。 According to this configuration, when the first option is not selected, the information on the second option having the highest likelihood among the plurality of options is output together with the first information. And the second option can be used to analyze the cause of the speech recognition processing result.

［適用例４］
上記適用例にかかる集積回路装置において、前記第１の情報が前記記憶部に記憶されるのは、前記音声データが無音でない場合であることを特徴とする。 [Application Example 4]
In the integrated circuit device according to the application example described above, the first information is stored in the storage unit when the audio data is not silent.

この構成によれば、無音でない場合に第１の情報が記憶部に保持されるので、音声認識できなかった原因の解析に役立つデータのみを記憶部に記憶することができる。また、記憶部に記憶されるデータ量を節約することができる。ここで、無音の定義は特に断定するものではないが、例えば、集積回路装置が使用される環境におけるホワイトノイズから算出される値としてもよい。 According to this configuration, since the first information is held in the storage unit when there is no silence, only the data useful for analyzing the cause of the voice recognition failure can be stored in the storage unit. In addition, the amount of data stored in the storage unit can be saved. Here, the definition of silence is not particularly determined, but may be a value calculated from white noise in an environment where the integrated circuit device is used, for example.

本実施形態にかかる集積回路装置１の構成例を示すブロック図。1 is a block diagram showing a configuration example of an integrated circuit device 1 according to an embodiment. 選択肢のグループの一例を示す図。The figure which shows an example of the group of choices.

以下、本発明の好適な実施形態について図面を用いて詳細に説明する。なお、以下に説明する実施形態は、特許請求の範囲に記載された本発明の内容を不当に限定するものではない。また以下で説明される構成の全てが本発明の必須構成要件であるとは限らない。用いる図面は説明のための便宜上のものである。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. The embodiments described below do not unduly limit the contents of the present invention described in the claims. Also, not all of the configurations described below are essential constituent requirements of the present invention. The drawings used are for convenience of explanation.

（第１実施形態）
図１は、本実施形態にかかる集積回路装置１の構成例を示すブロック図である。 (First embodiment)
FIG. 1 is a block diagram illustrating a configuration example of an integrated circuit device 1 according to the present embodiment.

集積回路装置１は、アナログデジタル変換器１０と、音声認識部２０と、記憶部３０と、制御部４０と、を含んで構成されている。制御部４０は、コントローラー４１と、ホストインターフェース４２と、を含んで構成されている。また、アナログデジタル変換器１０にはマイクロフォン１００が接続されている。また、ホスト２００がホストインターフェース４２に接続されている。集積回路装置１は、ホスト２００において、いわゆる周辺装置としての位置付けである。 The integrated circuit device 1 includes an analog-digital converter 10, a voice recognition unit 20, a storage unit 30, and a control unit 40. The control unit 40 includes a controller 41 and a host interface 42. Further, the microphone 100 is connected to the analog-digital converter 10. A host 200 is connected to the host interface 42. The integrated circuit device 1 is positioned as a so-called peripheral device in the host 200.

アナログデジタル変換器１０は、マイクロフォン１００から入力される入力音声をデジタル化して音声データ信号Ｓ３を生成する部分である。音声データ信号Ｓ３は、コントローラー４１に出力される。 The analog-to-digital converter 10 is a part that digitizes the input sound input from the microphone 100 and generates an audio data signal S3. The audio data signal S3 is output to the controller 41.

音声認識部２０は、複数の選択肢の中から、音声データ信号Ｓ３に基づく第１の情報に対して音声認識処理を行い、入力音声と一致すると判断できる第１の選択肢を選択する部分である。第１の情報は、コントローラー４１で形成され、音声認識対象信号Ｓ４としてコントローラー４１から音声認識部２０に出力される。また、複数の選択肢は、予め音声認識部２０に保持されていてもよく、第１の情報と共に音声認識対象信号Ｓ４としてコントローラー４１から出力されてもよい。音声認識処理の手法としては種々の公知の方法を用いることができ、例えば、隠れマルコフモデルを用いて音声認識処理を行ってもよい。また、選択肢は、文法的に厳密に１単語である必要はなく、複数の単語から構成されるフレーズや文章であってもよい。 The voice recognition unit 20 is a part that performs a voice recognition process on the first information based on the voice data signal S3 from a plurality of options and selects a first option that can be determined to match the input voice. The first information is formed by the controller 41 and is output from the controller 41 to the voice recognition unit 20 as a voice recognition target signal S4. The plurality of options may be held in the voice recognition unit 20 in advance, and may be output from the controller 41 as the voice recognition target signal S4 together with the first information. Various known methods can be used as the speech recognition processing technique. For example, speech recognition processing may be performed using a hidden Markov model. Further, the option does not need to be exactly one word in terms of grammar, and may be a phrase or a sentence composed of a plurality of words.

音声認識部２０は、音声認識処理の結果として、複数の選択肢の中で尤度が最大であり且つ所定の閾値を超えた選択肢、すなわち第１の選択肢を選出する。また、音声認識部２０は、所定時間（音声認識処理許容時間、例えば、２秒間）内に閾値を超えた選択肢がない場合には、該当する選択肢がない旨の情報を音声認識処理の結果として出力する。所定時間は、複数の選択肢の言語モデルにおける音素データの長さから算出してもよいし、入力音声の長さから算出してもよい。所定時間の計測は、音声認識部２０で行ってもよく、コントローラー４１で行ってもよい。また、音声認識部２０及びコントローラー４１の双方で行ってもよい。尚、所定時間の計測をコントローラー４１で行うことで、音声認識部２０からの応答がなくなるなどの不測の事態に対する対応も可能となる。 As a result of the speech recognition processing, the speech recognition unit 20 selects an option having the maximum likelihood among a plurality of options and exceeding a predetermined threshold, that is, a first option. In addition, when there is no option that exceeds the threshold within a predetermined time (speech recognition processing allowable time, for example, 2 seconds), the speech recognition unit 20 displays information indicating that there is no corresponding option as a result of the speech recognition processing. Output. The predetermined time may be calculated from the length of phoneme data in a plurality of choice language models, or may be calculated from the length of input speech. The measurement of the predetermined time may be performed by the voice recognition unit 20 or the controller 41. Moreover, you may perform by both the voice recognition part 20 and the controller 41. FIG. In addition, by performing measurement for a predetermined time with the controller 41, it is possible to cope with unforeseen circumstances such as no response from the voice recognition unit 20.

記憶部３０は、第１の情報を記憶する部分である。第１の情報は、コントローラー４１において識別情報などが付加されて書込みデータ信号Ｓ５として記憶部３０に出力され、記録される。 The storage unit 30 is a part that stores the first information. The first information is added with identification information or the like in the controller 41 and is output to the storage unit 30 as a write data signal S5 and recorded.

上述したが、第１の情報は、アナログデジタル変換器１０で生成された音声データそのものであってもよい。これによって、第１のデータが音声認識部２０で用いる音声データとなるので、音声認識できなかった原因の解析に音声認識部２０で用いる音声データを利用できる。したがって、音声認識の認識率の向上に資する集積回路装置１を実現できる。 As described above, the first information may be the sound data itself generated by the analog-digital converter 10. As a result, the first data becomes the voice data used by the voice recognition unit 20, and therefore the voice data used by the voice recognition unit 20 can be used for analyzing the cause of the voice recognition failure. Therefore, the integrated circuit device 1 that contributes to an improvement in the recognition rate of voice recognition can be realized.

また、第１の情報は、アナログデジタル変換器１０で生成された音声データそのものに限られず、例えば、音声認識に必要な高速フーリエ変換の後のデータであってもよいし、何らかの重み付けを行った特徴ベクトルのようなものでもよい。 Further, the first information is not limited to the voice data itself generated by the analog-digital converter 10, and may be, for example, data after the fast Fourier transform necessary for voice recognition, or some weighting is performed. It may be a feature vector.

制御部４０は、集積回路装置１の主たる制御を行う部分である。集積回路装置１が起動されると、制御部４０は集積回路装置１内の初期設定を行い、ホストインターフェース４２におけるホスト２００との通信手段を確立する。また、ホストインターフェース４２において、ホスト２００からのコマンドが解読され、実行されることで、集積回路装置１は、ホスト２００からの指示に従った動作を行うことができる。 The controller 40 is a part that performs main control of the integrated circuit device 1. When the integrated circuit device 1 is activated, the control unit 40 performs initial setting in the integrated circuit device 1 and establishes communication means with the host 200 in the host interface 42. Further, the command from the host 200 is decoded and executed in the host interface 42, so that the integrated circuit device 1 can perform an operation in accordance with an instruction from the host 200.

次に、集積回路装置１の具体的な動作例について説明する。 Next, a specific operation example of the integrated circuit device 1 will be described.

まず、ホスト２００が、音声認識を開始するための認識開始コマンドをホスト出力信号Ｓ１としてホストインターフェース４２に出力する。ホストインターフェース４２は、ホスト出力信号Ｓ１を受け付けて、認識開始信号を内部制御信号Ｓ２として音声認識部２０とコントローラー４１とに出力する。 First, the host 200 outputs a recognition start command for starting voice recognition to the host interface 42 as the host output signal S1. The host interface 42 receives the host output signal S1 and outputs a recognition start signal to the voice recognition unit 20 and the controller 41 as an internal control signal S2.

認識開始コマンド及び認識開始信号は、音声認識部２０による音声認識処理に用いる複数の選択肢に関する情報を含んでいてもよい。例えば、音声ガイドとユーザーとが対話するシステムに集積回路装置１を用いる場合には、認識開始コマンド及び認識開始信号は、音音声ガイドが出力する質問への回答に対応する選択肢のグループを指定するための情報を含んでいてもよい。この場合、指定されたグループに含まれる選択肢が音声認識の対象となる。 The recognition start command and the recognition start signal may include information on a plurality of options used for the voice recognition processing by the voice recognition unit 20. For example, when the integrated circuit device 1 is used in a system in which the voice guide and the user interact, the recognition start command and the recognition start signal specify a group of options corresponding to the answer to the question output by the voice guide. Information may be included. In this case, options included in the designated group are targets for speech recognition.

図２は、選択肢のグループの一例を示す図である。図２に示される例では、「はい」、「いいえ」、「戻る」、「キャンセル」の４つの選択肢がグループを構成している。 FIG. 2 is a diagram illustrating an example of a group of options. In the example shown in FIG. 2, four options “Yes”, “No”, “Return”, and “Cancel” constitute a group.

また、認識開始コマンド及び認識開始信号は、音声認識処理許容時間の設定パラメーターを含んでいてもよい。音声認識部２０は、音声認識処理許容時間の設定パラメーターの値に基づいて、音声認識処理をタイムアウトする所定時間を設定してもよい。また、一般的には選択肢のグループによって選択肢の長さが異なるので、例えば、長い選択肢が含まれている場合には音声認識処理許容時間として長い時間を設定し、長い選択肢が含まれていない場合には音声認識処理許容時間として短い時間を設定するなど、選択肢の長さに合わせて設定されるようにしてもよい。 The recognition start command and the recognition start signal may include a setting parameter for the voice recognition processing allowable time. The voice recognition unit 20 may set a predetermined time for timing out the voice recognition process based on the value of the setting parameter for the voice recognition process allowable time. Also, since the length of the options generally differs depending on the group of options, for example, when a long option is included, a long time is set as the voice recognition processing allowable time, and a long option is not included May be set in accordance with the length of the option, such as setting a short time as the voice recognition processing allowable time.

コントローラー４１は、内部制御信号Ｓ２を受け付けて、音声認識処理に必要な動作を開始する。より具体的には、アナログデジタル変換器１０から音声データを音声データ信号Ｓ３として受け付け、音声データに基づく第１の情報を音声認識対象信号Ｓ４として音声認識部２０へ出力するとともに、第１の情報を書込みデータ信号Ｓ５として記憶部３０へ出力する。 The controller 41 receives the internal control signal S2 and starts an operation necessary for the voice recognition process. More specifically, the audio data is received from the analog-digital converter 10 as the audio data signal S3, the first information based on the audio data is output to the audio recognition unit 20 as the audio recognition target signal S4, and the first information Is output to the storage unit 30 as the write data signal S5.

音声認識部２０は、内部制御信号Ｓ２を受け付けて、音声認識処理を開始する。音声認識部２０は、コントローラー４１から第１の情報を音声認識対象信号Ｓ４として受け付け、受け付けた第１の情報に基づいて音声認識処理を行う。 The voice recognition unit 20 receives the internal control signal S2 and starts the voice recognition process. The voice recognition unit 20 receives the first information from the controller 41 as the voice recognition target signal S4, and performs voice recognition processing based on the received first information.

音声認識部２０は、音声認識処理が終了すると、認識終了信号を内部制御信号Ｓ６としてコントローラー４１とホストインターフェース４２に出力する。また、音声認識部２０は、音声認識処理が終了すると、音声認識処理の結果に関する情報を処理結果信号Ｓ７としてコントローラー４１とホストインターフェース４２に出力する。 When the voice recognition process is completed, the voice recognition unit 20 outputs a recognition end signal to the controller 41 and the host interface 42 as an internal control signal S6. Further, when the voice recognition process is completed, the voice recognition unit 20 outputs information related to the result of the voice recognition process to the controller 41 and the host interface 42 as a processing result signal S7.

図２に示す選択肢のグループの例の場合、音声認識部２０は、「はい」、「いいえ」、「戻る」、「キャンセル」の４つの選択肢のうち、尤度が閾値以上となった選択肢の中で尤度が最も大きい選択肢（第１の選択肢）の情報を処理結果信号Ｓ７として制御部４０に出力する。また、全ての選択肢の尤度が閾値よりも小さい場合は、「該当なし」に対応する情報と共に尤度が最も大きい選択肢（第２の選択肢）の情報を出力する。 In the example of the option group illustrated in FIG. 2, the speech recognition unit 20 selects an option whose likelihood is equal to or greater than a threshold value from among four options “Yes”, “No”, “Return”, and “Cancel”. Information on the option (first option) having the highest likelihood is output to the control unit 40 as the processing result signal S7. If the likelihoods of all the options are smaller than the threshold value, the information of the option (second option) having the highest likelihood is output together with the information corresponding to “not applicable”.

コントローラー４１は、内部制御信号Ｓ６を受け付けて、音声認識処理を終了するために必要な動作を行う。より具体的には、コントローラー４１は、音声認識対象信号Ｓ４の出力を停止する。また、コントローラー４１は、書込みデータ信号Ｓ５の出力を停止する。 The controller 41 receives the internal control signal S6 and performs an operation necessary to end the voice recognition process. More specifically, the controller 41 stops outputting the speech recognition target signal S4. Further, the controller 41 stops outputting the write data signal S5.

コントローラー４１は、音声認識処理の結果が「該当なし」である場合には、記憶部３０から読出しデータ信号Ｓ８を介して第１の情報を読み出し、読み出した第１の情報を出力データ信号Ｓ９としてホストインターフェース４２へ出力する。 When the result of the speech recognition process is “not applicable”, the controller 41 reads the first information from the storage unit 30 via the read data signal S8, and uses the read first information as the output data signal S9. Output to the host interface 42.

ホストインターフェース４２は、音声認識処理の結果が「該当なし」である場合には、コントローラー４１から受け取った第１の情報、若しくは、第１の情報及び第２の選択肢の情報を、ホスト入力信号Ｓ１０としてホスト２００へ出力する。また、ホストインターフェース４２は、第１の選択肢が選出された場合には、第１の選択肢の情報をホスト入力信号Ｓ１０としてホスト２００へ出力する。 When the result of the speech recognition process is “not applicable”, the host interface 42 receives the first information received from the controller 41 or the first information and the second option information as the host input signal S10. To the host 200. Further, when the first option is selected, the host interface 42 outputs information on the first option to the host 200 as the host input signal S10.

このように、音声認識処理の結果が選択肢のいずれにも一致すると判断できない場合、すなわち、音声認識できなかった場合、音声データに基づいた第１の情報をホスト２００へ出力することで、音声認識できなかった原因の解析にホスト２００側で第１の情報を利用することができる。したがって、音声認識の認識率の向上に資する集積回路装置１を実現できる。 As described above, when it cannot be determined that the result of the voice recognition process matches any of the options, that is, when the voice recognition is not possible, the first information based on the voice data is output to the host 200, thereby the voice recognition. The first information can be used on the host 200 side in analyzing the cause of the failure. Therefore, the integrated circuit device 1 that contributes to an improvement in the recognition rate of voice recognition can be realized.

また、ホスト２００は、第１の情報読出し要求命令をホスト出力信号Ｓ１としてホストインターフェース４２に出力してもよい。この場合には、ホストインターフェース４２は第１の情報読出し要求命令を示す情報を内部制御信号Ｓ２としてコントローラー４１に出力する。コントローラー４１は、内部制御信号Ｓ２を受け付けて、第１の情報読出し要求命令に対応する第１の情報を、読出しデータ信号Ｓ８を介して記憶部３０から読み出して、読み出した第１の情報を出力データ信号Ｓ９としてホストインターフェース４２へ出力する。ホストインターフェース４２は、出力データ信号Ｓ９を受け付けて、第１の情報をホスト入力信号Ｓ１０としてホスト２００へ出力する。これによって、ホスト２００の都合に合わせて第１の情報を出力することができる。 Further, the host 200 may output the first information read request command to the host interface 42 as the host output signal S1. In this case, the host interface 42 outputs information indicating the first information read request command to the controller 41 as the internal control signal S2. The controller 41 receives the internal control signal S2, reads the first information corresponding to the first information read request command from the storage unit 30 via the read data signal S8, and outputs the read first information. The data signal S9 is output to the host interface 42. The host interface 42 receives the output data signal S9 and outputs the first information to the host 200 as the host input signal S10. Thus, the first information can be output according to the convenience of the host 200.

これにより、ホスト２００による第１の情報の読出しが、任意のタイミングで可能となる。 Accordingly, the first information can be read by the host 200 at an arbitrary timing.

また、ホスト２００は、時間設定命令をホスト出力信号Ｓ１としてホストインターフェース４２に出力してもよい。この場合には、ホストインターフェース４２は時間設定命令を示す情報を内部制御信号Ｓ２としてコントローラー４１に出力する。コントローラー４１は、この情報を基にして音声認識部２０で行われる音声認識処理の処理時間の監視を行うことができる。 Further, the host 200 may output a time setting command to the host interface 42 as the host output signal S1. In this case, the host interface 42 outputs information indicating the time setting command to the controller 41 as the internal control signal S2. The controller 41 can monitor the processing time of the voice recognition process performed by the voice recognition unit 20 based on this information.

上述した実施形態及び変形例は一例であって、これらに限定される訳ではない。例えば各実施形態及び各変形例は、複数を適宜組み合わせることが可能である。 The above-described embodiments and modifications are examples, and the present invention is not limited to these. For example, a plurality of embodiments and modifications can be combined as appropriate.

本発明は、上述した実施形態並びに適用例に限定されるものではなく、さらに種々の変形が可能である。例えば、本発明は、実施形態で説明した構成と実質的に同一の構成（例えば、機能、方法及び結果が同一の構成、あるいは目的及び効果が同一の構成）を含む。また、本発明は、実施形態で説明した構成の本質的でない部分を置き換えた構成を含む。また、本発明は、実施形態で説明した構成と同一の作用効果を奏する構成又は同一の目的を達成することができる構成を含む。また、本発明は、実施形態で説明した構成に公知技術を付加した構成を含む。本発明は、本発明の趣旨を逸脱しない範囲において広く適用が可能である。 The present invention is not limited to the above-described embodiments and application examples, and various modifications can be made. For example, the present invention includes substantially the same configuration (for example, a configuration having the same function, method and result, or a configuration having the same purpose and effect) as the configuration described in the embodiment. In addition, the invention includes a configuration in which a non-essential part of the configuration described in the embodiment is replaced. In addition, the present invention includes a configuration that exhibits the same operational effects as the configuration described in the embodiment or a configuration that can achieve the same object. In addition, the invention includes a configuration in which a known technique is added to the configuration described in the embodiment. The present invention can be widely applied without departing from the spirit of the present invention.

１…集積回路装置、１０…アナログデジタル変換器、２０…音声認識部、３０…記憶部、４０…制御部、４１…コントローラー、４２…ホストインターフェース、１００…マイクロフォン、２００…ホスト。 DESCRIPTION OF SYMBOLS 1 ... Integrated circuit device, 10 ... Analog-digital converter, 20 ... Voice recognition part, 30 ... Memory | storage part, 40 ... Control part, 41 ... Controller, 42 ... Host interface, 100 ... Microphone, 200 ... Host.

Claims

An analog-to-digital converter that digitizes the input sound and generates sound data;
A voice recognition unit that selects a first option based on the voice data;
A storage unit for storing first information based on the audio data;
A control unit,
The selection is performed when a predetermined condition is satisfied,
The integrated circuit device, wherein if the selection is not performed, the control unit outputs the first information.

The first option is selected from a plurality of options,
The integrated circuit device according to claim 1, wherein the predetermined condition is that the likelihood is the largest among the plurality of options and is greater than a predetermined threshold.

The information of a second option having the highest likelihood among the plurality of options is output along with the first information when the selection is not performed. Or the integrated circuit device according to 2;

4. The integrated circuit device according to claim 1, wherein the first information is stored in the storage unit when the audio data is not silent. 5.