JP2018194849A

JP2018194849A - Information processing device

Info

Publication number: JP2018194849A
Application number: JP2018132122A
Authority: JP
Inventors: 田村　雄一; Yuichi Tamura; 雄一田村
Original assignee: Pioneer Electronic Corp
Current assignee: Pioneer Corp
Priority date: 2018-07-12
Filing date: 2018-07-12
Publication date: 2018-12-06

Abstract

To provide an information processing device by which an appropriate recognition result can be obtained.SOLUTION: A CPU 13 in a car navigation system 10 obtains a recognition result of a voice recognition engine 12 that performs voice recognition of voice made by a user and parameters made up of sound pressure information, a score, and algorithm determination. The CPU 13 further obtains a recognition result of a voice recognition engine 22 in a smart phone 20 that has performed voice recognition of the same voice, and parameters made up of sound pressure information, a score, and algorithm determination via a short distance wireless communication unit 15. The CPU 13 selects either one of the recognition result of the voice recognition engine 12 and the recognition result of the voice recognition engine 22 on the basis of the parameters of the voice recognition engine 12 and the parameters of the voice recognition engine 22, and causes the one to be executed as a command.SELECTED DRAWING: Figure 2

Description

本発明は、音声認識処理結果に基づいて情報処理を行う情報処理装置に関する。 The present invention relates to an information processing apparatus that performs information processing based on a voice recognition processing result.

従来から様々な電子機器において、利用者が発話した音声を認識して、その認識結果に応じた動作を行う機器がある。それらは、機器毎に音声認識処理部のアルゴリズムや辞書等が異なるため、同じ音声であっても機器毎に認識率が異なる。 2. Description of the Related Art Conventionally, in various electronic devices, there are devices that recognize a voice spoken by a user and perform an operation according to the recognition result. Since the algorithms, dictionaries, and the like of the voice recognition processing unit are different for each device, the recognition rate is different for each device even for the same voice.

また、近年はクラウド型の音声認識システムが提案されている。これは、サーバに音声認識エンジンや辞書を備え、複数の端末がそれぞれネットワーク経由でサーバに接続して音声認識処理を依頼し、処理結果を取得するものである。このようなクラウド型の音声認識システムは、サーバに辞書を持つので語彙を非常に多くすることができるとともに高度なアルゴリズムでも処理可能であるといった利点がある。 In recent years, cloud-type speech recognition systems have been proposed. In this method, a server is provided with a voice recognition engine and a dictionary, and a plurality of terminals each connect to the server via a network to request a voice recognition process and obtain a processing result. Such a cloud-type speech recognition system has an advantage that it has a dictionary in the server, so that the vocabulary can be greatly increased and it can be processed by an advanced algorithm.

また、特許文献１には、音声認識の結果を他の電子機器と共有することが開示されている。具体的には、外部機器２０へ音声にて入力を行う場合は、携帯情報処理器１０から外部機器２０へ認識モジュールを送信し、外部機器２０は受信した認識モジュールを利用して処理を行う。 Patent Document 1 discloses sharing the result of speech recognition with other electronic devices. Specifically, when inputting to the external device 20 by voice, a recognition module is transmitted from the portable information processor 10 to the external device 20, and the external device 20 performs processing using the received recognition module.

特開２００３−１４０６９０号公報JP 2003-140690 A

クラウド型の音声認識ステムは、基本的に会話を前提として認識が行われる。しかしながら、例えばカーナビゲーションシステムでは迂回検索やリルートするといった固有のフレーズを音声操作用のコマンドとして使用することができるが、クラウド型の音声認識ステムは、このような固有な環境で使用されるフレーズを適切に認識させることは困難である。 The cloud type speech recognition system is basically recognized on the premise of conversation. However, for example, a car navigation system can use a unique phrase such as detour search or reroute as a command for voice operation, but a cloud-type speech recognition system can use a phrase used in such a unique environment. It is difficult to recognize properly.

また、特許文献１に記載の方法では、複数の電子機器で認識結果を共有できるものの、例えば、カーナビゲーションシステムの認識モジュールを他の機器に移動した場合に、その認識モジュールはカーナビゲーションシステム固有の環境に適した認識を行うため、例えば通常の会話の認識をさせた場合に適切な認識結果を得られない場合がある。 In the method described in Patent Document 1, although recognition results can be shared by a plurality of electronic devices, for example, when a recognition module of a car navigation system is moved to another device, the recognition module is unique to the car navigation system. In order to perform recognition suitable for the environment, for example, when a normal conversation is recognized, an appropriate recognition result may not be obtained.

そこで、本発明は、上述した問題に鑑み、例えば、適切な認識結果を得ることができる情報処理装置を提供することを課題とする。 In view of the above-described problems, an object of the present invention is to provide an information processing apparatus capable of obtaining an appropriate recognition result, for example.

上記課題を解決するために、音声を認識する第１音声認識部から第１音声認識結果情報及び第１音声認識処理情報を取得する第１取得部と、前記音声を認識する第２音声認識部から第２音声認識結果情報及び第２音声認識処理情報を取得する第２取得部と、前記第１音声認識処理情報及び前記第２音声認識処理情報に基づき前記第１音声認識結果情報又は前記第２音声認識結果情報のいずれかを選択し、選択された前記第１音声認識結果情報又は前記第２音声認識結果情報に関する処理を処理部に実行させる制御部と、を備えたことを特徴としている。 In order to solve the above problems, a first acquisition unit that acquires first speech recognition result information and first speech recognition processing information from a first speech recognition unit that recognizes speech, and a second speech recognition unit that recognizes the speech A second acquisition unit for acquiring second voice recognition result information and second voice recognition processing information from the first voice recognition result information or the second voice recognition processing information based on the first voice recognition processing information and the second voice recognition processing information. A control unit that selects any one of the two speech recognition result information and causes the processing unit to execute a process related to the selected first speech recognition result information or the second speech recognition result information. .

請求項１２に記載の発明は、認識した音声に基づいて処理部に処理を実行させる情報処理装置の制御方法であって、前記音声を認識する第１音声認識部から第１音声認識結果情報及び第１音声認識処理情報を取得する第１取得工程と、前記音声を認識する第２音声認識部から第２音声認識結果情報及び第２音声認識処理情報を取得する第２取得工程と、前記第１音声認識処理情報及び前記第２音声認識処理情報に基づき前記第１音声認識結果情報又は前記第２音声認識結果情報のいずれかを選択し、選択された前記第１音声認識結果情報又は前記第２音声認識結果情報に関する処理を処理部に実行させる制御工程と、を含むことを特徴としている。 The invention according to claim 12 is a control method of an information processing apparatus that causes a processing unit to execute processing based on recognized speech, wherein the first speech recognition result information from the first speech recognition unit that recognizes the speech, and A first acquisition step of acquiring first speech recognition processing information; a second acquisition step of acquiring second speech recognition result information and second speech recognition processing information from a second speech recognition unit that recognizes the speech; One of the first voice recognition result information and the second voice recognition result information is selected based on one voice recognition processing information and the second voice recognition processing information, and the selected first voice recognition result information or the first voice recognition result information is selected. And a control step for causing the processing unit to execute processing related to the two speech recognition result information.

請求項１３に記載の発明は、請求項１２に記載の情報処理方法を、コンピュータにより実行させることを特徴としている。 The invention described in claim 13 is characterized in that the information processing method according to claim 12 is executed by a computer.

請求項１４に記載の発明は、請求項１３に記載の情報処理プログラムを格納したことを特徴としている。 The invention described in claim 14 is characterized in that the information processing program described in claim 13 is stored.

本発明の一実施例にかかるカーナビゲーションシステムの外観斜視図である。1 is an external perspective view of a car navigation system according to an embodiment of the present invention. 図１に示されたカーナビゲーションシステムとスマートフォンのブロック構成図である。It is a block block diagram of the car navigation system and smart phone which were shown by FIG. 図１に示されたカーナビゲーションシステムの音声認識動作のフローチャートである。It is a flowchart of the speech recognition operation | movement of the car navigation system shown by FIG. 図３に示されたフローチャートの具体例を説明する表である。4 is a table for explaining a specific example of the flowchart shown in FIG. 3. 図３に示されたフローチャートの具体例を説明する表である。4 is a table for explaining a specific example of the flowchart shown in FIG. 3.

以下、本発明の一実施形態にかかる情報処理装置を説明する。本発明の一実施形態にかかる情報処理装置は、第１取得部が、利用者が発話した音声を音声認識する第１音声認識部の音声認識結果である第１音声認識結果情報と、第１音声認識部から第１音声認識結果情報とともに得られる情報である第１音声認識処理情報と、を取得し、第２取得部が、第１音声認識部が認識する音声を音声認識する第２音声認識部の音声認識結果である第２音声認識結果情報と、第２音声認識部から第２音声認識結果情報とともに得られる情報である第２音声認識処理情報と、を取得する。そして、制御部が、第１取得部が取得した第１音声認識処理情報および第２取得部が取得した第２音声認識処理情報に基づいて、第１音声認識結果情報または第２音声認識結果情報のいずれか一方を選択し、当該選択された第１音声認識結果情報または第２音声認識結果情報に基づいた情報処理を処理部に実行させる。このようにすることにより、同じ音声を認識した２つの音声認識部の結果から選択することができるので、単独で音声認識を行う以上の精度で音声認識をすることができる。例えば２つの音声認識部を異なるアルゴリズムや辞書を持ったものとすれば、様々な環境に合った認識結果を得ることができる。したがって、適切な認識結果を得ることができる。 Hereinafter, an information processing apparatus according to an embodiment of the present invention will be described. In the information processing apparatus according to an embodiment of the present invention, the first acquisition unit includes first speech recognition result information that is a speech recognition result of a first speech recognition unit that recognizes speech uttered by a user, and first information. First speech recognition processing information that is information obtained together with the first speech recognition result information from the speech recognition unit, and the second acquisition unit recognizes the speech recognized by the first speech recognition unit. Second speech recognition result information that is a speech recognition result of the recognition unit and second speech recognition processing information that is information obtained together with the second speech recognition result information from the second speech recognition unit are acquired. Then, based on the first voice recognition processing information acquired by the first acquisition unit and the second voice recognition processing information acquired by the second acquisition unit, the control unit performs the first voice recognition result information or the second voice recognition result information. Is selected, and the processing unit is caused to execute information processing based on the selected first voice recognition result information or second voice recognition result information. By doing in this way, since it can select from the result of the two speech recognition parts which recognized the same speech, speech recognition can be performed with the accuracy more than performing speech recognition independently. For example, if two speech recognition units have different algorithms and dictionaries, recognition results suitable for various environments can be obtained. Therefore, an appropriate recognition result can be obtained.

また、制御部は、第１音声認識処理情報が予め定めた第１閾値以上である場合は、第１音声認識結果情報を選択して、該第１音声認識結果情報に基づいた情報処理を処理部に実行させてもよい。このようにすることにより、第１音声認識部の認識結果を利用して、例えばナビゲーションシステムのルート検索やインターネットを利用した店舗等の検索といった様々な情報処理をすることができる。 In addition, when the first voice recognition processing information is equal to or greater than a predetermined first threshold, the control unit selects the first voice recognition result information and processes information processing based on the first voice recognition result information. May be executed by the part. By doing in this way, various information processings, such as a route search of a navigation system, a search of a store using the Internet, etc., can be performed using the recognition result of the 1st voice recognition part.

また、制御部は、第１音声認識処理情報が第１閾値未満かつ、第２音声認識処理情報が予め定めた第２閾値以上である場合は、第２音声認識結果情報を選択して、該第２音声認識結果情報に基づいた情報処理を処理部に実行させてもよい。このようにすることにより、第１音声認識部の認識結果の信頼性が低く適切でない可能性が高い場合は第２音声認識部の結果を利用して、例えばナビゲーションシステムのルート検索やインターネットを利用した店舗等の検索といった様々な情報処理をすることができる。 Further, the control unit selects the second speech recognition result information when the first speech recognition processing information is less than the first threshold and the second speech recognition processing information is greater than or equal to a predetermined second threshold, Information processing based on the second speech recognition result information may be executed by the processing unit. By doing so, when the reliability of the recognition result of the first voice recognition unit is low and the possibility that it is not appropriate is high, the result of the second voice recognition unit is used, for example, the route search of the navigation system or the Internet is used. It is possible to perform various information processing such as searching for a shop or the like.

また、第２取得部は、第２音声認識結果情報に基づいて処理された結果である処理結果情報をさらに取得する。そして、制御部は、第１音声認識処理情報が予め定めた第１閾値未満かつ、第２音声認識処理情報が予め定めた第２閾値以上である場合は、第２音声認識結果情報が所定のコマンド群に含まれているか否かを判断し、含まれている場合は第２音声認識結果情報に基づいた情報処理を処理部に実行させ、含まれていない場合は、第２取得部が取得した処理結果情報に基づいた情報処理を処理部に実行させてもよい。このようにすることにより、第１音声認識部の認識結果が適切でない可能性が高い場合は第２音声認識部の結果を利用することができる。さらに、第２音声認識部の結果が、例えば機器を操作するためのコマンド等の所定のコマンド群に含まれる場合はそのコマンドに沿った動作をさせることができ、また、第２音声認識部の結果が所定のコマンド群に含まれない場合は、第２音声認識部を有する機器等で処理した結果を利用して情報処理をすることができる。 The second acquisition unit further acquires processing result information that is a result processed based on the second speech recognition result information. Then, when the first voice recognition processing information is less than the predetermined first threshold and the second voice recognition processing information is equal to or greater than the predetermined second threshold, the second voice recognition result information is predetermined. It is determined whether or not it is included in the command group. If it is included, the processing unit executes information processing based on the second speech recognition result information, and if it is not included, the second acquisition unit acquires it. Information processing based on the processed result information may be executed by the processing unit. By doing in this way, when the possibility that the recognition result of the first speech recognition unit is not appropriate is high, the result of the second speech recognition unit can be used. Further, when the result of the second voice recognition unit is included in a predetermined command group such as a command for operating the device, for example, the operation according to the command can be performed. If the result is not included in the predetermined command group, information processing can be performed using the result processed by the device having the second speech recognition unit.

また、制御部は、第１音声認識処理情報が第１閾値未満かつ、第２音声認識処理情報が第２閾値未満である場合は、第１音声認識処理情報および第２音声認識処理情報それぞれに重み付けをした所定の演算を行い、当該演算結果に基づいて第１音声認識結果情報または第２音声認識結果情報を選択してもよい。このようにすることにより、第１閾値および第２閾値で認識結果を選択できない場合は、それぞれの結果に対して使用環境等に基づいた重みづけを行った演算をすることにより第１音声認識結果情報または第２音声認識結果情報のいずれか一方を選択することができる。 In addition, when the first voice recognition process information is less than the first threshold value and the second voice recognition process information is less than the second threshold value, the control unit determines each of the first voice recognition process information and the second voice recognition process information. A predetermined calculation with weighting may be performed, and the first voice recognition result information or the second voice recognition result information may be selected based on the calculation result. By doing in this way, when the recognition result cannot be selected with the first threshold value and the second threshold value, the first voice recognition result is obtained by performing a calculation based on weighting based on the use environment for each result. Either the information or the second speech recognition result information can be selected.

また、制御部は、第１音声認識処理情報が第１閾値未満かつ、第２音声認識処理情報が第２閾値未満である場合は、過去の使用履歴に基づいて第１音声認識結果情報または第２音声認識結果情報を選択してもよい。このようにすることにより、第１閾値および第２閾値で認識結果を選択できない場合は、過去の音声認識や検索あるいは操作等に使用されたかといった過去の使用履歴に基づいて第１音声認識結果情報または第２音声認識結果情報のいずれか一方を選択することができる。 In addition, when the first voice recognition process information is less than the first threshold value and the second voice recognition process information is less than the second threshold value, the control unit determines whether the first voice recognition result information or the first voice recognition result information is based on the past use history. 2 Voice recognition result information may be selected. In this way, when the recognition result cannot be selected with the first threshold value and the second threshold value, the first voice recognition result information is based on the past use history such as whether it has been used for past voice recognition, search, or operation. Alternatively, either one of the second speech recognition result information can be selected.

また、制御部は、第１音声認識処理情報が第１閾値未満かつ、第２音声認識処理情報が第２閾値未満である場合は、過去の使用状況に基づいて第１音声認識結果情報または第２音声認識結果情報を選択してもよい。このようにすることにより、第１閾値および第２閾値で認識結果を選択できない場合は、時間帯、季節や天候といった過去の使用状況に基づいて第１音声認識結果情報または第２音声認識結果情報のいずれか一方を選択することができる。 In addition, when the first voice recognition processing information is less than the first threshold and the second voice recognition processing information is less than the second threshold, the control unit determines whether the first voice recognition result information or the first voice recognition result information is based on the past usage situation. 2 Voice recognition result information may be selected. By doing in this way, when a recognition result cannot be selected with the first threshold value and the second threshold value, the first voice recognition result information or the second voice recognition result information is based on the past use situation such as time zone, season, and weather. Either of these can be selected.

また、第１音声認識処理情報および第２音声認識処理情報には、利用者が発話した音声と、第１音声認識結果情報または第２音声認識結果情報との関連の度合いに関する情報である関連度情報を含んでもよい。このようにすることにより、発話音声に基づいた認識結果との関連の度合い、即ち、発話音声と辞書との類似度を示すスコアに基づいて第１音声認識結果情報または第２音声認識結果情報のいずれか一方を選択することができる。 The first voice recognition process information and the second voice recognition process information include a degree of association that is information on the degree of association between the voice spoken by the user and the first voice recognition result information or the second voice recognition result information. Information may be included. By doing so, the first speech recognition result information or the second speech recognition result information is based on the degree of association with the recognition result based on the uttered speech, that is, the score indicating the similarity between the uttered speech and the dictionary. Either one can be selected.

また、第１音声認識部と、第１取得部と、第２取得部と、制御部と、を一体的に備えていてもよい。このようにすることにより、第２音声認識部を有する外部機器等と連携させることで、情報処理装置において適切な認識結果を得ることができる。 Moreover, the 1st audio | voice recognition part, the 1st acquisition part, the 2nd acquisition part, and the control part may be provided integrally. By doing so, it is possible to obtain an appropriate recognition result in the information processing apparatus by cooperating with an external device or the like having the second voice recognition unit.

また、制御部は、自身が選択した第１音声認識結果情報または第２音声認識結果情報に基づいて第１音声認識部および第２音声認識部に認識結果を学習させてもよい。このようにすることにより、それぞれの音声認識部に認識結果を共有させて以降の音声認識の精度を向上させることができる。 Further, the control unit may cause the first speech recognition unit and the second speech recognition unit to learn the recognition result based on the first speech recognition result information or the second speech recognition result information selected by the control unit. By doing in this way, each voice recognition part can share a recognition result, and the precision of subsequent voice recognition can be improved.

また、利用者が発話した音声がそれぞれ入力される入力部が第１音声認識部および第２音声認識部それぞれに対応して設けられてもよい。このようにすることにより、例えば入力部としてマイクをそれぞれに音声認識部に対応して設けることができ、第１音声認識処理情報や第２音声認識処理情報としてマイクから入力された音声の音圧や音量を取得することができる。 Moreover, the input part into which the audio | voice which the user uttered is each input may be provided corresponding to each of a 1st audio | voice recognition part and a 2nd audio | voice recognition part. In this way, for example, microphones can be provided as input units corresponding to the voice recognition units, respectively, and the sound pressure of the voice input from the microphones as the first voice recognition processing information and the second voice recognition processing information And can get volume.

また、本発明の一実施形態にかかる情報処理装置の制御方法は、第１取得工程で、利用者が発話した音声を音声認識する第１音声認識部の音声認識結果である第１音声認識結果情報と、第１音声認識部から第１音声認識結果情報とともに得られる情報である第１音声認識処理情報と、を取得し、第２取得工程で、第１音声認識部が認識する音声を音声認識する第２音声認識部の音声認識結果である第２音声認識結果情報と、第２音声認識部から第２音声認識結果情報とともに得られる情報である第２音声認識処理情報と、を取得する。そして、制御工程で、第１取得工程で取得した第１音声認識処理情報および第２取得工程で取得した第２音声認識処理情報に基づいて、第１音声認識結果情報または第２音声認識結果情報のいずれか一方を選択し、当該選択された第１音声認識結果情報または第２音声認識結果情報に基づいた情報処理を処理部に実行させる。このようにすることにより、同じ音声を認識した２つの音声認識部の結果から選択することができるので、単独で音声認識を行う以上の精度で音声認識をすることができる。例えば２つの音声認識部を異なるアルゴリズムや辞書を持ったものとすれば、様々な環境に合った認識結果を得ることができる。したがって、適切な認識結果を得ることができる。 Also, in the control method of the information processing apparatus according to the embodiment of the present invention, the first voice recognition result that is the voice recognition result of the first voice recognition unit that recognizes the voice uttered by the user in the first acquisition step. Information and first voice recognition processing information which is information obtained together with the first voice recognition result information from the first voice recognition unit, and the voice recognized by the first voice recognition unit in the second acquisition step is voiced. The second voice recognition result information that is the voice recognition result of the second voice recognition unit to be recognized and the second voice recognition processing information that is information obtained together with the second voice recognition result information from the second voice recognition unit are acquired. . Then, in the control process, based on the first voice recognition process information acquired in the first acquisition process and the second voice recognition process information acquired in the second acquisition process, the first voice recognition result information or the second voice recognition result information Is selected, and the processing unit is caused to execute information processing based on the selected first voice recognition result information or second voice recognition result information. By doing in this way, since it can select from the result of the two speech recognition parts which recognized the same speech, speech recognition can be performed with the accuracy more than performing speech recognition independently. For example, if two speech recognition units have different algorithms and dictionaries, recognition results suitable for various environments can be obtained. Therefore, an appropriate recognition result can be obtained.

また、上述した情報処理装置の制御方法をコンピュータにより実行させる情報処理装置の制御プログラムとしてもよい。このようにすることにより、コンピュータを用いて、適切な認識結果を得ることができる。 Moreover, it is good also as a control program of the information processing apparatus which performs the control method of the information processing apparatus mentioned above with a computer. By doing in this way, an appropriate recognition result can be obtained using a computer.

また、上述した情報処理装置の制御プログラムをコンピュータ読み取り可能な記録媒体に格納してもよい。このようにすることにより、当該プログラムを機器に組み込む以外に単体でも流通させることができ、バージョンアップ等も容易に行える。 The control program for the information processing apparatus described above may be stored in a computer-readable recording medium. In this way, the program can be distributed as a single unit in addition to being incorporated in the device, and version upgrades can be easily performed.

本発明の一実施例にかかる情報処理装置としてのカーナビゲーションシステム１０を図１乃至図５を参照して説明する。カーナビゲーションシステム１０は、図１に示したように車両のインストルメントパネル１００に装着されている。そして、カーナビゲーションシステム１０は、例えば助手席１０１上に置かれている後述するスマートフォン２０と無線または有線で接続して通信可能となっている。 A car navigation system 10 as an information processing apparatus according to an embodiment of the present invention will be described with reference to FIGS. The car navigation system 10 is mounted on an instrument panel 100 of a vehicle as shown in FIG. The car navigation system 10 can communicate with a smartphone 20 (described later) placed on the passenger seat 101 by wireless or wired connection.

図１に示したカーナビゲーションシステム１０は、図２に示したように、マイク１１と、音声認識エンジン１２と、ＣＰＵ１３と、記憶装置１４と、近距離無線通信部１５と、ＧＰＳ１６と、表示部１７と、を備えている。 As shown in FIG. 2, the car navigation system 10 shown in FIG. 1 includes a microphone 11, a speech recognition engine 12, a CPU 13, a storage device 14, a short-range wireless communication unit 15, a GPS 16, and a display unit. 17.

入力部としてのマイク１１は、利用者が発話した音声が入力され、電気信号である音声信号に変換して音声認識エンジン１２に出力する。なお、マイク１１は、カーナビゲーションシステム１０に一体的に設けられていなくてもよく、利用者の近傍、例えば車両であればステアリングコラム等に設けてケーブルや無線等で接続されていてもよい。 The microphone 11 serving as an input unit receives voice uttered by the user, converts the voice into an audio signal that is an electrical signal, and outputs the voice signal to the voice recognition engine 12. Note that the microphone 11 may not be provided integrally with the car navigation system 10, and may be provided near the user, for example, a steering column in the case of a vehicle and connected by a cable or wirelessly.

第１音声認識部としての音声認識エンジン１２は、マイク１１から入力された音声信号に基づいて音声認識を行い、その認識した結果情報である単語や単語の組み合わせによるフレーズと、音声信号と自身が有する辞書にある語彙との近似度を示すスコア、自身が有する辞書とに照合度や前後の文脈から推定した候補との適合度等アルゴリズムによる判定の度合いを示すアルゴリズム判定、マイク１１から入力された音声信号の音圧情報といったパラメータと、をＣＰＵ１３に出力する。なお、結果情報は候補として複数あってもよく、その場合は候補ごとにスコア、アルゴリズム判定および音圧情報が出力される。また、音圧情報に代えて音量情報でもよい。また、パラメータは、前記した３つのうち１つ以上であればよいが、スコアが含まれていることが望ましい。即ち、フレーズが第１音声認識結果情報、パラメータが第１音声認識処理情報に相当する。また、スコアは、音声信号と辞書との近似度であるので、音声信号と辞書から検索されたフレーズとの関連の度合いに関する情報である関連度情報である。 The voice recognition engine 12 as the first voice recognition unit performs voice recognition based on the voice signal input from the microphone 11, and the recognition result information includes a phrase based on a word or a combination of words, a voice signal, and itself. An algorithm determination indicating the degree of determination by an algorithm such as a score indicating the degree of approximation with a vocabulary in the dictionary possessed, a matching degree with a dictionary possessed by itself or a candidate estimated from previous and subsequent contexts, and input from the microphone 11 Parameters such as sound pressure information of the sound signal are output to the CPU 13. Note that there may be a plurality of result information as candidates, in which case the score, algorithm determination, and sound pressure information are output for each candidate. Also, volume information may be used instead of the sound pressure information. Further, the parameter may be one or more of the three parameters described above, but preferably includes a score. That is, the phrase corresponds to the first voice recognition result information, and the parameter corresponds to the first voice recognition processing information. Further, since the score is a degree of approximation between the voice signal and the dictionary, the score is relevance information that is information relating to the degree of association between the voice signal and the phrase searched from the dictionary.

なお、音声認識エンジン１２に使用される認識アルゴリズムやスコアの算出方法は周知のものでよく特に限定しないが、音声認識エンジン１２は、カーナビゲーションシステム１０に設けられているので、カーナビゲーションシステム１０の音声入力コマンド（操作コマンド）に用いられる「リルート」や「迂回検索」などのフレーズの認識確率が高くなるように調整されたものが好ましい。また、音声認識エンジン１２は、カーナビゲーションシステム１０が備えていなくてもよく、例えば、従来技術に記載したクラウド型の音声認識システムを利用してもよい。即ち、マイク１１から入力された音声信号をサーバ等に送信してサーバで音声認識処理を行い、フレーズとパラメータをカーナビゲーションシステム１０が受信するものであってもよい。 Note that the recognition algorithm and the score calculation method used for the speech recognition engine 12 may be well-known and are not particularly limited. However, since the speech recognition engine 12 is provided in the car navigation system 10, What is adjusted so that the recognition probability of phrases such as “reroute” and “detour search” used for voice input commands (operation commands) is increased is preferable. The voice recognition engine 12 may not be included in the car navigation system 10, and may use, for example, a cloud type voice recognition system described in the related art. In other words, the voice signal input from the microphone 11 may be transmitted to a server or the like, the voice recognition process may be performed by the server, and the car navigation system 10 may receive the phrases and parameters.

制御部、第１取得部、処理部としてのＣＰＵ１３は、ＲＡＭやＲＯＭ等を備えたマイクロコンピュータとして構成され、カーナビゲーションシステム１０の全体制御を司る。そして、ＣＰＵ１３は、カーナビゲーションシステム１０が一般的に有する機能、例えば目的地設定、ルート検索、案内、地図表示等の各種処理を実行する。また、ＣＰＵ１３は、音声認識エンジン１２が出力したフレーズおよびパラメータを取得する。そして、音声認識エンジン１２が出力したパラメータおよび後述するスマートフォン２０から近距離無線通信部１５が取得した音声認識エンジン２２が出力したパラメータに基づいて、音声認識エンジン１２が出力したフレーズおよびスマートフォン２０から近距離無線通信部１５が取得した音声認識エンジン２２が出力したフレーズのいずれか一方を選択し、選択されたフレーズに基づいた処理を実行する。 The CPU 13 as a control unit, a first acquisition unit, and a processing unit is configured as a microcomputer including a RAM, a ROM, and the like, and governs overall control of the car navigation system 10. And CPU13 performs various processes, such as destination setting, route search, guidance, and map display, which car navigation system 10 generally has. Further, the CPU 13 acquires the phrases and parameters output from the voice recognition engine 12. Based on the parameters output by the speech recognition engine 12 and the parameters output by the speech recognition engine 22 acquired by the short-range wireless communication unit 15 from the smartphone 20 described later, the phrases output by the speech recognition engine 12 and the smartphone 20 One of the phrases output by the speech recognition engine 22 acquired by the distance wireless communication unit 15 is selected, and processing based on the selected phrase is executed.

記憶装置１４は、例えばハードディスクや半導体メモリ等の不揮発性の読み書き自在な記憶媒体で構成されている。記憶装置１４は、例えばカーナビゲーションシステム１０で案内等に使用する地図等の情報が記憶されている。 The storage device 14 is configured by a nonvolatile read / write storage medium such as a hard disk or a semiconductor memory. The storage device 14 stores information such as a map used for guidance in the car navigation system 10, for example.

第２取得部としての近距離無線通信部１５は、例えばＢｌｅｕｔｏｏｔｈ（登録商標）や赤外線通信等の近距離無線通信により後述するスマートフォン２０と接続して互いにデータ通信を行う。また、近距離無線通信部１５は、スマートフォン２０から後述する音声認識エンジン２２が出力したフレーズおよびパラメータを取得する。なお、近距離無線通信部１５は、近距離無線通信に限らず無線ＬＡＮ（Local Area Network）などの他の無線通信でもよいし、ＵＳＢ（Universal Serial Bus）などの有線通信によるものでもよい。 The short-range wireless communication unit 15 as a second acquisition unit is connected to a smartphone 20 described later by short-range wireless communication such as Bluetooth (registered trademark) or infrared communication, and performs data communication with each other. In addition, the short-range wireless communication unit 15 acquires phrases and parameters output from the smartphone 20 described later by a voice recognition engine 22. Note that the short-range wireless communication unit 15 is not limited to short-range wireless communication, and may be other wireless communication such as a wireless LAN (Local Area Network) or wired communication such as USB (Universal Serial Bus).

ＧＰＳ１６は、公知であるように複数のＧＰＳ（Global Positioning System）衛星から発信される電波を受信して、現在の位置情報（現在位置情報）を求めてＣＰＵ１３に出力する。なお、本実施例では、ＧＰＳ１６がカーナビゲーションシステム１０に一体に設けられている例を示すが、ＧＰＳ１６が別体として構成され、カーナビゲーションシステム１０と着脱自在となっていてもよい。 As is well known, the GPS 16 receives radio waves transmitted from a plurality of GPS (Global Positioning System) satellites, obtains current position information (current position information), and outputs it to the CPU 13. In this embodiment, an example in which the GPS 16 is provided integrally with the car navigation system 10 is shown, but the GPS 16 may be configured as a separate body and detachable from the car navigation system 10.

表示部１７は、例えば液晶ディスプレイやＥＬ（Electro Luminescence）ディスプレイ等の表示装置で構成されている。また、表示部１７は、表示面にタッチパネルが重ねられていてもよい。表示部１７は、地図、自車の位置、目的地や経路等の案内情報等や、各種操作メニューおよびタッチパネル操作用のボタン等が表示される。 The display unit 17 is configured by a display device such as a liquid crystal display or an EL (Electro Luminescence) display. The display unit 17 may have a touch panel superimposed on the display surface. The display unit 17 displays a map, guidance information such as the position of the vehicle, destination and route, various operation menus, buttons for touch panel operation, and the like.

上述した構成のカーナビゲーションシステム１０は、図２に示したスマートフォン２０と近距離無線通信部１５により互いにデータ通信が行われる。なお、上述したように、カーナビゲーションシステム１０は、地図情報を持ってルート検索等のナビゲーション機能を自身で行っていたが、外部サーバ等に地図情報を持ってナビゲーション機能をサーバに実行させて自身はその結果を受け取って表示する形態としてもよい。 In the car navigation system 10 having the above-described configuration, data communication is performed between the smartphone 20 and the short-range wireless communication unit 15 illustrated in FIG. In addition, as described above, the car navigation system 10 performs navigation functions such as route search by itself with map information. However, the car navigation system 10 causes the server to execute the navigation function with map information in an external server or the like. May be configured to receive and display the result.

スマートフォン２０は、マイク２１と、音声認識エンジン２２と、ＣＰＵ２３と、記憶装置２４と、近距離無線通信部２５と、回線通信部２６と、を備えている。 The smartphone 20 includes a microphone 21, a voice recognition engine 22, a CPU 23, a storage device 24, a short-range wireless communication unit 25, and a line communication unit 26.

入力部としてのマイク２１は、利用者が発話した音声が入力され、電気信号である音声信号に変換して音声認識エンジン２２に出力する。 The microphone 21 serving as an input unit receives voice spoken by the user, converts the voice into an audio signal that is an electric signal, and outputs the voice signal to the voice recognition engine 22.

第２音声認識部としての音声認識エンジン２２は、マイク２１から入力された音声信号に基づいて音声認識を行い、その認識した結果情報である単語や単語の組み合わせによるフレーズと、音声信号と自身が有する辞書にある語彙との近似度を示すスコア、自身が有する辞書とに照合度を示すアルゴリズム判定、マイク２１から入力された音声信号の音圧情報といったパラメータと、をＣＰＵ２３に出力する。なお、結果情報は候補として複数あってもよく、その場合は候補ごとにスコア、アルゴリズム判定および音圧情報が出力される。即ち、フレーズが第２音声認識結果情報、パラメータが第２音声認識処理情報に相当する。 The voice recognition engine 22 as the second voice recognition unit performs voice recognition based on the voice signal input from the microphone 21, and the recognition result information includes a phrase based on a word or a combination of words, a voice signal, and itself. The CPU 23 outputs to the CPU 23 a score indicating a degree of approximation with a vocabulary in the dictionary, an algorithm determination indicating a matching degree with the dictionary held by the dictionary, and sound pressure information of an audio signal input from the microphone 21. Note that there may be a plurality of result information as candidates, in which case the score, algorithm determination, and sound pressure information are output for each candidate. That is, the phrase corresponds to the second voice recognition result information, and the parameter corresponds to the second voice recognition processing information.

なお、音声認識エンジン２２に使用される認識アルゴリズムやスコアの算出方法は周知のものでよく特に限定しないが、音声認識エンジン２２は、音声認識エンジン１２とは異なる認識アルゴリズムや辞書を持つものが望ましい。この場合、音声認識エンジン１２では正しく認識できないフレーズを認識できる可能性が高まり、音声認識エンジン１２を補完することができる。 The recognition algorithm used for the speech recognition engine 22 and the score calculation method may be well-known ones and are not particularly limited. However, the speech recognition engine 22 preferably has a recognition algorithm or dictionary different from the speech recognition engine 12. . In this case, the possibility of recognizing a phrase that cannot be recognized correctly by the speech recognition engine 12 increases, and the speech recognition engine 12 can be supplemented.

また、音声認識エンジン２２は、スマートフォン２０が備えていなくてもよく、例えば、従来技術に記載したクラウド型の音声認識システムを利用してもよい。即ち、マイク２１から入力された音声信号をサーバ等に送信してサーバで音声認識処理を行い、フレーズとパラメータをスマートフォン２０が受信するものであってもよい。 The voice recognition engine 22 may not be included in the smartphone 20, and may use, for example, a cloud type voice recognition system described in the related art. That is, the speech signal input from the microphone 21 may be transmitted to a server or the like, the speech recognition process may be performed by the server, and the smartphone 20 may receive the phrases and parameters.

ＣＰＵ２３は、ＲＡＭやＲＯＭ等を備えたマイクロコンピュータとして構成され、スマートフォン２０の全体制御を司る。そして、ＣＰＵ２３は、スマートフォン２０が一般的に有する機能、例えば電話、メール、インターネット接続等の機能の実行、あるいはアプリの実行等を行う。また、ＣＰＵ２３は、音声認識エンジン２２からフレーズと、パラメータを取得し、近距離無線通信部２５を介してカーナビゲーションシステム１０に送信する。また、音声認識エンジン２２の認識結果に基づいてインターネット検索等の処理を行う。 The CPU 23 is configured as a microcomputer including a RAM, a ROM, and the like, and governs overall control of the smartphone 20. And CPU23 performs the function which the smart phone 20 generally has, for example, functions, such as a telephone call, mail, and an internet connection, or execution of an application. Further, the CPU 23 acquires a phrase and parameters from the voice recognition engine 22 and transmits them to the car navigation system 10 via the short-range wireless communication unit 25. Further, processing such as Internet search is performed based on the recognition result of the voice recognition engine 22.

記憶装置２４は、例えば半導体メモリ等の不揮発性の読み書き自在な記憶媒体で構成されている。メモリーカードなどの着脱自在な記憶媒体でもよい。記憶装置２４は、例えばスマートフォン２０で使用する電話帳やアプリのデータ等が記憶されている。 The storage device 24 is composed of a nonvolatile read / write storage medium such as a semiconductor memory. A removable storage medium such as a memory card may be used. The storage device 24 stores, for example, phone books and application data used in the smartphone 20.

近距離無線通信部２５は、例えばＢｌｅｕｔｏｏｔｈ（登録商標）や赤外線通信等の近距離無線通信により後述するカーナビゲーションシステム１０と接続して互いにデータ通信を行う。また、近距離無線通信部２５は、音声認識エンジン２２が出力したフレーズおよびパラメータをカーナビゲーションシステム１０に送信する。 The short-range wireless communication unit 25 is connected to a car navigation system 10 described later by short-range wireless communication such as Bluetooth (registered trademark) or infrared communication, and performs data communication with each other. In addition, the short-range wireless communication unit 25 transmits the phrase and parameters output by the voice recognition engine 22 to the car navigation system 10.

回線通信部２６は、携帯電話回線網への接続を行い各種通信をする。回線通信部２６は、例えばＷ−ＣＤＭＡ（Wideband Code Division Multiple Access）やＬＴＥ（Long Term Evolution）などの通信方式により基地局等と接続して携帯電話回線網へ接続する。 The line communication unit 26 connects to the mobile phone network and performs various communications. The line communication unit 26 is connected to a base station or the like by a communication method such as W-CDMA (Wideband Code Division Multiple Access) or LTE (Long Term Evolution), and is connected to the mobile phone line network.

上述した構成のカーナビゲーションシステム１０は、自身が持つ音声認識エンジン１２と、スマートフォン２０が持つ音声認識エンジン２２と、の双方を利用してより適切な認識結果（フレーズ）を選択する。そして、選択されたフレーズに基づいた各種処理を実行する。詳細動作を図３に示したフローチャートを参照して説明する。図３に示したフローチャートは、ＣＰＵ１３が実行する。また、本フローチャートを実行する前に予めカーナビゲーションシステム１０とスマートフォン２０は近距離無線通信によって互いにデータ通信が行えるようになっている。 The car navigation system 10 having the above-described configuration selects a more appropriate recognition result (phrase) using both the voice recognition engine 12 that the car navigation system 10 has and the voice recognition engine 22 that the smartphone 20 has. Then, various processes based on the selected phrase are executed. The detailed operation will be described with reference to the flowchart shown in FIG. The CPU 13 executes the flowchart shown in FIG. In addition, the car navigation system 10 and the smartphone 20 can perform data communication with each other by short-range wireless communication in advance before executing this flowchart.

まず、第１取得工程としてのステップＳ１において、利用者が発話した音声をマイク１１を介して音声認識エンジン１２（カーナビゲーションシステム１０）で音声認識し、フレーズおよびパラメータ（スコア、アルゴリズム判定、音圧情報）を取得してステップＳ３に進む。 First, in step S1 as the first acquisition step, the speech uttered by the user is recognized by the speech recognition engine 12 (car navigation system 10) via the microphone 11, and the phrase and parameter (score, algorithm determination, sound pressure) are recognized. Information) is acquired, and the process proceeds to step S3.

一方、第２取得工程としてのステップＳ２においては、ステップＳ１と同じ音声を音声認識エンジン２２（スマートフォン２０）が音声認識したフレーズおよびパラメータを近距離無線通信部１５を介して取得してステップＳ３に進む。 On the other hand, in step S2 as the second acquisition step, the phrases and parameters that the voice recognition engine 22 (smart phone 20) recognizes the same voice as in step S1 are acquired via the short-range wireless communication unit 15 and the process proceeds to step S3. move on.

次に、ステップＳ３において、ステップＳ１で取得した音声認識エンジン１２のパラメータと、ステップＳ２で取得した音声認識エンジン２２のパラメータと、比較しステップＳ４に進む。 Next, in step S3, the parameters of the speech recognition engine 12 acquired in step S1 are compared with the parameters of the speech recognition engine 22 acquired in step S2, and the process proceeds to step S4.

次に、ステップＳ４において、音声認識エンジン１２が出力したパラメータのうち、スコアと音圧情報が予め定めた閾値以上か否かを判断し、閾値以上である場合（ＹＥＳの場合）はステップＳ５に進み、閾値未満である場合（ＮＯの場合）はステップＳ６に進む。この閾値は、例えば、音圧情報（音圧）の最大値を１００としたときの値で８５以上かつ、スコアが９２以上と設定されている。本実施例では、スコアだけでなく、より大きな音圧であった方が正確な音声認識ができる可能性が高いとして音圧情報にも閾値を設けている。つまり、これらの条件を満たす場合閾値以上と判断される。即ち、本ステップの判断に用いられる閾値が第１閾値に相当する。 Next, in step S4, it is determined whether or not the score and the sound pressure information are equal to or greater than a predetermined threshold among the parameters output by the speech recognition engine 12, and if they are equal to or greater than the threshold (YES), the process proceeds to step S5. If it is less than the threshold (in the case of NO), the process proceeds to step S6. For example, the threshold value is 85 or more when the maximum value of sound pressure information (sound pressure) is 100, and the score is 92 or more. In the present embodiment, not only the score but also a threshold value is set for the sound pressure information because there is a high possibility that accurate speech recognition is possible when the sound pressure is higher. That is, when these conditions are satisfied, it is determined that the threshold value is exceeded. That is, the threshold used for the determination in this step corresponds to the first threshold.

次に、ステップＳ５において、ステップＳ４で閾値以上と判断されたので、音声認識エンジン１２の認識結果であるフレーズを判定語、つまり、後のステップで実行される操作コマンドと決定（選択）しステップＳ１９に進む。 Next, in step S5, since it is determined in step S4 that the threshold is greater than or equal to the threshold value, the phrase that is the recognition result of the speech recognition engine 12 is determined (selected) as a determination word, that is, an operation command to be executed in a later step. Proceed to S19.

ステップＳ６においては、スマートフォン２０から取得したパラメータのうち、スコアと音圧情報が予め定めた閾値以上か否かを判断し、閾値以上である場合（ＹＥＳの場合）はステップＳ７に進み、閾値未満である場合（ＮＯの場合）はステップＳ１１に進む。この閾値は、例えば、音圧情報（音圧）の最大値を１００としたときの値で８２以上かつ、スコアが９６以上と設定されている。即ち、これらの条件を満たす場合閾値以上と判断される。なお、本ステップで判断される閾値とステップＳ４で判断される閾値は同じ値であってもよい。また、これらの閾値は、設置位置や各音声認識エンジンのアルゴリズムなどから適宜設定すればよい。即ち、本ステップの判断に用いられる閾値が第２閾値に相当する。 In step S6, it is determined whether or not the score and the sound pressure information are equal to or greater than a predetermined threshold among the parameters acquired from the smartphone 20. If (NO), the process proceeds to step S11. For example, the threshold value is set to 82 or more when the maximum value of sound pressure information (sound pressure) is 100, and the score is set to 96 or more. That is, when these conditions are satisfied, it is determined that the threshold value is exceeded. The threshold value determined in this step and the threshold value determined in step S4 may be the same value. These threshold values may be set as appropriate based on the installation position, the algorithm of each voice recognition engine, and the like. That is, the threshold value used for the determination in this step corresponds to the second threshold value.

次に、ステップＳ７において、ステップＳ６で閾値以上と判断されたので、スマートフォン２０から取得した認識結果であるフレーズを判定語と決定（選択）しステップＳ８に進む。 Next, in step S7, since it was judged that it was more than a threshold value by step S6, the phrase which is the recognition result acquired from the smart phone 20 is determined (selected) as a judgment word, and it progresses to step S8.

次に、ステップＳ８において、ステップＳ７で決定した判定語がナビコマンドにあるか否かを判断し、ある場合（ＹＥＳの場合）はそのフレーズを判定語と決定（選択）しステップＳ１９に進み、無い場合（ＮＯの場合）はステップＳ９に進む。ナビコマンドとは、カーナビゲーションシステム１０の操作に利用される所定のコマンド群を示している。つまり、本ステップでは決定された判定語がナビコマンドか否かを判断している。 Next, in step S8, it is determined whether or not the determination word determined in step S7 is in the navigation command. If there is (in the case of YES), the phrase is determined (selected) as a determination word and the process proceeds to step S19. If not (NO), the process proceeds to step S9. The navigation command indicates a predetermined command group used for the operation of the car navigation system 10. That is, in this step, it is determined whether or not the determined determination word is a navigation command.

次に、ステップＳ９において、ステップＳ８においてナビコマンドに判定語が含まれていないと判断されたので、スマートフォン２０に連携動作を行わせてステップＳ１０に進む。連携動作とは、例えばスマートフォン２０に音声認識エンジン２２が出力したフレーズ（判定語）を用いてインターネット検索やナビゲーションのアプリ等がインストールされている場合は検索結果に関連する地点情報（店舗名や所在地あるいは緯度経度情報等）を行わせることである。この連携動作は、カーナビゲーションシステム１０（ＣＰＵ１３）からスマートフォン２０へ実行を指示するコマンド等を送信してもよいし、スマートフォン２０が音声認識動作に引き続いて当該連携動作を予め行っていてもよい。 Next, in step S9, since it was determined in step S8 that the determination word is not included in the navigation command, the smartphone 20 is caused to perform a cooperative operation, and the process proceeds to step S10. For example, when an Internet search or navigation application is installed using a phrase (determination word) output from the speech recognition engine 22 on the smartphone 20, point information related to the search result (store name or location) Or latitude / longitude information). For this cooperative operation, a command for instructing execution from the car navigation system 10 (CPU 13) to the smartphone 20 may be transmitted, or the smart phone 20 may perform the cooperative operation in advance following the voice recognition operation.

次に、ステップＳ１０において、ステップＳ９でスマートフォン２０に行わせた連携動作結果を近距離無線通信部１５を介して取得しステップＳ１９に進む。即ち、この連携動作結果が処理結果情報に相当する。 Next, in step S10, the cooperative operation result performed by the smartphone 20 in step S9 is acquired via the short-range wireless communication unit 15, and the process proceeds to step S19. That is, the cooperation operation result corresponds to the processing result information.

ステップＳ１１においては、ステップＳ４、Ｓ６のいずれも閾値以下、即ち第１閾値未満かつ、第２閾値未満であったので、以下に示す（１）式、（２）式の計算式による評価を行ってステップＳ１２に進む。
（音圧×ａ）×（（スコア＋判定）×ｂ）・・・（１）
（音圧×ｃ）×（（スコア＋判定）×ｄ）・・・（２） In step S11, since both steps S4 and S6 were less than the threshold value, that is, less than the first threshold value and less than the second threshold value, evaluation was performed using the following formulas (1) and (2). Then, the process proceeds to step S12.
(Sound pressure × a) × ((score + determination) × b) (1)
(Sound pressure × c) × ((score + determination) × d) (2)

（１）式はスマートフォン２０から取得したパラメータをそれぞれ代入して算出する式、（２）式は音声認識エンジン１２が出力したパラメータをそれぞれ代入して算出する式である。また、判定はアルゴリズム判定の数値、ａ、ｂ、ｃ、ｄはそれぞれが乗算される項の重み付けをするための係数である。即ち、第１音声認識処理情報および第２音声認識処理情報それぞれに重み付けをした所定の演算を行っている。 The expression (1) is an expression that is calculated by substituting the parameters acquired from the smartphone 20, and the expression (2) is an expression that is calculated by substituting the parameters output by the speech recognition engine 12. The determination is a numerical value for algorithm determination, and a, b, c, and d are coefficients for weighting the terms to be multiplied. That is, a predetermined calculation is performed by weighting each of the first voice recognition process information and the second voice recognition process information.

次に、ステップＳ１２において、ステップＳ１１で行った評価の結果、１つに確定することができたか否かを判断し、確定できた場合（ＹＥＳの場合）はステップＳ１３に進み、確定できなった場合（ＮＯの場合）はステップＳ１４に進む。本ステップでは、例えば上記した（１）式や（２）式の算出結果の差が８以上であった場合は算出結果の多いフレーズを選択して１つに確定する。 Next, in step S12, it is determined whether or not the result of the evaluation performed in step S11 has been confirmed to one. If it can be confirmed (in the case of YES), the process proceeds to step S13 and cannot be confirmed. If yes (NO), go to step S14. In this step, for example, when the difference between the calculation results of the above formulas (1) and (2) is 8 or more, a phrase having a large calculation result is selected and fixed to one.

次に、ステップＳ１３において、ステップＳ１２や後述するステップＳ１５、Ｓ１７で確定したフレーズを判定語として選択し、ステップＳ１９に進む。 Next, in step S13, the phrase determined in step S12 or later-described steps S15 and S17 is selected as a determination word, and the process proceeds to step S19.

ステップＳ１４においては、認識結果として取得したフレーズの過去の使用履歴に基づいて評価してステップＳ１５に進む。この過去の使用履歴とは、例えば、音声認識の履歴に限らず、インターネット検索や目的地の検索などカーナビゲーションシステム１０の動作やスマートフォン２０で使われた履歴情報等である。なお、スマートフォン２０の使用履歴情報は、例えば本ステップ実行時に近距離無線通信部１５を介してフレーズを指定し取得すればよい。 In step S14, evaluation is performed based on the past use history of the phrase acquired as the recognition result, and the process proceeds to step S15. The past use history is not limited to the speech recognition history, but is the operation of the car navigation system 10 such as Internet search or destination search, history information used by the smartphone 20, and the like. The usage history information of the smartphone 20 may be acquired by specifying a phrase via the short-range wireless communication unit 15 at the time of executing this step, for example.

次に、ステップＳ１５において、ステップＳ１４で行った評価の結果、１つに確定することができたか否かを判断し、確定できた場合（ＹＥＳの場合）はステップＳ１３に進み、確定できなった場合（ＮＯの場合）はステップＳ１６に進む。本ステップでは、例えばステップＳ１４の結果、使用頻度の多いフレーズを選択して１つに確定する。 Next, in step S15, as a result of the evaluation performed in step S14, it is determined whether or not one can be confirmed. If it can be confirmed (YES), the process proceeds to step S13 and cannot be confirmed. In the case (NO), the process proceeds to step S16. In this step, for example, as a result of step S14, a frequently used phrase is selected and fixed to one.

次に、ステップＳ１６において、認識結果として取得したフレーズの過去の使用状況に基づいて評価してステップＳ１５に進む。この過去の使用状況とは、過去にそのフレーズが使用されたシーン、例えば午前／午後等の時間帯や季節、天候等の外部環境等である。 Next, in step S16, it evaluates based on the past use condition of the phrase acquired as a recognition result, and progresses to step S15. The past use situation is a scene in which the phrase has been used in the past, for example, an external environment such as a time zone such as AM / PM, season, weather, or the like.

次に、ステップＳ１７において、ステップＳ１６で行った評価の結果、１つに確定することができたか否かを判断し、確定できた場合（ＹＥＳの場合）はステップＳ１３に進み、確定できなった場合（ＮＯの場合）はステップＳ１８に進む。本ステップでは、例えばステップＳ１６の結果、同じシーンで使用されているフレーズを選択して１つに確定する。 Next, in step S17, it is determined whether or not the result of the evaluation performed in step S16 has been confirmed to one, and if it can be confirmed (in the case of YES), the process proceeds to step S13 and cannot be confirmed. In the case (NO), the process proceeds to step S18. In this step, for example, as a result of step S16, phrases used in the same scene are selected and confirmed as one.

次に、ステップＳ１８において、ステップＳ１２〜Ｓ１７で１つに確定することができなかったのでスコアが最も高いフレーズを判定語として確定してステップＳ１９に進む。 Next, in step S18, the phrase having the highest score is determined as a determination word since it cannot be determined as one in steps S12 to S17, and the process proceeds to step S19.

次に、ステップＳ１９において、ステップＳ５、Ｓ７、Ｓ１３、Ｓ１８で確定した判定語を音声認識エンジン１２、２２に学習させてステップＳ２０に進む。この学習は音声認識エンジン１２に限らず、音声認識エンジン２２にも行わせるため、判定語の情報を近距離無線通信部１５を介してスマートフォン２０にも送信する。 Next, in step S19, the speech recognition engines 12 and 22 learn the determination words determined in steps S5, S7, S13, and S18, and the process proceeds to step S20. Since this learning is performed not only by the voice recognition engine 12 but also by the voice recognition engine 22, the determination word information is also transmitted to the smartphone 20 via the short-range wireless communication unit 15.

次に、ステップＳ２０において、判定語に基づいてコマンドを実行する。つまり、当該判定語をカーナビゲーションシステム１０の操作コマンドとして解釈して処理を実行する。また、ステップＳ１０を実行してスマートフォン２０から近距離無線通信部１５が連携結果を取得した場合は、その結果に基づいて地点検索を行ったり、その内容をそのまま表示するといったことを行ってもよい。さらに、判定語が操作コマンドとして解釈できない場合は、エラーである旨を表示部１７に表示したり、再度の入力を促してフローチャートを先頭からやり直すようにしてもよい。 Next, in step S20, a command is executed based on the determination word. That is, the determination word is interpreted as an operation command of the car navigation system 10 and the process is executed. In addition, when the short-range wireless communication unit 15 acquires the cooperation result from the smartphone 20 by executing Step S10, a point search may be performed based on the result, or the content may be displayed as it is. . Further, when the determination word cannot be interpreted as an operation command, an error message may be displayed on the display unit 17 or the flowchart may be restarted from the beginning by prompting for another input.

以上の説明から明らかなように、ステップＳ４〜Ｓ２０は、ステップＳ１で取得したパラメータおよびステップＳ２で取得したパラメータに基づいて、音声認識エンジン１２が出力したフレーズまたは音声認識エンジン２２が出力したフレーズのいずれか一方を選択し、選択されたフレーズに基づいた情報処理を処理部に実行させる制御工程として機能している。 As is clear from the above description, steps S4 to S20 are performed based on the parameter acquired in step S1 and the parameter acquired in step S2, or the phrase output by the speech recognition engine 12 or the phrase output by the speech recognition engine 22. It functions as a control step of selecting either one and causing the processing unit to execute information processing based on the selected phrase.

なお、ステップＳ１１、Ｓ１４、Ｓ１６に示した動作は、この順序で行うに限らない。また、これら３つの動作を全て行わず、１つまたは２つのみを行うようにしてもよい。 The operations shown in steps S11, S14, and S16 are not limited to this order. Alternatively, not all three operations may be performed, but only one or two may be performed.

ここで、具体例を図３に示したフローチャートに沿って説明する。例えば、利用者が「そば（蕎麦）好き」と発音した場合に、カーナビゲーションシステム１０の音声認識エンジン１２が音圧情報が８７で「相馬市」と判定し、他の候補として「茂原市」を挙げ、スマートフォン２０の音声認識エンジン２２が音圧情報が７８で「ソファーに」と判定し、他の候補として「そば好き」、「相馬市」、を挙げたとする。そして、それぞれの他の候補まで含めたスコアとアルゴリズム判定は、図４に示したとおりとする。図４に示しように、図２に示したフローチャートは、カーナビゲーションシステム１０とスマートフォン２０それぞれについて１つずつのフレーズで比較するに限らず、それぞれ複数の候補で比較してもよい。 Here, a specific example will be described along the flowchart shown in FIG. For example, when the user pronounces “I like soba (buckwheat noodle)”, the speech recognition engine 12 of the car navigation system 10 determines that the sound pressure information is 87 and “Soma City”, and “Mobara City” as another candidate. The speech recognition engine 22 of the smartphone 20 determines that the sound pressure information is 78 and “is on the sofa”, and other candidates are “Soba-like” and “Soma City”. And the score and algorithm determination including each other candidate are as shown in FIG. As shown in FIG. 4, the flowchart shown in FIG. 2 is not limited to comparing one phrase for each of the car navigation system 10 and the smartphone 20, but may be compared using a plurality of candidates.

このとき、ステップＳ４では、音声認識エンジン１２が判定した「相馬市」や他の候補である「茂原市」も音圧情報８５以上、スコア９２以上の閾値を満たすことができない。そのため、ステップＳ６を実行するが、音声認識エンジン２２が判定した「ソファーに」や他の候補である「そば好き」、「相馬市」も音圧情報８２以上、スコア９６以上の閾値を満たすことができない。 At this time, in step S4, “Soma City” determined by the speech recognition engine 12 and “Mobara City” as another candidate cannot satisfy the threshold values of the sound pressure information 85 or more and the score 92 or more. Therefore, although step S6 is executed, “to the sofa” determined by the speech recognition engine 22 and “Soba-like” and “Soma-shi” as other candidates also satisfy the threshold values of the sound pressure information 82 or more and the score 96 or more. I can't.

そこで、ステップＳ１１で（１）式と（２）式を計算して評価し、ステップＳ１２で判断する。このときアルゴリズム判定は◎や○などを適宜点数に換算して計算する。計算の結果、例えば、「ソファーに」が７８、「そば好き」が７６、「相馬市」が７３、「茂原市」が４１とする。そして、最高点数の候補と、その候補から８点以内の候補として、「ソファーに」、「そば好き」、「相馬市」が抽出されるが１つには確定できない。なお、「相馬市」はカーナビゲーションシステム１０とスマートフォン２０の双方の候補に挙げられているが、以降の判断は上記式の計算結果が大きい値となった方、例えばカーナビゲーションシステム１０の結果に基づいて判断するものとする。あるいはこのフローチャートを実行するＣＰＵ１３が設けられている音声認識エンジン１２を優先としてもよい。 Therefore, the equations (1) and (2) are calculated and evaluated in step S11, and the determination is made in step S12. At this time, algorithm determination is performed by appropriately converting ◎ and ○ into points. As a result of the calculation, for example, “Sofa” is 78, “Soba-like” is 76, “Soma City” is 73, and “Mobara City” is 41. Then, “to the sofa”, “soba noodles”, and “Soma City” are extracted as candidates with the highest score and up to 8 candidates from that candidate, but cannot be determined as one. “Soma City” is listed as a candidate for both the car navigation system 10 and the smartphone 20, but the subsequent judgment is based on the result of the above calculation formula, for example, the result of the car navigation system 10. Judgment shall be made based on this. Or it is good also considering the voice recognition engine 12 with which CPU13 which performs this flowchart is provided as priority.

そして、ステップＳ１４でカーナビゲーションシステム１０とスマートフォン２０それぞれで過去の使用履歴による評価をし、ステップＳ１５で判断する。「ソファーに」、「そば好き」、「相馬市」の使用履歴（使用回数）は図５に示したとおりとする。ここで、カーナビゲーションシステム１０とスマートフォン２０の両方で履歴があるもの（回数が１以上）を抽出する。この場合、「そば好き」と「相馬市」が抽出されるが１つには確定できない。なお、両方で履歴があるものでなく、回数が何回以上や最高回数との差がいくつ以上などで絞ってもよい。また、いずれの候補も０回の場合は、全ての候補（「ソファーに」、「そば好き」、「相馬市」）について次の演算（ステップＳ１６）を行う。 In step S14, the car navigation system 10 and the smartphone 20 are evaluated based on past usage histories, and the determination is made in step S15. The usage history (number of times of use) of “on the sofa”, “soba noodles”, and “Soma City” is as shown in FIG. Here, what has a history in both the car navigation system 10 and the smartphone 20 (the number of times is 1 or more) is extracted. In this case, “Soba noodles” and “Soma City” are extracted but cannot be determined as one. It should be noted that the number of times may be narrowed by the number of times or more, or the difference from the maximum number, or the like, without having a history in both. If all candidates are 0, the following calculation (step S16) is performed for all candidates (“sofa”, “soba noodles”, “Soma city”).

そして、ステップＳ１６で過去にそのフレーズが使用された状況に基づいて評価し、ステップＳ１７で判断する。これは上述したように、時間帯や季節、天候等の状況（ステータス）に基づいて一番該当するものを選択する。つまり、過去に使用された状況と今回の状況から類似するものを選択する。類似の判断は、例えば、３つのステータスのうち２つ以上一致で類似とするなどとすればよい。 In step S16, evaluation is performed based on the situation in which the phrase has been used in the past, and determination is made in step S17. As described above, the most appropriate one is selected based on the situation (status) such as time zone, season, and weather. That is, a situation similar to the situation used in the past and the situation this time is selected. The similarity determination may be, for example, that two or more of the three statuses are coincident and similar.

ステップＳ１７の結果「そば好き」の過去に使用された状況が今回と類似する場合は「そば好き」が選択され、ステップＳ１３で判定語と決定される。また、「そば好き」と「相馬市」のいずれも類似に該当しない場合は、ステップＳ１８を実行してスコアが最も高い「そば好き」が選択される。 As a result of step S17, if the situation of “soba noodles” used in the past is similar to this time, “soba noodles” is selected, and the determination word is determined in step S13. If neither “Soba noodle lover” nor “Soma City” corresponds to the similarity, step S18 is executed, and “Soba noodle lover” having the highest score is selected.

次に、ステップＳ６で、「そば好き」が閾値を満たした場合を説明する。この場合、ステップＳ７で「そば好き」が判定語として決定され、ステップＳ８でナビコマンドにあるか否かが判断される。「そば好き」はカーナビゲーションシステム１０を操作するためのコマンドには無いので、ステップＳ９でスマートフォン２０の連携動作が行われる。スマートフォン２０では「そば好き」に関連する検索がインターネット等を利用して行われ、例えばレストランや有名店、そば打ち体験イベントの名称や所在地の情報が得られたとすると、それらの情報を連携結果としてスマートフォン２０が送信することで、カーナビゲーションシステム１０が取得する（ステップＳ１０）。 Next, the case where “Soba noodles” satisfy the threshold in step S6 will be described. In this case, “Soba noodle” is determined as a determination word in step S7, and it is determined whether or not the navigation command is in step S8. “Soba noodles” is not included in the command for operating the car navigation system 10, so the cooperative operation of the smartphone 20 is performed in step S 9. On the smartphone 20, a search related to “soba noodles” is performed using the Internet or the like. For example, if information on the name and location of a restaurant, a famous store, a soba noodle-making event is obtained, the information is used as a linkage result. The car navigation system 10 acquires by the smart phone 20 transmitting (step S10).

ステップＳ１０で得られた情報は、ステップＳ１９で判定語（「そば好き」）の学習後、ステップＳ２０で利用される。例えば、ステップＳ１０で得られたレストランやそば打ち体験イベント場等の名称や所在地等の情報に基づいて地点情報として登録したり、目的地として設定するか尋ねたり、地図上に表示したりする。つまり、この場合のステップＳ２０におけるコマンド実行とは判定語を操作コマンドとして解釈するのではなく、得られた情報に基づいて、任意のコマンドを選択して実行することとなる。 The information obtained in step S10 is used in step S20 after learning of the determination word (“noodles like”) in step S19. For example, it is registered as point information based on information such as the name and location of the restaurant and soba experience event hall obtained in step S10, whether to set as a destination, or displayed on a map. In other words, the command execution in step S20 in this case does not interpret the determination word as an operation command, but selects and executes an arbitrary command based on the obtained information.

本実施例によれば、カーナビゲーションシステム１０のＣＰＵ１３が、利用者が発話した音声を音声認識する音声認識エンジン１２が認識したフレーズと、音圧情報、スコア、アルゴリズム判定からなるパラメータと、を取得し、さらに、同じ音声を音声認識したスマートフォン２０の音声認識エンジン２２が認識しやフレーズと、音圧情報、スコア、アルゴリズム判定からなるパラメータと、を近距離無線通信部１５を介して取得する。そして、音声認識エンジン１２のパラメータと音声認識エンジン２２のパラメータに基づいて、音声認識エンジン１２の認識結果と音声認識エンジン２２の認識結果のいずれか一方を選択してコマンドとして実行させている。このようにすることにより、２つの音声認識エンジンの結果から選択することができるので、単独で音声認識を行う以上の精度で音声認識をすることができる。また、カーナビゲーションシステム１０とスマートフォン２０とで異なるアルゴリズムや辞書を持っているために、様々な環境に合った認識結果を得ることができる。したがって、適切な認識結果を得ることができる。 According to the present embodiment, the CPU 13 of the car navigation system 10 acquires a phrase recognized by the voice recognition engine 12 that recognizes voice spoken by the user, and parameters including sound pressure information, score, and algorithm determination. In addition, the speech recognition engine 22 of the smartphone 20 that recognizes the same speech recognizes the phrase and the parameters including the sound pressure information, the score, and the algorithm determination via the short-range wireless communication unit 15. Based on the parameters of the speech recognition engine 12 and the parameters of the speech recognition engine 22, either the recognition result of the speech recognition engine 12 or the recognition result of the speech recognition engine 22 is selected and executed as a command. By doing so, it is possible to select from the results of the two speech recognition engines, so that speech recognition can be performed with higher accuracy than performing speech recognition alone. In addition, since the car navigation system 10 and the smartphone 20 have different algorithms and dictionaries, recognition results suitable for various environments can be obtained. Therefore, an appropriate recognition result can be obtained.

また、音声認識エンジン１２のパラメータが閾値以上であった場合はカーナビゲーションシステム１０（音声認識エンジン１２）が認識したフレーズを判定語として選択しているので、カーナビゲーションシステム１０の認識した結果を優先的に利用することができる。 If the parameter of the speech recognition engine 12 is equal to or greater than the threshold value, the phrase recognized by the car navigation system 10 (speech recognition engine 12) is selected as a determination word, so the result recognized by the car navigation system 10 is given priority. Can be used.

また、音声認識エンジン１２のパラメータが閾値未満かつ、スマートフォン２０（音声認識エンジン２２）のパラメータが閾値以上である場合は、スマートフォン２０が認識したフレーズを判定語として選択しているので、カーナビゲーションシステム１０の認識した結果の信頼性が低く利用に適さない可能性が高い場合にスマートフォン２０の認識した結果を利用することができる。 When the parameter of the speech recognition engine 12 is less than the threshold and the parameter of the smartphone 20 (speech recognition engine 22) is greater than or equal to the threshold, the phrase recognized by the smartphone 20 is selected as a determination word, so the car navigation system The result recognized by the smartphone 20 can be used when the reliability of the recognized result of 10 is low and the possibility of being unsuitable for use is high.

また、音声認識エンジン１２のパラメータが閾値未満かつ、スマートフォン２０（音声認識エンジン２２）のパラメータが閾値以上である場合で、スマートフォン２０の認識結果がナビコマンドに無い場合は、スマートフォン２０に連携動作を行わせ、その結果を取得して、ＣＰＵ１３内で処理を行っている。このようにすることにより、スマートフォン２０の認識した結果の信頼性は高いが、そのフレーズがカーナビゲーションシステム１０を操作するためのコマンドではない場合に、そのフレーズに関連する情報を得て動作させることができる。 When the parameter of the voice recognition engine 12 is less than the threshold and the parameter of the smartphone 20 (voice recognition engine 22) is greater than or equal to the threshold, and the recognition result of the smartphone 20 is not included in the navigation command, the smartphone 20 performs a cooperative operation. The result is acquired, and the processing is performed in the CPU 13. By doing in this way, although the reliability of the result recognized by the smartphone 20 is high, when the phrase is not a command for operating the car navigation system 10, information related to the phrase is obtained and operated. Can do.

また、判定語が決定した後に、カーナビゲーションシステム１０とスマートフォン２０に決定した判定語について学習させているので、双方の音声認識エンジンに認識結果を共有させて以降の音声認識の精度を向上させることができる。この場合、カーナビゲーションシステム１０においては、これまで知り得なかった結果を学習することができ次回以降のスコア精度の向上や辞書の語彙の増加といった効果が期待できる。例えば新語や流行語などをタイムリーに学習させることができる。また、スマートフォン２０においては、ナビコマンドを学習することができるので更なる音声認識精度の向上を図ることができる。 Moreover, since the determination word determined by the car navigation system 10 and the smartphone 20 is learned after the determination word is determined, both voice recognition engines can share the recognition result to improve the accuracy of subsequent voice recognition. Can do. In this case, the car navigation system 10 can learn results that could not be known so far, and can be expected to improve the score accuracy and increase the dictionary vocabulary from the next time. For example, new words and buzzwords can be learned in a timely manner. Further, since the smartphone 20 can learn the navigation command, the speech recognition accuracy can be further improved.

また、音声認識エンジン１２のパラメータが閾値未満かつ、音声認識エンジン２２のパラメータが閾値未満の場合は、（１）式および（２）式による評価や、過去の使用履歴による評価、過去の使用情報による評価などにより判定語を決定しているので、音圧情報やスコアおよびアルゴリズム判定の結果で判定語を決定できない場合でも判定語を決定することができる。 When the parameters of the speech recognition engine 12 are less than the threshold and the parameters of the speech recognition engine 22 are less than the threshold, the evaluation based on the expressions (1) and (2), the evaluation based on the past use history, the past use information Since the determination word is determined by evaluation or the like, the determination word can be determined even when the determination word cannot be determined based on the sound pressure information, the score, and the algorithm determination result.

また、音声認識エンジン１２と、ＣＰＵ１３と、近距離無線通信部１５と、を一体的に備えているので、音声認識エンジン２２を有するスマートフォン２０と連携させることで、カーナビゲーションシステム１０において適切な認識結果を得ることができる。 In addition, since the voice recognition engine 12, the CPU 13, and the short-range wireless communication unit 15 are integrally provided, appropriate recognition is performed in the car navigation system 10 by linking with the smartphone 20 having the voice recognition engine 22. The result can be obtained.

なお、図３に示したフローチャートでは、ステップＳ７でスマートフォン２０の認識結果を判定語とした後にステップＳ８でナビコマンドか否かを判断していたが、このような判断を行わず、ステップＳ７で判定語として決定したらそのままステップＳ１９を実行するようにしてもよい。 In the flowchart shown in FIG. 3, it is determined whether or not the navigation command is a navigation command in step S8 after the recognition result of the smartphone 20 is used as a determination word in step S7. If it is determined as a determination word, step S19 may be executed as it is.

また、図１や図２に示した構成ではカーナビゲーションシステム１０とスマートフォン２０はそれぞれのマイク１１、２１に音声が入力されていたが、例えば、カーナビゲーションシステム１０のマイク１１に入力した音声を音声信号に変換した後にスマートフォン２０に送信し、スマートフォン２０はその音声信号に基づいて音声認識を行ってもよい。この場合、音圧情報はパラメータとして利用できなくなるが、スコアやアルゴリズム判定は異なることが多いので、これらの情報のみで判定することが可能である。即ち、入力部は１つであってもよい。 In the configuration shown in FIGS. 1 and 2, the car navigation system 10 and the smartphone 20 have voices input to the microphones 11 and 21, respectively. For example, the voice input to the microphone 11 of the car navigation system 10 is voiced. After converting into a signal, it transmits to the smart phone 20, and the smart phone 20 may perform voice recognition based on the audio | voice signal. In this case, the sound pressure information cannot be used as a parameter, but since the score and algorithm determination are often different, it is possible to determine only with such information. That is, there may be one input unit.

また、図３のフローチャートをコンピュータで実行可能なプログラムとして構成することで、情報制御装置の制御プログラムとして構成することができる。 Further, by configuring the flowchart of FIG. 3 as a program executable by a computer, it can be configured as a control program of the information control apparatus.

また、上述した実施例ではカーナビゲーションシステム１０が主となって動作する例であったが、スマートフォン２０が主になってもよい。また、カーナビゲーションシステム１０やスマートフォン２０に限らず、パーソナルコンピュータや音声で操作可能な家電機器など他の情報処理装置に適用してもよい。 In the above-described embodiment, the car navigation system 10 is mainly operated, but the smartphone 20 may be mainly used. Moreover, you may apply not only to the car navigation system 10 or the smart phone 20, but to other information processing apparatuses, such as a personal computer and the household appliances which can be operate | moved with an audio | voice.

また、本発明は上記実施例に限定されるものではない。即ち、当業者は、従来公知の知見に従い、本発明の骨子を逸脱しない範囲で種々変形して実施することができる。かかる変形によってもなお本発明の情報処理装置の構成を具備する限り、勿論、本発明の範疇に含まれるものである。 Further, the present invention is not limited to the above embodiment. That is, those skilled in the art can implement various modifications in accordance with conventionally known knowledge without departing from the scope of the present invention. Of course, such modifications are included in the scope of the present invention as long as the configuration of the information processing apparatus of the present invention is provided.

１０カーナビゲーションシステム（情報処理装置）
１１マイク（入力部）
１２音声認識エンジン（第１音声認識部）
１３ＣＰＵ（制御部、第１取得部、処理部）
１５近距離無線通信部（第２取得部）
２０スマートフォン
２１マイク（入力部）
２２音声認識エンジン（第２音声認識部）
Ｓ１ナビ側で音声認識（第１取得工程）
Ｓ２スマートフォン側で音声認識（第２取得工程）
Ｓ４〜Ｓ２０ナビかスマートフォンのいずれか一方を選択してコマンド実行する（制御工程） 10 Car navigation system (information processing equipment)
11 Microphone (input unit)
12 Speech recognition engine (first speech recognition unit)
13 CPU (control unit, first acquisition unit, processing unit)
15 Short-range wireless communication unit (second acquisition unit)
20 Smartphone 21 Microphone (input unit)
22 Voice recognition engine (second voice recognition unit)
S1 Voice recognition on the navigation side (first acquisition process)
S2 Voice recognition on the smartphone side (second acquisition process)
S4 ~ S20 Select either Navi or Smartphone and execute command (control process)

Claims

A first acquisition unit that acquires first speech recognition result information and first speech recognition processing information from a first speech recognition unit that recognizes speech;
A second acquisition unit that acquires second voice recognition result information and second voice recognition processing information from a second voice recognition unit that recognizes the voice;
Based on the first speech recognition processing information and the second speech recognition processing information, either the first speech recognition result information or the second speech recognition result information is selected, and the selected first speech recognition result information or A control unit that causes the processing unit to execute processing related to the second speech recognition result information;
An information processing apparatus comprising: