JP2019086643A

JP2019086643A - Voice recognition system

Info

Publication number: JP2019086643A
Application number: JP2017214359A
Authority: JP
Inventors: 昭寛四家; Akihiro Yotsuya
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2017-11-07
Filing date: 2017-11-07
Publication date: 2019-06-06
Anticipated expiration: 2037-11-07
Also published as: JP6996944B2

Abstract

To provide a "voice recognition system" which always receives an urgent voice input for a specific function, while selectively using plural pieces of voice recognition means used for voice input for different functions.SOLUTION: Normally, when a voice recognition device 102 performs continuous voice recognition and inputs a command of voice recognition result to a data processing device 109 and an operation of a talk switch 107 occurs, voice recognition is started using a voice recognition server by a portable device 2. While the voice recognition by the portable device 2 is being performed, basically, the output of the command of the voice recognition result by the voice recognition device 102 to the data processing device 109 is stopped, but when the recognition result by the voice recognition device 102 is a command requesting execution of an urgent process registered in a priority word table 105, the command recognized by the voice recognition device 102 is output to the data processing device 109.SELECTED DRAWING: Figure 1

Description

本発明は、ユーザの発話音声を認識する音声認識の技術に関するものである。 The present invention relates to a technology of voice recognition that recognizes a user's uttered voice.

ユーザの発話音声を認識する音声認識の技術としては、ユーザの音声認識開始指示操作を必要とすることなく、常時、ユーザの発話音声を認識する技術が知られている（たとえば、特許文献１）。 As a speech recognition technology for recognizing a user's speech, there is known a technology for always recognizing a user's speech without requiring the user's speech recognition start instruction operation (for example, Patent Document 1). .

また、ユーザの発話音声を認識する音声認識の技術としては、端末において、自身が備えた音声認識装置によってユーザの発話音声の音声認識を行うと共に、外部の音声認識サーバを用いたユーザの発話音声の音声認識を行い、いずれか一方の音声認識よって得られた音声認識結果を利用する技術が知られている（たとえば、特許文献２、３）。 In addition, as a technology for voice recognition that recognizes the user's voice, the terminal performs voice recognition of the user's voice by a voice recognition device provided by the user, and the user's voice using an external voice recognition server There is known a technology for performing speech recognition and using the speech recognition result obtained by any one of the speech recognition (for example, Patent Documents 2 and 3).

国際公開第２０１４/０６８７８８号International Publication No. 2014/0687888 特開２０１５-１０２７９５号Unexamined-Japanese-Patent No. 2015-102795 特開２０１３-０６４７７７号JP 2013-064777

端末において、自身が備えた音声認識装置によるユーザの発話音声の音声認識と、外部の音声認識サーバを用いたユーザの発話音声の音声認識とを、異なる機能のコマンド入力に使用する場合、並行して双方の音声認識を行うことは、ユーザが発話音声によって入力したコマンドが、ユーザがコマンドの入力を意図した機能と異なる機能に対するコマンドとして入力されてしまう可能性が生じるため適切ではない。 In the terminal, when speech recognition of the user's speech by the speech recognition device provided by itself and speech recognition of the user's speech using an external speech recognition server are used for command input of different functions, It is not appropriate to perform both voice recognition because a command input by the user through speech is likely to be input as a command for a function different from the function intended by the user.

そこで、自身が備えた音声認識装置によるユーザの発話音声の音声認識と、外部の音声認識サーバを用いたユーザの発話音声の音声認識とは、選択的に切り替えて行うことが望ましい。 Therefore, it is desirable to selectively switch between speech recognition of the user's speech by the speech recognition apparatus provided by the user and speech recognition of the user's speech using an external speech recognition server.

一方で、自身が備えた音声認識装置による音声認識をユーザの音声認識開始指示操作を必要とすることなく常時行うようにした場合において、自身が備えた音声認識装置によるユーザの発話音声の音声認識と、外部の音声認識サーバを用いたユーザの発話音声の音声認識とを、選択的に切り替えて行うようにすると次のような問題が生じる。 On the other hand, in the case where voice recognition by the voice recognition device provided by itself is always performed without requiring the user's voice recognition start instruction operation, voice recognition of the user's uttered voice by the voice recognition device provided by itself is performed. If the voice recognition of the user's uttered voice using the external voice recognition server is selectively switched and performed, the following problems occur.

すなわち、この場合には、自身が備えた音声認識装置による音声認識のみをユーザの音声認識開始指示操作を必要とすることなく常時行う第１のモードと、外部の音声認識サーバのみを用いたユーザの発話音声の音声認識行う第２のモードを設け、通常は、第１のモードで音声認識を行い、ユーザの操作に応じて、一時的に、第２のモードで音声認識を行い、音声認識を終了したならば、第１のモードに復帰することが考えられるが、このようにすると、第２のモードの期間中に、自身が備えた音声認識装置による音声認識をコマンドの入力に用いる機能に対して緊急を要するコマンドを入力する必要が生じた場合でも、第１のモードによる音声認識の音声認識開始指示操作が存在しないために、第２のモードから第１のモードに強制的に切り替えることができず、当該緊急を要するコマンドの入力が行えなくなってしまう。 That is, in this case, the user using only the external voice recognition server and the first mode in which the voice recognition by its own voice recognition device is always performed without the need for the user's voice recognition start instruction operation. A second mode for speech recognition of uttered speech is provided. Usually, speech recognition is performed in the first mode, and speech recognition is temporarily performed in the second mode according to the user's operation, and speech recognition is performed. It is possible to return to the first mode if you have finished, but in this way, during the second mode, a function that uses speech recognition by the speech recognition device that it has for input of commands. Even if it is necessary to input an urgently required command, the second mode is forcibly switched to the first mode because there is no speech recognition start instruction operation for speech recognition in the first mode. Can not be replaced, input of commands that require the emergency becomes impossible.

一方で、第２のモードの期間中も、自身が備えた音声認識装置による音声認識を行うものとすれば、上述のように、ユーザが発話音声によって入力したコマンドが、ユーザがコマンドの入力を意図した機能と異なる機能に対するコマンドとして入力されてしまうことがある。 On the other hand, even during the second mode, assuming that voice recognition is performed by the voice recognition device provided by the user, as described above, the command input by the user by the uttered voice causes the user to input the command. It may be input as a command for a function different from the intended function.

そこで、本発明は、音声認識をユーザの音声認識開始指示操作を必要とすることなく常時行う第１の音声認識手段と、第２の音声認識手段とを、異なる機能に対する入力に用いる音声認識システムにおいて、各機能に対する誤入力の抑制しつつ、常時、第１の音声認識手段の音声認識を用いた緊急を要する入力を行えるようにすることを課題とする。 Therefore, the present invention is a voice recognition system using the first voice recognition means that always performs voice recognition without requiring a user's voice recognition start instruction operation and the second voice recognition means for input to different functions. It is an object of the present invention to make it possible to always perform an urgent input using voice recognition of the first voice recognition means while suppressing erroneous input to each function.

前記課題達成のために、本発明は、ユーザの発話した音声を音声認識する音声認識システムに、マイクロフォンと、前記マイクロフォンで収音した音声が表すコマンドの常時の音声認識を行う第１の音声認識手段と、入力したコマンドが実行を命令する処理を実行する第１の機能部と、前記マイクロフォンで収音した音声を音声認識する音声認識動作を行う第２の音声認識手段と、前記第２の音声認識装置の音声認識結果を処理する第２の機能部と、前記第１の音声認識手段が音声認識するコマンドのうちの一部のコマンドを優先コマンドとして登録した優先コマンド記憶手段と、音声認識動作制御手段と設けたものである。ここで、当該音声認識動作制御手段は、音声認識モードとして備えた第１の音声認識モードと第２の音声認識モードとの間の切り替えを制御すると共に、前記音声認識モードが前記第１の音声認識モードであるときに、前記第２の音声認識手段の前記音声認識動作を停止すると共に、第１の音声認識手段が音声認識したコマンドを前記第１の機能部に入力し、前記音声認識モードが前記第２の音声認識モードであるときに、前記第２の音声認識手段に前記音声認識動作を実行させると共に、前記第１の音声認識手段が音声認識したコマンドが、前記優先コマンド記憶手段に登録された優先コマンドである場合にのみ、当該第１の音声認識手段が音声認識したコマンドを前記第１の機能部に入力する。 In order to achieve the above task, the present invention provides a voice recognition system for voice recognition of voice uttered by a user, the first voice recognition for performing continuous voice recognition of a microphone and a command represented by the voice collected by the microphone Means, a first functional unit that executes processing for instructing the execution of the input command, second voice recognition means for performing voice recognition operation for voice recognition of voice collected by the microphone, and the second voice recognition means A second functional unit for processing a speech recognition result of the speech recognition device; a priority command storage means in which a part of the commands of the speech recognition means is recognized as a priority command; and the speech recognition It is provided with operation control means. Here, the voice recognition operation control means controls switching between the first voice recognition mode and the second voice recognition mode provided as the voice recognition mode, and the voice recognition mode is the first voice. When in the recognition mode, the voice recognition operation of the second voice recognition means is stopped, and a command recognized by the first voice recognition means is inputted to the first functional unit, and the voice recognition mode is When the second voice recognition mode is in the second voice recognition mode, the second voice recognition means executes the voice recognition operation, and a command recognized by the first voice recognition means is stored in the priority command storage means. Only in the case of the registered priority command, the first voice recognition means inputs the voice-recognized command to the first functional unit.

ここで、このような音声認識システムは、前記音声認識動作制御手段において、前記第１の音声認識モードにあるときに、ユーザからの所定の入力が発生したときに、前記音声認識モードを前記第２の音声認識モードに切り替え、前記第２の音声認識モードにあるときに、前記第２の音声認識手段の前記音声認識動作が完了したときに、前記音声認識モードを前記第１の音声認識モードに切り替えるように構成してもよい。 Here, in the voice recognition system, when the voice recognition operation control unit generates a predetermined input from the user when the voice recognition mode is in the first voice recognition mode, the voice recognition mode is set to the second voice recognition mode. Switching to the second voice recognition mode, and when the voice recognition operation of the second voice recognition unit is completed when the second voice recognition mode is in the first voice recognition mode. It may be configured to switch to

また、このような音声認識システムは、前記第２の音声認識手段を、前記音声認識動作として、音声認識サービスを提供する外部の音声認識サーバに通信を介して接続し、接続した音声認識サーバの音声認識サービスを利用して、前記マイクロフォンで収音した音声の音声認識を行うものとしてもよい。 In addition, such a speech recognition system is connected to an external speech recognition server providing speech recognition service via the communication as the speech recognition operation, and the second speech recognition unit is connected to the speech recognition server. The voice recognition service may be used to perform voice recognition of voice collected by the microphone.

また、このような音声認識システムは、当該音声認識システムは、自動車に搭載されたシステムにおいて音声入力に用いられる音声認識システムであってよい。
また、このような音声認識システムを、自動車に搭載された車載システムと、当該車載システムと選択的に接続されるポータブル装置とより構成し、前記車載システムに、前記マイクロフォンと前記第１の音声認識手段と前記第１の機能部と前記優先コマンド記憶手段と前記音声認識動作制御手段とを設け、前記ポータブル装置に、前記第２の音声認識手段と前記第２の機能部を設け、前記音声認識動作制御手段において、前記音声認識モードが前記第２の音声認識モードにあるときに、前記マイクロフォンで収音した音声を前記車載システムから前記ポータブル装置に転送し、前記第２の音声認識手段において、前記車載システムから前記ポータブル装置に転送された音声を音声認識する音声認識動作を行うように構成してもよい。 In addition, in such a speech recognition system, the speech recognition system may be a speech recognition system used for speech input in a system mounted on a car.
Further, such a voice recognition system is constituted by an on-vehicle system mounted on a vehicle and a portable device selectively connected to the on-vehicle system, and the on-vehicle system comprises the microphone and the first voice recognition. Means, the first function unit, the priority command storage means, and the voice recognition operation control means, and the portable device is provided with the second voice recognition means and the second function portion; In the operation control means, when the voice recognition mode is in the second voice recognition mode, the voice collected by the microphone is transferred from the in-vehicle system to the portable device, and in the second voice recognition means, The system may be configured to perform voice recognition operation for voice recognition of voice transferred from the on-vehicle system to the portable device.

また、以上の各音声認識システムは、前記優先コマンド記憶手段を、前記第１の音声認識手段が音声認識するコマンドのうちの、当該コマンドが実行を命令する処理が前記自動車の安全確保に関わる処理であるコマンドが、少なくとも、優先コマンドとして登録されているものとしてもよい。 Further, in each of the voice recognition systems described above, among the commands in which the first voice recognition means recognizes the voice of the priority command storage means, the process in which the command instructs execution of the command relates to securing the safety of the vehicle. The command that is may be registered at least as a priority command.

また、以上の各音声認識システムは、前記優先コマンド記憶手段を、前記第１の音声認識手段が音声認識するコマンドのうちの、当該コマンドを表す音声を音声認識した音声認識結果に対する有意な処理が当該時点において前記第２の機能部に規定されていないコマンドが、少なくとも優先コマンドとして登録されているものとしてもよい。 In each of the speech recognition systems described above, the priority command storage means is a significant process for the speech recognition result obtained by speech recognition of speech representing the command among the speech recognition means of the first speech recognition means. A command not defined in the second function unit at the time may be registered at least as a priority command.

以上のような音声認識システムによれば、第１の音声認識モードで、第１の音声認識手段による第１の機能部へのコマンド入力を行っているときには、２の音声認識手段による音声認識は停止すると共に、第２の音声認識モードで、第２の音声認識手段による第２の機能部への音声入力を行っているときには、基本的には、常時の音声認識を行う第１の音声認識手段の音声認識を用いた第１の機能部へのコマンドの入力を停止する。 According to the speech recognition system as described above, when the first speech recognition mode performs command input to the first functional unit by the first speech recognition means, the speech recognition by the two speech recognition means is When stopping and at the same time in the second voice recognition mode and performing voice input to the second functional unit by the second voice recognition means, basically, the first voice recognition for performing constant voice recognition Stopping the input of the command to the first function unit using the speech recognition of the means;

よって、ユーザが第１の機能部にコマンドを入力するために発話した音声を、第２の機能部に音声入力してしまったり、ユーザが第２の機能部に音声入力するために発話した音声を、第１の機能部へのコマンド入力の音声と誤認識して第１の機能部へコマンドを誤入力してしまうことは抑制される。 Therefore, a voice uttered for the user to input a command to the first functional unit may be voice-inputted to the second functional unit, or a voice uttered for the user to voice-input the second functional unit. Is erroneously recognized as the voice of the command input to the first functional unit, and erroneous input of the command to the first functional unit is suppressed.

一方で、第２の音声認識手段による第２の機能部への音声入力を行っているときでも、第１の音声認識手段の音声認識によって、優先コマンド記憶手段に登録された優先コマンドが認識された場合には、これを第１の機能部へ入力させる。 On the other hand, even when the second voice recognition means performs voice input to the second functional unit, the priority command registered in the priority command storage means is recognized by the voice recognition of the first voice recognition means. If this is the case, this is input to the first function unit.

したがって、前記自動車の安全確保に関わる処理の実行を命令するコマンド等の緊急性のある処理の実行を命令するコマンドを優先コマンドとして優先コマンド記憶手段に登録しておくことにより、常時、第１の音声認識手段の音声認識を用いた緊急性のある処理の実行を命令するコマンドの第１の機能部への入力を行えるようになる。 Therefore, the first command is always registered in the priority command storage means as a priority command in which the command instructing the execution of the urgent processing such as the command instructing the execution of the processing relating to securing the safety of the vehicle is registered as the priority command. It becomes possible to input to the first function unit a command instructing execution of an urgent process using the speech recognition of the speech recognition means.

また、コマンドを表す音声を音声認識した音声認識結果に対する有意な処理が当該時点において前記第２の機能部に規定されていない第１の機能部のコマンド、すなわち、第２の機能部に当該コマンドを表す音声が音声入力されてしまっても差し障りのないコマンドを優先コマンドとして優先コマンド記憶手段に登録しておくことにより、これらのコマンドについて、常時、第１の音声認識手段の音声認識を用いたコマンドの第１の機能部への入力を行えるようになる。 In addition, the command of the first functional unit whose significant processing with respect to the speech recognition result of speech recognition of the voice representing the command is not defined in the second functional unit at the time, that is, the command to the second functional unit The voice recognition of the first voice recognition means is always used for these commands by registering in the priority command storage means the command having no fault even if the voice representing the voice has been input as the priority command. The command can be input to the first function unit.

以上のように、本発明によれば、音声認識をユーザの音声認識開始指示操作を必要とすることなく常時行う第１の音声認識手段と、第２の音声認識手段とを、異なる機能に対する入力に用いる音声認識システムにおいて、各機能に対する誤入力の抑制しつつ、常時、第１の音声認識手段の音声認識を用いた緊急を要する入力を行えるようにすることができる。 As described above, according to the present invention, the first voice recognition means and the second voice recognition means, which always perform voice recognition without requiring the user's voice recognition start instruction operation, are input to different functions. In the speech recognition system used in the present invention, it is possible to always perform an urgent input using speech recognition of the first speech recognition means while suppressing erroneous input to each function.

本発明の実施形態に係る情報処理システムの構成を示すブロック図である。It is a block diagram showing composition of an information processing system concerning an embodiment of the present invention. 本発明の実施形態に係る外部音声認識制御処理を示すフローチャートである。It is a flowchart which shows the external speech recognition control processing which concerns on embodiment of this invention. 本発明の実施形態に係る音声認識結果フィルタ処理を示すフローチャートである。It is a flowchart which shows the speech recognition result filter process which concerns on embodiment of this invention. 本発明の実施形態に係る情報処理システムの動作例を示すシーケンス図である。It is a sequence diagram which shows the operation example of the information processing system which concerns on embodiment of this invention.

以下、本発明の実施形態を、自動車において利用される情報処理システムへの適用を例にとり説明する。
図１に情報処理システムの構成を示す。
図示するように、情報処理システムは、自動車に搭載される車載システム１と、車載システム１に選択的に接続されるポータブル装置２とを備えている。
ここで、ポータブル装置２は、たとえば、スマートフォンやタブレット装置などのユーザによって携帯可能な装置である。また、ポータブル装置２は移動通信を介して外部の音声認識サーバ３に接続し、音声認識サーバ３の音声認識サービスを利用して、車載システム１から転送された音声の音声認識を行い、音声認識の結果を、ポータブル装置２に対する音声入力として受け入れ、音声入力に応じた動作を行う機能を備えている。 Hereinafter, an embodiment of the present invention will be described by taking an application to an information processing system used in a car as an example.
FIG. 1 shows the configuration of the information processing system.
As illustrated, the information processing system includes an in-vehicle system 1 mounted in a car and a portable device 2 selectively connected to the in-vehicle system 1.
Here, the portable device 2 is, for example, a device portable by a user such as a smartphone or a tablet device. In addition, the portable device 2 is connected to the external voice recognition server 3 via mobile communication, and performs voice recognition of the voice transferred from the in-vehicle system 1 using the voice recognition service of the voice recognition server 3 for voice recognition Is accepted as an audio input to the portable device 2 and has a function of performing an operation according to the audio input.

次に、車載システム１は、マイクロフォン１０１、音声認識装置１０２、音声認識辞書１０３、音声認識結果フィルタ部１０４、優先ワードテーブル１０５、外部音声認識制御部１０６、トークスイッチ１０７、ポータブル装置２と通信を行う通信インタフェース１０８、データ処理装置１０９、ディスプレイや自動車周辺を撮影するカメラやＡＶ装置や空調装置等の各種の周辺装置１１０を備えている。 Next, the in-vehicle system 1 communicates with the microphone 101, the voice recognition device 102, the voice recognition dictionary 103, the voice recognition result filter unit 104, the priority word table 105, the external voice recognition control unit 106, the talk switch 107, and the portable device 2. A communication interface 108 to perform, a data processing device 109, and various peripheral devices 110 such as a display, a camera for photographing the surroundings of the vehicle, an AV device, an air conditioner, and the like are provided.

そして、音声認識辞書１０３には、データ処理装置１０９のコマンドを表すワードの音声認識用のデータが登録されている。そして、音声認識装置１０２は音声認識辞書１０３を用いて、マイクロフォン１０１から入力するユーザの発話音声がデータ処理装置１０９のコマンドを表す音声である場合に、当該コマンドを音声認識し、音声認識結果として音声認識結果フィルタ部１０４に出力する動作を、ユーザの音声認識開始指示操作をトリガとすることなく常時行う。 Then, in the voice recognition dictionary 103, data for voice recognition of a word representing a command of the data processing device 109 is registered. Then, the voice recognition device 102 uses the voice recognition dictionary 103 to recognize the command when the user's uttered voice input from the microphone 101 is a voice representing a command of the data processing device 109, as a voice recognition result. The operation of outputting to the speech recognition result filter unit 104 is always performed without using the speech recognition start instruction operation of the user as a trigger.

次に、優先ワードテーブル１０５には、音声認識辞書１０３に音声認識用のデータが登録されているワードが表すコマンドのうちの、緊急に処理する必要のあるコマンドと、ポータブル装置２に対する音声入力に現れることがないコマンドが登録されている。そして、音声認識結果フィルタ部１０４は、優先ワードテーブル１０５を用いて、後に詳述する音声認識結果フィルタ処理を行って、音声認識装置１０２が認識した音声認識結果のうちの、所定の条件を満たす音声認識結果のみをデータ処理装置１０９に出力する。 Next, in the priority word table 105, a command that needs to be urgently processed among the commands represented by the words for which the data for voice recognition is registered in the voice recognition dictionary 103 and the voice input to the portable device 2 A command that never appears is registered. Then, the speech recognition result filter unit 104 performs speech recognition result filtering processing, which will be described in detail later, using the priority word table 105 to satisfy a predetermined condition among the speech recognition results recognized by the speech recognition apparatus 102. Only the speech recognition result is output to the data processor 109.

なお、緊急に処理する必要のあるコマンドとは、たとえば、自動車の安全確保に関わる処理の実行をデータ処理装置１０９に命令するコマンドであり、たとえば、データ処理装置１０９に、自動車の後方を撮影するカメラで撮影した画像のディスプレイへの表示を指示するコマンド「バックカメラ」等を、緊急に処理する必要のあるコマンドとすることができる。 The command that needs to be processed urgently is, for example, a command that instructs the data processing apparatus 109 to execute the process related to securing the safety of the vehicle. For example, the data processing apparatus 109 captures the back of the vehicle A command "back camera" or the like that instructs display of an image captured by a camera can be a command that needs to be urgently processed.

また、ポータブル装置２に対する音声入力に現れることがないコマンドとしては、たとえば、当該時点においてポータブル装置２が音声入力をポータブル装置２に対するコマンドの入力に用いている場合には、音声認識辞書１０３に音声認識用のデータが登録されているワードが表すコマンドのうちの、ポータブル装置２が対応していないコマンドとすることができる。 Further, as a command that does not appear in the voice input to the portable device 2, for example, when the portable device 2 uses voice input for inputting a command to the portable device 2 at that time, the voice recognition dictionary 103 Among the commands represented by the word in which the data for recognition is registered, the command may not correspond to the portable device 2.

また、音声認識辞書１０３に音声認識用のデータが登録されているワードが表すコマンドのうちの、ポータブル装置２に対する音声入力に、常識的におよそ現れそうもないコマンドも、ポータブル装置２に対する音声入力に現れることがないコマンドとしてよい。 Further, among commands represented by words for which data for voice recognition is registered in the voice recognition dictionary 103, commands that are unlikely to appear in the voice input to the portable device 2 are also voice input to the portable device 2 It may be a command that never appears in.

ここで、このようなポータブル装置２に対する音声入力に現れることがないコマンドとしては、たとえば、データ処理装置１０９に空調装置の風量の増加処理の実行を指示するコマンド「風量アップ」等がある。 Here, such a command that does not appear in the voice input to the portable device 2 includes, for example, a command “air volume up” that instructs the data processing device 109 to execute the process of increasing the air volume of the air conditioner.

そして、データ処理装置１０９は、音声認識結果フィルタ部１０４から出力されたコマンドに応じた処理を行う。
また、外部音声認識制御部１０６は、後に詳述する外部音声認識制御処理を行って、マイクロフォン１０１から入力するユーザの発話音声を、通信インタフェース１０８を介してポータブル装置２に転送し、ポータブル装置２に、上述した音声認識サーバ３の音声認識サービスを利用した音声認識を行わせる。 Then, the data processing device 109 performs processing according to the command output from the speech recognition result filter unit 104.
Also, the external voice recognition control unit 106 performs external voice recognition control processing, which will be described in detail later, and transfers the user's uttered voice input from the microphone 101 to the portable device 2 via the communication interface 108. The voice recognition using the voice recognition service of the voice recognition server 3 described above is performed.

以下、この外部音声認識制御部１０６が行う外部音声認識制御処理について説明する。
図２に、外部音声認識制御処理の手順を示す。
図示するように、外部音声認識制御処理において、外部音声認識制御部１０６は、トークスイッチ１０７がユーザによってオン操作されるのを監視し（ステップ２０２）、オン操作されたならば外部音声認識モードを設定する（ステップ２０４）。 The external speech recognition control process performed by the external speech recognition control unit 106 will be described below.
FIG. 2 shows the procedure of the external speech recognition control process.
As shown, in the external speech recognition control process, the external speech recognition control unit 106 monitors that the talk switch 107 is turned on by the user (step 202), and the external speech recognition mode is set if it is turned on. It sets (step 204).

そして、通信インタフェース１０８を介してポータブル装置２に音声認識開始コマンドを発行し（ステップ２０６）、マイクロフォン１０１から入力するユーザの発話音声の通信インタフェース１０８を介したポータブル装置２への転送を開始する（ステップ２０８）。 Then, a voice recognition start command is issued to the portable device 2 through the communication interface 108 (step 206), and transfer to the portable device 2 through the communication interface 108 of the user's speech input from the microphone 101 is started ( Step 208).

ここで、ポータブル装置２は、車載システム１から音声認識開始コマンドを受信したならば、車載システム１から転送される、所定時間長以上の無音区間の開始点を終了点とする一連の音声の、音声認識サーバ３の音声認識サービスを利用した音声認識を行う音声認識処理を開始する。 Here, when the portable device 2 receives the voice recognition start command from the in-vehicle system 1, the portable device 2 transfers a series of voices transferred from the in-vehicle system 1 using the start point of the silent section longer than the predetermined time as an end point The voice recognition process for performing voice recognition using the voice recognition service of the voice recognition server 3 is started.

次に、外部音声認識制御部１０６は、ユーザの発話音声の通信インタフェース１０８を介したポータブル装置２への転送を開始したならば（ステップ２０８）、音声認識結果フィルタ部１０４からの外部音声認識停止コマンドの受信（ステップ２１０）の発生と、通信インタフェース１０８を介したポータブル装置２からの音声認識終了通知の受信（ステップ２１２）の発生を監視する。 Next, when the external speech recognition control unit 106 starts transfer of the user's speech to the portable device 2 through the communication interface 108 (step 208), the external speech recognition from the speech recognition result filter unit 104 is stopped. The occurrence of reception of a command (step 210) and occurrence of notification of completion of voice recognition from the portable device 2 via the communication interface 108 (step 212) are monitored.

ここで、ポータブル装置２は、上述した音声認識処理が終了したならば、音声認識終了通知を車載装置に出力する。
そして、外部音声認識制御部１０６は、音声認識結果フィルタ部１０４からの外部音声認識停止コマンドの受信（ステップ２１０）と、ポータブル装置２からの音声認識終了通知の受信（ステップ２１２）とのいずれかが発生したならば、マイクロフォン１０１から入力するユーザの発話音声の通信インタフェース１０８を介したポータブル装置２への転送を停止し（ステップ２１４）、外部音声認識モードを解除する（ステップ２１６）。 Here, when the voice recognition process described above is completed, the portable device 2 outputs a voice recognition end notification to the on-vehicle device.
Then, the external speech recognition control unit 106 either receives an external speech recognition stop command from the speech recognition result filter unit 104 (step 210) or receives a speech recognition end notification from the portable device 2 (step 212). If it occurs, the transfer to the portable device 2 through the communication interface 108 of the user's speech input from the microphone 101 is stopped (step 214), and the external speech recognition mode is released (step 216).

そして、ステップ２０２からの処理に戻る。
以上、外部音声認識制御部１０６が行う外部音声認識制御処理について説明した。
次に、音声認識結果フィルタ部１０４が行う上述の音声認識結果フィルタ処理について説明する。
図３に、音声認識結果フィルタ処理の手順を示す。
図示するように、音声認識結果フィルタ部１０４は音声認識結果フィルタ処理において、音声認識装置１０２からの音声認識結果の入力を待ち（ステップ３０２）、音声認識結果が入力したならば、外部音声認識制御部１０６によって外部音声認識モードが設定されているかどうかを調べる（ステップ３０４）。 Then, the process returns to the process from step 202.
The external speech recognition control process performed by the external speech recognition control unit 106 has been described above.
Next, the above-described speech recognition result filtering process performed by the speech recognition result filter unit 104 will be described.
FIG. 3 shows the procedure of the speech recognition result filtering process.
As shown, the voice recognition result filter unit 104 waits for the input of the voice recognition result from the voice recognition device 102 in the voice recognition result filter processing (step 302), and if the voice recognition result is input, the external voice recognition control is performed. It is checked whether the external speech recognition mode is set by the unit 106 (step 304).

そして、外部音声認識モードが設定されていなければ（ステップ３０４）、入力した音声認識結果をデータ処理装置１０９に出力し（ステップ３１０）、ステップ３０２からの処理に戻る。 Then, if the external speech recognition mode is not set (step 304), the input speech recognition result is output to the data processing apparatus 109 (step 310), and the process returns to the process from step 302.

一方、外部音声認識モードが設定されている場合には（ステップ３０４）、入力した音声認識結果が優先ワードテーブル１０５に登録されているコマンドであるかどうかを調べる（ステップ３０６）。 On the other hand, when the external speech recognition mode is set (step 304), it is checked whether the input speech recognition result is a command registered in the priority word table 105 (step 306).

そして、入力した音声認識結果が優先ワードテーブル１０５に登録されているコマンドでなければ（ステップ３０６）、受信した音声認識結果を廃棄し、そのままステップ３０２からの処理に戻る。 If the input speech recognition result is not a command registered in the priority word table 105 (step 306), the received speech recognition result is discarded, and the process returns to step 302.

一方、そして、受信した音声認識結果が優先ワードテーブル１０５に登録されているコマンドであれば（ステップ３０６）、外部音声認識制御部１０６に外部音声認識停止コマンドを送信した上で（ステップ３０８）、受信した音声認識結果をデータ処理装置１０９に出力し（ステップ３１０）、ステップ３０２からの処理に戻る。 On the other hand, if the received voice recognition result is a command registered in the priority word table 105 (step 306), an external voice recognition stop command is sent to the external voice recognition control unit 106 (step 308), The received speech recognition result is output to the data processor 109 (step 310), and the process returns to step 302.

以上、音声認識結果フィルタ部１０４が行う音声認識結果フィルタ処理について説明した。
ここで、以上のような外部音声認識制御処理と音声認識結果フィルタ処理による音声認識の動作の例を図４に示す。
図示するように、通常、マイクロフォン１０１から入力したユーザの発話した音声は、音声認識装置１０２に送られ（４０１）、音声認識装置１０２において音声認識され、音声認識結果が音声認識結果フィルタ部１０４に送られる（４０２）。そして、音声認識結果フィルタ部１０４は、受け取った音声認識結果をデータ処理装置１０９に出力する（４０３）。 The speech recognition result filtering process performed by the speech recognition result filter unit 104 has been described above.
Here, FIG. 4 shows an example of the operation of speech recognition by the external speech recognition control process and the speech recognition result filtering process as described above.
As shown in the figure, usually, the voice uttered by the user input from the microphone 101 is sent to the voice recognition device 102 (401), voice recognition is performed in the voice recognition device 102, and the voice recognition result is sent to the voice recognition result filter unit 104. Sent (402). Then, the speech recognition result filter unit 104 outputs the received speech recognition result to the data processing device 109 (403).

一方、ユーザがポータブル装置２への音声入力を行うためにトークスイッチ１０７のオン操作を行うと（４１１）、外部音声認識制御部１０６は、外部音声認識モードを設定する（４１２）。 On the other hand, when the user performs the on operation of the talk switch 107 to perform voice input to the portable device 2 (411), the external voice recognition control unit 106 sets the external voice recognition mode (412).

そして、その後、優先ワードテーブル１０５に登録されたコマンドを表すワードではないワードをユーザが発話すると、その音声(非登録ワード音声）は、マイクロフォン１０１から音声認識装置１０２と、外部音声認識制御部１０６に送られる（４１３）。 After that, when the user utters a word that is not a word representing a command registered in the priority word table 105, the voice (non-registered word voice) is transmitted from the microphone 101 to the voice recognition device 102 and the external voice recognition control unit 106. Sent to (413).

音声認識装置１０２は、受け取った音声を音声認識し、音声認識結果を音声認識結果フィルタ部１０４に送る（４１４）。音声認識結果フィルタ部１０４は、受け取った音声認識結果が優先ワードテーブル１０５に登録されたコマンドではないので、音声認識結果をデータ処理装置１０９に出力せずに廃棄する。 The speech recognition device 102 performs speech recognition on the received speech, and sends the speech recognition result to the speech recognition result filter unit 104 (414). Since the received speech recognition result is not a command registered in the priority word table 105, the speech recognition result filter unit 104 discards the speech recognition result without outputting it to the data processing device 109.

一方、外部音声認識制御部１０６は、受け取った音声をポータブル装置２に転送し（４１５）、ポータブル装置２は転送された音声を音声認識サーバ３に送信する（４１６）。
そして、その後、ポータブル装置２が、送信した（４１６）音声の音声認識結果を音声認識サーバ３から受けとる前に、優先ワードテーブル１０５に登録されたコマンドを表すワードをユーザが発話すると、その音声(登録ワード音声）は、マイクロフォン１０１から音声認識装置１０２と、外部音声認識制御部１０６に送られる（４１７）。 On the other hand, the external voice recognition control unit 106 transfers the received voice to the portable device 2 (415), and the portable device 2 transmits the transferred voice to the voice recognition server 3 (416).
Then, when the user speaks a word representing a command registered in the priority word table 105 before the portable device 2 receives the voice recognition result of the transmitted (416) voice from the voice recognition server 3, the voice (see FIG. The registered word voice) is sent from the microphone 101 to the voice recognition device 102 and the external voice recognition control unit 106 (417).

音声認識装置１０２は、受け取った音声を音声認識し、音声認識結果を音声認識結果フィルタ部１０４に送る（４１８）。音声認識結果フィルタ部１０４は、受け取った音声認識結果が優先ワードテーブル１０５に登録されたコマンドであるので、外部音声認識停止コマンドを外部音声認識制御部１０６に発行する（４１９）と共に、音声認識結果をデータ処理装置１０９に出力する（４２０）。 The speech recognition device 102 performs speech recognition on the received speech, and sends the speech recognition result to the speech recognition result filter unit 104 (418). The voice recognition result filter unit 104 issues an external voice recognition stop command to the external voice recognition control unit 106 (419) because the received voice recognition result is a command registered in the priority word table 105 (419). Are output to the data processor 109 (420).

一方、外部音声認識制御部１０６は、受け取った音声をポータブル装置２に転送する（４２１）。ここでは、ポータブル装置２は、音声認識サーバ３からの、音声認識結果待ちの状態にある期間は、転送された音声を無視するように構成されているものとし、ポータブル装置２に転送された（４２１）音声は、音声認識サーバ３に送信されずに廃棄されるものとする。 On the other hand, the external voice recognition control unit 106 transfers the received voice to the portable device 2 (421). Here, it is assumed that the portable device 2 is configured to ignore the transferred voice during a period waiting for the voice recognition result from the voice recognition server 3, and the portable device 2 is transferred to the portable device 2 ( 421) The voice is discarded without being transmitted to the voice recognition server 3.

また、外部音声認識制御部１０６は、外部音声認識停止コマンドを受け取ったならば（４１９）、外部音声認識モードを解除する（４２２）。
一方、ポータブル装置２が送信した（４１６）音声の音声認識結果が音声認識サーバ３からポータブル装置２に応答されると（４２３）、ポータブル装置２において、当該音声認識結果の処理が行われる。 When the external speech recognition control unit 106 receives the external speech recognition stop command (419), it cancels the external speech recognition mode (422).
On the other hand, when the voice recognition result of the voice transmitted by the portable device 2 (416) is answered from the voice recognition server 3 to the portable device 2 (423), the portable device 2 processes the voice recognition result.

また、ポータブル装置２の音声認識処理が終了し、ポータブル装置２から外部音声認識制御部１０６に音声認識終了が通知される（４２４）。
以上、外部音声認識制御処理と音声認識結果フィルタ処理による音声認識の動作の例を示した。
ここで、図４に示した例と異なり、ユーザがトークスイッチ１０７をオン操作した直後や、ユーザがトークスイッチ１０７をオン操作し優先ワードテーブル１０５に登録されたコマンドを表すワードではないワードを途中まで発話した後に、優先ワードテーブル１０５に登録されたコマンドを表すワードを発話した場合も、当該ワードの発話を音声認識装置１０２で認識した認識結果が音声認識結果フィルタ部１０４を介して、データ処理装置１０９に出力されることとなる。
なお、これらの場合、優先ワードテーブル１０５に登録されたコマンドを表すワードの発話音声が外部音声認識制御部１０６からポータブル装置２に転送され、ポータブル装置２において当該音声に対する音声認識サーバ３を用いた音声認識が行われる不都合が生じることがあるが、ユーザが発話した音声が緊急に処理する必要のあるコマンドを表すものであれば、当該コマンドのデータ処理装置１０９への入力を当該不都合より優先すべきであり、ユーザが発話した音声がポータブル装置２に対する音声入力に現れることがないコマンドを表すものであれば、ポータブル装置２において、音声認識結果に基づいて不慮の動作が行われることはない。 In addition, the voice recognition process of the portable device 2 ends, and the portable voice device 2 notifies the external voice recognition control unit 106 of the completion of voice recognition (424).
The example of the operation of the speech recognition by the external speech recognition control process and the speech recognition result filtering process has been described above.
Here, unlike the example shown in FIG. 4, immediately after the user turns on the talk switch 107, or while the user turns on the talk switch 107, the word which is not a word representing a command registered in the priority word table 105 is interrupted. Even when a word representing a command registered in the priority word table 105 is uttered after uttering, the recognition result of the speech recognition of the utterance of the word by the speech recognition device 102 is processed through the speech recognition result filter unit 104. It is output to the device 109.
In these cases, the speech of the word representing the command registered in the priority word table 105 is transferred from the external speech recognition control unit 106 to the portable device 2 and the portable device 2 uses the speech recognition server 3 for the voice. Although there may be a disadvantage that voice recognition is performed, if the voice uttered by the user represents a command that needs to be urgently processed, the input to the data processing device 109 of the command is prioritized over the inconvenience. If the voice uttered by the user represents a command that does not appear in the voice input to the portable device 2, the portable device 2 does not perform an unexpected operation based on the voice recognition result.

以上、本発明の実施形態について説明した。
本実施形態によれば、通常は、ポータブル装置２による音声認識サーバ３を用いたユーザの発話音声の音声入力を停止した状態で、音声認識装置１０２による常時のユーザの発話音声の音声認識と音声認識結果のコマンドのデータ処理装置１０９への入力が行われるが、ユーザがポータブル装置２への音声入力を行うためにトークスイッチ１０７のオン操作を行うと、ポータブル装置２による音声認識サーバ３を用いたユーザの発話音声の音声入力が開始される。 The embodiments of the present invention have been described above.
According to the present embodiment, normally, in a state where the speech input of the user's speech using the speech recognition server 3 by the portable device 2 is stopped, the speech recognition and speech of the regular speech of the user by the speech recognition device 102 are Although the command of the recognition result is input to the data processing device 109, when the user performs the on operation of the talk switch 107 to perform voice input to the portable device 2, the voice recognition server 3 by the portable device 2 is used. Voice input of the user's uttered voice is started.

そして、ポータブル装置２による音声認識サーバ３を用いたユーザの発話音声の音声入力が行われている期間中は、基本的には、音声認識装置１０２によるユーザの発話音声の音声認識結果のコマンドのデータ処理装置１０９への入力は停止する。 Then, basically, while the speech input of the user's speech using the speech recognition server 3 by the portable device 2 is being performed, the command of the speech recognition result of the speech of the user by the speech recognition device 102 is basically The input to the data processor 109 is stopped.

よって、ユーザがデータ処理装置１０９にコマンドを入力するために発話した音声を、ポータブル装置２に音声入力してしまったり、ユーザがポータブル装置２に音声入力するために発話した音声を、データ処理装置１０９へのコマンド入力の音声と誤認識してデータ処理装置１０９へコマンドを誤入力してしまうことは抑制される。 Therefore, the user can input into the portable device 2 a voice uttered for inputting a command into the data processing device 109, or the user can utter a voice uttered for inputting into the portable device 2 a data processing device The erroneous recognition of the voice of the command input to 109 and the erroneous input of the command to the data processing device 109 are suppressed.

一方で、ポータブル装置２による音声認識サーバ３を用いたユーザの発話音声の音声入力が行われている期間中であっても、優先ワードテーブル１０５に登録されている緊急を要する処理の実行を要求するコマンドや、ポータブル装置２に対する音声入力に現れることがないコマンドを表すワードをユーザが発話した場合だけは、音声認識装置１０２によって認識された当該コマンドがデータ処理装置１０９に入力する。 On the other hand, even during a period in which the user's voice input using the voice recognition server 3 by the portable device 2 is performed while speech input is being performed, execution of a process requiring emergency registered in the priority word table 105 is requested. The command recognized by the voice recognition device 102 is input to the data processing device 109 only when the user utters a word representing a command to be performed or a command that does not appear in the voice input to the portable device 2.

よって、本実施形態によれば、データ処理装置１０９に対するコマンドの誤入力やポータブル装置２に対する誤音声入力を抑制しつつ、常時、データ処理装置１０９に対する緊急を要する処理の実行を要求するコマンドの音声入力を行うことができるようになる。また、ポータブル装置２に対する音声入力に現れることがないコマンドについても、常時、データ処理装置１０９に対する音声入力を行うことができるようになる。 Therefore, according to the present embodiment, while suppressing erroneous input of a command to the data processing apparatus 109 and erroneous audio input to the portable apparatus 2, an audio of a command requesting execution of an urgent process to the data processing apparatus 109 at all times. You will be able to do input. In addition, even for commands that do not appear in voice input to the portable device 2, voice input to the data processing device 109 can always be performed.

なお、以上の実施形態では、トークスイッチ１０７のオン操作を、ポータブル装置２に音声認識を行わせるトリガとしたが、このトリガは、トークスイッチ１０７のオン操作以外のものであってもよい。すなわち、このトリガは、ポータブル装置２への音声入力の開始を指示するコマンドの音声入力等であってもよい。なお、この場合、ポータブル装置２への音声入力の開始を指示するコマンドの音声入力の発生は、音声認識装置１０２において、ユーザの発話音声中の当該コマンドを表すワードを音声認識することにより検出する。 In the above embodiment, the on operation of the talk switch 107 is a trigger for causing the portable device 2 to perform voice recognition. However, this trigger may be other than the on operation of the talk switch 107. That is, this trigger may be voice input of a command instructing start of voice input to the portable device 2 or the like. In this case, occurrence of voice input of a command instructing start of voice input to portable device 2 is detected by voice recognition of a word representing the command in the voice of the user in voice recognition device 102. .

また、以上の実施形態は、マイクロフォン１０１から入力する発話音声の音声認識サーバ３を用いた音声認識を行うポータブル装置２に代えて、マイクロフォン１０１から入力する発話音声の音声認識サーバ３を用いた音声認識を行うポータブル装置２ではない装置や、音声認識サーバ３を用いずに自身が備えた音声認識機能を用いてマイクロフォン１０１から入力する発話音声の音声認識を行う任意の装置を備えた場合にも、ポータブル装置２を当該備えた装置に置換することにより同様に適用することができる。 Further, in the above embodiment, instead of the portable device 2 that performs speech recognition using the speech recognition server 3 for speech input from the microphone 101, speech using the speech recognition server 3 for speech input from the microphone 101 is used. Even when the portable device 2 is not a device that performs recognition, or any device that performs voice recognition of an uttered voice input from the microphone 101 using the voice recognition function provided by itself without using the voice recognition server 3. The present invention can be similarly applied by replacing the portable device 2 with the provided device.

１…車載システム、２…ポータブル装置、３…音声認識サーバ、１０１…マイクロフォン、１０２…音声認識装置、１０３…音声認識辞書、１０４…音声認識結果フィルタ部、１０５…優先ワードテーブル、１０６…外部音声認識制御部、１０７…トークスイッチ、１０８…通信インタフェース、１０９…データ処理装置、１１０…周辺装置。 Reference Signs List 1 in-vehicle system 2 portable device 3 voice recognition server 101 microphone 102 voice recognition device 103 voice recognition dictionary 104 voice recognition result filter unit 105 priority word table 106 external voice Recognition control unit 107: Talk switch 108: Communication interface 109: Data processing device 110: Peripheral device.

Claims

A speech recognition system for speech recognition of speech uttered by a user, comprising:
A microphone,
First voice recognition means for performing continuous voice recognition of a command represented by voice collected by the microphone;
A first functional unit that executes a process in which an input command instructs execution;
Second voice recognition means for performing voice recognition operation for voice recognition of voice collected by the microphone;
A second functional unit that processes the speech recognition result of the second speech recognition device;
Priority command storage means in which a part of voice recognition commands of the first voice recognition means is registered as a priority command;
And voice recognition operation control means,
The voice recognition operation control means
Controlling switching between a first speech recognition mode provided as a speech recognition mode and a second speech recognition mode;
When the speech recognition mode is the first speech recognition mode, the speech recognition operation of the second speech recognition means is stopped, and a command recognized by the first speech recognition means is said first speech recognition mode. Input to the function section,
When the speech recognition mode is the second speech recognition mode, the second speech recognition means is caused to execute the speech recognition operation, and a command recognized by the first speech recognition means is said priority A voice recognition system, wherein a command recognized by the first voice recognition means is inputted to the first functional unit only when the command is a priority command registered in the command storage means.

The speech recognition system according to claim 1, wherein
The voice recognition operation control means switches the voice recognition mode to the second voice recognition mode when a predetermined input from the user is generated when in the first voice recognition mode. A voice recognition mode, and when the voice recognition operation of the second voice recognition unit is completed, the voice recognition mode is switched to the first voice recognition mode.

The speech recognition system according to claim 1 or 2, wherein
The second speech recognition means is connected to an external speech recognition server providing speech recognition service via communication as the speech recognition operation, and uses the speech recognition service of the connected speech recognition server to transmit the microphone. A voice recognition system characterized by performing voice recognition of voices picked up in the above.

The speech recognition system according to claim 1, 2 or 3, wherein
A speech recognition system characterized in that the speech recognition system is a speech recognition system used for speech input in a system mounted on a car.

The speech recognition system according to claim 1, 2 or 3, wherein
An on-vehicle system mounted on a vehicle and a portable device selectively connected to the on-vehicle system;
The in-vehicle system includes the microphone, the first voice recognition means, the first functional unit, the priority command storage means, and the voice recognition operation control means. The portable device is configured to receive the second voice recognition. Means and the second functional unit,
The voice recognition operation control means transfers voice collected by the microphone from the in-vehicle system to the portable device when the voice recognition mode is in the second voice recognition mode.
The voice recognition system according to claim 1, wherein the second voice recognition means performs a voice recognition operation for voice recognition of voice transferred from the in-vehicle system to the portable device.

The speech recognition system according to claim 1, 2, 3, 4 or 5, wherein
Among the commands recognized by the first voice recognition means, the priority command storage means includes at least a priority command of which the process in which the command instructs execution is a process related to securing the safety of the vehicle. A speech recognition system characterized by being registered as.

The speech recognition system according to claim 1, 2, 3, 4, 5 or 6.
In the priority command storage means, significant processing is performed on the voice recognition result obtained by speech recognition of the speech representing the command among the speech recognition commands of the first speech recognition means at the time point, the second function unit A voice recognition system characterized in that at least a command not defined in is registered as a priority command.