JP7035979B2

JP7035979B2 - Speech recognition device

Info

Publication number: JP7035979B2
Application number: JP2018216852A
Authority: JP
Inventors: 大樹山下
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2018-11-19
Filing date: 2018-11-19
Publication date: 2022-03-15
Anticipated expiration: 2038-11-19
Also published as: JP2020086006A

Description

本発明は、音声認識装置に関する。 The present invention relates to a voice recognition device.

特許文献１には、ユーザに対して冒頭ガイダンスと発話促進ガイダンスを再生し、発話促進ガイダンス再生後のユーザ発声状況から終話判定を行う自動応答録音装置が開示されている。 Patent Document 1 discloses an automatic response recording device that reproduces the opening guidance and the utterance promotion guidance to the user and determines the end of the call from the user's utterance status after the reproduction of the utterance promotion guidance.

特開平５－１１０６９０号公報Japanese Unexamined Patent Publication No. 5-110690

特許文献１に記載の装置では、音声が入力されない無音期間が所定期間（例えば１秒）継続したときに終話判定を行い、この終話判定の後に音声認識処理を開始する。そのため、終話を判定するまでの無音期間がそのままユーザの待ち時間となる。従って、特許文献１に記載の装置では、音声認識処理をどれだけ高速化したとしても、ユーザの待ち時間を終話判定期間（＝無音期間）以下に短縮することができないという問題があった。 In the apparatus described in Patent Document 1, the end-of-call determination is performed when the silent period in which no voice is input continues for a predetermined period (for example, 1 second), and the voice recognition process is started after the end-of-call determination. Therefore, the silent period until the end of the call is determined becomes the waiting time of the user as it is. Therefore, the device described in Patent Document 1 has a problem that the waiting time of the user cannot be shortened to the end-of-call determination period (= silence period) or less, no matter how fast the voice recognition process is.

本発明は、上記に鑑みてなされたものであって、音声認識処理におけるユーザの待ち時間を短縮し、音声認識処理のレスポンスを向上させることができる音声認識装置を提供することを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to provide a speech recognition device capable of shortening a user's waiting time in a speech recognition process and improving the response of the speech recognition process.

上述した課題を解決し、目的を達成するために、本発明に係る音声認識装置は、入力された音声を処理する制御部を備えた音声認識装置であって、前記制御部は、前記音声が入力されない期間である無音期間が第二閾値期間を経過したときに終話判定を行い、前記無音期間が前記第二閾値期間よりも短い第一閾値期間を経過したときに音声認識処理を開始する。 In order to solve the above-mentioned problems and achieve the object, the voice recognition device according to the present invention is a voice recognition device provided with a control unit for processing the input voice, and the control unit is such that the voice is used. The end-of-call determination is performed when the silent period, which is a non-input period, elapses from the second threshold period, and the voice recognition process is started when the silent period elapses from the first threshold period shorter than the second threshold period. ..

これにより、音声認識装置は、音声認識処理を開始する第一閾値期間が、終話判定を行う第二閾値期間よりも短いため、終話判定を待つことなく音声認識処理が開始される。 As a result, the voice recognition device starts the voice recognition process without waiting for the end-of-call determination because the first threshold period for starting the voice recognition process is shorter than the second threshold period for determining the end of the call.

また、本発明に係る音声認識装置において、前記制御部は、前記第一閾値期間の経過後、かつ前記第二閾値期間の経過前に前記無音期間が終了した場合、前記音声認識処理を中断してもよい。 Further, in the voice recognition device according to the present invention, the control unit interrupts the voice recognition process when the silence period ends after the lapse of the first threshold period and before the lapse of the second threshold period. You may.

これにより、音声認識装置は、音声認識処理を一旦開始したとしても、第二閾値期間の経過前にユーザが発話した場合は、音声認識処理を中断する。 As a result, even if the voice recognition process is started once, the voice recognition device interrupts the voice recognition process if the user speaks before the elapse of the second threshold period.

本発明に係る音声認識装置によれば、終話判定を待つことなく音声認識処理を開始するため、音声認識処理におけるユーザの待ち時間を短縮することができ、音声認識処理のレスポンスを向上させることができる。 According to the voice recognition device according to the present invention, since the voice recognition process is started without waiting for the end-of-call determination, the waiting time of the user in the voice recognition process can be shortened and the response of the voice recognition process can be improved. Can be done.

図１は、本発明の実施形態に係る音声認識装置の構成を示す機能ブロック図である。FIG. 1 is a functional block diagram showing a configuration of a voice recognition device according to an embodiment of the present invention. 図２は、従来の音声認識装置における音声認識処理の開始のタイミングを示す図である。FIG. 2 is a diagram showing the timing of starting the voice recognition process in the conventional voice recognition device. 図３は、本発明の実施形態に係る音声認識装置における音声認識処理の開始のタイミングを示す図である。FIG. 3 is a diagram showing the start timing of the voice recognition process in the voice recognition device according to the embodiment of the present invention. 図４は、本発明の実施形態に係る音声認識装置において、ユーザが一つの文章を、間を開けずに一度に発話した場合の音声認識処理の流れを示すタイムチャートである。FIG. 4 is a time chart showing a flow of voice recognition processing when a user utters one sentence at a time without a gap in the voice recognition device according to the embodiment of the present invention. 図５は、本発明の実施形態に係る音声認識装置において、ユーザが複数の文章および単語を、間を開けながら発話した場合の音声認識処理の流れを示すタイムチャートである。FIG. 5 is a time chart showing a flow of voice recognition processing when a user utters a plurality of sentences and words with a gap in the voice recognition device according to the embodiment of the present invention.

本発明の実施形態に係る音声認識装置について、図面を参照しながら説明する。なお、本発明は以下の実施形態に限定されるものではない。また、下記実施形態における構成要素には、当業者が置換可能かつ容易なもの、あるいは実質的に同一のものが含まれる。 The voice recognition device according to the embodiment of the present invention will be described with reference to the drawings. The present invention is not limited to the following embodiments. In addition, the components in the following embodiments include those that can be easily replaced by those skilled in the art, or those that are substantially the same.

［音声認識装置］
本実施形態に係る音声認識装置１は、例えば車両に車載器として搭載され、図１に示すように、マイク１０と、制御部２０と、音声バッファ３０と、を備えている。この音声認識装置１の機能は、単一の装置により実現されてもよく、あるいは複数の装置により実現されてもよい。 [Voice recognition device]
The voice recognition device 1 according to the present embodiment is mounted on a vehicle, for example, as an on-board unit, and includes a microphone 10, a control unit 20, and a voice buffer 30 as shown in FIG. The function of the voice recognition device 1 may be realized by a single device, or may be realized by a plurality of devices.

マイク１０は、ユーザが発話した音声を集音し、その音声信号を制御部２０の音声取得部２１に出力する。制御部（プロセッサ）２０は、具体的にはＣＰＵ（Central Processing Unit）等の演算処理装置によって構成されており、マイク１０を通じて入力された音声（音声信号）を処理する。制御部２０は、音声取得部２１と、第二終話判定部２２と、音声認識部２３と、第一終話判定部２４と、を備えている。 The microphone 10 collects the voice spoken by the user and outputs the voice signal to the voice acquisition unit 21 of the control unit 20. Specifically, the control unit (processor) 20 is composed of an arithmetic processing unit such as a CPU (Central Processing Unit), and processes voice (voice signal) input through the microphone 10. The control unit 20 includes a voice acquisition unit 21, a second end-of-call determination unit 22, a voice recognition unit 23, and a first end-of-call determination unit 24.

音声取得部２１は、マイク１０から入力される時系列の音声信号をデジタル化することにより、音声データを生成する。そして、音声取得部２１は、生成した音声データを音声バッファ３０に蓄積する。また、音声取得部２１は、必要に応じて、音声バッファ３０に蓄積された音声データを、第二終話判定部２２および音声認識部２３に出力する。 The voice acquisition unit 21 generates voice data by digitizing a time-series voice signal input from the microphone 10. Then, the voice acquisition unit 21 stores the generated voice data in the voice buffer 30. Further, the voice acquisition unit 21 outputs the voice data stored in the voice buffer 30 to the second end-of-call determination unit 22 and the voice recognition unit 23, if necessary.

第二終話判定部２２は、第二の終話判定を行う。第二終話判定部２２は、具体的には、ユーザからの音声が入力されない期間である無音期間が、予め設定された第二閾値期間を経過したか否かを判定する。そして、第二終話判定部２２は、無音期間が第二閾値期間を経過したときに第二段階目の終話判定（第二の終話判定）を行う。前記した「第二閾値期間」とは、ユーザが完全に終話したか否かを判定するための閾値として用いられる期間である。第二閾値期間は、前記した第一閾値期間よりも長く、例えば第一閾値期間に対して、第一の終話判定と第二の終話判定との間の期間（例えば１秒）を足し合わせた長さに設定される（図３参照）。 The second end-of-call determination unit 22 makes a second end-of-call determination. Specifically, the second end-of-call determination unit 22 determines whether or not the silent period, which is a period during which no voice from the user is input, has elapsed a preset second threshold period. Then, the second end-of-call determination unit 22 makes a second-stage end-of-call determination (second end-of-call determination) when the silence period elapses from the second threshold period. The above-mentioned "second threshold period" is a period used as a threshold for determining whether or not the user has completely finished speaking. The second threshold period is longer than the above-mentioned first threshold period, for example, the period between the first end-of-call determination and the second end-end determination (for example, 1 second) is added to the first threshold period. It is set to the combined length (see FIG. 3).

音声認識部２３は、自動音声認識（ＡＳＲ：Automatic Speech Recognition）処理（以下、「認識処理」という）を行う音声認識エンジンである。音声認識部２３は、無音期間が第一閾値期間を経過したときに認識処理を開始する。すなわち、音声認識部２３は、第一終話判定部２４によって第一の終話判定が行われたときに認識処理を開始する。 The speech recognition unit 23 is a speech recognition engine that performs automatic speech recognition (ASR: Automatic Speech Recognition) processing (hereinafter referred to as "recognition processing"). The voice recognition unit 23 starts the recognition process when the silence period elapses from the first threshold value period. That is, the voice recognition unit 23 starts the recognition process when the first end-of-call determination is made by the first end-of-call determination unit 24.

ここで、従来の音声認識方法では、図２に示すように、無音期間が所定の閾値期間（例えば１秒）を経過したときに終話判定を行い、この終話判定が行われた時点から認識処理を開始していた。すなわち、従来の音声認識方法では、終話判定と認識処理とをシーケンシャル（順次的）に処理しているため、ユーザの待ち時間が長くなるという問題があった。 Here, in the conventional voice recognition method, as shown in FIG. 2, the end-of-call determination is performed when the silence period elapses from a predetermined threshold period (for example, 1 second), and from the time when the end-of-call determination is performed. The recognition process was started. That is, in the conventional voice recognition method, since the end-of-call determination and the recognition process are sequentially processed (sequentially), there is a problem that the waiting time of the user becomes long.

一方、本実施形態に係る音声認識装置１では、図３に示すように、無音期間が従来の閾値期間よりも短い第一閾値期間を経過したときに第一の終話判定を行い、この第一の終話判定が行われた時点から認識処理を開始する。すなわち、本実施形態に係る音声認識装置１では、従来の音声認識方法よりも認識処理を早期に開始し、終話判定と認識処理とを並列的に処理することにより、ユーザの待ち時間の短縮化を図る。 On the other hand, in the voice recognition device 1 according to the present embodiment, as shown in FIG. 3, when the first threshold period in which the silence period is shorter than the conventional threshold period elapses, the first end-of-call determination is performed, and this first determination is made. The recognition process is started from the time when one end-of-call determination is made. That is, in the voice recognition device 1 according to the present embodiment, the recognition process is started earlier than the conventional voice recognition method, and the end-of-call determination and the recognition process are processed in parallel, thereby shortening the waiting time of the user. Aim for conversion.

なお、音声認識部２３は、第一閾値期間の経過後、かつ第二閾値期間の経過前に無音期間が終了した場合、例えば図３のＡ時点で無音期間が終了した場合、認識処理を中断する。すなわち、本実施形態に係る音声認識装置１では、認識処理を一旦開始したとしても、第二閾値期間の経過前にユーザが発話した場合は、認識処理を中断する。 The voice recognition unit 23 interrupts the recognition process when the silence period ends after the lapse of the first threshold period and before the lapse of the second threshold period, for example, when the silence period ends at the time point A in FIG. do. That is, in the voice recognition device 1 according to the present embodiment, even if the recognition process is started once, if the user speaks before the lapse of the second threshold period, the recognition process is interrupted.

第一終話判定部２４は、第一の終話判定を行う。第一終話判定部２４は、具体的には、ユーザからの音声が入力されない期間である無音期間が、予め設定された第一閾値期間を経過したか否かを判定する。そして、第一終話判定部２４は、無音期間が第一閾値期間を経過したときに第一段階目の終話判定（第一の終話判定）を行う。前記した「第一閾値期間」とは、ユーザが完全に終話したか否かの判定（第二の終話判定）の前に、ユーザが仮に終話した否かを判定するための閾値として用いられる期間である。第一閾値期間は、後記する第二閾値期間よりも短く、例えば第二閾値期間から、第一の終話判定と第二の終話判定との間の期間（例えば１秒）を差し引いた長さに設定される（後記する図３参照）。 The first end-of-speech determination unit 24 makes the first end-of-speech determination. Specifically, the first end-of-call determination unit 24 determines whether or not the silence period, which is the period during which no voice from the user is input, has elapsed the preset first threshold value period. Then, the first end-of-call determination unit 24 makes the end-of-call determination (first end-of-call determination) in the first stage when the silence period elapses from the first threshold value period. The above-mentioned "first threshold period" is used as a threshold value for determining whether or not the user has tentatively ended the call before determining whether or not the user has completely terminated the call (second end-of-call determination). It is a period to be. The first threshold period is shorter than the second threshold period described later, for example, the length obtained by subtracting the period (for example, 1 second) between the first end-of-call determination and the second end-end determination from the second threshold period. It is set to the threshold value (see FIG. 3 described later).

音声バッファ３０は、例えばＨＤＤ（Hard Disk Drive）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random access memory）等により構成されている。音声バッファ３０は、音声取得部２１が生成した音声データを一時的に蓄積する。 The audio buffer 30 is composed of, for example, an HDD (Hard Disk Drive), a ROM (Read Only Memory), a RAM (Random access memory), or the like. The voice buffer 30 temporarily stores the voice data generated by the voice acquisition unit 21.

［音声認識方法］
以下、本実施形態に係る音声認識装置１による音声認識方法の具体的な実施例について、図４および図５を参照しながら説明する。なお、以下の説明では、マイク１０、音声取得部２１、第二終話判定部２２および音声バッファ３０における処理をクライアント側の処理として説明し、音声認識部２３および第一終話判定部２４における処理を認識エンジン側の処理として説明する。また、図４および図５における左側の「ユーザ操作」は、ユーザ側の行動を示している。 [Voice recognition method]
Hereinafter, specific examples of the voice recognition method by the voice recognition device 1 according to the present embodiment will be described with reference to FIGS. 4 and 5. In the following description, the processing in the microphone 10, the voice acquisition unit 21, the second end call determination unit 22, and the voice buffer 30 will be described as the client-side processing, and the voice recognition unit 23 and the first end call determination unit 24 will be described. The process will be described as a process on the recognition engine side. Further, the "user operation" on the left side in FIGS. 4 and 5 indicates an action on the user side.

（第一の実施例）
図４は、音声認識装置１に対して、ユーザが一つの文章（「佐藤さんに電話して」）を、間を開けずに一度に発話した場合における認識処理の流れを示している。まず、ユーザによってＰＴＴ（Push to Talk）がなされると（ステップＳ１）、認識エンジン側の音声認識部２３は発話待ち状態となる（ステップＳ２）。 (First Example)
FIG. 4 shows the flow of the recognition process when the user utters one sentence (“call Mr. Sato”) at once to the voice recognition device 1 without a gap. First, when PTT (Push to Talk) is performed by the user (step S1), the voice recognition unit 23 on the recognition engine side is in an utterance waiting state (step S2).

続いて、ユーザが「佐藤さんに電話して」という発話を開始すると（ステップＳ３）、クライアント側の音声取得部２１は、音声バッファ３０への音声データの蓄積を開始する（ステップＳ４）。それと同時に、認識エンジン側の音声認識部２３は、発話を検知する（ステップＳ５）。 Subsequently, when the user starts uttering "Call Mr. Sato" (step S3), the voice acquisition unit 21 on the client side starts accumulating voice data in the voice buffer 30 (step S4). At the same time, the voice recognition unit 23 on the recognition engine side detects the utterance (step S5).

続いて、ユーザの発話が終了し（ステップＳ６）、第一閾値期間が経過すると、認識エンジン側の第一終話判定部２４は、終話検知（第一の終話判定）を行う（ステップＳ７）。これを受けて、音声認識部２３は、音声バッファ３０から音声データを読み込み、認識処理を開始する（ステップＳ８）。また、第一終話判定部２４は、終話検知（第一の終話判定）後にその検知結果をクライアント側の第二終話判定部２２に送信する（ステップＳ９）。これを受けて、第二終話判定部２２は完全終話待ちとなる（ステップＳ１０）。 Subsequently, when the user's utterance is completed (step S6) and the first threshold period elapses, the first end-of-call determination unit 24 on the recognition engine side performs end-of-call detection (first end-of-call determination) (step). S7). In response to this, the voice recognition unit 23 reads the voice data from the voice buffer 30 and starts the recognition process (step S8). Further, the first end-of-call determination unit 24 transmits the detection result to the second end-of-call determination unit 22 on the client side after the end-of-call determination (first end-of-call determination) (step S9). In response to this, the second end-of-call determination unit 22 waits for the complete end of the call (step S10).

続いて、ユーザの発話終了から第二閾値期間が経過すると、クライアント側の第二終話判定部２２は、完全終話検知（第二の終話判定）を行う（ステップＳ１１）。その後、音声認識部２３における認識処理が終了すると、音声認識部２３は、その認識結果をクライアント側に送信する（ステップＳ１２）。 Subsequently, when the second threshold period elapses from the end of the user's utterance, the second end-of-call determination unit 22 on the client side performs complete end-of-call detection (second end-of-call determination) (step S11). After that, when the recognition process in the voice recognition unit 23 is completed, the voice recognition unit 23 transmits the recognition result to the client side (step S12).

（第二の実施例）
図５は、音声認識装置１に対して、ユーザが複数の文章（「近くで探す」）および単語（「コンビニエンスストア」）を、間を開けながら発話した場合の音声認識処理の流れを示すタイムチャートである。まず、ユーザによってＰＴＴ（Push to Talk）がなされると（ステップＳ２１）、認識エンジン側の音声認識部２３は発話待ち状態となる（ステップＳ２２）。 (Second Example)
FIG. 5 shows a time showing a flow of voice recognition processing when a user speaks a plurality of sentences (“search nearby”) and words (“convenience store”) with respect to the voice recognition device 1 with a gap. It is a chart. First, when PTT (Push to Talk) is performed by the user (step S21), the voice recognition unit 23 on the recognition engine side is in an utterance waiting state (step S22).

続いて、ユーザが「近くで探す」という発話を開始すると（ステップＳ２３）、クライアント側の音声取得部２１は、音声バッファ３０への音声データの蓄積を開始する（ステップＳ２４）。それと同時に、認識エンジン側の音声認識部２３は、発話を検知する（ステップＳ２５）。 Subsequently, when the user starts the utterance "search nearby" (step S23), the voice acquisition unit 21 on the client side starts accumulating voice data in the voice buffer 30 (step S24). At the same time, the voice recognition unit 23 on the recognition engine side detects the utterance (step S25).

続いて、ユーザの発話が途切れ（ステップＳ２６）、第一閾値期間が経過すると、認識エンジン側の第一終話判定部２４は、終話検知（第一の終話判定）を行う（ステップＳ２７）。これを受けて、音声認識部２３は、音声バッファ３０から音声データを読み込み、認識処理を開始する（ステップＳ２８）。また、第一終話判定部２４は、終話検知（第一の終話判定）後にその検知結果をクライアント側の第二終話判定部２２に送信する（ステップＳ２９）。これを受けて、第二終話判定部２２は完全終話待ちとなる（ステップＳ３０）。 Subsequently, when the user's utterance is interrupted (step S26) and the first threshold period elapses, the first end-of-call determination unit 24 on the recognition engine side performs end-of-call detection (first end-of-call determination) (step S27). ). In response to this, the voice recognition unit 23 reads the voice data from the voice buffer 30 and starts the recognition process (step S28). Further, the first end-of-call determination unit 24 transmits the detection result to the second end-of-call determination unit 22 on the client side after the end-of-call detection (first end-of-call determination) (step S29). In response to this, the second end-of-call determination unit 22 waits for the complete end of the call (step S30).

続いて、ユーザの発話が再開され、ユーザが「コンビニエンスストア」という発話を開始すると（ステップＳ３１）、クライアント側の音声取得部２１は、発話を再検知する（ステップＳ３２）。そして、音声取得部２１は、認識エンジン側の音声認識部２３に対して認識処理中断の指示を送信する（ステップＳ３３）。これを受けて、音声認識部２３は、音声認識を中断する。また、音声取得部２１は、音声認識部２３に対して、音声バッファ３０に蓄積された音声データを送信する（ステップＳ３４）。 Subsequently, when the user's utterance is resumed and the user starts the utterance of "convenience store" (step S31), the voice acquisition unit 21 on the client side re-detects the utterance (step S32). Then, the voice acquisition unit 21 transmits an instruction to interrupt the recognition process to the voice recognition unit 23 on the recognition engine side (step S33). In response to this, the voice recognition unit 23 interrupts the voice recognition. Further, the voice acquisition unit 21 transmits the voice data stored in the voice buffer 30 to the voice recognition unit 23 (step S34).

続いて、ユーザの発話が終了し（ステップＳ３５）、第一閾値期間が経過すると、認識エンジン側の第一終話判定部２４は、終話検知（第一の終話判定）を行う（ステップＳ３６）。これを受けて、音声認識部２３は、音声バッファ３０から音声データを読み込み、認識処理を開始する（ステップＳ３７）。また、第一終話判定部２４は、終話検知（第一の終話判定）後にその検知結果をクライアント側の第二終話判定部２２に送信する（ステップＳ３８）。これを受けて、第二終話判定部２２は完全終話待ちとなる（ステップＳ３９）。 Subsequently, when the user's utterance is completed (step S35) and the first threshold period elapses, the first end-of-call determination unit 24 on the recognition engine side performs end-of-call detection (first end-of-call determination) (step). S36). In response to this, the voice recognition unit 23 reads the voice data from the voice buffer 30 and starts the recognition process (step S37). Further, the first end-of-call determination unit 24 transmits the detection result to the second end-of-call determination unit 22 on the client side after the end-of-call detection (first end-of-call determination) (step S38). In response to this, the second end-of-call determination unit 22 waits for the complete end of the call (step S39).

続いて、ユーザの発話終了から第二閾値期間が経過すると、クライアント側の第二終話判定部２２は、完全終話検知（第二の終話判定）を行う（ステップＳ４０）。その後、音声認識部２３における認識処理が終了すると、音声認識部２３は、その認識結果をクライアント側に送信する（ステップＳ４１）。 Subsequently, when the second threshold period elapses from the end of the user's utterance, the second end-of-call determination unit 22 on the client side performs complete end-of-call detection (second end-of-call determination) (step S40). After that, when the recognition process in the voice recognition unit 23 is completed, the voice recognition unit 23 transmits the recognition result to the client side (step S41).

以上説明したような音声認識装置１によれば、認識処理における終話判定を二段階に分け、通常の長さの終話判定（第二の終話判定）よりも前に短時間での終話判定（第一の終話判定）を実施することにより、認識処理を早期に開始することができる。 According to the voice recognition device 1 as described above, the end-of-call determination in the recognition process is divided into two stages, and the end-of-call determination in a short time is performed before the end-of-call determination of a normal length (second end-end determination). By carrying out the talk determination (first end-of-speech determination), the recognition process can be started at an early stage.

すなわち、音声認識装置１によれば、認識処理を開始する第一閾値期間が、終話判定を行う第二閾値期間よりも短いため、終話判定を待つことなく認識処理が開始される。従って、音声認識装置１によれば、終話判定を待つことなく音声認識処理を開始するため、音声認識処理におけるユーザの待ち時間を短縮することができ、音声認識処理のレスポンスを向上させることができる。 That is, according to the voice recognition device 1, since the first threshold period for starting the recognition process is shorter than the second threshold period for determining the end of call, the recognition process is started without waiting for the end of call determination. Therefore, according to the voice recognition device 1, since the voice recognition process is started without waiting for the end call determination, the waiting time of the user in the voice recognition process can be shortened, and the response of the voice recognition process can be improved. can.

［音声認識プログラム］
本実施形態に係る音声認識プログラムは、コンピュータを、前記した制御部２０の各部（各手段）として機能させたものである。音声対話プログラムは、例えばハードディスク、フレキシブルディスク、ＣＤ－ＲＯＭ等の、コンピュータで読み取り可能な記録媒体に格納して配布してもよく、あるいは、ネットワークを介して流通させてもよい。 [Voice recognition program]
In the voice recognition program according to the present embodiment, the computer is made to function as each part (each means) of the control unit 20 described above. The voice dialogue program may be stored and distributed in a computer-readable recording medium such as a hard disk, a flexible disk, or a CD-ROM, or may be distributed via a network.

以上、本発明に係る音声認識装置について、発明を実施するための形態により具体的に説明したが、本発明の趣旨はこれらの記載に限定されるものではなく、特許請求の範囲の記載に基づいて広く解釈されなければならない。また、これらの記載に基づいて種々変更、改変等したものも本発明の趣旨に含まれることはいうまでもない。 The voice recognition device according to the present invention has been specifically described above in terms of the mode for carrying out the invention, but the purpose of the present invention is not limited to these descriptions, but is based on the description of the scope of claims. Must be widely interpreted. Needless to say, various changes, modifications, etc. based on these descriptions are also included in the gist of the present invention.

１音声認識装置
１０マイク
２０制御部
２１音声取得部
２２第二終話判定部
２３音声認識部
２４第一終話判定部
３０音声バッファ 1 Voice recognition device 10 Microphone 20 Control unit 21 Voice acquisition unit 22 Second end-of-speech judgment unit 23 Voice recognition unit 24 First end-of-speech judgment unit 30 Voice buffer

Claims

A voice recognition device equipped with a control unit that processes input voice.
The control unit
When the silence period, which is the period during which no voice is input, elapses from the second threshold period, the end-of-call determination is performed.
The voice recognition process is started when the first threshold period shorter than the second threshold period elapses.
Speech recognition device.

The voice recognition device according to claim 1, wherein the control unit interrupts the voice recognition process when the silence period ends after the lapse of the first threshold period and before the lapse of the second threshold period.