WO2017154358A1 - Speech recognition device and speech recognition program - Google Patents
Speech recognition device and speech recognition program Download PDFInfo
- Publication number
- WO2017154358A1 WO2017154358A1 PCT/JP2017/001556 JP2017001556W WO2017154358A1 WO 2017154358 A1 WO2017154358 A1 WO 2017154358A1 JP 2017001556 W JP2017001556 W JP 2017001556W WO 2017154358 A1 WO2017154358 A1 WO 2017154358A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- speech recognition
- processing unit
- speech
- likelihood
- Prior art date
Links
- 238000000034 method Methods 0.000 claims description 37
- 230000003247 decreasing effect Effects 0.000 abstract 1
- 230000007423 decrease Effects 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
Definitions
- This disclosure relates to a speech recognition apparatus and a speech recognition program.
- Patent Document 1 discloses a technique for determining whether or not the reliability of a user's voice, that is, the likelihood is equal to or higher than a predetermined threshold, and performing voice recognition when the reliability is higher than or equal to a predetermined threshold. Has been. Patent Document 1 discloses a technique for changing a threshold value according to the number of utterances of a user and the conversation time.
- the threshold value may be changed even though the threshold value need not be changed.
- the present disclosure relates to the reliability of speech uttered by a user, that is, a criterion for determination in a speech recognition apparatus and a speech recognition program that determine the effectiveness of speech recognition based on the magnitude relationship between likelihood and a predetermined threshold. It is an object of the present invention to provide a configuration that can appropriately change the threshold value.
- the speech recognition apparatus includes a speech input unit, a speech storage unit, a speech recognition processing unit, a likelihood calculation processing unit, a validity determination processing unit, an identical determination processing unit, and a threshold adjustment processing unit.
- a voice of the user is input to the voice input unit.
- the voice storage unit stores voice input to the voice input unit.
- the voice recognition processing unit recognizes the voice input to the voice input unit.
- the likelihood calculation processing unit calculates the likelihood of the speech recognition result by the speech recognition processing unit.
- the validity determination processing unit determines that the speech recognition result by the speech recognition processing unit is valid when the likelihood calculated by the likelihood calculation processing unit is equal to or greater than a predetermined threshold.
- the same determination processing unit determines whether or not the previous voice and the current voice are the same based on the degree of coincidence between the previous voice and the current voice stored in the voice storage unit.
- the threshold adjustment processing unit lowers the threshold by a predetermined adjustment value when the same determination processing unit determines that the previous speech and the current speech are the same.
- a speech recognition program is provided in a speech recognition apparatus that includes a speech input unit that receives user speech and a speech storage unit that stores speech input to the speech input unit.
- the likelihood calculation process, the validity determination process, the same determination process, and the threshold adjustment process are executed.
- the voice recognition process recognizes voice input to the voice input unit.
- the likelihood calculation process the likelihood of the speech recognition result by the speech recognition process is calculated.
- the validity determination process determines that the voice recognition result by the voice recognition process is valid when the likelihood calculated by the likelihood calculation process is equal to or greater than a predetermined threshold.
- the same determination process determines whether or not the previous voice and the current voice are the same based on the degree of coincidence between the previous voice and the current voice stored in the voice storage unit.
- the threshold adjustment process lowers the threshold by a predetermined adjustment value when it is determined by the same determination process that the previous voice and the current voice are the same.
- a threshold value that is a criterion for determining whether the speech recognition result is valid or invalid is an appropriate value only when the user repeatedly inputs the same word. Can be changed.
- FIG. 1 is a diagram schematically illustrating a configuration example of a speech recognition apparatus according to the present embodiment.
- FIG. 2 is a diagram showing an example of a voice recognition screen.
- FIG. 3 is a flowchart illustrating an operation example of the speech recognition apparatus.
- a voice recognition device 10 illustrated in FIG. 1 is mounted on a vehicle, for example, and includes a control unit 11, a voice input unit 12, a voice output unit 13, a display output unit 14, an operation input unit 15, a storage unit 16, and the like. Prepare.
- the control unit 11 is configured mainly with a microcomputer (not shown), and controls the overall operation of the speech recognition apparatus 10.
- the voice input unit 12 includes a microphone (not shown) and the like, and inputs the user's voice.
- the voice input unit 12 converts the input voice into voice data and outputs the voice data to the control unit 11.
- the control unit 11 stores the audio data input from the audio input unit 12 in the storage unit 16.
- the storage unit 16 is an example of an audio storage unit, and is configured of a storage medium such as a hard disk drive, for example.
- the storage unit 16 stores various types of data necessary for speech recognition processing such as a speech recognition dictionary database, in addition to speech speech data input from the speech input unit 12.
- the voice output unit 13 includes a speaker (not shown) and outputs various information such as a voice recognition result by voice based on a voice output signal input from the control unit 11.
- the display output unit 14 includes a display panel (not shown), and displays various screens such as a voice recognition screen G shown in FIG. 2 based on a display output signal input from the control unit 11.
- the voice recognition screen G exemplifies commands that can be recognized, for example, and the user issues various voices, that is, commands for the voice recognition device 10 with reference to the commands shown on the voice recognition screen G. Can do. Note that the commands shown on the speech recognition screen G are merely examples, and the speech recognition apparatus 10 can recognize speech that is not shown on the speech recognition screen G.
- the operation input unit 15 includes an operation switch (not shown). When the operation switch is operated by the user, the operation content is input to the control unit 11.
- the control unit 11 executes a speech recognition control program, which is an example of a speech recognition program, so that a speech recognition processing unit 21, a likelihood calculation processing unit 22, a validity determination processing unit 23, an identical determination processing unit 24, a threshold value
- the adjustment processing unit 25 is virtually realized by software. Note that these processing units may be realized by hardware, or may be realized by a combination of software and hardware.
- the voice recognition processing unit 21 specifies the degree of coincidence between the two voice data by comparing the voice data of the voice input to the voice input unit 12 with the voice data stored in the dictionary database for voice recognition. .
- the voice recognition processing unit 21 recognizes that the input voice is the voice of the voice data of the collated dictionary database when the two voice data are completely or almost the same.
- the two audio data substantially coincide with each other means that the audio data coincide with each other to the extent that they can be regarded as being completely coincident, for example, when both audio data coincide with 80% to 90% or more.
- the likelihood calculation processing unit 22 calculates the likelihood of the speech recognition result by the speech recognition processing unit 21. That is, the likelihood calculation processing unit 22 calculates a higher likelihood as the matching degree is higher according to the matching degree of the both voice data at the time of the voice recognition processing by the voice recognition processing unit 21, and lower likelihood as the matching degree is lower. Calculate the degree.
- the validity determination processing unit 23 determines that the speech recognition result by the speech recognition processing unit 21 is valid when the likelihood calculated by the likelihood calculation processing unit 22 is equal to or greater than a predetermined threshold T. Is smaller, the speech recognition result by the speech recognition processing unit 21 is determined to be invalid.
- the determination reason for example, information indicating that the user's voice is too low or too high, Invalid determination reason information such as information indicating that the user's voice is too fast or too slow is output.
- the invalidity determination reason information may be output by, for example, a voice by the voice output unit 13, may be output by a screen display by the display output unit 14, or an auditory output and display output unit 14 by the voice output unit 13. It may be performed by both visual output by. Also, the loudness and speed of the voice can be specified by comparing the signal level of the voice data output from the voice input unit 12 with a threshold for judging the magnitude and a threshold for judging the speed, for example.
- the same determination processing unit 24 determines whether the voice inputted last time and the voice inputted this time Are determined to be the same. That is, the same determination processing unit 24 specifies the degree of coincidence between the voice data of the previous input voice and the voice data of the current input voice. The identical determination processing unit 24 determines that the previously input sound and the sound input this time are the same when the two sound data are completely or approximately the same. It should be noted that the two audio data substantially coincide with each other means that the audio data coincide with each other to the extent that they can be regarded as being completely coincident, for example, when both audio data coincide with 80% to 90% or more.
- the threshold adjustment processing unit 25 lowers the above threshold T by a predetermined adjustment value when the same determination processing unit 24 determines that the previous sound and the current sound are the same.
- the threshold adjustment processing unit 25 is configured to change the adjustment value according to the likelihood calculated by the likelihood calculation processing unit 22. In this case, the threshold adjustment processing unit 25 increases the adjustment value as the calculated likelihood is higher.
- the threshold adjustment processing unit 25 is configured to change the adjustment value according to the number of times that the likelihood calculated by the likelihood calculation processing unit 22 is equal to or greater than the threshold T in a plurality of speech recognition processes in the past. ing. In this case, the threshold adjustment processing unit 25 increases the adjustment value as the number of times that the calculated likelihood becomes equal to or greater than the threshold T increases.
- the voice recognition device 10 monitors whether or not voice is input (S1). Then, when voice is input (S1: YES), the voice recognition device 10 stores the voice data (S2). Then, the voice recognition device 10 monitors again whether or not voice is input in a state where the same voice recognition screen G as that at the time of step S1 is displayed (S3). When the voice is input again (S3: YES), the voice recognition device 10 stores the voice data (S4), and proceeds to step S5. Note that the voice recognition device 10 proceeds to step S1 when no voice is input again within a predetermined waiting time (S3: NO).
- the speech recognition processing unit 21 performs speech recognition processing. And the speech recognition apparatus 10 performs the likelihood calculation process by the likelihood calculation process part 22 (S6). And the speech recognition apparatus 10 performs the same determination process by the same determination process part 24 (S7). If the speech recognition apparatus 10 determines that the previously input speech and the current input speech are the same (S7: YES) by the same determination process, the speech recognition apparatus 10 decreases the threshold T (S8). Note that an example of threshold adjustment processing is configured by step S8. Further, the speech recognition apparatus 10 outputs that the threshold value T has been lowered, that is, that the judgment reference value for judging whether the speech recognition result is valid or invalid is adapted to the user's speech mode (S9), The process proceeds to step S10.
- the output indicating that the determination reference value is adapted may be output by voice by the voice output unit 13, may be output by screen display by the display output unit 14, or may be output by the voice output unit 13. You may perform both by an audio output and the visual output by the display output part 14.
- FIG. If the speech recognition apparatus 10 determines that the previously input speech and the current input speech are not the same (S7: NO) by the same determination processing, step S8 and S9 are not performed. The process proceeds to S10.
- Step S10 the speech recognition apparatus 10 determines whether or not the likelihood calculated in step S6 is greater than or equal to a threshold value T.
- Step S10 is an example of a validity determination process. If the likelihood is greater than or equal to the threshold T (S10: YES), the speech recognition apparatus 10 determines that the speech recognition result in step S5 is valid (S11), and ends this control. On the other hand, when the likelihood is smaller than the threshold value T (S10: NO), the speech recognition apparatus 10 determines that the speech recognition result in step S5 is invalid (S12). Then, the speech recognition apparatus 10 outputs invalidity determination reason information (S13) and ends this control.
- An example of the validity determination process is configured by steps S10, S11, and S12.
- the speech recognition apparatus 10 it is possible to determine that the previous speech and the current speech are the same when the user repeatedly inputs speech while the same speech recognition screen G is displayed.
- the threshold value T which is a determination reference value for determining whether the speech recognition result is valid or invalid, is lowered by a predetermined adjustment value.
- the determination reference value for determining whether the speech recognition result is valid or invalid can be changed to an appropriate value only when the user repeatedly inputs the same word. it can.
- the adjustment value of the threshold T is changed according to the calculated likelihood.
- the speech recognition apparatus 10 increases the adjustment value as the calculated likelihood is higher. That is, by setting the threshold T to a smaller value as the likelihood of the speech recognition result is higher, the probability that the speech recognition result having a higher likelihood is determined to be effective can be increased.
- the adjustment value of the threshold T is changed according to the number of times that the likelihood calculated in the plurality of speech recognition processes in the past becomes equal to or greater than the threshold T.
- the speech recognition apparatus 10 increases the adjustment value as the number of times that the calculated likelihood is equal to or greater than the threshold value T increases. That is, by increasing the number of times that the speech recognition result is determined to be valid during a plurality of speech recognition processes in the past, the threshold T is set to a smaller value. The probability that the speech recognition result is determined to be valid can be increased.
- the determination reason when it is determined that the speech recognition result is invalid, the determination reason is also output. Thereby, the user can improve the speech mode based on the output determination reason, and can increase the probability that the speech recognition result is determined to be valid.
- the speech recognition apparatus 10 not only lowers the threshold T, but also performs processing such as reducing the processing speed of the speech recognition processing or increasing the time required for the speech recognition processing. It may be. That is, the speech recognition process can be performed more accurately by reducing the processing speed of the speech recognition process or increasing the processing time. When a correct speech recognition result cannot be obtained and the user repeatedly inputs the same word over and over, it is effective to take measures for performing such accurate speech recognition processing.
- the voice recognition device 10 may store at least voice data of the voice input last time. Therefore, the speech recognition apparatus 10 may be configured to erase past speech data in order from the oldest, and in this case, the limited storage capacity of the storage unit 16 can be effectively utilized. In addition, the speech recognition apparatus 10 may be configured to calculate the likelihood in a plurality of stages, for example, three stages of “high”, “medium”, and “low”, or to calculate the likelihood by, for example, continuous numerical values by percentage. It is good.
- the speech recognition apparatus 10 may be configured to decrease the adjustment value of the threshold T as the calculated likelihood is higher. Further, the speech recognition apparatus 10 may be configured to decrease the adjustment value of the threshold T as the number of times that the calculated likelihood is equal to or greater than the threshold T in a plurality of past speech recognition processes. Further, the voice recognition program according to the present embodiment can be stored in, for example, a computer-readable storage medium.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Navigation (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
This speech recognition device 10 comprises: a speech input unit 12 to which a user's speech is input; a speech storage unit 16 for storing an input speech; a speech recognition processing unit 21 for recognizing the input speech; a likelihood calculation processing unit 22 for calculating the likelihood of the recognition results of a speech; a validity determination processing unit 23 for determining that the recognition results of the speech are valid if the calculated likelihood is greater than or equal to a prescribed threshold; a similarity determination processing unit 24 for determining whether or not the stored previous speech and the present speech are identical on the basis of the degree of coincidence between the previous speech and the present speech; and a threshold adjustment processing unit 25 for decreasing the threshold by a prescribed adjustment value if the similarity determination processing unit 24 determines that the previous speech and the present speech are identical.
Description
本出願は、2016年3月7日に出願された日本出願番号2016-043348号に基づくもので、ここにその記載内容を援用する。
This application is based on Japanese Patent Application No. 2016-043348 filed on March 7, 2016, the contents of which are incorporated herein by reference.
本開示は、音声認識装置および音声認識プログラムに関する。
This disclosure relates to a speech recognition apparatus and a speech recognition program.
例えば車両に搭載される音声認識装置においては、周囲のノイズやユーザの言い間違えなどにより、正しい音声認識結果を得られない場合がある。そのため、正しい音声認識結果が得られない場合に、ユーザに再度の発話を促す機能を備えた音声認識装置が考えられている。例えば特許文献1には、ユーザの音声の信頼度、即ち、尤らしさが所定の閾値以上であるか否かを判定し、信頼度が所定の閾値以上である場合に音声認識を行う技術が開示されている。そして、特許文献1には、ユーザの発話回数や対話時間に応じて閾値を変更する技術が開示されている。
For example, in a voice recognition device mounted on a vehicle, a correct voice recognition result may not be obtained due to ambient noise or a user's mistake. Therefore, a speech recognition device having a function of prompting the user to speak again when a correct speech recognition result cannot be obtained has been considered. For example, Patent Document 1 discloses a technique for determining whether or not the reliability of a user's voice, that is, the likelihood is equal to or higher than a predetermined threshold, and performing voice recognition when the reliability is higher than or equal to a predetermined threshold. Has been. Patent Document 1 discloses a technique for changing a threshold value according to the number of utterances of a user and the conversation time.
特許文献1に記載の従来技術によれば、ユーザの発話回数や対話時間に応じた閾値に基づいて音声認識を行うことができる。しかしながら、発話の「回数」や対話の「時間」に応じて閾値を変更する従来技術では、例えば前回と今回とでユーザが異なる言葉を発した場合であっても、発話の「回数」や対話の「時間」が所定の条件を満たすのであれば閾値が変更されてしまう。正しい音声認識結果が得られずユーザに再度の発話を促す場合としては、通常、ユーザは、前回と同じ言葉を繰り返し発することが想定される。よって、再度の発話により正しい音声認識結果を得るという観点からは、ユーザが同じ言葉を繰り返す場合に閾値を変更すれば十分である。よって、ユーザが異なる言葉を発する場合にも閾値が変更されてしまう従来技術では、閾値の変更が不要であるにも関わらず閾値が変更されてしまうおそれがある。
According to the conventional technique described in Patent Document 1, it is possible to perform voice recognition based on a threshold value corresponding to the number of utterances of the user and the conversation time. However, in the conventional technique in which the threshold value is changed according to the “number of utterances” and the “time” of the dialogue, for example, even when the user utters different words between the previous time and this time, the “number of utterances” and the dialogue If the “time” satisfies the predetermined condition, the threshold value is changed. As a case where a correct speech recognition result cannot be obtained and the user is prompted to speak again, it is usually assumed that the user repeats the same words as the previous time. Therefore, from the viewpoint of obtaining a correct speech recognition result by re-speaking, it is sufficient to change the threshold when the user repeats the same word. Therefore, in the related art in which the threshold value is changed even when the user utters different words, the threshold value may be changed even though the threshold value need not be changed.
そこで、本開示は、ユーザが発する音声の信頼度、即ち、尤度と所定の閾値との大小関係に基づいて音声認識の有効性を判断する音声認識装置および音声認識プログラムにおいて、その判断の基準となる閾値を適切に変更できるようにした構成を提供すること目的とする。
Therefore, the present disclosure relates to the reliability of speech uttered by a user, that is, a criterion for determination in a speech recognition apparatus and a speech recognition program that determine the effectiveness of speech recognition based on the magnitude relationship between likelihood and a predetermined threshold. It is an object of the present invention to provide a configuration that can appropriately change the threshold value.
本開示の一態様において、音声認識装置は、音声入力部、音声記憶部、音声認識処理部、尤度算出処理部、有効判定処理部、同一判定処理部、閾値調整処理部を備える。音声入力部は、ユーザの音声が入力される。音声記憶部は、音声入力部に入力される音声を記憶する。音声認識処理部は、音声入力部に入力される音声を認識する。尤度算出処理部は、音声認識処理部による音声の認識結果の尤度を算出する。有効判定処理部は、尤度算出処理部により算出される尤度が所定の閾値以上である場合に、音声認識処理部による音声の認識結果を有効と判定する。同一判定処理部は、音声記憶部に記憶されている前回の音声と今回の音声との一致度に基づいて、前回の音声と今回の音声とが同一であるか否かを判定する。閾値調整処理部は、同一判定処理部により前回の音声と今回の音声とが同一であると判定された場合に、閾値を所定の調整値だけ低くする。
In one aspect of the present disclosure, the speech recognition apparatus includes a speech input unit, a speech storage unit, a speech recognition processing unit, a likelihood calculation processing unit, a validity determination processing unit, an identical determination processing unit, and a threshold adjustment processing unit. A voice of the user is input to the voice input unit. The voice storage unit stores voice input to the voice input unit. The voice recognition processing unit recognizes the voice input to the voice input unit. The likelihood calculation processing unit calculates the likelihood of the speech recognition result by the speech recognition processing unit. The validity determination processing unit determines that the speech recognition result by the speech recognition processing unit is valid when the likelihood calculated by the likelihood calculation processing unit is equal to or greater than a predetermined threshold. The same determination processing unit determines whether or not the previous voice and the current voice are the same based on the degree of coincidence between the previous voice and the current voice stored in the voice storage unit. The threshold adjustment processing unit lowers the threshold by a predetermined adjustment value when the same determination processing unit determines that the previous speech and the current speech are the same.
本開示の一態様において、音声認識プログラムは、ユーザの音声が入力される音声入力部と、音声入力部に入力される音声を記憶する音声記憶部と、を備える音声認識装置に、音声認識処理、尤度算出処理、有効判定処理、同一判定処理、閾値調整処理を実行させる。音声認識処理は、音声入力部に入力される音声を認識する。尤度算出処理は、音声認識処理による音声の認識結果の尤度を算出する。有効判定処理は、尤度算出処理により算出される尤度が所定の閾値以上である場合に、音声認識処理による音声の認識結果を有効と判定する。同一判定処理は、音声記憶部に記憶されている前回の音声と今回の音声との一致度に基づいて、前回の音声と今回の音声とが同一であるか否かを判定する。閾値調整処理は、同一判定処理により前回の音声と今回の音声とが同一であると判定された場合に、閾値を所定の調整値だけ低くする。
In one aspect of the present disclosure, a speech recognition program is provided in a speech recognition apparatus that includes a speech input unit that receives user speech and a speech storage unit that stores speech input to the speech input unit. The likelihood calculation process, the validity determination process, the same determination process, and the threshold adjustment process are executed. The voice recognition process recognizes voice input to the voice input unit. In the likelihood calculation process, the likelihood of the speech recognition result by the speech recognition process is calculated. The validity determination process determines that the voice recognition result by the voice recognition process is valid when the likelihood calculated by the likelihood calculation process is equal to or greater than a predetermined threshold. The same determination process determines whether or not the previous voice and the current voice are the same based on the degree of coincidence between the previous voice and the current voice stored in the voice storage unit. The threshold adjustment process lowers the threshold by a predetermined adjustment value when it is determined by the same determination process that the previous voice and the current voice are the same.
本開示に係る音声認識装置および音声認識プログラムによれば、音声の認識結果を有効または無効と判定するための判断基準である閾値を、ユーザが同じ言葉の入力を繰り返す場合に限り、適切な値に変更することができる。
According to the speech recognition apparatus and the speech recognition program according to the present disclosure, a threshold value that is a criterion for determining whether the speech recognition result is valid or invalid is an appropriate value only when the user repeatedly inputs the same word. Can be changed.
本開示についての上記目的およびその他の目的、特徴や利点は、添付の図面を参照しながら下記の詳細な記述により、より明確になる。その図面は、
図1は、本実施形態に係る音声認識装置の構成例を概略的に示す図であり、
図2は、音声認識画面の一例を示す図であり、
図3は、音声認識装置の動作例を示すフローチャートである。
The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description with reference to the accompanying drawings. The drawing
FIG. 1 is a diagram schematically illustrating a configuration example of a speech recognition apparatus according to the present embodiment. FIG. 2 is a diagram showing an example of a voice recognition screen. FIG. 3 is a flowchart illustrating an operation example of the speech recognition apparatus.
以下、音声認識装置の一実施形態について図面を参照しながら説明する。図1に例示する音声認識装置10は、例えば車両に搭載されるものであり、制御部11、音声入力部12、音声出力部13、表示出力部14、操作入力部15、記憶部16などを備える。制御部11は、図示しないマイクロコンピュータを主体として構成されており、音声認識装置10の動作全般を制御する。
Hereinafter, an embodiment of a speech recognition apparatus will be described with reference to the drawings. A voice recognition device 10 illustrated in FIG. 1 is mounted on a vehicle, for example, and includes a control unit 11, a voice input unit 12, a voice output unit 13, a display output unit 14, an operation input unit 15, a storage unit 16, and the like. Prepare. The control unit 11 is configured mainly with a microcomputer (not shown), and controls the overall operation of the speech recognition apparatus 10.
音声入力部12は、図示しないマイクなどを備えており、ユーザの音声が入力される。音声入力部12は、入力される音声を音声データに変換して制御部11に出力する。制御部11は、音声入力部12から入力される音声データを記憶部16に記憶する。記憶部16は、音声記憶部の一例であり、例えばハードディスクドライブなどの記憶媒体で構成されている。記憶部16には、音声入力部12から入力される音声の音声データのほか、音声認識用の辞書データベースなどの音声認識処理に必要な各種のデータが格納されている。
The voice input unit 12 includes a microphone (not shown) and the like, and inputs the user's voice. The voice input unit 12 converts the input voice into voice data and outputs the voice data to the control unit 11. The control unit 11 stores the audio data input from the audio input unit 12 in the storage unit 16. The storage unit 16 is an example of an audio storage unit, and is configured of a storage medium such as a hard disk drive, for example. The storage unit 16 stores various types of data necessary for speech recognition processing such as a speech recognition dictionary database, in addition to speech speech data input from the speech input unit 12.
音声出力部13は、図示しないスピーカなどを備えており、制御部11から入力される音声出力信号に基づいて、音声認識結果などの各種情報を音声により出力する。表示出力部14は、図示しない表示パネルなどを備えており、制御部11から入力される表示出力信号に基づいて、例えば図2に示す音声認識画面Gなどの各種画面を表示する。音声認識画面Gには、例えば音声認識可能なコマンドが例示されており、ユーザは、音声認識画面Gに示されているコマンドを参考に各種の音声、即ち、音声認識装置10に対するコマンドを発することができる。なお、音声認識画面Gに示されるコマンドはあくまでも例であり、音声認識装置10は、音声認識画面Gに示されていない音声も認識可能となっている。操作入力部15は、図示しない操作スイッチなどを備えており、ユーザにより操作スイッチが操作されると、その操作内容を制御部11に入力する。
The voice output unit 13 includes a speaker (not shown) and outputs various information such as a voice recognition result by voice based on a voice output signal input from the control unit 11. The display output unit 14 includes a display panel (not shown), and displays various screens such as a voice recognition screen G shown in FIG. 2 based on a display output signal input from the control unit 11. The voice recognition screen G exemplifies commands that can be recognized, for example, and the user issues various voices, that is, commands for the voice recognition device 10 with reference to the commands shown on the voice recognition screen G. Can do. Note that the commands shown on the speech recognition screen G are merely examples, and the speech recognition apparatus 10 can recognize speech that is not shown on the speech recognition screen G. The operation input unit 15 includes an operation switch (not shown). When the operation switch is operated by the user, the operation content is input to the control unit 11.
制御部11は、音声認識プログラムの一例である音声認識用の制御プログラムを実行することにより、音声認識処理部21、尤度算出処理部22、有効判定処理部23、同一判定処理部24、閾値調整処理部25をソフトウェアにより仮想的に実現する。なお、これらの処理部は、ハードウェアにより実現してもよいし、ソフトウェアとハードウェアの組み合わせにより実現してもよい。
The control unit 11 executes a speech recognition control program, which is an example of a speech recognition program, so that a speech recognition processing unit 21, a likelihood calculation processing unit 22, a validity determination processing unit 23, an identical determination processing unit 24, a threshold value The adjustment processing unit 25 is virtually realized by software. Note that these processing units may be realized by hardware, or may be realized by a combination of software and hardware.
音声認識処理部21は、音声入力部12に入力される音声の音声データと、音声認識用の辞書データベースに格納されている音声データとを照合することにより、両音声データの一致度を特定する。そして、音声認識処理部21は、両音声データが完全に一致、あるいは、概ね一致している場合には、入力された音声が、照合した辞書データベースの音声データの音声であることを認識する。なお、両音声データが概ね一致しているとは、両音声データが例えば80%~90%以上一致している場合など、完全一致と同視できる程度に一致している場合を意味する。
The voice recognition processing unit 21 specifies the degree of coincidence between the two voice data by comparing the voice data of the voice input to the voice input unit 12 with the voice data stored in the dictionary database for voice recognition. . The voice recognition processing unit 21 recognizes that the input voice is the voice of the voice data of the collated dictionary database when the two voice data are completely or almost the same. It should be noted that the two audio data substantially coincide with each other means that the audio data coincide with each other to the extent that they can be regarded as being completely coincident, for example, when both audio data coincide with 80% to 90% or more.
尤度算出処理部22は、音声認識処理部21による音声の認識結果の尤度を算出する。即ち、尤度算出処理部22は、音声認識処理部21による音声認識処理時における上記両音声データの一致度に応じ、一致度が高いほど高い尤度を算出し、一致度が低いほど低い尤度を算出する。
The likelihood calculation processing unit 22 calculates the likelihood of the speech recognition result by the speech recognition processing unit 21. That is, the likelihood calculation processing unit 22 calculates a higher likelihood as the matching degree is higher according to the matching degree of the both voice data at the time of the voice recognition processing by the voice recognition processing unit 21, and lower likelihood as the matching degree is lower. Calculate the degree.
有効判定処理部23は、尤度算出処理部22により算出される尤度が所定の閾値T以上である場合には、音声認識処理部21による音声の認識結果を有効と判定し、閾値Tよりも小さい場合には、音声認識処理部21による音声の認識結果を無効と判定する。なお、有効判定処理部23は、音声認識処理部21による音声の認識結果を無効と判定する場合には、その判定理由、例えば、ユーザの音声が小さすぎる、あるいは、大きすぎる旨を示す情報、ユーザの音声が速すぎる、あるいは、遅すぎる旨を示す情報などといった無効判定理由情報を出力するようになっている。
The validity determination processing unit 23 determines that the speech recognition result by the speech recognition processing unit 21 is valid when the likelihood calculated by the likelihood calculation processing unit 22 is equal to or greater than a predetermined threshold T. Is smaller, the speech recognition result by the speech recognition processing unit 21 is determined to be invalid. When the validity determination processing unit 23 determines that the voice recognition result by the voice recognition processing unit 21 is invalid, the determination reason, for example, information indicating that the user's voice is too low or too high, Invalid determination reason information such as information indicating that the user's voice is too fast or too slow is output.
無効判定理由情報は、例えば、音声出力部13による音声により出力してもよいし、表示出力部14による画面表示により出力してもよいし、音声出力部13による聴覚的出力および表示出力部14による視覚的出力の双方により行ってもよい。また、音声の大きさや速さは、例えば、音声入力部12が出力する音声データの信号レベルを大きさ判定用の閾値や速さ判定用の閾値と比較することにより特定することができる。
The invalidity determination reason information may be output by, for example, a voice by the voice output unit 13, may be output by a screen display by the display output unit 14, or an auditory output and display output unit 14 by the voice output unit 13. It may be performed by both visual output by. Also, the loudness and speed of the voice can be specified by comparing the signal level of the voice data output from the voice input unit 12 with a threshold for judging the magnitude and a threshold for judging the speed, for example.
同一判定処理部24は、記憶部16に記憶されている前回の入力音声の音声データと今回の入力音声の音声データとの一致度に基づいて、前回入力された音声と今回入力された音声とが同一であるか否かを判定する。即ち、同一判定処理部24は、前回の入力音声の音声データと今回の入力音声の音声データとの一致度を特定する。そして、同一判定処理部24は、両音声データが完全に一致、あるいは、概ね一致している場合には、前回入力された音声と今回入力された音声とが同一であると判定する。なお、両音声データが概ね一致しているとは、両音声データが例えば80%~90%以上一致している場合など、完全一致と同視できる程度に一致している場合を意味する。
Based on the degree of coincidence between the voice data of the previous input voice stored in the storage unit 16 and the voice data of the current input voice, the same determination processing unit 24 determines whether the voice inputted last time and the voice inputted this time Are determined to be the same. That is, the same determination processing unit 24 specifies the degree of coincidence between the voice data of the previous input voice and the voice data of the current input voice. The identical determination processing unit 24 determines that the previously input sound and the sound input this time are the same when the two sound data are completely or approximately the same. It should be noted that the two audio data substantially coincide with each other means that the audio data coincide with each other to the extent that they can be regarded as being completely coincident, for example, when both audio data coincide with 80% to 90% or more.
閾値調整処理部25は、同一判定処理部24により前回の音声と今回の音声とが同一であると判定された場合に、上記の閾値Tを所定の調整値だけ低くする。閾値調整処理部25は、尤度算出処理部22により算出される尤度に応じて調整値を変化させるように構成されている。この場合、閾値調整処理部25は、算出される尤度が高いほど調整値を大きくする。また、閾値調整処理部25は、過去における複数の音声認識処理において、尤度算出処理部22により算出される尤度が閾値T以上となった回数に応じて調整値を変化させるように構成されている。この場合、閾値調整処理部25は、算出される尤度が閾値T以上となった回数が多いほど調整値を大きくする。
The threshold adjustment processing unit 25 lowers the above threshold T by a predetermined adjustment value when the same determination processing unit 24 determines that the previous sound and the current sound are the same. The threshold adjustment processing unit 25 is configured to change the adjustment value according to the likelihood calculated by the likelihood calculation processing unit 22. In this case, the threshold adjustment processing unit 25 increases the adjustment value as the calculated likelihood is higher. The threshold adjustment processing unit 25 is configured to change the adjustment value according to the number of times that the likelihood calculated by the likelihood calculation processing unit 22 is equal to or greater than the threshold T in a plurality of speech recognition processes in the past. ing. In this case, the threshold adjustment processing unit 25 increases the adjustment value as the number of times that the calculated likelihood becomes equal to or greater than the threshold T increases.
次に、音声認識装置10の動作例について説明する。図3に例示するように、音声認識装置10は、音声認識画面Gを表示すると、音声が入力されたか否かを監視する(S1)。そして、音声認識装置10は、音声が入力されると(S1:YES)、その音声データを記憶する(S2)。そして、音声認識装置10は、ステップS1時と同一の音声認識画面Gを表示している状態において、再度、音声が入力されたか否かを監視する(S3)。音声認識装置10は、再度、音声が入力された場合(S3:YES)には、その音声データを記憶して(S4)、ステップS5に移行する。なお、音声認識装置10は、所定の待機時間内に、再度、音声が入力されない場合(S3:NO)には、ステップS1に移行する。
Next, an operation example of the speech recognition apparatus 10 will be described. As illustrated in FIG. 3, when the voice recognition device G displays the voice recognition screen G, the voice recognition device 10 monitors whether or not voice is input (S1). Then, when voice is input (S1: YES), the voice recognition device 10 stores the voice data (S2). Then, the voice recognition device 10 monitors again whether or not voice is input in a state where the same voice recognition screen G as that at the time of step S1 is displayed (S3). When the voice is input again (S3: YES), the voice recognition device 10 stores the voice data (S4), and proceeds to step S5. Note that the voice recognition device 10 proceeds to step S1 when no voice is input again within a predetermined waiting time (S3: NO).
音声認識装置10は、ステップS5に移行すると、音声認識処理部21による音声認識処理を実行する。そして、音声認識装置10は、尤度算出処理部22による尤度算出処理を実行する(S6)。そして、音声認識装置10は、同一判定処理部24による同一判定処理を実行する(S7)。音声認識装置10は、同一判定処理により、前回入力された音声と今回入力された音声とが同一であると判定した場合(S7:YES)には、閾値Tを低くする(S8)。なお、ステップS8により閾値調整処理の一例が構成されている。また、音声認識装置10は、閾値Tを低くしたこと、即ち、音声認識結果を有効あるいは無効と判断するための判断基準値をユーザの発話態様に適応させたことを出力して(S9)、ステップS10に移行する。
When the speech recognition apparatus 10 proceeds to step S5, the speech recognition processing unit 21 performs speech recognition processing. And the speech recognition apparatus 10 performs the likelihood calculation process by the likelihood calculation process part 22 (S6). And the speech recognition apparatus 10 performs the same determination process by the same determination process part 24 (S7). If the speech recognition apparatus 10 determines that the previously input speech and the current input speech are the same (S7: YES) by the same determination process, the speech recognition apparatus 10 decreases the threshold T (S8). Note that an example of threshold adjustment processing is configured by step S8. Further, the speech recognition apparatus 10 outputs that the threshold value T has been lowered, that is, that the judgment reference value for judging whether the speech recognition result is valid or invalid is adapted to the user's speech mode (S9), The process proceeds to step S10.
なお、判断基準値を適応させたことの出力は、例えば、音声出力部13による音声により出力してもよいし、表示出力部14による画面表示により出力してもよいし、音声出力部13による聴覚的出力および表示出力部14による視覚的出力の双方により行ってもよい。また、音声認識装置10は、同一判定処理により、前回入力された音声と今回入力された音声とが同一でないと判定した場合(S7:NO)には、ステップS8,S9を実行することなくステップS10に移行する。
Note that the output indicating that the determination reference value is adapted may be output by voice by the voice output unit 13, may be output by screen display by the display output unit 14, or may be output by the voice output unit 13. You may perform both by an audio output and the visual output by the display output part 14. FIG. If the speech recognition apparatus 10 determines that the previously input speech and the current input speech are not the same (S7: NO) by the same determination processing, step S8 and S9 are not performed. The process proceeds to S10.
音声認識装置10は、ステップS10に移行すると、ステップS6において算出した尤度が閾値T以上か否かを判定する。ステップS10は、有効判定処理の一例である。音声認識装置10は、尤度が閾値T以上である場合(S10:YES)には、ステップS5における音声の認識結果を有効と判定して(S11)、この制御を終了する。一方、音声認識装置10は、尤度が閾値Tよりも小さい場合(S10:NO)には、ステップS5における音声の認識結果を無効と判定する(S12)。そして、音声認識装置10は、無効判定理由情報を出力して(S13)、この制御を終了する。なお、ステップS10,S11,S12により、有効判定処理の一例が構成されている。
When the speech recognition apparatus 10 proceeds to step S10, the speech recognition apparatus 10 determines whether or not the likelihood calculated in step S6 is greater than or equal to a threshold value T. Step S10 is an example of a validity determination process. If the likelihood is greater than or equal to the threshold T (S10: YES), the speech recognition apparatus 10 determines that the speech recognition result in step S5 is valid (S11), and ends this control. On the other hand, when the likelihood is smaller than the threshold value T (S10: NO), the speech recognition apparatus 10 determines that the speech recognition result in step S5 is invalid (S12). Then, the speech recognition apparatus 10 outputs invalidity determination reason information (S13) and ends this control. An example of the validity determination process is configured by steps S10, S11, and S12.
音声認識装置10によれば、同一の音声認識画面Gを表示している状態においてユーザが何度も音声を入力する場合であって、前回の音声と今回の音声とが同一であると判定できる場合には、音声の認識結果を有効または無効と判定するための判断基準値である閾値Tを所定の調整値だけ低くするようにした。これにより、音声の認識結果が有効と判断されやすくなるため、ユーザが何度も同じ音声を入力し続けなければならない事態を回避することができる。このように、音声認識装置10によれば、音声の認識結果を有効または無効と判定するための判断基準値を、ユーザが同じ言葉の入力を繰り返す場合に限り、適切な値に変更することができる。
According to the speech recognition apparatus 10, it is possible to determine that the previous speech and the current speech are the same when the user repeatedly inputs speech while the same speech recognition screen G is displayed. In this case, the threshold value T, which is a determination reference value for determining whether the speech recognition result is valid or invalid, is lowered by a predetermined adjustment value. Thereby, since it becomes easy to determine that the recognition result of the voice is valid, it is possible to avoid a situation where the user has to keep inputting the same voice over and over again. As described above, according to the speech recognition device 10, the determination reference value for determining whether the speech recognition result is valid or invalid can be changed to an appropriate value only when the user repeatedly inputs the same word. it can.
また、音声認識装置10によれば、算出される尤度に応じて閾値Tの調整値を変化させるようにした。この場合、音声認識装置10は、算出される尤度が高いほど調整値を大きくする。即ち、音声認識結果の尤らしさが高いほど閾値Tをより小さな値とすることで、尤らしさが高い音声認識結果が有効と判断される確率を高めることができる。
Further, according to the speech recognition apparatus 10, the adjustment value of the threshold T is changed according to the calculated likelihood. In this case, the speech recognition apparatus 10 increases the adjustment value as the calculated likelihood is higher. That is, by setting the threshold T to a smaller value as the likelihood of the speech recognition result is higher, the probability that the speech recognition result having a higher likelihood is determined to be effective can be increased.
また、音声認識装置10によれば、過去における複数の音声認識処理において算出される尤度が閾値T以上となった回数に応じて、閾値Tの調整値を変化させるようにした。この場合、音声認識装置10は、算出される尤度が閾値T以上となった回数が多いほど調整値を大きくする。即ち、過去における複数の音声認識処理時において音声の認識結果が有効と判断された回数が多いほど閾値Tをより小さな値とすることで、例えば、有効と判断されやすいユーザの音声については、その音声認識結果が有効と判断される確率を高めることができる。
Further, according to the speech recognition apparatus 10, the adjustment value of the threshold T is changed according to the number of times that the likelihood calculated in the plurality of speech recognition processes in the past becomes equal to or greater than the threshold T. In this case, the speech recognition apparatus 10 increases the adjustment value as the number of times that the calculated likelihood is equal to or greater than the threshold value T increases. That is, by increasing the number of times that the speech recognition result is determined to be valid during a plurality of speech recognition processes in the past, the threshold T is set to a smaller value. The probability that the speech recognition result is determined to be valid can be increased.
また、音声認識装置10によれば、音声の認識結果を無効と判定する場合には、その判定理由も出力する。これにより、ユーザは、出力された判定理由に基づき発話態様を改善することができ、音声の認識結果が有効と判断される確率を高めることができる。
Also, according to the speech recognition apparatus 10, when it is determined that the speech recognition result is invalid, the determination reason is also output. Thereby, the user can improve the speech mode based on the output determination reason, and can increase the probability that the speech recognition result is determined to be valid.
なお、本開示は、上述した実施形態に限定されるものではなく、その要旨を逸脱しない範囲で種々の実施形態に適用可能である。
例えば、音声認識装置10は、ステップS8において、閾値Tを低くするだけでなく、音声認識処理の処理速度を遅くする、あるいは、音声認識処理の処理にかける時間を長くする、といった処理を行うようにしてもよい。即ち、音声認識処理の処理速度を遅くしたり、処理時間を長くしたりすることにより、より正確に音声認識処理を行うことができるようになる。正しい音声認識結果が得られずユーザが同じ言葉を何度も繰り返し入力する場合には、このような正確な音声認識処理を行うための措置を行うことが有効である。 Note that the present disclosure is not limited to the above-described embodiments, and can be applied to various embodiments without departing from the gist thereof.
For example, in step S8, thespeech recognition apparatus 10 not only lowers the threshold T, but also performs processing such as reducing the processing speed of the speech recognition processing or increasing the time required for the speech recognition processing. It may be. That is, the speech recognition process can be performed more accurately by reducing the processing speed of the speech recognition process or increasing the processing time. When a correct speech recognition result cannot be obtained and the user repeatedly inputs the same word over and over, it is effective to take measures for performing such accurate speech recognition processing.
例えば、音声認識装置10は、ステップS8において、閾値Tを低くするだけでなく、音声認識処理の処理速度を遅くする、あるいは、音声認識処理の処理にかける時間を長くする、といった処理を行うようにしてもよい。即ち、音声認識処理の処理速度を遅くしたり、処理時間を長くしたりすることにより、より正確に音声認識処理を行うことができるようになる。正しい音声認識結果が得られずユーザが同じ言葉を何度も繰り返し入力する場合には、このような正確な音声認識処理を行うための措置を行うことが有効である。 Note that the present disclosure is not limited to the above-described embodiments, and can be applied to various embodiments without departing from the gist thereof.
For example, in step S8, the
また、音声認識装置10は、少なくとも前回入力された音声の音声データを記憶すればよい。従って、音声認識装置10は、過去の音声データを古いものから順に消去する構成としてもよく、この場合、限りある記憶部16の記憶容量を有効に活用できる。また、音声認識装置10は、尤度を複数段階、例えば、「高」、「中」、「低」の3段階で算出する構成としてもよいし、例えばパーセンテージによる連続的な数値により算出する構成としてもよい。
Further, the voice recognition device 10 may store at least voice data of the voice input last time. Therefore, the speech recognition apparatus 10 may be configured to erase past speech data in order from the oldest, and in this case, the limited storage capacity of the storage unit 16 can be effectively utilized. In addition, the speech recognition apparatus 10 may be configured to calculate the likelihood in a plurality of stages, for example, three stages of “high”, “medium”, and “low”, or to calculate the likelihood by, for example, continuous numerical values by percentage. It is good.
また、音声認識装置10は、算出される尤度が高いほど閾値Tの調整値を小さくする構成としてもよい。また、音声認識装置10は、過去における複数の音声認識処理において、算出される尤度が閾値T以上となった回数が多いほど閾値Tの調整値を小さくする構成としてもよい。
また、本実施形態に係る音声認識プログラムは、例えばコンピュータが読み取り可能な記憶媒体に記憶することができる。 Further, thespeech recognition apparatus 10 may be configured to decrease the adjustment value of the threshold T as the calculated likelihood is higher. Further, the speech recognition apparatus 10 may be configured to decrease the adjustment value of the threshold T as the number of times that the calculated likelihood is equal to or greater than the threshold T in a plurality of past speech recognition processes.
Further, the voice recognition program according to the present embodiment can be stored in, for example, a computer-readable storage medium.
また、本実施形態に係る音声認識プログラムは、例えばコンピュータが読み取り可能な記憶媒体に記憶することができる。 Further, the
Further, the voice recognition program according to the present embodiment can be stored in, for example, a computer-readable storage medium.
本開示は、実施例に準拠して記述されたが、本開示は当該実施例や構造に限定されるものではないと理解される。本開示は、様々な変形例や均等範囲内の変形をも包含する。加えて、様々な組み合わせや形態、さらには、それらに一要素のみ、それ以上、あるいはそれ以下、を含む他の組み合わせや形態をも、本開示の範疇や思想範囲に入るものである。
Although the present disclosure has been described with reference to the embodiments, it is understood that the present disclosure is not limited to the embodiments and structures. The present disclosure includes various modifications and modifications within the equivalent range. In addition, various combinations and forms, as well as other combinations and forms including only one element, more or less, are within the scope and spirit of the present disclosure.
Although the present disclosure has been described with reference to the embodiments, it is understood that the present disclosure is not limited to the embodiments and structures. The present disclosure includes various modifications and modifications within the equivalent range. In addition, various combinations and forms, as well as other combinations and forms including only one element, more or less, are within the scope and spirit of the present disclosure.
Claims (5)
- ユーザの音声が入力される音声入力部(12)と、
前記音声入力部に入力される音声を記憶する音声記憶部(16)と、
前記音声入力部に入力される音声を認識する音声認識処理部(21)と、
前記音声認識処理部による音声の認識結果の尤度を算出する尤度算出処理部(22)と、
前記尤度算出処理部により算出される尤度が所定の閾値以上である場合に、前記音声認識処理部による音声の認識結果を有効と判定する有効判定処理部(23)と、
前記音声記憶部に記憶されている前回の音声と今回の音声との一致度に基づいて、前回の音声と今回の音声とが同一であるか否かを判定する同一判定処理部(24)と、
前記同一判定処理部により前回の音声と今回の音声とが同一であると判定された場合に、前記閾値を所定の調整値だけ低くする閾値調整処理部(25)と、
を備える音声認識装置。 A voice input unit (12) through which a user's voice is input;
A voice storage unit (16) for storing voice input to the voice input unit;
A speech recognition processing unit (21) for recognizing speech input to the speech input unit;
A likelihood calculation processing unit (22) for calculating the likelihood of the speech recognition result by the speech recognition processing unit;
A validity determination processing unit (23) that determines that the speech recognition result by the speech recognition processing unit is valid when the likelihood calculated by the likelihood calculation processing unit is equal to or greater than a predetermined threshold;
An identical determination processing unit (24) for determining whether or not the previous voice and the current voice are the same based on the degree of coincidence between the previous voice and the current voice stored in the voice storage unit; ,
A threshold adjustment processing unit (25) for lowering the threshold by a predetermined adjustment value when it is determined by the same determination processing unit that the previous voice and the current voice are the same;
A speech recognition apparatus comprising: - 前記閾値調整処理部は、前記尤度算出処理部により算出される尤度に応じて前記調整値を変化させる請求項1に記載の音声認識装置。 The speech recognition apparatus according to claim 1, wherein the threshold adjustment processing unit changes the adjustment value according to the likelihood calculated by the likelihood calculation processing unit.
- 前記閾値調整処理部は、前記尤度算出処理部により算出される尤度が前記閾値以上となった回数に応じて前記調整値を変化させる請求項1または2に記載の音声認識装置。 The voice recognition device according to claim 1 or 2, wherein the threshold adjustment processing unit changes the adjustment value according to the number of times the likelihood calculated by the likelihood calculation processing unit is equal to or greater than the threshold.
- 前記有効判定処理部は、前記尤度算出処理部により算出される尤度が所定の閾値よりも小さい場合に、前記音声認識処理部による音声の認識結果を無効と判定するとともに、その判定理由を出力する請求項1から3の何れか1項に記載の音声認識装置。 The validity determination processing unit determines that the speech recognition result by the speech recognition processing unit is invalid when the likelihood calculated by the likelihood calculation processing unit is smaller than a predetermined threshold, and the determination reason is The speech recognition apparatus according to any one of claims 1 to 3, which outputs the speech recognition apparatus.
- ユーザの音声が入力される音声入力部(12)と、前記音声入力部に入力される音声を記憶する音声記憶部(16)と、を備える音声認識装置(10)に、
前記音声入力部に入力される音声を認識する音声認識処理と、
前記音声認識処理による音声の認識結果の尤度を算出する尤度算出処理と、
前記尤度算出処理により算出される尤度が所定の閾値以上である場合に、前記音声認識処理による音声の認識結果を有効と判定する有効判定処理と、
前記音声記憶部に記憶されている前回の音声と今回の音声との一致度に基づいて、前回の音声と今回の音声とが同一であるか否かを判定する同一判定処理と、
前記同一判定処理により前回の音声と今回の音声とが同一であると判定された場合に、前記閾値を所定の調整値だけ低くする閾値調整処理と、
を実行させる音声認識プログラム。 A voice recognition device (10) comprising a voice input unit (12) for inputting a user's voice and a voice storage unit (16) for storing voice input to the voice input unit,
A speech recognition process for recognizing speech input to the speech input unit;
Likelihood calculation processing for calculating the likelihood of the speech recognition result by the speech recognition processing;
When the likelihood calculated by the likelihood calculation process is equal to or greater than a predetermined threshold value, the validity determination process for determining that the voice recognition result by the voice recognition process is valid;
The same determination process for determining whether the previous voice and the current voice are the same based on the degree of coincidence between the previous voice and the current voice stored in the voice storage unit;
A threshold adjustment process for lowering the threshold by a predetermined adjustment value when it is determined by the same determination process that the previous voice and the current voice are the same;
Voice recognition program that executes
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2016-043348 | 2016-03-07 | ||
JP2016043348A JP6716968B2 (en) | 2016-03-07 | 2016-03-07 | Speech recognition device, speech recognition program |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017154358A1 true WO2017154358A1 (en) | 2017-09-14 |
Family
ID=59790301
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2017/001556 WO2017154358A1 (en) | 2016-03-07 | 2017-01-18 | Speech recognition device and speech recognition program |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP6716968B2 (en) |
WO (1) | WO2017154358A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI622029B (en) * | 2017-09-15 | 2018-04-21 | 驊鉅數位科技有限公司 | Interactive language learning system with pronunciation recognition |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108922520B (en) * | 2018-07-12 | 2021-06-01 | Oppo广东移动通信有限公司 | Voice recognition method, voice recognition device, storage medium and electronic equipment |
JP2023154894A (en) * | 2022-04-08 | 2023-10-20 | キヤノン株式会社 | Information conversion system, information processing device, information processing method and program |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11149294A (en) * | 1997-11-17 | 1999-06-02 | Toyota Motor Corp | Voice recognition device and voice recognition method |
JP2003091299A (en) * | 2001-07-13 | 2003-03-28 | Honda Motor Co Ltd | On-vehicle voice recognition device |
JP2004325635A (en) * | 2003-04-23 | 2004-11-18 | Sharp Corp | Apparatus, method, and program for speech processing, and program recording medium |
JP2006030915A (en) * | 2004-07-22 | 2006-02-02 | Iwatsu Electric Co Ltd | Method and device for speech recognition |
JP2007041319A (en) * | 2005-08-03 | 2007-02-15 | Matsushita Electric Ind Co Ltd | Speech recognition device and speech recognition method |
WO2009008115A1 (en) * | 2007-07-09 | 2009-01-15 | Mitsubishi Electric Corporation | Voice recognizing apparatus and navigation system |
-
2016
- 2016-03-07 JP JP2016043348A patent/JP6716968B2/en active Active
-
2017
- 2017-01-18 WO PCT/JP2017/001556 patent/WO2017154358A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11149294A (en) * | 1997-11-17 | 1999-06-02 | Toyota Motor Corp | Voice recognition device and voice recognition method |
JP2003091299A (en) * | 2001-07-13 | 2003-03-28 | Honda Motor Co Ltd | On-vehicle voice recognition device |
JP2004325635A (en) * | 2003-04-23 | 2004-11-18 | Sharp Corp | Apparatus, method, and program for speech processing, and program recording medium |
JP2006030915A (en) * | 2004-07-22 | 2006-02-02 | Iwatsu Electric Co Ltd | Method and device for speech recognition |
JP2007041319A (en) * | 2005-08-03 | 2007-02-15 | Matsushita Electric Ind Co Ltd | Speech recognition device and speech recognition method |
WO2009008115A1 (en) * | 2007-07-09 | 2009-01-15 | Mitsubishi Electric Corporation | Voice recognizing apparatus and navigation system |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI622029B (en) * | 2017-09-15 | 2018-04-21 | 驊鉅數位科技有限公司 | Interactive language learning system with pronunciation recognition |
Also Published As
Publication number | Publication date |
---|---|
JP6716968B2 (en) | 2020-07-01 |
JP2017161581A (en) | 2017-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10446155B2 (en) | Voice recognition device | |
CN106796786B (en) | Speech recognition system | |
US20200075028A1 (en) | Speaker recognition and speaker change detection | |
US20150019074A1 (en) | System and method for controlling a speech recognition system | |
WO2017154358A1 (en) | Speech recognition device and speech recognition program | |
KR20100083572A (en) | Signal processing apparatus and method of recognizing voice thereof | |
WO2016110068A1 (en) | Voice switching method and apparatus for voice recognition device | |
WO2010086925A1 (en) | Voice recognition device | |
JP5431282B2 (en) | Spoken dialogue apparatus, method and program | |
JP2015219441A (en) | Operation support device and operation support method | |
KR101151571B1 (en) | Speech recognition environment control apparatus for spoken dialog system and method thereof | |
US10861447B2 (en) | Device for recognizing speeches and method for speech recognition | |
WO2020153109A1 (en) | Presentation assistance device for calling attention to words that are forbidden to speak | |
JP2018207169A (en) | Apparatus controller and apparatus control method | |
US9548065B2 (en) | Energy post qualification for phrase spotting | |
JP6759058B2 (en) | Voice recognition device and voice recognition method | |
JP2018097029A (en) | Voice recognition device and voice recognition method | |
JP2020086571A (en) | In-vehicle device and speech recognition method | |
JP6459330B2 (en) | Speech recognition apparatus, speech recognition method, and speech recognition program | |
JP2013019958A (en) | Sound recognition device | |
JP2018116130A (en) | In-vehicle voice processing unit and in-vehicle voice processing method | |
JP2006208486A (en) | Voice inputting device | |
US20090106025A1 (en) | Speaker model registering apparatus and method, and computer program | |
CN110580901A (en) | Speech recognition apparatus, vehicle including the same, and vehicle control method | |
JP2013083796A (en) | Method for identifying male/female voice, male/female voice identification device, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17762704 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17762704 Country of ref document: EP Kind code of ref document: A1 |