KR100677711B1

KR100677711B1 - Voice recognition apparatus, storage medium and navigation apparatus

Info

Publication number: KR100677711B1
Application number: KR1020040110491A
Authority: KR
Inventors: 스즈키류이치; 요코이구니오; 아카호리이치로; 사카이마코토; 스즈키사토시; 다테이시마사히코
Original assignee: 가부시키가이샤 덴소
Priority date: 2004-01-30
Filing date: 2004-12-22
Publication date: 2007-02-02
Also published as: JP4453377B2; KR20050078195A; JP2005215474A

Abstract

화자에게 가능한한 사용하기에 편리한 음성 인식 장치 등을 제공한다.Provide a speaker with a voice recognition device that is as convenient to use as possible.

음성을 입력하면(S120), 음성을 분석하여 후보 단음절을 복수 선택하고(S125), 제1 후보 단음절을 통지한다(S135). 또한, 확정 SW가 조작되거나 음성이 입력되는 것에 의해 처리를 분기하고(S140), 확정 SW가 조작된 경우에는 통지한 후보 단음절을 확정 단음절로 하고(S145), 추가로 음성이 입력된 경우에는 다시 음성을 분석하여 후보 단음절을 선택한다(S125). 이 결과, 화자는 인식이 정확하게 이루어질 수 없는 경우에, 스위치 등을 조작하여 재입력 지시를 하지 않고 계속하여 재발화만 하면 된다.When the voice is input (S120), the voice is analyzed to select a plurality of candidate single syllables (S125), and the first candidate single syllable is notified (S135). Further, the processing is branched by the operation of the confirmation SW or by input of the voice (S140). When the confirmation SW is operated, the notified candidate single syllable is determined as the confirmation single syllable (S145). The candidate single syllable is selected by analyzing the voice (S125). As a result, in the case where the recognition cannot be made accurately, the speaker only needs to continuously re-ignite without operating the switch or the like and giving a re-input instruction.

단음절, 음성 인식, GPS, 네비게이션Single syllable, speech recognition, GPS, navigation

Description

VOICE RECOGNITION APPARATUS, STORAGE MEDIUM AND NAVIGATION APPARATUS}

도1은 네비게이션 장치의 개략 구성도.1 is a schematic configuration diagram of a navigation device.

도2는 음성 인식 처리1을 설명하기 위한 플로우차트.2 is a flowchart for explaining a speech recognition process 1. FIG.

도3은 음성 인식 처리2를 설명하기 위한 플로우차트.3 is a flowchart for explaining a speech recognition process 2. FIG.

도4는 음성 인식 처리3을 설명하기 위한 플로우차트.4 is a flowchart for explaining a speech recognition process 3. FIG.

도5는 음성 인식 처리4을 설명하기 위한 플로우차트.5 is a flowchart for explaining speech recognition processing 4. FIG.

도6은 화면 이미지.6 is a screen image.

도7은 화면 이미지.7 is a screen image.

도8은 화면 이미지.8 is a screen image.

도9는 입력 횟수에 의한 인식율의 변화를 나타내는 그래프이다.9 is a graph showing a change in recognition rate by the number of inputs.

*도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

20: 네비게이션 장치 21: 위치 검출기20: navigation device 21: position detector

21a: GPS 수신기 21b: 자이로스코프21a: GPS receiver 21b: gyroscope

21c: 거리 센서 21d: 지자기 센서21c: distance sensor 21d: geomagnetic sensor

22: 조작 스위치군 23a: 리모콘22: operation switch group 23a: remote control

23b: 리모콘 센서 25: 지도 데이터 입력기23b: remote sensor 25: map data input

26: 표시부 27: 음성 출력부26: display unit 27: audio output unit

28: 마이크로폰 29: 제어부28: microphone 29: control unit

30: 음성 인식 관련 데이터 입출력기30: Data input / output related to speech recognition

31: 차내 LAN 통신부31: In-car LAN communication unit

본 발명은, 화자(話者)에 의해 입력된 음성에 기초하여, 화자가 의도하는 단음절을 결정하는 음성 인식 장치 등에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition device or the like for determining a single syllable intended by a speaker based on a voice input by a speaker.

화자에 의해 입력된 음성에 기초하여, 화자가 의도하는 단음절을 하나의 단음절씩 결정하는 음성 인식 장치가 널리 알려져 있다. 이러한 타입의 음성 인식 장치는, 단어(복수의 단음절로 이루어진 낱말(語)) 단위의 음성 인식을 수행하는 음성 인식 장치와 같이 음성 인식을 수행하는 단어 전부에 대응하는 단어 사전을 미리 구비하고 있을 필요가 없기 때문에, 최종적인 인식 결과의 집합(예를 들면, 문장)으로서는 사실상 어떤 것이라도 인식시킬 수 있다는 이점이 있다.Background of the Invention A speech recognition apparatus that determines single syllables intended by a speaker by one syllable based on a speech input by a speaker is widely known. This type of speech recognition apparatus needs to be provided with a word dictionary corresponding to all the words for speech recognition in advance, such as a speech recognition apparatus for speech recognition in units of words (words composed of a plurality of single syllables). There is no advantage in that virtually anything can be recognized as the final set of recognition results (eg, sentences).

그러나, 단음절의 음성을 인식하는 경우에는, 단어 단위의 음성 인식에 비해서 인식 단서가 적기 때문에 일반적으로 인식율이 낮다. 그 때문에, 이러한 단음절 의 음성을 인식하는 음성 인식 장치에서는, 보다 인식 정밀도를 향상시키기 위해서 각종 연구(工夫)가 실시되고 있다. 예를 들면, 화자가 발화 방법을 연구하여 입력함으로써 인식 정밀도를 향상시키도록 되어 있거나, 음성 인식 장치가 인식한 단음절을 음성 출력(토크 백)함으로써 화자에게 확인시켜서 최종적인 인식 정밀도를 향상시키도록 되어 있다.However, in the case of recognizing single-syllable speech, the recognition rate is generally low because recognition clues are smaller than speech recognition in word units. Therefore, in the speech recognition apparatus which recognizes such single-syllable speech, various studies have been conducted to improve the recognition accuracy. For example, the speaker may improve the recognition accuracy by studying and inputting a speech method, or confirm the speaker by outputting (talk back) the single syllable recognized by the speech recognition device to improve the final recognition accuracy. have.

여기서, 전자의 방법에 대하여 설명한다. 특허문헌1에 기재된 음성 인식 장치는, 화자가, 예를 들면,「あいうえおのあ(아이우에오의 아)」라고 입력함으로써 단음절의 음성「あ(아)」를 인식하는 것이다. 이와 같이, 화자가 단음절보다 긴 단음절 인식용 특정어를 입력함으로써, 간단하게 단음절을 입력할 경우와 비교하여 음성 인식 장치의 인식 정밀도를 향상시킬 수 있다. Here, the former method is demonstrated. In the speech recognition apparatus described in Patent Literature 1, a speaker recognizes a single syllable speech “A” by inputting, for example, “Aoi Eonoa”. As described above, by inputting a specific word for recognizing a single syllable longer than a single syllable, the speaker can improve the recognition accuracy of the speech recognition device as compared with the case of simply inputting a single syllable.

(특허문헌1) 일본 특허공개 평11-184495호 공보Patent Document 1: Japanese Patent Application Laid-Open No. 11-184495

그러나, 이러한 음성 인식 방법을 이용한 음성 인식 장치에서도, 화자의 말투(소위, 버릇) 또는 발화(發話)시의 소음 환경 등에 의해, 오인식(誤認識)을 완전히 방지하는 것은 곤란한 것이 사실이다. 또한, 단음절의 음성을 인식하는 음성 인식 장치의 경우에는, 화자가 한 음절 한 음절마다 수정이나 확정을 해야 할 필요가 있고, 오인식이 있는 경우에는 다시 화자가 수고해야 한다는 문제가 있다.However, it is true that even in the speech recognition apparatus using such a speech recognition method, it is difficult to completely prevent misrecognition due to the speaker's tone (so-called habit) or noise environment during speech. In addition, in the case of a speech recognition device that recognizes a single syllable voice, the speaker needs to correct or confirm every syllable syllable, and if there is a misperception, the speaker needs to work again.

본 발명은, 이와 같은 문제를 감안하여 이루어진 것이며, 화자에게 가능한한 사용하기 편리한 음성 인식 장치 등을 제공하는 것을 목적으로 한다.This invention is made | formed in view of such a problem, and an object of this invention is to provide a speech recognition apparatus etc. which are as easy to use for a speaker as possible.

상기 과제를 해결하기 위해서 이루어진 본 발명의 일양태에 따른 음성 인식 장치는, 음성 입력 수단, 음성 인식 수단, 통지 수단, 수신 수단 및 제어 수단을 포함한다. 음성 입력 수단은 화자가 발성한 음성을 입력하고, 음성 인식 수단은 입력 수단이 입력한 음성을 분석하여 후보 단음절을 특정하고, 통지 수단은 지정된 정보를 통지하고, 수신 수단은 화자의 조작을 받아 들인다. 또한, 제어 수단은, 음성 인식 수단이 특정한 후보 단음절 중에서 가장 가망성이 높은 후보 단음절을 통지 수단에 통지하도록 하는 통지 처리를 수행하고, 화자로부터 결정을 의미하는 조작을 수신 수단이 받아 들인 경우에는, 직전의 통지 처리시에 통지한 후보 단음절을 화자가 의도하는 단음절로서 확정하는 확정 처리를 수행하고, 화자로부터 새로운 음성이 음성 입력 수단으로 입력되어 음성 인식 수단이 후보 단음절을 특정한 경우에는 통지 처리의 수행으로 복귀하여 그 후보 단음절 중에서 가장 가망성이 높은 후보 단음절을 통지 수단에 통지시킨다. 또한, 여기서 언급한 후보 단음절이란 글자 그대로 단음절의 후보이며, 음성 인식 수단이 특정하는 후보 단음절은 하나이어도 좋고 복수이어도 좋다.The speech recognition apparatus which concerns on one aspect of this invention made in order to solve the said subject contains a speech input means, a speech recognition means, a notification means, a receiving means, and a control means. The voice input means inputs the voice uttered by the speaker, the voice recognition means analyzes the voice input by the input means to specify candidate single syllables, the notification means notifies the designated information, and the receiving means accepts the speaker's operation. . Further, the control means performs a notification process for causing the speech recognition means to notify the notification means of the candidate most syllable having the highest potential among the specific candidate single syllables, and when the receiving means accepts an operation meaning decision from the speaker, To confirm the candidate single syllable notified at the time of the notification processing as a single syllable intended by the speaker, and when a new voice is input from the speaker to the voice input means and the speech recognition means specifies the candidate single syllable, the notification processing is performed. In return, the notifying candidate single syllable among the candidate single syllables is notified to the notifying means. In addition, the candidate single syllable mentioned here is literally a candidate of a single syllable, and one or more candidate single syllables which a speech recognition means specifies may be sufficient.

상기 양태에 따른 음성 인식 장치에 의하면, 화자는 발화한 단음절이 정확하게 인식된 경우에만 조작을 수행하여 단음절을 확정시키고, 정확하게 인식되지 않은 경우에는 아무런 조작없이 정확하게 인식될 때 까지 계속 단음절을 발화할 수 있다. 이 때문에 화자는, 인식이 정확하게 이루어지지 않은 경우에는 몇번이나 재입력 지시를 수행하지 않고, 계속하여 재발화하기만 하면 된다. 즉, 사용하기에 편 리하다.According to the speech recognition apparatus according to the above aspect, the speaker performs an operation only when the uttered single syllable is correctly recognized, and if it is not correctly recognized, the speaker can continue to utter the single syllable until it is correctly recognized without any manipulation. have. For this reason, when the recognition is not performed correctly, the speaker only needs to continuously re-ignite without performing the re-input command many times. In other words, it is convenient to use.

또한, 본 발명의 다른 양태에 따른 음성 인식 장치는, 음성 입력 수단, 음성 인식 수단, 통지 수단, 및 제어 수단을 포함한다. 음성 입력 수단은 화자가 발성한 음성을 입력하고, 음성 인식 수단은 입력 수단이 입력한 음성을 분석하여 후보 단음절을 특정하고, 또한 확정을 의미하는 확정어를 인식하고, 통지 수단은 지정된 정보를 통지한다. 또한, 제어 수단은 음성 인식 수단이 특정한 후보 단음절 중에서 가장 가망성이 높은 후보 단음절을 통지 수단에 통지하는 통지 처리를 수행하고, 화자로부터 새로운 음성이 음성 입력 수단으로 입력되어 음성 인식 수단이 확정어를 인식하는 경우에는, 직전의 통지 처리시에 통지시킨 후보 단음절을 화자가 의도하는 단음절로서 확정하는 확정 처리를 수행하고, 화자로부터 새로운 음성이 음성 입력 수단으로 입력되어 음성 인식 수단이 후보 단음절을 특정한 경우에는, 통지 처리의 수행으로 복귀하여 그 후보 단음절 중에서 가장 가망성이 높은 후보 단음절을 통지 수단에 통지한다. 또한, 여기서 언급하는 후보 단음절이란, 글자 그대로 단음절의 후보이고, 음성 인식 수단이 특정하는 후보 단음절은 하나이어도 좋고, 복수이어도 좋다.Moreover, the speech recognition apparatus which concerns on another aspect of this invention contains a speech input means, a speech recognition means, a notification means, and a control means. The voice input means inputs the voice uttered by the speaker, the voice recognition means analyzes the voice input by the input means to identify candidate single syllables, and also recognizes a definite word meaning definiteness, and the notification means notifies the designated information. do. In addition, the control means performs a notification process of notifying the notification means of the candidate single syllable having the highest potential among the specific candidate single syllables by the voice recognition means, and a new voice is input from the speaker to the voice input means so that the voice recognition means recognizes the final word. In the case where the candidate single syllable notified at the time of the previous notification processing is determined as the single syllable intended by the speaker, and when a new voice is input from the speaker to the voice input means and the speech recognition means specifies the candidate single syllable, Then, the processing returns to the execution of the notification process, and the notification means notifies the candidate single syllable having the highest potential among the candidate single syllables. Note that the candidate single syllable referred to herein is a literal candidate and the candidate single syllable specified by the speech recognition means may be one or plural.

본 발명의 다른 양태에 따른 음성 인식 장치에 의하면, 화자는 발화한 단음절이 정확하게 인식된 경우에만 확정어(예를 들면, 「다음(次(츠기))」이나, 「다음으로(次へ(츠기에))」나, 「다음은(次は(츠기와)」))을 발화하여 단음절을 확정시키고, 정확하게 인식되지 않는 경우에는 어떠한 특별 조작이나 발화함이 없이 정확하게 인식될 때까지 인식시키고자 하는 단음절을 계속해서 발화할 수 있다. 이 때문에, 화자는 인식이 정확하게 이루어지지 않은 경우에 몇번이나 재입력 지시를 하지 않고, 계속해서 재발화하기만 하면 된다. 즉, 사용하기에 편리하다.According to the speech recognition apparatus according to another aspect of the present invention, the speaker is a definite word (e.g., "Next") or "Next" only when the spoken single syllable is correctly recognized. E)) '' or `` Next, '' to determine the single syllable by uttering it, and if it is not recognized correctly, to recognize it until it is recognized without any special operation or utterance. You can continue to sing single syllables. For this reason, if the speaker does not recognize correctly, the speaker only needs to re-ignite continuously without giving re-instruction many times. That is convenient to use.

그러나, 인식된 단음절이 정확하지 않은 경우, 화자가 재발화한 경우에도 다시 직전과 동일하게 부적절한 후보 단음절이 통지될 가능성이 있다. 이를 방지하기 위해서는, 본 발명의 또 다른 양태에 기재된 바와 같이 제어 수단이 확정 처리를 수행하지 않고 통지 처리를 연속하여 2번 이상 수행하는 경우, 통지 처리에 의해 과거에 통지한 후보 단음절을 통지하는 후보 단음절로부터 제외하고 가장 가망성이 높은 후보 단음절을 통지하도록 되어 있으면 좋다.However, if the recognized single syllable is not correct, there is a possibility that an inappropriate candidate single syllable is notified again, just as before, even when the speaker re-ignites. In order to prevent this, as described in another aspect of the present invention, when the control means performs the notification process two or more times in succession without performing the confirmation process, the candidate for notifying the candidate single syllable notified in the past by the notification process. Except from the single syllable, the most probable candidate single syllable may be notified.

이와 같이 되어 있다면, 재발화시에 다시 직전과 동일한 부적절한 후보 단음절이 통지 되지 않아 화자의 사용 편리도가 향상된다.In this way, the speaker may not be notified of the same inappropriate candidate single syllable again at the time of re-ignition, thereby improving the speaker's ease of use.

그러나, 정확한 후보 단음절이 통지되었음에도 불구하고, 실수로 재발화해 버리는 경우도 생각할 수 있다. 이와 같이 실수하게 되면, 두번 다시 정확한 후보 단음절이 통지될 수 없다는 문제가 발생한다. 이러한 문제가 발생하는 것을 방지하기 위해서는, 본 발명의 또 다른 양태에 따라, 소정 횟수의 재발화가 있는 경우에는 후보 단음절의 제외를 해제하도록 하면 된다. 즉, 제어 수단이, 상기 제외에 대하여, 확정 처리를 수행하지 않고 반복하여 수행한 통지 처리 중 직전의 처리를 제외하고 소정 횟수 이전에 수행한 통지 처리에 의하여 통지한 후보 단음절은 제외하지 않도록 되어 있으면 좋다.However, even if the correct candidate single syllable is notified, a case of accidental re-ignition may be considered. This mistake causes the problem that the correct candidate single syllable cannot be notified again. In order to prevent such a problem from occurring, according to still another aspect of the present invention, when there is a predetermined number of re-ignitions, the exclusion of candidate single syllables may be canceled. That is, if the control means does not exclude the candidate single syllable notified by the notification processing performed a predetermined number of times except for the immediately preceding one of the notification processing repeatedly performed without performing the confirmation processing for the above-mentioned exclusion. good.

또한, 이 소정 횟수의 최적값으로는, 본 발명의 또 다른 양태에 따라 3회이면 좋다. 이 숫자의 근거는, 본원 발명자들이 실시한 실험(본 실험의 상세한 설명 은 실시예의 단락에서 설명함)에 따르면, 발화 횟수 4회까지 정확한 후보 단음절이 통지될 확률은 98％이며, 그 이상 발화 횟수를 거듭해도 그 이후로 정확한 후보 단음절이 통지되는 일은 거의 없다. 즉, 대부분의 경우, 재발화 횟수 3회 시점까지 정확한 단음절이 한번은 통지된다는 것을 의미하고, 재발화 횟수가 3회로 된 경우에는, 화자가 정확한 후보 단음절을 실수로 제외시켰을 가능성이 높다는 것을 의미한다.The optimum value of the predetermined number of times may be three times in accordance with another aspect of the present invention. Based on the experiment conducted by the inventors of the present invention (the detailed description of this experiment is described in the paragraph of the embodiment), the probability of notifying the correct candidate single syllable up to four times of utterance is 98%. Even if repeated, the correct candidate single syllables are rarely notified after that. That is, in most cases, it means that the correct single syllable is notified once until three times the number of re-ignitions. If the number of the re-ignitions is three times, it means that the speaker is likely to have excluded the correct candidate single syllable by mistake.

따라서, 본 발명의 또 다른 양태에 따라, 과거 3회 이전에 수행한 통지 처리에 의해 통지한 후보 단음절은 제외하지 않도록 하면, 상술한 바와 같이, 두번 다시 정확한 후보 단음절이 통지될 수 없게 되는 문제를 방지할 수 있다.Therefore, according to another aspect of the present invention, if the candidate single syllable notified by the notification processing performed before the past three times is not excluded, as described above, the problem that the correct candidate single syllable cannot be notified again is solved. It can prevent.

또한, 이 소정 횟수는, 상술한 바와 같이, 실험적으로는 3회가 최적이지만, 음성 인식 장치가 이용되는 환경이나 화자의 말투(버릇) 등의 요인에 의해, 드물지만 변경하는 것이 좋은 경우도 고려된다. 이 때문에, 본 발명의 또 다른 양태에 따라, 제어 수단은, 수신 수단이 받아 들인 화자의 조작에 기초하여 소정 횟수를 변경하도록 되어 있으면 좋다. 이렇게 되어 있다면, 음성 인식 장치가 이용되는 환경이나 화자의 말투(버릇) 등에 맞게 화자가 소정 횟수를 변경할 수 있다.As described above, the predetermined number of times is optimally three times in experiments, but it is also possible to change it in rare cases due to factors such as the environment in which the speech recognition apparatus is used or the manner of speaking of the speaker. . For this reason, according to another aspect of the present invention, the control means may be configured to change the predetermined number of times based on the operation of the speaker accepted by the receiving means. If this is the case, the speaker can change the predetermined number of times according to the environment in which the speech recognition apparatus is used or the manner of speaking of the speaker.

또한, 본 발명의 또 다른 양태에 따른 음성 인식 장치는, 음성 입력 수단, 수신 수단, 기억 수단, 및 음성 인식 수단을 포함한다. 음성 입력 수단은 화자가 발성한 음성을 입력하고, 수신 수단은 화자의 조작을 받아 들이고, 기억 수단은 단음절마다 대응된 복수 음절로 이루어지는 단음절 인식용 특정어에 의해 구성되는 사전을 그 단음절 인식용 특정어의 종류별로 분별하여 복수 기억한다. 또한, 음성 인식 수단은 상기 수신 수단이 받아 들인 화자의 조작에 기초하여 상기 기억 수단이 기억하는 상기 사전을 선택하고, 그 선택한 사전 중으로부터 음성 입력 수단이 입력한 음성에 대응하는 단음절 인식용 특정어를 선택하고, 그 선택한 단음절 인식용 특정어에 대응하는 단음절을 화자가 의도하는 단음절로서 결정한다.Moreover, the speech recognition apparatus which concerns on another aspect of this invention contains a speech input means, a receiving means, a memory means, and a speech recognition means. The voice input means inputs the voice uttered by the speaker, the reception means receives the speaker's operation, and the storage means specifies a dictionary composed of a single syllable recognition word consisting of a plurality of syllables corresponding to each single syllable. Separate and remember multiple words. Further, the speech recognition means selects the dictionary stored by the storage means based on the operation of the speaker accepted by the reception means, and selects a single syllable recognition word corresponding to the speech input by the speech input means from the selected dictionary. And selects a single syllable corresponding to the selected single syllable recognition specific word as the syllable intended by the speaker.

종래는, 화자가 발성하는, 예를 들면「あいうえおのあ(아이우에오의 아)」나 「あさひのあ(아사히의 아)」와 같은 복수의 음절로 이루어지는 단음절 인식용 특정어의 음성에서 「あ(아)」라는 단음절을 인식하는 음성 인식 장치가 있었다. 그러나, 이러한 입력 방식의 음성 인식 장치는 화자가 모든 단음절 인식용 특정어를 기억해 둘 필요가 있고, 화자의 측면에서 흥미가 없는 단음절 인식용 특정어를 기억한다는 것은 어려운 일이다.Conventionally, in a voice of a specific word for speech recognition of a syllable composed of a plurality of syllables such as `` aiueeo no あ (Aiueo's) '' and `` asahino あ (Asahi's A) ''. There was a speech recognition device that recognized single syllables. However, in such an input speech recognition apparatus, the speaker needs to remember all the single syllable recognition specific words, and it is difficult to remember the single syllable recognition words that are not interesting from the speaker's point of view.

여기서, 본 발명의 또 다른 양태에 따른 음성 인식 장치에, 미리 각종 장르 등으로 나뉘어진 단음절 인식용 특정어의 사전이 기억되어 있다면, 화자는 그들의 사전 중에서 자신이 기억하기 쉬운 단음절 인식용 특정어의 사전을 조작에 의해 변경할 수 있다. 이와 같이, 화자의 취향에 따른 사전을 사용할 수 있도록 되어 있다면, 화자는 단음절 인식용 특정어를 조속히 기억할 수 있고, 음성 인식 장치를 능숙하게 사용할 수 있게 된다.Here, if the speech recognition apparatus according to another aspect of the present invention stores a dictionary of single syllable recognition words divided into various genres or the like in advance, the speaker may select a specific syllable recognition word for one syllable that he or she easily remembers from among the dictionaries. The dictionary can be changed by operation. In this way, if the dictionary according to the speaker's taste can be used, the speaker can quickly store a specific word for single syllable recognition, and can use the speech recognition device skillfully.

또한, 이러한 사전의 단음절 인식용 특정어는, 본 발명의 또 다른 양태에 따라, 미리 화자가 수신 수단을 조작하여 등록할 수 있게 되어 있다면 된다. 이렇게 되어 있다면, 더욱 화자의 취향이 반영된 단음절 인식용 특정어를 사용할 수 있기 때문에, 음성 인식 장치를 능숙하게 사용할 수 있게 된다.In addition, according to still another aspect of the present invention, such a dictionary syllable recognition specific word may be registered by a speaker in advance by operating a receiving means. If this is the case, since a specific syllable for recognizing single syllables that reflects the taste of the speaker can be used, the voice recognition device can be used skillfully.

또한, 본 발명의 또 다른 양태에 따른 음성 인식 장치는, 화자에 의해 입력된 음성에 기초하여 화자가 의도하는 단음절을 결정하는 음성 인식 장치이고, 화자가 발성한 음성을 입력하는 음성 입력 수단과, 음성 입력 수단이 입력한 동일 단음절로 이루어지는 반복 음성을 단음절마다의 음성으로 나누고, 그 각 음성에 기초하여 화자가 의도하는 단음절을 결정하는 음성 인식 수단을 포함한다.In addition, the speech recognition apparatus according to another aspect of the present invention is a speech recognition apparatus for determining a single syllable intended by the speaker based on the speech input by the speaker, the speech input means for inputting the speech spoken by the speaker; And speech recognition means for dividing the repetitive voice composed of the same single syllables input by the voice input means into voices for each single syllable, and determining the single syllable intended by the speaker based on the respective voices.

이와 같이 되어 있다면, 예를 들면「あああ(아아아)」라고 화자가 발화하면, 「あ(아)」라고 하는 단음절이 인식된다. 따라서 단순히「あ(아)」라고 화자가 발화하는 경우에 비해서 인식 단서가 증가되기 때문에 인식율도 향상되고, 단음절 인식용 특정어를 기억할 필요도 없기 때문에 화자의 부담도 경감된다.In this way, for example, when a speaker speaks "Aa Aa" and a speaker speaks, a single syllable called "A A" is recognized. As a result, the recognition clue is increased compared to the case where the speaker speaks `` A '', so the recognition rate is improved, and the burden of the speaker is also reduced because there is no need to store a specific word for single syllable recognition.

또한, 본 발명에 또 양태에 따른 음성 인식 장치는, 화자에 의해 입력된 음성에 기초하여 화자가 의도하는 단음절을 결정하는 음성 인식 장치이고, 화자가 발성한 음성을 입력하는 음성 입력 수단을 포함한다. 또한, 음성 입력 수단이 입력한 단음절의 음성이 탁음(濁音), 요음(拗音), 촉음(促音) 및 반탁음 중 어느 하나인 경우에는, 그 탁음, 요음, 촉음 및 반탁음에 대응하는 청음(淸音)을 화자가 의도하는 단음절로서 결정하고, 음성 입력 수단이 입력한 음성이 미리 정해진 탁음을 의미하는 특정어인 경우에는, 직전에 결정한 단음절을 대응하는 탁음의 단음절로 변경하고, 음성 입력 수단이 입력한 음성이 미리 정해진 요음을 의미하는 특정어인 경우에는, 직전에 결정한 단음절을 대응하는 요음의 단음절로 변경하고, 음성 입력 수단이 입력한 음성이 미리 정해진 촉음을 의미하는 특정어인 경우에는, 직전에 결정한 단음절을 대응하는 촉음의 단음절로 변경하고, 음성 입력 수단이 입력한 음성 이 미리 정해진 반탁음을 의미하는 특정어인 경우에는, 직전에 결정한 단음절을 대응하는 반탁음의 단음절로 변경하는 음성 인식 수단을 포함한다. 또한, 여기서 언급하는「청음」이란, 탁음, 요음, 촉음 및 반탁음을 제외한 45개(통상)의 기본 단음절군을 의미한다.In addition, the speech recognition apparatus according to the present invention is a speech recognition apparatus for determining a single syllable intended by a speaker based on the speech input by the speaker, and includes speech input means for inputting speech spoken by the speaker. . In addition, when the voice of the single syllable input by the voice input means is any of a sound of a sound, a sound of a sound, a sound of a sound, and a sound of a half sound, the sound corresponding to the sound of sound, sound of a sound, sound of a sound, and sound of a sound If the voice is determined as a single syllable intended by the speaker, and the voice input by the voice input means is a specific word meaning a predetermined sound, the previously determined single syllable is changed to a corresponding single syllable of the corresponding sound. In the case where the input voice is a specific word meaning a predetermined note, the short syllable determined immediately before is changed to a short syllable of the corresponding note, and when the voice input means is a specific word meaning a predetermined tactile sound, If the determined single syllable is changed to a single syllable of the corresponding tactile sound, and the voice input by the voice input means is a specific word meaning a predetermined halftone sound, It includes speech recognition means for changing a monosyllabic a monosyllable of bantak sound corresponding. In addition, "hearing" referred to here means 45 (normal) basic single-syllable groups except a sound, a sound, a sound, and a half sound.

일반적으로, 어떠한 단음절에어서의 탁음과 탁음이 아닌 것을 인식하는 것은, 상이한 단음절 끼리를 인식하는 것보다 어렵다. 따라서, 탁음과 탁음이 아닌 것을 하나의 그룹으로 인식하고, 나중에 탁음인 것으로 변경하도록 되어 있다면, 인식율이 향상한다. 나중에 변경한다는 것은, 예를 들면「てんてん(텡텡)」이라고 화자에 의해 음성이 입력된 경우에, 직전에 입력된 단음절을 탁음으로 변경하도록 하면 된다. 요음, 촉음 및 반탁음에 대해서도 동일하다.In general, it is more difficult to recognize the sound and non-sound in a single syllable than in recognizing different single syllables. Therefore, the recognition rate improves if it is supposed to recognize the sound and non-sound as a group and later change the sound to sound. To change later, for example, when a voice is input by the speaker as "てんてん (텡텡)", the single syllable input immediately before is changed into a sound tone. The same applies to the note sound, the touch sound, and the halftone sound.

또한, 본 발명의 또 다른 양태에 따른 음성 인식 장치는, 화자에 의해 입력된 음성에 기초하여 화자가 의도하는 단음절을 결정하는 음성 인식 장치이고, 화자가 발성한 음성을 입력하는 음성 입력 수단, 복수 음절로 이루어진 단음절 인식용 특정어의 조합에 대하여 단음절이 대응된 사전을 기억하는 기억 수단, 음성 입력 수단이 입력한 음성을 분석하여 단음절 인식용 특정어의 조합을 특정하고, 특정한 조합에 대응하는 단음절을 기억 수단이 기억하는 사전에 기초하여 결정하는 음성 인식 수단을 포함한다. 구체적인 예를 들어 설명하면, 예를 들면「ケイ(케이)」(K),「エイ(에이)」(A)라고 화자가 발성하면 「か(가)」로 인식하고,「ケイ(케이)」(K),「アイ(아이)」(I)라고 화자가 발화하면 「き(키)」로 인식하는 음성 인식 장치이다. 또한, 50음표(일문의 50음표임)의 행 번호와 열 번호에 대응시켜서 「イチ(이치)」(1),「イチ(이치)」(1)라고 화자가 발성하면「あ(아)」로 인식하도록 되어 있어도 좋다.In addition, a speech recognition apparatus according to another aspect of the present invention is a speech recognition apparatus for determining a single syllable intended by a speaker based on a speech input by a speaker, and includes: speech input means for inputting a speech spoken by a speaker; A memory means for storing a dictionary corresponding to a syllable syllable and a speech input by a voice input means for analyzing a combination of a syllable-recognition-specific word composed of syllables, specifying a combination of a specific syllable for a syllable syllable, and a syllable corresponding to a specific combination And speech recognition means for determining the result based on a dictionary stored by the storage means. For example, if the speaker utters "Kei" (K) and "Ai" (A), it will be recognized as "Ka", and "Kei" will be described. (K) and "Ai" (I) are speech recognition devices that recognize as "ki" when the speaker speaks. In addition, if the speaker utters `` I チ '' (1) and `` I チ '' (1) in correspondence with the row number and column number of the 50th note (which is the 50th note in Japanese), "あ (아)" It may be recognized as ".

이러한 음성 인식 장치는, 인식 대상의 음성 길이 및 음성수가 증가되기 때문에 인식율이 향상한다. 또한, 모든 단음절에 대하여 단음절 인식용 특정어를 준비할 필요가 없기 때문에(상술한 예와 같이「ケイ(케이)」를 か(카)행 전체 단음절을 인식할 때에 이용할 수 있기 때문에), 사전의 용량이 삭감되고, 또한, 화자도 기억하는 단음절 인식용 특정어가 감소되어 사용의 편리성이 향상된다.Such a speech recognition apparatus improves the recognition rate because the speech length and the number of speeches of the recognition target are increased. In addition, since it is not necessary to prepare a specific word for recognizing single syllables for all single syllables (as in the example described above, "ケ (kei)" can be used to recognize all single syllables in the line ka), The capacity is reduced, and the single syllable recognition word that the speaker also stores is reduced, thereby improving convenience of use.

또한, 본 발명의 또 다른 양태에 따른 음성 인식 장치는, 음성 입력 수단, 음성 인식 수단, 표시 수단, 및 제어 수단을 포함한다. 음성 입력 수단은 화자가 발성한 음성을 입력하고, 음성 인식 수단은 음성 입력 수단이 입력한 음성을 분석하여 후보 단음절을 특정하고, 표시 수단은 지정된 정보를 표시하는 표시 영역을 갖고 또한, 표시 영역의 표면에 대한 화자의 조작을 표시 영역 내의 위치와 함께 감지하는 센서를 갖는다. 또한, 제어 수단은, 음성 인식 수단이 복수의 후보 단음절을 특정한 경우, 그들의 후보 단음절에 대응하는 각 오브젝트를 표시 수단의 표시 영역에 표시 영역 내에서 최대인 오브젝트군으로 나열하여 표시하고, 화자의 조작에 의해 센서가 감지한 위치에 표시되어 있는 오브젝트에 대응하는 후보 단음절을 화자가 의도하는 단음절로서 결정한다.Moreover, the speech recognition apparatus which concerns on another aspect of this invention contains a speech input means, a speech recognition means, a display means, and a control means. The voice input means inputs the voice uttered by the speaker, the voice recognition means analyzes the voice input by the voice input means to specify candidate single syllables, and the display means has a display area for displaying specified information, It has a sensor that senses the speaker's manipulation to the surface along with its position in the display area. In addition, when the speech recognition means specifies a plurality of candidate single syllables, the control means lists and displays each object corresponding to the candidate single syllables in the display area of the display means as the largest group of objects in the display area. The candidate single syllable corresponding to the object displayed at the position detected by the sensor is determined as the single syllable intended by the speaker.

이와 같이 되어 있다면, 화자는 시각적으로 후보 단음절을 확인할 수 있고, 또한, 표시 영역 내에서 최대인 오브젝트군으로 나열되어 표시되기 때문에, 일별(一瞥)하여 후보 단음절을 확인할 수 있다. 그 결과, 화자는 원활하게 단음절을 확 정할 수 있다.In this case, the speaker can visually confirm the candidate single syllables, and can display the candidate single syllables by day because the speakers are listed and displayed as the largest object group in the display area. As a result, the speaker can smoothly determine the single syllable.

그러나, 표시 수단에 표시하는 후보 단음절에 대응하는 오브젝트의 수는 어떠한 수이어도 좋지만, 본 발명의 또 다른 양태에 따라, 그 오브젝트에 대응하는 후보 단음절의 가망성에 따라서 높은 순으로 3개만 선택하여 표시 수단에 표시하도록 되어 있으면 좋다. 왜냐하면, 표시하고자 하는 오브젝트의 수를 증가시키면 한번의 발화로 정확한 단음절이 결정될 확률은 높아지지만, 너무 많은 오브젝트를 표시하면 이번에는 일람성이 악화하여 화자가 오브젝트를 선택하기 어렵게 된다는 문제가 발생한다. 따라서 이 밸런스를 유지하고 또한 표시시킨 오브젝트 중 어느 하나 중에 화자가 의도하는 단음절이 들어 있을 확률을 감안하면, 표시할 오브젝트의 수는 3개가 가장 적합하다.However, the number of objects corresponding to the candidate single syllables to be displayed on the display means may be any number, but according to another aspect of the present invention, only three are selected in ascending order according to the prospect of candidate single syllables corresponding to the object. It is good to display in. This is because increasing the number of objects to be displayed increases the probability that an accurate single syllable is determined by one utterance. However, if too many objects are displayed, the list is deteriorated at this time, which makes it difficult for the speaker to select objects. Therefore, considering the probability that any one of the objects to maintain this balance and display contains the syllables intended by the speaker, three objects are most suitable.

또한, 표시 수단의 표시 영역에 표시하는 각 오브젝트는 모두 동일한 크기, 색, 형상이어도 좋지만, 본 발명의 또 다른 양태에 따라, 그 오브젝트에 대응하는 후보 단음절의 가망성에 따라서 시각적 특징을 변경하여 표시 수단에 표시되도록 되어 있으면 좋다.「시각적 특징을 변경하여」라는것은, 구체적으로는 크기, 색깔, 형상, 점멸 상태, 애니메이션 등을 고려할 수 있다.The objects displayed in the display area of the display means may all have the same size, color, or shape, but according to another aspect of the present invention, the display means may be changed by changing the visual characteristics in accordance with the prospect of candidate single syllables corresponding to the object. The term " modifying visual characteristics " may specifically consider size, color, shape, blinking state, animation, and the like.

이와 같이 되어 있다면, 예를 들면 3개의 후보 단음절이 표시된 경우, 어느 후보 단음절이 가장 가망성이 높은지 일별하여 알 수 있고, 화자는 후보 단음절의 선택을 수행하기 쉽다. 또한, 이러한 음성 인식 장치를, 예를 들면, 차량에 탑재시켜서 이용할 경우, 운전자가 표시 수단을 주시하는 시간을 단축할 수도 있다.In this way, for example, when three candidate single syllables are displayed, it is possible to know which candidate single syllables are most promising by day, and the speaker can easily select candidate single syllables. In addition, when such a voice recognition device is used in a vehicle, for example, the time for the driver to watch the display means can be shortened.

또한, 본 발명의 또 다른 양태에 따라, 제어 수단은, 결정시, 표시 수단의 표시 영역에서의 각 오브젝트가 차지하는 표시 범위보다, 센서가 감지한 위치에 의해 각 오브젝트를 특정하는 특정 범위가 넓게 취급하도록 되어 있으면 좋다.Further, according to another aspect of the present invention, the control means handles a specific range that specifies each object by the position detected by the sensor, rather than the display range occupied by each object in the display area of the display means at the time of determination. It is good to be.

이와 같이 되어 있으면, 화자는 오브젝트가 표시된 위치를 정확하게 터치할 필요가 없어진다. 따라서, 예를 들면, 음성 인식 장치를 차량에 탑재시켜서 이용할 경우, 운전자는 표시 수단을 확인한 후, 실제 조작할 때에는 표시 수단을 주시하지 않고, 장소를 추측하여 후보 단음절을 선택해도 원하는 후보 단음절을 선택할 수 있을 가능성이 높아진다.In this way, the speaker does not need to touch the position where the object is displayed accurately. Thus, for example, when the voice recognition device is mounted on a vehicle, the driver checks the display means and then selects the desired candidate syllable even if the candidate is selected by guessing the place without actually looking at the display means when actually operating. It is more likely to be.

또한, 본 발명의 또 다른 양태에 따른 음성 인식 장치는, 음성 입력 수단, 음성 인식 수단, 표시 수단, 및 제어 수단을 포함한다. 음성 입력 수단은 화자가 발성한 음성을 입력하고, 음성 인식 수단은 음성 입력 수단이 입력한 음성을 분석하여 후보 단음절을 특정하고, 표시 수단은 지정된 정보를 표시하는 표시 영역을 갖고 또한, 표시 영역의 표면에 대한 화자의 조작을 표시 영역 내의 위치와 함께 감지하는 센서를 갖는다. 또한, 제어 수단은, 표시 수단의 표시 영역에 50음표에 대응시켜서 단음절을 나타내는 오브젝트를 표시하고, 또한 음성 인식 수단이 복수의 후보 단음절을 특정한 경우, 그들의 각 후보 단음절에 대응하는 표시 영역 중의 오브젝트의 시각적 특징을 다른 오브젝트와는 변경하여 표시하고, 그 시각적 특징을 변경한 오브젝트에 한하지 않고 화자의 조작에 의해 센서가 감지한 위치에 표시한 오브젝트가 나타내는 후보 단음절을 화자가 의도하는 단음절로서 결정한다.Moreover, the speech recognition apparatus which concerns on another aspect of this invention contains a speech input means, a speech recognition means, a display means, and a control means. The voice input means inputs the voice uttered by the speaker, the voice recognition means analyzes the voice input by the voice input means to specify candidate single syllables, and the display means has a display area for displaying specified information, It has a sensor that senses the speaker's manipulation to the surface along with its position in the display area. Further, the control means displays an object representing a single syllable in correspondence with 50 notes in the display area of the display means, and when the speech recognition means specifies a plurality of candidate single syllables, Visual features are displayed differently from other objects, and the candidate single syllables represented by the objects displayed at the position detected by the sensor by the speaker's operation are determined as the single syllables intended by the speaker. .

이러한 음성 인식 장치라면, 화자는 음성 인식 장치가 인식한 후보 단음절 중에서 단음절을 결정할 수 있고, 또한, 후보 단음절 중에 의도하는 단음절이 없으 면 직접 50음표 중에서 단음절을 지정할 수 있기에 사용하기가 편리하다.With such a speech recognition device, the speaker can determine the single syllable from the candidate single syllables recognized by the speech recognition device, and if the intended single syllable does not exist among the candidate single syllables, the speaker can directly specify the single syllable among the 50 notes.

또한, 본 발명의 또 다른 양태에 따른 음성 인식 장치는, 음성 입력 수단, 음성 인식 수단, 표시 수단, 차량 상태 취득 수단, 및 본 발명의 양태 중에 어느 하나의 양태에 따른 음성 인식 장치의 제어 수단, 음성 인식 장치의 제어 수단,및 주제어 수단을 포함한다. 음성 입력 수단은 화자가 발성한 음성을 입력하고, 음성 인식 수단은 음성 입력 수단이 입력한 음성을 분석하여 후보 단음절을 특정하고, 표시 수단은 지정된 정보를 표시하는 표시 영역을 갖고 또한, 표시 영역에 대한 화자의 조작을 표시 영역 내의 위치와 함께 감지하는 센서를 갖고, 차량 상태 취득 수단은 차량이 주행중인지 여부의 정보를 취득한다. 또한, 주제어 수단은 차량 상태 취득 수단이 취득하는 정보에 기초하여, 차량이 주행중이라고 판단한 경우에는 본 발명의 일양태에 따른 음성 인식 장치의 제어 수단을 기능시키고, 차량이 정지중이라고 판단한 경우에는 본 발명의 일양태에 따른 음성 인식 장치의 제어 수단을 기능시킨다.Moreover, the speech recognition apparatus which concerns on another aspect of this invention is a speech input means, a speech recognition means, a display means, a vehicle state acquisition means, and the control means of the speech recognition apparatus in any one aspect of this invention, Control means of the speech recognition apparatus, and main control means. The voice input means inputs the voice uttered by the speaker, the voice recognition means analyzes the voice input by the voice input means to specify candidate single syllables, and the display means has a display area for displaying the specified information, A sensor for detecting the operation of the speaker with the position in the display area, and the vehicle state obtaining means acquires information of whether the vehicle is running. Further, the main control means functions the control means of the speech recognition apparatus according to one aspect of the present invention when it is determined that the vehicle is running based on the information acquired by the vehicle state obtaining means, and when it is determined that the vehicle is stationary, the present invention. Function control means of the speech recognition apparatus according to one aspect of the present invention.

이와 같이 되어 있다면, 화자(운전자)가 운전중이면 표시 수단을 비교적 주시할 필요가 없는 방법으로 단음절의 확정을 할 수 있고, 운전중이 아니면 표시 수단을 주시할 필요는 있지만, 입력을 원활하게 수행하는 방법으로 단음절을 입력하고, 확정할 수 있다. In this way, the single-syllable can be determined in a manner that does not require the viewer to keep an eye on the display means when the speaker is in operation. You can enter a single syllable and confirm it.

또한, 본 발명의 양태에 따른 음성 인식 장치는, 음성 입력 수단, 음성 인식 수단, 통지 수단, 기억 수단, 및 제어 수단을 포함한다. 음성 입력 수단은 화자가 발성한 음성을 입력하고, 음성 인식 수단은 음성 입력 수단이 입력한 음성을 분석 하여 후보 단음절을 특정하고, 통지 수단은 지정된 정보를 통지하고, 기억 수단은 단음절마다 대응되는 복수 음절로 이루어진 확인용 단어에 의해 구성되는 사전을 기억한다. 또한, 음성 인식 수단이 특정한 후보 단음절 중에서 가장 가망성이 높은 후보 단음절을 기억 수단이 기억하는 인식용 단어에 의해 통지 수단에 통지하도록 하는 통지 처리를 수행하고, 화자로부터 결정을 의미하는 조작을 수신 수단이 받아 들인 경우에는 통지하도록 하는 후보 단음절을 화자가 의도하는 단음절로서 확정한다.In addition, the speech recognition apparatus according to the aspect of the present invention includes speech input means, speech recognition means, notification means, storage means, and control means. The voice input means inputs the voice uttered by the speaker, the voice recognition means analyzes the voice input by the voice input means to identify the candidate single syllables, the notification means notifies the designated information, and the storage means corresponds to a plurality of single syllables. It stores a dictionary composed of confirmation words made up of syllables. Furthermore, the voice recognition means performs a notification process for causing the notification means to notify the notification means by means of recognition words stored in the storage means by the speech recognition means having the most likely candidate single syllable among the specific candidate single syllables, and the receiving means receives an operation meaning a decision from the speaker. If accepted, the candidate single syllable to be notified is determined as the speaker's intended single syllable.

이와 같이 되어 있다면, 화자도 인식하기 어려운 단음절을 정확하게 파악할 수 있다.In this way, it is possible to accurately identify single syllables that are difficult for the speaker to recognize.

그러나, 이와 같이 단음절 대신에 확인용 단어를 통지하는 경우, 화자의 취향에 따른 확인용 단어를 사용할 수 있도록 되어 있으면 좋다. 즉, 본 발명의 양태에 따른 음성 인식 장치와 같이, 기억 수단이 상기 사전을 인식용 단어의 종류에 따라 나누어 복수 기억하고, 제어 수단은 수신 수단이 받아 들인 화자의 조작에 기초하여 기억 수단이 기억하는 사전을 선택하고, 그 선택된 사전을 이용하여 통지 처리를 수행하도록 되어 있으면 좋다.However, when notifying a word for confirmation instead of a single syllable in this way, it is sufficient to be able to use the word for confirmation according to the taste of the speaker. That is, as in the speech recognition apparatus according to the aspect of the present invention, the storage means divides the dictionary according to the type of the word for recognition, and stores the plural numbers, and the control means stores the memory means based on the operation of the speaker accepted by the reception means. The dictionary to be selected may be selected, and notification processing may be performed using the selected dictionary.

이와 같이 되어 있다면, 화자는 선호하는 확인용 단어에 의해 단음절을 확인할 수 있다.If so, the speaker can identify the syllable by the preferred confirmation word.

또한, 확인용 단어는, 본 발명의 또 다른 양태에 따라, 미리 화자가 수신 수단을 조작하는 것에 의해 등록할 수 있도록 되어 있으면 좋다. 이와 같이 되어 있다면, 더욱 화자의 취향이 반영된 확인용 단어를 통지할 수 있다.In addition, according to another aspect of the present invention, the confirmation word may be registered so that the speaker can operate in advance by operating the receiving means. If so, the confirmation word reflecting the speaker's taste can be notified.

또한, 후보 단음절을 표시 수단에 표시하는 경우에도, 후보 단음절 대신에 각각의 후보 단음절에 대응하는 확인용 단어를 사용하여 표시하도록 되어 있어도 좋다. 즉, 본 발명의 또 다른 양태에 따라, 추가로 지정된 정보를 표시하는 표시 영역을 갖고 또한, 표시 영역의 표면에 대한 화자의 조작을 표시 영역 내의 위치와 함께 감지하는 센서를 갖는 표시 수단을 포함하도록 음성 인식 장치를 구성하고, 제어 수단은 음성 인식 수단이 복수의 후보 단음절을 특정한 경우, 그들의 후보 단음절에 대응하는 확인용 단어를 표시 수단에 오브젝트로서 나열하여 표시하고, 화자에 의해 조작되어 센서가 감지한 위치에 표시되어 있는 오브젝트에 대응하는 후보 단음절을 화자가 의도하는 단음절로서 결정하도록 되어 있으면 좋다.In addition, even when the candidate single syllable is displayed on the display means, the identification words corresponding to the candidate single syllables may be displayed instead of the candidate single syllables. That is, according to another aspect of the present invention, there is provided a display means having a display area for displaying additionally designated information and having a sensor for detecting a speaker's manipulation of the surface of the display area together with a position in the display area. When the speech recognition means specifies a plurality of candidate single syllables, the control means lists the identification words corresponding to the candidate single syllables as objects on the display means, displays them as objects, and is operated by the speaker to detect the sensors. The candidate single syllable corresponding to the object displayed at one position may be determined as the single syllable intended by the speaker.

그리고, 또한, 이와 같이 표시한 확인용 단어를 화자가 발화함으로써 확정하도록 구성되어 있으면 좋다. 즉, 본 발명의 또 다른 양태에 따르면, 제어 수단은 음성 인식 수단이 복수의 후보 단음절을 특정한 경우 그들의 후보 단음절에 대응하는 확인용 단어를 표시 수단에 오브젝트로서 나열하여 표시한 후, 음성 인식 수단이 특정한 확인용 단어에 대응하는 후보 단음절을 화자가 의도하는 단음절로서 결정하도록 되어 있으면 좋다.In addition, the confirmation word displayed in this manner may be configured to be confirmed by the speaker speaking. That is, according to another aspect of the present invention, the control means, after the speech recognition means specifies a plurality of candidate single syllables, displays confirmation words corresponding to the candidate single syllables as objects on the display means, and then the speech recognition means The candidate single syllable corresponding to the specific confirmation word may be determined as the single syllable intended by the speaker.

이와 같이 되어 있다면, 화자는 발화에 의해 후보 단음절을 확정시킬 수 있고 또한, 어느 정도 확실하게 후보 단음절을 확정시킬 수 있다.In this way, the speaker can confirm the candidate single syllable by speech, and can also firmly determine the candidate single syllable.

그러나, 본 발명의 또 다른 양태에 따라, 음성 인식 장치에서의 음성 확인 수단 및 제어 수단의 적어도 하나로서 기능시키는 프로그램을, 음성 인식 장치가 내장되는 컴퓨터에 실행시키도록 되어 있어도 좋다. 이와 같이 되어 있다면, 예를 들면, 플랙서블디스크, 광자기 디스크, CD－ROM, 하드디스크, ROM, RAM 등의 컴퓨터가 판독 가능한 기록 매체에 프로그램을 기록하고, 그 프로그램을 필요에 따라 컴퓨터에 로드하여 기동함으로써 음성 인식 장치에서의 음성 인식 수단 및 제어 수단의 적어도 하나로서 기능시킬 수 있다. 또한, 프로그램은 네트워크 등을 사용하여 유통시키는 것도 가능하기 때문에, 음성 인식 장치의 기능 향상도 용이하다.However, according to another aspect of the present invention, a computer in which the speech recognition apparatus is built may be executed by a program which functions as at least one of the speech confirmation means and the control means in the speech recognition apparatus. If so, for example, the program is recorded on a computer-readable recording medium such as a flexible disk, a magneto-optical disk, a CD-ROM, a hard disk, a ROM, or a RAM, and the program is loaded into the computer as necessary. By activating, the mobile phone can function as at least one of a voice recognition means and a control means in the voice recognition device. In addition, since the program can also be distributed using a network or the like, the function of the speech recognition device can be easily improved.

또한, 본 발명의 또 다른 양태에 따라, 상기 음성 인식 장치에서의 음성 확인 수단으로서 기능시키기 위한 프로그램에 대해서도 동일하다. 또한, 본 발명의 양태에 따라, 상기 음성 인식 장치에서의 주제어 수단으로서 기능시키는 프로그램에 대해서도 동일하다. 물론, 이와 같은 프로그램은, 본 발명의 양태에 따라 기억 매체에 기억시켜도 좋다.Moreover, according to another aspect of the present invention, the same applies to a program for functioning as a voice confirmation means in the voice recognition device. In addition, according to an aspect of the present invention, the same applies to a program functioning as a main control means in the speech recognition apparatus. Of course, such a program may be stored in the storage medium according to the aspect of the present invention.

그러나, 음성 인식 장치는 본 발명의 양태에 따라 네비게이션 장치와 연계하도록 되어 있고, 음성 인식 장치가 취득한 단음절군을 네비게이션 장치가 네비게이션 처리의 수행시에 이용하도록 되어 있으면 좋다. 여기서 언급하는 네비게이션 처리란, 예를 들면, 지도를 표시하고, 또한, 그 지도상에 현재 장소를 표시하는 처리 또는 설정되는 경로에 따라 경로 안내를 수행하는 경로 안내 처리 등이다.However, the speech recognition apparatus may be associated with the navigation apparatus according to the aspect of the present invention, and the single speech syllable group acquired by the speech recognition apparatus may be used when the navigation apparatus performs the navigation process. The navigation process referred to here is, for example, a process of displaying a map and displaying a current place on the map, or a route guide process of performing route guidance according to a set route.

이와 같이 되어 있다면, 네비게이션 처리에서 이용자가 수행하는 각종 조작을 음성에 의해 수행할 수 있게 되어 네비게이션 처리의 사용 편리도가 향상된다.In this way, various operations performed by the user in the navigation process can be performed by voice, thereby improving the convenience of use of the navigation process.

이하, 본 발명이 적용된 실시예에 대하여 도면을 이용하여 설명한다. 한편, 본 발명의 실시예는 하기의 실시예에 한정되지 않고, 본 발명의 기술적 범위에 속 하는 한 다양한 형태를 취할 수 있다.EMBODIMENT OF THE INVENTION Hereinafter, the Example to which this invention was applied is demonstrated using drawing. In addition, the embodiment of the present invention is not limited to the following examples, and may take various forms as long as it falls within the technical scope of the present invention.

(제1 실시예)(First embodiment)

도1은 음성 인식 기능을 갖는 네비게이션 장치(20)의 구성을 도시한 블록도이다. 네비게이션 장치(20)는 차량에 탑재되고, 차량의 현재 위치를 검출하는 위치 검출기(21), 사용자로부터의 각종 지시를 입력하기 위한 조작 스위치군(22), 조작 스위치군(22)과 동일하게 각종 지시를 입력할 수 있고, 네비게이션 장치(20)와는 별개로 된 리모트콘트롤 단말(이하 "리모콘"으로 언급함)(23a), 리모콘(23a)으로부터의 신호를 입력하는 리모콘 센서(23b), 지도 데이터나 각종 정보를 기록한 지도 기억 매체로부터 지도 데이터 등을 입력하는 지도 데이터 입력기(25), 지도나 각종 정보 표시를 수행하기 위한 표시부(26), 각종 가이드 음성 등을 출력하기 위한 음성 출력부(27), 음성을 입력하여 음성 정보를 출력하는 마이크로폰(28), 음성 인식 관련 데이터를 입출력하는 음성 인식 관련 데이터 입출력기(30), 차내 LAN과 통신을 수행하는 차내 LAN 통신부(31), 및 상술한 위치 검출기(21), 조작 스위치군(22), 리모콘 센서(23b), 지도 데이터 입력기(25), 마이크로폰(28), 음성 인식 관련 데이터 입출력기(30), 차내 LAN 통신부(31)로부터의 입력에 따라서 각종 처리를 수행하고, 표시부(26), 음성 출력부(27), 음성 인식 관련 데이터 입출력부(30), 차내 LAN 통신부(31)를 제어하는 제어부(29)를 포함하고 있다.1 is a block diagram showing the configuration of a navigation device 20 having a voice recognition function. The navigation device 20 is mounted on a vehicle, and is the same as the position detector 21 for detecting the current position of the vehicle, the operation switch group 22 for inputting various instructions from the user, and the operation switch group 22. Instructions can be input, the remote control terminal (hereinafter referred to as "remote control") 23a separate from the navigation device 20, the remote control sensor 23b for inputting signals from the remote control 23a, and map data. Or a map data input unit 25 for inputting map data or the like from a map storage medium on which various kinds of information are recorded, a display unit 26 for performing a map or various information display, and an audio output unit 27 for outputting various guide voices and the like. A microphone 28 for inputting voice and outputting voice information, a voice recognition data input / output device 30 for inputting and outputting voice recognition related data, an in-vehicle LAN communication unit 31 for communicating with the in-vehicle LAN, and From the above-described position detector 21, the operation switch group 22, the remote control sensor 23b, the map data input device 25, the microphone 28, the voice recognition data input / output device 30, and the in-vehicle LAN communication unit 31 The control unit 29 performs various processes according to the input and controls the display unit 26, the voice output unit 27, the voice recognition related data input / output unit 30, and the in-vehicle LAN communication unit 31.

위치 검출기(21)는 GPS(Global Positioning System)용의 인공 위성으로부터의 송신 전파를 GPS 안테나를 통하여 수신하고, 차량의 위치, 방위, 속도 등을 검 출하는 GPS 수신기(21a), 차량에 가해지는 회전 운동의 크기를 검출하는 자이로스코프(21b), 차량의 전후 방향의 가속도 등으로부터 주행한 거리를 검출하기 위한 거리 센서(21c), 및 지자기로부터 진행 방위를 검출하기 위한 지자기 센서(21d)를 포함하고 있다. 그리고, 이들 각 센서 등(21a 내지 21d)은 각각이 성질이 다른 오차를 갖고 있기 때문에 서로 보완하면서 사용하도록 구성되어 있다.The position detector 21 receives a radio wave from a satellite for a GPS (Global Positioning System) through a GPS antenna, and applies a GPS receiver 21a for detecting the position, orientation, and speed of the vehicle to the vehicle. A gyroscope 21b for detecting the magnitude of the rotational movement, a distance sensor 21c for detecting the distance traveled from the acceleration in the front-rear direction, etc. of the vehicle, and a geomagnetic sensor 21d for detecting the traveling direction from the geomagnetic Doing. Each of these sensors 21a to 21d is configured to be used while complementing each other because they have different errors in their properties.

조작 스위치군(22)은 표시부(26)의 표시면과 일체로 구성된 터치 패널 및 표시부(26)의 주변에 설치된 기계적인 키 스위치 등으로 구성된다. 또한, 터치패널과 표시부(26)는 적층되어 일체화되어 있고, 터치패널에는 감압 방식, 전자기 유도 방식, 정전 용량 방식, 또는 이들을 조합한 방식 등 각종 방식이 있지만, 이중 어느 것을 사용해도 좋다.The operation switch group 22 is composed of a touch panel integrally formed with the display surface of the display unit 26 and a mechanical key switch or the like provided around the display unit 26. The touch panel and the display unit 26 are stacked and integrated, and there are various methods such as a pressure reducing method, an electromagnetic induction method, a capacitive method, or a combination thereof, but any one of them may be used.

지도 데이터 입력기(25)는, 도시하지 않은 지도 기억 매체에 기억된 각종 데이터를 입력하기 위한 장치이다. 지도 기억 매체에는 지도 데이터(도로 데이터, 지형 데이터, 마크 데이터, 교차점 데이터, 시설 데이터 등), 안내용 음성 데이터, 음성 인식 데이터 등이 기억되어 있다. 이와 같은 데이터를 기억하는 지도 기억 매체의 종류로는 CD-ROM이나 DVD 등이 일반적이지만, 하드 디스크 등의 자기 기억 장치나 메모리 카드 등 매체를 사용해도 좋다.The map data input unit 25 is a device for inputting various data stored in a map storage medium (not shown). The map storage medium stores map data (road data, terrain data, mark data, intersection data, facility data, etc.), voice data for guidance, voice recognition data, and the like. As a type of map storage medium that stores such data, a CD-ROM or a DVD is generally used, but a magnetic storage device such as a hard disk or a medium such as a memory card may be used.

표시부(26)는 컬라 표시 장치이고, 액정 디스플레이, 유기 EL 디스플레이, CRT 등이 있지만, 그중 어느 것을 사용해도 좋다. 표시부(26)의 표시 화면에는, 위치 검출기(21)로 검출한 차량 현재 위치와, 지도 데이터 입력기(25)로부터 입력된 지도 데이터로부터 특정한 현재 장소를 나타내는 마크, 목적지까지의 유도 경로, 명칭, 표지, 각종 시설의 마크 등의 부가 데이터를 중복하여 표시할 수 있다. 또한, 시설의 가이드 등도 표시할 수 있다.The display unit 26 is a color display device, and there are liquid crystal displays, organic EL displays, CRTs, and the like, but any of them may be used. On the display screen of the display unit 26, a mark indicating a specific current place from a vehicle current position detected by the position detector 21, map data input from the map data input unit 25, a guidance route to a destination, a name, a sign And additional data such as marks of various facilities can be displayed in duplicate. It is also possible to display guides of facilities.

음성 출력부(27)는 지도 데이터 입력기(25)로부터 입력한 시설 가이드나 각종 안내 음성을 출력할 수 있다.The audio output unit 27 may output a facility guide or various guide voices input from the map data input unit 25.

마이크로폰(28)은 이용자가 음성을 입력(발화)하면, 그 입력한 음성에 기초하여 전기 신호(음성 신호)를 제어부(29)로 출력한다. 이용자는 이 마이크로폰(28)에 각종 음성을 입력함으로써 네비게이션 장치(20)를 조작할 수 있다.The microphone 28 outputs an electric signal (voice signal) to the control unit 29 based on the input voice when the user inputs (speaks) a voice. The user can operate the navigation device 20 by inputting various voices into the microphone 28.

음성 인식 관련 데이터 입출력기(30)는, 도시하지 않은 음성 인식 관련 데이터 기억 매체에 기억된 각종 데이터를 입출력하기 위한 장치이다. 음성 인식 관련 데이터 기억 매체에는, 단음절을 인식하기 위한 특징 파라미터, 단음절마다 대응된 복수 음절로 이루어지는 단음절 인식용 특정어에 의해 구성되는 사전, 단음절마다 대응된 복수 음절로 이루어지는 확인용 단어에 의해 구성되는 사전 등이 기억되어 있다. 이와 같은 데이터를 기억하는 지도 기억 매체의 종류로는 하드디스크 등의 자기 기억 장치나 메모리 카드 등의 매체를 사용해도 좋다.The speech recognition related data input / output unit 30 is an apparatus for inputting and outputting various data stored in a speech recognition related data storage medium (not shown). The speech recognition-related data storage medium includes a dictionary composed of feature parameters for recognizing single syllables, a single syllable recognition word consisting of a plurality of syllables corresponding to each syllable, and a confirmation word composed of a plurality of syllables corresponding to each syllable. The dictionary is memorized. As a type of map storage medium storing such data, a magnetic storage device such as a hard disk or a medium such as a memory card may be used.

차내 LAN 통신부(31)는 차내 LAN으로 접속되고, 그 차내 LAN으로 접속된 각종 ECU와 통신을 수행할 수 있다. 또한, 차내 LAN으로는, 예를 들면, CAN(Control Aria Network)이 상정되고, 각종 ECU 중 하나로는 엔진 ECU나, AT-ECU나, 보데-ECU가 상정된다.The in-vehicle LAN communication unit 31 is connected to the in-vehicle LAN and can communicate with various ECUs connected to the in-vehicle LAN. As the in-vehicle LAN, for example, CAN (Control Aria Network) is assumed, and one of the various ECUs is assumed to be an engine ECU, an AT-ECU, or a Bode-ECU.

제어 회로(29)는, CPU, ROM, RAM, I/O 및 이들의 구성을 접속하는 버스라인 등으로 이루어지는 주지(周知)의 마이크로 컴퓨터를 중심으로 구성되어 있고, ROM 및 RAM에 기억된 프로그램에 기초하여 각종 처리를 수행한다. 예를 들면, 위치 검출기(21)로부터의 각 검출 신호에 기초하여 좌표 및 진행 방향을 세트로하여 차량의 현재 위치를 산출하고, 지도 데이터 입력기(25)를 통하여 판독한 현재 위치 부근의 지도 등을 표시부(26)에 표시하는 표시 처리나, 지도 데이터 입력기(25)에 저장된 지점 데이터와 조작 스위치군(22)이나 리모콘(23a)등의 조작에 따라 설정된 목적지에 기초하여 현재 위치로부터 목적지까지의 최적 경로를 산출하고, 그 산출한 경로를 안내하는 경로 안내 처리 등을 수행한다. 또한, 제어부(29)는 후술하는 음성 인식 처리를 수행할 수 있다.The control circuit 29 is mainly comprised of a well-known microcomputer consisting of a CPU, a ROM, a RAM, an I / O, a bus line connecting these structures, and the like stored in a program stored in the ROM and the RAM. Various processing is performed based on this. For example, the current position of the vehicle is calculated based on each detection signal from the position detector 21 as a set of coordinates and the direction of travel, and a map near the current position read through the map data input unit 25 is obtained. Optimal from the current position to the destination based on the display processing displayed on the display unit 26 or the destination data set in accordance with the point data stored in the map data input unit 25 and the operation of the operation switch group 22 or the remote controller 23a. The route is calculated and route guidance processing for guiding the calculated route is performed. In addition, the controller 29 may perform a voice recognition process to be described later.

지금까지 네비게이션 장치(20)의 개략 구성을 설명하였지만, 네비게이션 장치(20)의 각 부와 특허 청구 범위에 기재된 용어의 대응을 나타낸다. 마이크로폰(28)이 음성 입력 수단에 상당하고, 음성 출력부(27)가 통지 수단에 상당하고, 표시부(26)가 통지 수단 및 표시 수단에 상당하고, 조작 스위치군(22) 및 리모콘(23a)이 수신 수단에 상당하고, 제어부(29)가 음성 인식 수단, 제어수단 및 주제어 수단에 상당하고, 음성 인식 관련 데이터 기억 매체가 기억 수단에 상당하고, 차내 LAN 통신부(31)가 차량 상태 취득 수단에 상당한다.Although the schematic structure of the navigation apparatus 20 was demonstrated so far, the correspondence of each part of the navigation apparatus 20 and the term described in a claim is shown. The microphone 28 corresponds to the voice input means, the audio output portion 27 corresponds to the notification means, the display portion 26 corresponds to the notification means and the display means, and the operation switch group 22 and the remote controller 23a. The control unit 29 corresponds to the voice recognition unit, the control unit and the main control unit, the voice recognition related data storage medium corresponds to the storage unit, and the in-vehicle LAN communication unit 31 corresponds to the vehicle state acquisition unit. It is considerable.

다음에, 제어부(29)에서 수행되는 처리 중, 경로 안내 처리에 앞서 목적지 등의 명칭을 입력할 때 등에 실행되는 음성 인식 처리1에 대하여 도2의 플로우차트를 이용하여 설명한다. 음성 인식 처리1은, 네비게이션 장치(20)로의 정보 입력시에 음성 입력이 가능한 상태로 이용자가 특별히 지시한 경우에 수행되기 시작된다.Next, the voice recognition process 1 performed during inputting a name of a destination or the like before the route guidance process among the processes performed by the control unit 29 will be described using the flowchart of FIG. The speech recognition processing 1 starts to be performed when the user specifically instructs the user to input a voice at the time of inputting the information to the navigation apparatus 20.

제어부(29)가 수행을 개시하면, 먼저 조작 스위치군(22)과 리모콘(23a)에 설 치된 토크 SW가 이용자에 의해 눌려졌는지 여부에 의하여 처리를 분기한다(S110). 토크 SW가 이용자에 의해 눌려진 경우에는 다음의 처리 단계로 진행하고, 그렇지 않으면 본 단계에서 유지된다.When the control unit 29 starts performing, first, the process branches to whether or not the torque SW installed in the operation switch group 22 and the remote control unit 23a has been pressed by the user (S110). If the torque SW is pressed by the user, the processing proceeds to the next processing step, otherwise it is maintained at this step.

이어지는 S115에서는, 확인음(예를 들면 「삐」라는 전자음이나, "음성을 입력해주세요"라는 안내 음성)을 음성 출력부(27)로 출력시킨다.In following S115, a confirmation sound (for example, an electronic sound of "beep" or a guide voice of "Please input voice") is output to the voice output unit 27.

이어지는 S120에서는, 마이크로폰(28)을 통하여 이용자의 음성을 입력한다.In the following S120, the user's voice is input through the microphone 28.

이어지는 S125에서는, S120에서 입력한 음성을 분석(특징 파라미터 등을 추출)하고, 음성 인식 관련 데이터 입출력기(30)를 통해 취득한 단음절의 특징 파라미터 등과 비교하여 후보 단음절을 후보 순위를 부여하여 복수 선택한다.In subsequent S125, the voice input in S120 is analyzed (feature parameter extracted), and the candidate single syllable is given a plurality of candidates by selecting a candidate rank by comparing the characteristic parameters of the single syllable acquired through the voice recognition-related data input / output unit 30 and the like. .

이어지는 S130에서는 S125에서 선택한 후보 단음절 중, 제외 버퍼 내에 있는 후보 단음절을 제외한다. 이 제외 버퍼란 제어부(29) 내에 존재하고, 제외 지정된 후보 단음절을 3개 기억할 수 있는 버퍼이다. 또한, 제외 버퍼는 음성 인식 처리1의 수행 시작시에 초기화된다.In subsequent S130, candidate single syllables in the exclusion buffer are excluded from the candidate single syllables selected in S125. This exclusion buffer is a buffer existing in the control unit 29 and capable of storing three exclusion-designated candidate single syllables. In addition, the exclusion buffer is initialized at the start of execution of the voice recognition process 1.

이어지는 S135에서는 후보 단음절 중, 가장 후보 순위가 높은 후보 단음절을 표시부(26)에 표시하게 하거나 음성 출력부(27)에 음성 출력하게 통지한다.In subsequent S135, the candidate single syllable having the highest candidate rank among the candidate single syllables is displayed on the display unit 26 or the audio output unit 27 is outputted as voice.

다음에, 조작 스위치군(22) 및 리모콘(23a)에 설치된 확정 SW(상술한 토크 SW와 공용하도록 되어 있어도 좋음)가 이용자에 의해 눌렸는지의 여부 또는 이용자에 의해 추가로 음성이 입력되었는지의 여부에 따라 처리를 분기한다(S140). 확정 SW가 이용자에 의해 눌려진 경우에는 S145로 진행하고, 확정 SW가 이용자에 의해 조작되지 않고, 이용자에 의해 추가로 음성이 입력된 경우에는 S150으로 진행한다.Next, whether the fixed SW (may be shared with the above-mentioned torque SW) provided in the operation switch group 22 and the remote controller 23a has been pressed by the user, or whether voice has been further input by the user. The process branches according to step S140. When the definite SW is pressed by the user, the flow advances to S145. When the definite SW is not operated by the user and further voice is input by the user, the flow proceeds to S150.

S145에서는 S135에서 통지한 후보 단음절을 확정 단음절로 확정하고, 이미 확정한 확정 단음절군의 마지막에 부가한다. 또한, 제외 버퍼를 초기화한다(S153). 또한, 제외 버퍼를 초기화하면, 조작 스위치군(22) 또는 리모콘(23a)에 설치된 종료 SW가 이용자에 의해 조작되었는지의 여부에 따라 처리를 분기한다(S155). 이용자에 의해 종료 SW가 조작된 경우에는 본 처리(음성 인식 처리1)를 종료하고, 이용자에 의해 종료 SW가 조작되지 않으면 상술한 S115로 처리를 복귀시킨다.In S145, the candidate single syllable notified in S135 is confirmed as the final single syllable, and added to the end of the already determined final single syllable group. The exclusion buffer is also initialized (S153). When the exclusion buffer is initialized, the processing branches according to whether or not the end SW installed in the operation switch group 22 or the remote controller 23a has been operated by the user (S155). If the end SW is operated by the user, this process (voice recognition process 1) is terminated, and if the end SW is not operated by the user, the process returns to the above-described S115.

한편, S150에서는, S135에서 통지한 후보 단음절을 제외 버퍼로 입력한다. 이 때, 제외 버퍼 중에서 이미 후보 단음절이 3개인 경우에는 가장 과거에 제외 버퍼로 입력한 후보 단음절을 삭제하고, 새롭게 S135에서 통지한 후보 단음절을 제외 버퍼로 입력한다.On the other hand, in S150, the candidate single syllable notified in S135 is input to the exclusion buffer. At this time, if there are already three candidate single syllables in the exclusion buffer, the candidate single syllables that have been previously input to the exclusion buffer are deleted, and the candidate single syllables newly notified in S135 are input as the exclusion buffer.

또한, 편의적(설명을 간략화하기 위해서)으로, 종료 SW가 조작되었는지의 여부를 판정하는 단계 S155에서만, 종료 SW의 조작에 의한 음성 인식 처리1을 종료하도록 되어 있지만, 어느 단계에서도 종료 SW가 조작된 경우에는 즉시 음성 인식 처리1을 종료하도록 되어 있다. 또한, 음성의 입력 단계 S120, S140나 이용자의 조작 대기 단계 S140에서, 소정 시간(예를 들면, 30초) 음성의 입력이나 이용자의 조작이 없는 경우에도 음성 인식 처리(1)를 종료하도록 되어 있다.For convenience (to simplify the explanation), only the step S155 for determining whether or not the end SW has been operated is to terminate the voice recognition processing 1 by the operation of the end SW. In this case, the voice recognition process 1 is terminated immediately. In addition, in the voice input steps S120 and S140 or the user's operation wait step S140, the voice recognition process 1 is terminated even when there is no input of the voice for a predetermined time (for example, 30 seconds) or the user's operation. .

지금까지 음성 인식 처리1에 대하여 설명했지만, 이와 같이 하여 확정 단음절군은 경로 안내 처리시의 목적지의 명칭으로 이용하거나 시설의 명칭으로 이용한다.Although the voice recognition processing 1 has been described so far, the single-syllable group is used as the name of the destination in the route guidance process or as the name of the facility.

이와 같은 네비게이션 장치(20)에 따르면, 이용자는 발화한 단음절이 정확하 게 인식된 경우에만 조작을 수행하여 단음절을 확정시키고, 정확하게 인식되지 않은 경우에는 어떠한 조작없이 정확하게 인식될 때까지 계속해서 단음절을 발화할 수 있다. 이 때문에, 이용자는 인식이 정확하게 이루어지지 않은 경우에 몇번이고 재입력 지시를 하지 않고 계속해서 재발화하기만 하면 된다. 즉, 사용하기 편리하다.According to such a navigation device 20, the user performs an operation only when the uttered single syllable is correctly recognized to determine the single syllable, and if it is not correctly recognized, continues to perform the single syllable until it is correctly recognized without any manipulation. Can ignite. For this reason, the user only has to re-ignite continuously without giving re-input again and again if the recognition is not made correctly. That is convenient to use.

또한, 제외 버퍼에 기억되어 있는 후보 단음절은 재발화에 의해 새롭게 선택된 후보 단음절로부터 제외하도록 되어 있기 때문에, 재발화시에 재차 전회와 동일한 부적절한 후보 단음절이 통지되지 않음으로, 이용자가 사용하기 편리하다.In addition, since the candidate single syllables stored in the exclusion buffer are excluded from the newly selected candidate single syllables by re-ignition, inappropriate candidate single syllables similar to the previous time are not notified again at the time of re- firing, and thus are convenient for the user to use.

또한, 상술한 제외 버퍼가 후보 단음절을 3개만 기억할 수 있도록 구성한 이유를 설명한다.The reason why the above-described exclusion buffer is configured to store only three candidate single syllables will be described.

본원 발명자들은 다음과 같은 실험을 하였다. 이 실험은, 정지한 차 실내에서 20대에서 60대까지 각 나이대의 남녀 각 2명(즉, 총 20명)이 1명씩 10회 반복 발화하는 것을 3번 수행하는 실험이다. 또한, 이 실험 결과에 기초하여, 발화에 의한 입력 횟수를 횡축으로 하고, 이 입력 횟수까지 정확한 단음절이 인식된 확률을 종축에 나타낸 그래프를 도9에 도시하였다. 이 그래프로부터 알 수 있듯이, 3회차 이후는 거의 인식률이 일정하게 되고(3회차는 인식율 96%, 4회차는 인식율 98%, 5회차는 인식율 98%), 그 이후는 거의 변화가 없다. 즉, 4회 이상 발화 횟수를 중복해도 그 이후에 정확한 후보 단음절이 통지되는 경우는 거의 없다. 즉, 대부분의 경우, 재발화 횟수 3회 시점까지 정확한 단음절이 한번은 통지되는 것을 의미하고, 재발화 횟수가 3회로 된 경우에는 화자가 바른 후보 단음절을 실수로 제외시켜버릴 가능성이 높다는 것을 의미한다. 그 때문에, 재발화 횟수가 3회로 된 경우에는, 한번은 처음으로 인식된 것을 다시 인식 후보로 통지가능하도록 하는 것이 좋다.The inventors of the present invention performed the following experiment. This experiment is an experiment in which two men and women of each age (ie, 20 persons in total) from 20 to 60 years of age in a stationary car are repeatedly fired 10 times, one by three. In addition, on the basis of this experimental result, a graph showing the probability of correcting a single syllable up to this input number on the vertical axis is shown in FIG. As can be seen from the graph, the recognition rate is almost constant after the third round (the third round is 96%, the fourth round is 98%, the fifth round is 98%), and after that, there is almost no change. In other words, even if the number of times of speech is duplicated four or more times, the correct candidate single syllable is rarely notified thereafter. That is, in most cases, it means that the correct single syllable is notified once until three times the number of re-ignitions, and when the number of re-ignitions is three times, it means that the speaker is likely to exclude the correct candidate single syllables by mistake. Therefore, when the number of times of re-ignition is three, it is better to be able to notify the recognition candidate once again that the first recognition has been made.

이와 같이 되어 있다면, 후보 단음절이 통지되었음에도 불구하고, 이용자가 실수로 재발화한 경우에도, 제외된 후보 단음절이 적절한 타이밍에서 다시 통지되어 취득되는 상태로 복귀하기 때문에, 두번 다시 정확한 후보 단음절이 통지되지 않게 되는 문제를 방지할 수 있다.If this is the case, even though the candidate single syllable is notified, even if the user accidentally re-ignites, the correct candidate single syllable is not reported again because the excluded candidate single syllable is returned to the state obtained by being notified again at an appropriate timing. This can prevent the problem of being turned off.

(제2 실시예)(2nd Example)

다음에, 제2 실시예에 대하여 설명한다. 제2 실시예의 음성 인식 기능을 갖는 네비게이션 장치는, 상술한 제1 실시예의 네비게이션 장치(20)와 동일한 구성을 갖기 때문에, 상이점에 대해서만 설명한다. 주요한 상이점은, 제어부(29)에서 수행되는 음성 인식 처리에 있다. 이하, 제어부(29)에서 수행되는 음성 인식 처리2에 대해서 도3의 플로우차트를 이용하여 설명한다.Next, a second embodiment will be described. Since the navigation apparatus having the voice recognition function of the second embodiment has the same configuration as the navigation apparatus 20 of the first embodiment described above, only the differences will be described. The main difference lies in the speech recognition process performed by the control unit 29. Hereinafter, the speech recognition process 2 performed by the control unit 29 will be described using the flowchart of FIG.

음성 인식 처리2는, 네비게이션 장치(20)로의 정보 입력시에 음성 입력이 가능한 상태에서 이용자가 특별히 지시한 경우에 수행되기 시작한다.The speech recognition processing 2 starts to be performed when the user specifically instructs in the state in which voice input is possible at the time of inputting information to the navigation apparatus 20.

제어부(29)가 수행을 개시하면, 먼저 조작 스위치군(22) 또는 리모콘(23a)에 설치된 토크 SW가 이용자에 의해서 눌려졌는지의 여부에 따라 처리를 분기한다(S210). 토크 SW가 이용자에 의해 눌려진 경우에는 다음의 처리 단계로 진행하고, 그렇지 않으면 본 단계에 유지된다.When the control unit 29 starts performing, first, the process branches according to whether or not the torque SW installed in the operation switch group 22 or the remote control unit 23a is pressed by the user (S210). If the torque SW is pressed by the user, the processing proceeds to the next processing step, otherwise it is held in this step.

이어지는 S215에서는, 확인음(예를 들면 「삐」라는 전자음이나, 「음성을 입력해주세요」라는 안내 음성)을 음성 출력부(27)로 출력시킨다.In subsequent S215, a confirmation sound (for example, an electronic sound of "beep" or a guide voice of "Please input voice") is output to the audio output unit 27.

이어지는 S220에서는, 마이크로폰(28)을 통하여 이용자의 음성을 입력한다.In S220 that follows, the user's voice is input through the microphone 28.

이어지는 S225에서는, S220에서 입력한 음성을 분석(특징 파라미터 등을 추출)하고, 음성 인식 관련 데이터 입출력기(30)를 통하여 취득한 단음절의 특징 파라미터 등과 비교하여 후보 단음절을 후보 순위를 부여하여 복수 선택한다. 또한, S220에서 입력한 음성이 단음절이 아닌 경우에는, 확정을 의미하는 확정어(「다음」이나 「다음으로」나 「다음은」 등)인지의 여부를 판정한다.In subsequent S225, the voice inputted in S220 is analyzed (feature parameter extracted), and the candidate single syllable is given a plurality of candidates by selecting a candidate rank by comparing with the feature parameter of the single syllable acquired through the voice recognition-related data input / output unit 30 and the like. . In addition, when the voice input in S220 is not a single syllable, it is determined whether it is a definite word ("next", "next", "next", etc.) that means confirmation).

이어지는 S230에서는, S220에서 입력된 음성이 확정을 의미하는 확정어인지의 여부에 따라서 처리를 분기한다. S220에서 입력된 음성이 확정어인 경우에는 S250으로 진행하고, S220에서 입력된 음성이 확정어가 아니라면 S235로 진행한다.In subsequent S230, the processing branches according to whether or not the voice input in S220 is a definite word meaning definite. If the voice input in S220 is a definite word, the flow proceeds to S250. If the voice input in S220 is not a definite word, the flow proceeds to S235.

S235에서는, S225에서 선택한 후보 단음절 중, 제외 버퍼 내에 있는 후보 단음절을 제외한다. 이 제외 버퍼란 제어부(29) 내에 존재하고, 제외 지정된 후보 단음절을 3개 기억할 수 있는 버퍼이다. 또한, 제외 버퍼는 음성 인식 처리2의 수행 시작시에 초기화된다.In S235, the candidate single syllables in the exclusion buffer are excluded from the candidate single syllables selected in S225. This exclusion buffer is a buffer existing in the control unit 29 and capable of storing three exclusion-designated candidate single syllables. Also, the exclusion buffer is initialized at the start of the performance of the speech recognition process 2.

또한, S240에서는 후보 단음절 중, 가장 후보 순위가 높은 후보 단음절을 표시부(26)에 표시하거나, 음성 출력부(27)로 음성 출력하여 통지한다.Further, in S240, the candidate single syllable having the highest candidate rank among the candidate single syllables is displayed on the display unit 26, or the audio output unit 27 outputs and notifies the voice.

또한, S245에서는 S240에서 통지한 후보 단음절을 제외 버퍼로 입력한다. 이 때, 제외 버퍼 안에 이미 후보 단음절이 3개가 있는 경우에는, 가장 과거에 제외 버퍼로 입력한 후보 단음절을 삭제하고, 새롭게 S240에서 통지한 후보 단음절을 제외 버퍼로 입력한다.In S245, the candidate single syllable notified in S240 is input to the exclusion buffer. At this time, if there are already three candidate single syllables in the exclusion buffer, the candidate single syllables inputted to the exclusion buffer most recently are deleted, and the candidate single syllables newly notified in S240 are input as the exclusion buffer.

한편, S230에서, S220에서 입력된 음성이 확정어인 것으로 하여 진행하는 S250에서는, 전회 통지한 후보 단음절을 확정 단음절로서 확정하고, 이미 확정한 확정 단음절군의 마지막에 부가한다. 그리고, 제외 버퍼를 초기화한다(S253). 또한, 제외 버퍼를 초기화하면, 조작 스위치군(22) 및 리모콘(23a)에 설치된 종료 SW가 이용자에 의해 조작되었는지의 여부에 따라 처리를 분기한다(S255). 이용자에 의해 종료 SW가 조작된 경우에는 본 처리(음성 인식 처리2)를 종료하고, 이용자에 의해 종료 SW가 조작되지 않은 경우에는 상술한 S215로 처리를 복귀시킨다.On the other hand, in S230, in step S250 where the voice input in S220 proceeds as a definite word, the candidate single syllable notified last time is determined as the definite single syllable, and is added to the end of the definite single syllable group already determined. The exclusion buffer is initialized (S253). When the exclusion buffer is initialized, the processing branches according to whether or not the end SW installed in the operation switch group 22 and the remote controller 23a has been operated by the user (S255). If the end SW has been operated by the user, this process (voice recognition process 2) is terminated. If the end SW has not been operated by the user, the process returns to S215 described above.

또한, 편의적(설명을 간략화하기 위해서)으로, 종료 SW가 조작되었는지의 여부를 판정하는 단계 S255에서만 종료 SW의 조작에 의한 음성 인식 처리2를 종료하도록 되어 있지만, 어느 단계에서도 종료 SW가 조작된 경우에는 바로 음성 인식 처리(2)를 종료하도록 되어 있다. 또한, 음성의 입력 단계 S220에서, 소정 시간(예를 들면, 30초) 음성의 입력이 없는 경우에도 음성 인식 처리(2)를 종료하도록 되어 있다. Also, for convenience (to simplify the explanation), the voice recognition process 2 by the operation of the end SW is to be terminated only in step S255 in which it is determined whether the end SW has been operated. The voice recognition process 2 is immediately terminated. In addition, in the voice input step S220, the voice recognition process 2 is terminated even when there is no input of voice for a predetermined time (for example, 30 seconds).

지금까지 음성 인식 처리2에 대하여 설명하였지만, 이와 같이 하여 확정한 확정 단음절군은 경로 안내 처리시의 목적지의 명칭으로 이용하거나 시설의 명칭으로 이용한다.Although the voice recognition processing 2 has been described so far, the single-syllable group determined in this manner is used as the name of the destination or the name of the facility during the route guidance processing.

이와 같은 네비게이션(20)에 따르면, 이용자는 발화한 단음절이 정확하게 인식된 경우에만 확정어(「다음으로」)를 발화하여 단음절을 확정시키고, 정확하게 인식되지 않은 경우에는 어떠한 특별한 조작이나 발화하지 않고, 정확하게 인식될 때까지 인식시키고 싶은 단음절을 계속해서 발화할 수 있다. 이 때문에, 이용자는 인식이 정확하게 이루어지지 않은 경우에 계속 재입력 지시를 하지 않고, 계속해서 재발화하기만 하면 된다. 즉, 사용하기 편리하다.According to such a navigation 20, the user utters a definite word ("next") only when the uttered single syllable is correctly recognized, and if it is not correctly recognized, without any special manipulation or utterance, You can continue to utter the single syllable you want to recognize until it is correctly recognized. For this reason, the user only needs to re-ignite continuously without giving a re-entry instruction continuously if the recognition is not made correctly. That is convenient to use.

(제3 실시예)(Third Embodiment)

다음에, 제3 실시예에 대하여 설명한다. 제3 실시예의 음성 인식 기능을 갖는 네비게이션 장치는, 상술한 제1 실시예의 네비게이션 장치(20)와 동일한 구성을 갖기 때문에 상이점에 대해서만 설명한다. 주요한 상이점은, 제어부(29)에서 수행되는 음성 인식 처리에 있다. 이하, 제어부(29)에서 수행되는 음성 인식 처리3에 대하여 도4의 플로우차트를 이용하여 설명한다. Next, a third embodiment will be described. Since the navigation device having the voice recognition function of the third embodiment has the same configuration as the navigation device 20 of the first embodiment described above, only the differences will be described. The main difference lies in the speech recognition process performed by the control unit 29. Hereinafter, the speech recognition process 3 performed by the control unit 29 will be described using the flowchart of FIG.

제어부(29)가 수행을 개시하면, 먼저 조작 스위치군(22)과 리모콘(23a)에 설치된 토크 SW가 이용자에 의해 눌려졌는지의 여부에 따라서 처리를 분기한다(S310). 토크 SW가 이용자에 의해 눌려진 경우에는 다음의 처리 단계로 진행하고,그렇지 않으면 본 단계에서 유지된다.When the control unit 29 starts performing, first, the process branches to whether or not the torque SW installed in the operation switch group 22 and the remote control unit 23a has been pressed by the user (S310). If the torque SW is pressed by the user, the processing proceeds to the next processing step, otherwise it is maintained at this step.

이어지는 S315에서는, 확인음(예를 들면 「삐」라는 전자음이나, 「음성을 입력해주세요」라는 안내 음성)을 음성 출력부(27)로 출력시킨다.In subsequent S315, a confirmation sound (for example, an electronic sound of "beep" or a guide voice of "Please input voice") is output to the audio output unit 27.

이어지는 S320에서는, 마이크로폰(28)을 통하여 이용자의 음성을 입력한다.In the following S320, the user's voice is input through the microphone 28.

이어지는 S325에서는, S320에서 입력한 음성을 분석(특징 파라미터 등을 추출)하고, 음성 인식 관련 데이터 입출력기(30)를 통해 취득한 단음절의 특징 파라미터 등과 비교하여 후보 단음절을 3개 선택한다.In subsequent S325, the voice input in S320 is analyzed (feature parameter extracted), and three candidate single syllables are selected by comparing the characteristic parameters of the single syllable acquired through the speech recognition related data input / output unit 30 and the like.

이어지는 S330에서는, 차내 LAN 통신부(31)를 통하여 도시하지 않은 엔진 ECU로부터 차속도 정보를 취득하고, 차량이 주행중인지의 여부에 따라 처리를 분기한다. 차량이 주행중이면 S335로 진행하고, 차량이 주행중이 아니면 S340로 진행한다.In subsequent S330, vehicle speed information is acquired from an engine ECU (not shown) via the in-vehicle LAN communication unit 31, and the process branches according to whether the vehicle is running. If the vehicle is running, the process proceeds to S335. If the vehicle is not driving, the process proceeds to S340.

S335에서는 S325에서 선택한 후보 단음절을 표시부(26)에 표시 영역 내에서 최대인 큰 오브젝트군으로서 나열하여 표시한다. 이 표시의 일례를 도6에 도시한다. 도6에 도시하는 바와 같이, 화면(100)에는 후보 단음절 오브젝트(101~103)가 표시 영역 내의 대부분을 차지하도록 나열하여 표시되어 있다. 또한, 후보 단음절 오브젝트(101)보다 넓은 영역에 점선(실제로는 표시되지 않음, 이하 동일함)으로 나타낸 조작 특정 범위(104)가 설정되어 있다. 이 조작 특정 범위(104)는 이용자가 조작 특정 범위(104)를 터치한 경우에 제어부(29)가 후보 단음절 오브젝트(101)가 이용자에 의해 선택되었음을 인식하는 범위이다. 이와 같이, 후보 단음절 오브젝트(102)에는 조작 특정 범위(105)가 설정되고, 후보 단음절 오브젝트(103)에는 조작 특정 범위(106)가 설정되어 있다.In S335, the candidate single syllables selected in S325 are displayed on the display unit 26, listed as the largest object group in the display area. An example of this display is shown in FIG. As shown in Fig. 6, the screen 100 displays the candidate single syllable objects 101 to 103 arranged in such a manner as to occupy most of the display area. In addition, an operation specific range 104 indicated by a dotted line (not actually displayed, which is the same below) is set in an area wider than the candidate single syllable object 101. This operation specifying range 104 is a range in which the control unit 29 recognizes that the candidate single syllable object 101 has been selected by the user when the user touches the operation specifying range 104. In this way, the operation specific range 105 is set in the candidate single syllable object 102, and the operation specific range 106 is set in the candidate single syllable object 103.

도4로 돌아가서, 한편, S340에서는 표시부(26)에 50음표를 표시하고, 또한 S325에서 선택한 후보 단음절의 오브젝트의 테두리를 변경한다. 이 표시의 일례를 도7에 도시한다. 도7에 도시하는 바와 같이 화면(111)에는 50음 일람 형식으로 각 단음절이 오브젝트로서 나열되고, 그 중에서도 「あ(아)」, 「は(하)」,「ま(마)」의 후보 오브젝트(112~114) 만은 다른 단음절 오브젝트의 테두리와 다른 테두리의 굵기 및 색으로 되어 있다.4, on the other hand, in S340, 50 notes are displayed on the display unit 26, and the border of the object of the candidate single syllable selected in S325 is changed. An example of this display is shown in FIG. As shown in Fig. 7, each single syllable is arranged as an object in the form of a list of 50 sounds on the screen 111, and among them, candidate objects of "A", "Ha" and "Ma". Only 112 to 114 have a thickness and a color of a border different from other single syllable objects.

도4로 돌아가서, 이어지는 S345에서는, 표시부(26)의 표면과 일체로 구성된 터치 패널로부터 출력된 신호에 기초하여, 이용자에 의해 어떤 오브젝트가 선택되었는지의 여부에 따라 처리를 분기한다. 이용자에 의해 어떤 오브젝트가 선택된 경우에는 S350으로 진행하고, 이용자에 의하여 어떤 오브젝트도 선택되지 않았다면(예를 들면, 30초간), 상술한 S320으로 처리를 복귀시킨다.Returning to Fig. 4, in S345 which follows, the processing branches according to whether an object is selected by the user based on the signal output from the touch panel integrally formed with the surface of the display unit 26. If an object is selected by the user, the process proceeds to S350. If no object is selected by the user (for example, 30 seconds), the process returns to the above-described S320.

이용자에 의해 어떤 오브젝트가 선택된 경우에 진행하는 S350에서는, 선택된 오브젝트에 대응하는 후보 단음절을 확정 단음절로서 결정하고, 이미 결정이 끝난 확정 단음절군의 마지막에 부가한다. 또한, 여기에서 언급하는 "선택된 오브젝트"란, 상기 S340에서 설명한 표시(도7 참조)를 실행한 경우에는, 후보 단음절의 오브젝트에 제한되지 않고, 이용자에 의해 선택된 단음절의 오브젝트는 어느 것이나 대상으로 된다.In S350, which proceeds when an object is selected by the user, the candidate single syllable corresponding to the selected object is determined as the determined single syllable and added to the end of the determined single syllable group. Note that the term " selected object " referred to herein is not limited to the candidate single syllable object when the display described in S340 (see FIG. 7) is performed, and any object of the single syllable selected by the user is a target. .

이어지는 S355에서는, 조작 스위치군(22) 또는 리모콘(23a)에 설치된 종료 SW가 이용자에 의해 조작되었는지의 여부에 따라 처리를 분기한다. 이용자에 의해 종료 SW가 조작된 경우에는 본 처리(음성 인식 처리3)를 종료하고, 이용자에 의해 종료 SW가 조작되지 않으면 상술한 S315로 처리를 복귀시킨다.In subsequent S355, the processing branches according to whether or not the end SW provided in the operation switch group 22 or the remote controller 23a has been operated by the user. If the end SW is operated by the user, this process (voice recognition process 3) is terminated, and if the end SW is not operated by the user, the process returns to the above-described S315.

또한, 편의적(설명을 간략화하기 위해서)으로 종료 SW가 조작되었는지의 여부를 판정하는 단계 S355에서만, 종료 SW의 조작에 의한 음성 인식 처리3을 종료하도록 되어 있지만, 어느 단계에서도, 종료 SW가 조작된 경우에는, 즉시 음성 인식 처리3을 종료하도록 되어 있다. 또한, 음성의 입력 단계(S320)에서, 소정 시간(예를 들면, 30초) 음성의 입력이 없는 경우에도 음성 인식 처리3을 종료하도록 되어 있다.Further, only in step S355 for determining whether or not the end SW has been operated for convenience (to simplify the explanation), the voice recognition process 3 by the operation of the end SW is to be terminated. In this case, the voice recognition process 3 is terminated immediately. In the voice input step S320, the voice recognition process 3 is terminated even when there is no voice input for a predetermined time (for example, 30 seconds).

지금까지 음성 인식 처리3에 대하여 설명했지만, 이와 같이 하여 확정한 확정 단음절군은 경로 안내 처리시의 목적지의 명칭으로 이용하거나 시설의 명칭으로서 이용한다. Although the voice recognition processing 3 has been described so far, the single-syllable group determined in this way is used as the name of the destination during the route guidance process or as the name of the facility.

이와 같은 네비게이션 장치(20)에 따르면, 차량이 주행중인 경우에는 후보 단음절이 표시부(26)의 표시 영역 내에서 최대인 오브젝트군으로 나열되어 표시되기 때문에, 이용자는 일별하여 단음절을 확인할 수 있다. 그 결과, 이용자는 원활하게 단음절을 확정할 수 있다. 또한, 그 경우 표시부(26)의 표시 영역에서의 각 오브젝트가 차지하는 표시 범위보다 센서가 감지한 위치에 의해 각 오브젝트를 특정하는 특정 범위가 넓게 취급하도록 되어 있기 때문에, 이용자는 오브젝트가 표시된 위치를 정확하게 터치할 필요가 없어진다. 따라서 이용자가 운전중일지라도 이용자는 원하는 후보 단음절을 선택하기 쉽다.According to such a navigation device 20, when the vehicle is driving, the candidate single syllables are listed and displayed as the largest object group in the display area of the display unit 26, so that the user can check the single syllables one by one. As a result, the user can smoothly determine the single syllable. In this case, since the specific range for specifying each object is wider than the display range occupied by each object in the display area of the display unit 26, the user can accurately determine the position at which the object is displayed. There is no need to touch. Therefore, even if the user is driving, the user can easily select the desired candidate single syllable.

한편, 차량이 정지중인 경우에는 이용자는 후보 단음절 이외의 단음절도 선택할 수 있기 때문에, 보다 신속하게 단음절을 확정할 수 있다.On the other hand, when the vehicle is stopped, the user can select single syllables other than the candidate single syllables, so that the single syllable can be determined more quickly.

(제4 실시예)(Example 4)

다음에, 제4 실시예에 대하여 설명한다. 제4 실시예의 음성 인식 기능을 갖는 네비게이션 장치는, 상술한 제1 실시예의 네비게이션 장치(20)와 동일한 구성을 갖기 때문에 상이점에 대해서만 설명한다. 주요한 상이점은, 제어부(29)에서 실행되는 음성 인식 처리에 있다. 이하, 제어부(29)에서 실행되는 음성 인식 처리4에 대하여 도5의 플로우차트를 이용하여 설명한다.Next, a fourth embodiment will be described. Since the navigation apparatus having the voice recognition function of the fourth embodiment has the same configuration as the navigation apparatus 20 of the first embodiment described above, only the differences will be described. The main difference lies in the speech recognition process executed by the control unit 29. Hereinafter, the speech recognition process 4 executed by the control unit 29 will be described using the flowchart of FIG.

제어부(29)가 수행을 시작하면, 우선, 조작 스위치군(22) 및 리모콘(23a)에 설치된 토크 SW가 이용자에 의해 눌려졌는지의 여부에 따라 처리를 분기한다(S410). 토크 SW가 이용자에 의해 눌려진 경우에는 다음의 처리 단계로 진행하고, 그렇지 않으면 본 단계에 유지된다.When the control unit 29 starts performing, first, the process branches to whether or not the torque SW installed in the operation switch group 22 and the remote controller 23a has been pressed by the user (S410). If the torque SW is pressed by the user, the processing proceeds to the next processing step, otherwise it is held in this step.

이어지는 S415에서는, 확인음(예를 들면 「삐」라는 전자음이나, 「음성을 입력해 주십시요」라는 안내 음성)을 음성 출력부(27)로 출력시킨다.In following S415, a confirmation sound (for example, an electronic sound of "beep" or a guide voice of "Please input voice") is output to the audio output unit 27.

이어지는 S420에서는, 마이크로폰(28)을 통해 이용자의 음성을 입력한다.In S420 that follows, the user's voice is input through the microphone 28.

이어지는 S425에서는, S320에서 입력한 음성을 분석(특징 파라미터 등을 추출)하고, 음성 인식 관련 데이터 입출력기(30)를 통해 취득한 단음절의 특징 파라미터 등과 비교하여 후보 단음절을 3개 선택한다.In subsequent S425, the speech inputted in S320 is analyzed (feature parameter extracted, etc.), and three candidate single syllables are selected by comparing the characteristic parameters of the single syllable acquired through the speech recognition related data input / output unit 30 and the like.

S435에서는, S425에서 선택한 후보 단음절에 대응하는 확인용 단어를 표시부(26)의 표시 영역 내에 오브젝트군으로서 나열하여 표시하고, 또한 음성 출력부(27)를 통해 음성으로서 순서대로 통지한다. 여기에서 언급하는 확인용 단어란, 음성 인식 관련 데이터 입출력기(30)를 통해 취득할 수 있는 것이고, 각 단음절에 대응하여 그 단음절을 선두로 포함하는 단어이다. 구체적으로 예를 들면, 단음절 「あ(아)」에 대하여 확인용 단어 「あさひ(아사히)」, 단음절 「は(하)」에 대하여 확인용 단어 「はがき(하가키)」, 단음절 「ま(마)」에 대하여 「まつり(마츠리)」 등이다. 이 표시의 일례를 도8에 도시한다. 도8에 도시하는 바와 같이, 화면(121)에는 확인용 단어 오브젝트(122,123,124)가 표시 영역 내의 대부분을 차지하도록 나열되어 표시되어 있다. 또한, 이용자가 확인용 단어 오브젝트(122~124)중 어느 하나를 터치한 경우에는, 제어부(29)는 터치된 확인용 단어 오브젝트가 어느 것인지를 인식할 수 있도록 되어 있다.In S435, the confirmation word corresponding to the candidate single syllable selected in S425 is displayed and displayed as a group of objects in the display area of the display unit 26, and the voice output unit 27 is notified sequentially as voice. The confirmation word mentioned here is a word which can be acquired through the speech recognition data input / output device 30, and includes the single syllable at the head corresponding to each single syllable. Specifically, for example, the confirmation word "Asahi" and the syllable "Ha (ha)" for the single syllable "あ" and the single syllable "ま" ), "Matsuri" and the like. An example of this display is shown in FIG. As shown in FIG. 8, on the screen 121, confirmation word objects 122, 123, and 124 are arranged so as to occupy most of the display area. In addition, when a user touches any of the confirmation word objects 122-124, the control part 29 is able to recognize which touched confirmation word object.

도5로 돌아가서, S440에서는 마이크로폰(28)을 통해 이용자의 음성을 입력한다. 그리고, S440에서 입력한 음성을 분석(특징 파라미터 등을 추출)하고, S435에서 표시부(26)에 표시한 확인용 단어 중 어느 것인지 특정을 시도한다(S445).5, in S440, the user's voice is input through the microphone 28. Then, the voice input in S440 is analyzed (feature parameter extracted, etc.), and at step S435, one of the confirmation words displayed on the display unit 26 is attempted (S445).

이어지는 S450에서는, (S435)에서 표시부(26)에 표시한 확인용 단어 중 어느 것인지 특정할 수 있는 경우에는 S455로 진행되고, 특정할 수 없는 경우에는 S420로 처리를 복귀시킨다.In S450 which follows, if it is possible to specify which of the confirmation words displayed on the display part 26 in S435, it progresses to S455, and when it cannot specify, the process returns to S420.

S455에서는, 특정할 수 있는 확인용 단어에 대응하는 후보 단음절을 확정 단음절로 하여 이미 확정이 끝난 확정 단음절군의 마지막에 부가한다.In S455, the candidate single syllable corresponding to the identifying word which can be identified is added as the final single syllable group as the final single syllable.

이어지는 S460에서는, 조작 스위치군(22) 및 리모콘(23a)에 설치된 종료 SW가 이용자에 의해 조작되었는지의 여부에 따라 처리를 분기한다. 이용자에 의해 종료 SW가 조작된 경우에는 본 처리(음성 인식 처리4)를 종료하고, 이용자에 의해 종료 SW가 조작되지 않은 경우에는 상술한 S435로 처리를 복귀시킨다.In subsequent S460, the processing branches according to whether or not the end SW provided in the operation switch group 22 and the remote controller 23a has been operated by the user. If the end SW has been operated by the user, this process (voice recognition process 4) is terminated. If the end SW has not been operated by the user, the process returns to S435 described above.

또한, 편의적(설명을 간략화하기 위해서)으로, 종료 SW가 조작되었는지의 여부를 판정하는 단계 S460에서만 종료 SW의 조작에 의한 음성 인식 처리4를 종료하도록 되어 있지만, 어느 단계에서도 종료 SW가 조작된 경우에는 즉시 음성 인식 처리3를 종료하도록 되어 있다. 또한, 음성의 입력 단계(S420,S440)에서, 소정 시간(예를 들면, 30초) 음성의 입력이 없는 경우에도 음성 인식 처리 (4)를 종료하도록 되어 있다.Also, for convenience (to simplify the explanation), the voice recognition processing 4 by the operation of the end SW is to be terminated only in step S460 for determining whether the end SW has been operated, but the end SW has been operated in any step. The voice recognition process 3 is immediately terminated. In the voice input steps S420 and S440, the voice recognition process 4 is terminated even when there is no voice input for a predetermined time (for example, 30 seconds).

지금까지 음성 인식 처리4에 대하여 설명했지만, 이와 같이 하여 확정한 확정 단음절군는 경로안내 처리시의 목적지의 명칭으로 이용하거나 시설의 명칭으로 이용한다.Although the speech recognition processing 4 has been described so far, the definite single-syllable group determined in this way is used as the name of the destination or the name of the facility during the route guidance processing.

이러한 네비게이션 장치(20)에 따르면, 후보 단음절을 확인용 단어를 이용하여 이용자에게 통지하도록 되어 있기 때문에, 이용자는 단음절로 통지되는 것 보다 파악하기 쉽다. 또한, 후보 중에서 음성으로 선택되는 경우에도 그 확인용 단어를 이용하여 선택할 수 있기 때문에, 선택을 음성으로 수행한 경우에도 인식율이 높다.According to such a navigation device 20, since the candidate single syllable is notified to the user by using the words for confirmation, the user is easier to grasp than to be notified by the single syllable. In addition, even when a voice is selected from the candidates, since the word can be selected using the confirmation word, the recognition rate is high even when the selection is performed by voice.

이하, 다른 실시예에 대하여 설명한다.Hereinafter, another Example is described.

(1) 상기 실시예에서는, 이용자는 기본적으로 음성 입력을 단음절로 수행하도록 되어 있었지만, 단음절에 대응된 복수 음절로 이루어지는 단음절 인식용 특정어에 의해 입력하도록 되어 있어도 좋다. 그 경우, 네비게이션 장치(20)는 음성 인식 관련 데이터 입출력기(30)를 통해 입력한 음성 인식 관련 데이터에 기초하여, 입력된 단음절 인식용 특정어에 대응하는 단음절을 특정하도록 되어 있으면 좋다. 또한, 미리 각종 장르 등으로 분리된 단음절 인식용 특정어의 사전을 음성 인식 관련 데이터 기억 매체에 기억시켜 두고, 이용자가 그 사전을 선택할 수 있게 되어 있다면, 이용자의 기호에 의해 사전을 선택할 수 있기 때문에 이용자는 단음절 인식용 특정어를 조속히 기억하여 사용할 수 있게 된다. 또한, 이 단음절 인식용 특정어는, 이용자가 등록할 수 있게 되어 있다면, 이용자는 단음절 인식용 특정어를 조속히 기억하여 사용할 수 있게 된다.(1) In the above embodiment, the user basically performs audio input in single syllables, but may be input by a single syllable recognition word consisting of a plurality of syllables corresponding to the single syllable. In this case, the navigation device 20 may be configured to specify a single syllable corresponding to the input single syllable recognition specific word based on the voice recognition related data input through the voice recognition related data input / output unit 30. In addition, if a dictionary of a single syllable recognition word divided into various genres is stored in a speech recognition data storage medium in advance and the user can select the dictionary, the dictionary can be selected according to the user's preference. The user can quickly memorize and use a specific word for single syllable recognition. If the user can register the single syllable recognition specific word, the user can quickly store and use the single syllable recognition specific word.

(2) 또한, 네비게이션 장치(20)는, 음성을 분석할 때의 방법으로서, 입력한 동일 단음절로 이루어지는 반복되는 음성을 단음절마다 음성으로 나누고, 그 각 음성에 기초하여 이용자가 의도하는 단음절을 하나 결정하도록 되어 있어도 좋다. 즉, 이용자는 단음절을 연속해서 발화(예를 들면 「あああ(아아아)」)하면, 「あ(아)」 라는 단음절이 인식된다. 이렇게 되어 있다면, 단지 「あ(아)」라고 이용자가 발화할 경우와 비교하여 인식 단서가 늘어나기 때문에 인식율도 향상한다.(2) In addition, as a method for analyzing a voice, the navigation device 20 divides a repeated voice composed of the same single syllable input into voices for each single syllable, and generates a single syllable intended by the user based on the respective voices. It may be decided. That is, when a user utters a single syllable consecutively (for example, "Aaa"), a single syllable "A (ah)" is recognized. In this case, the recognition rate is also improved because the recognition clue increases as compared with the case where the user speaks "A".

(3) 또한, 네비게이션 장치(20)는, 음성을 분석할 때의 방법으로서, 입력한 단음절의 음성이 탁음, 요음, 촉음 또는 반탁음 중 어느 것인 경우, 그 탁음, 요음, 촉음 및 반탁음에 대응하는 청음을 이용자가 의도하는 단음절로서 결정하도록 되어 있어도 좋다. 또한, 그 경우에는 더욱 입력한 음성이, 예를 들면, 미리 정해진 탁음을 의미하는 특정어라면, 직전에 결정한 단음절을 대응하는 탁음의 단음절로 변경하도록 되어 있으면 좋다. 또한, 미리 정해진 요음을 의미하는 특정어라면, 직전에 결정한 단음절을 대응하는 요음의 단음절로 변경하도록 되어 있으면 좋다. 촉음 및 반탁음에서도 동일하다. 한편, 여기에서 언급하는「청음」이란, 탁음, 요음, 촉음 및 반탁음을 제외한 45개(통상)의 기본 단음절군을 의미한다.(3) In addition, the navigation device 20 is a method for analyzing a voice. When the input single-syllable voice is any of the sound of the sound, the sound of the sound, the sound of the sound, the sound of the sound, the sound of the sound, the sound of the sound. May be determined as a single syllable intended by the user. In addition, in that case, if the inputted voice is a specific word meaning, for example, a predetermined sound, it is sufficient to change the single syllable determined immediately before to the single sound syllable of the corresponding sound. In addition, as long as it is a specific word meaning a predetermined sound, the short syllable determined immediately before may be changed into the short syllable of the corresponding sound. The same applies to the tactile sound and the halftone sound. On the other hand, the term "hearing" as referred to herein means 45 basic monosyllable groups except for sound of sound, sound of sound, sound of sound, and sound of sound.

일반적으로, 어떤 단음절에서의 탁음과 탁음이 아닌 것을 인식하는 것은 상이한 단음절 끼리를 인식하는 것보다 어렵다. 따라서, 탁음과 탁음이 아닌 것을 한묶음으로 인식하고 나중에 탁음이나 요음인 것으로 변경하도록 되어 있다면, 인식율이 향상한다. 나중에 변경한다는 것은, 예를 들면 「てんてん(탱탱)」으로 이용자에 의해 음성이 입력된 경우에 직전에 입력된 단음절을 탁음으로 변경하도록 하 면 좋다. 요음, 촉음 및 반탁음에 대해서도 동일하다.In general, it is more difficult to recognize the sound of a single syllable and the sound of a non-syllable than to recognize different single syllables. Therefore, the recognition rate improves if it is supposed to recognize that the sound is not the sound of the sound and the sound as a bundle and later change to the sound of sound of the sound. To change later, for example, when a voice is input by the user as "てんてん", it is sufficient to change the short-syllable input immediately before to sound. The same applies to the note sound, the touch sound, and the halftone sound.

(4) 또한, 네비게이션 장치(20)는, 음성을 분석할 때의 방법으로서, 입력한 로마자 구두 음성에 대응하는 단음절 인식 특정어의 조합에 기초하여 이용자가 의도하는 단음절로서 결정하도록 되어 있어도 좋다. 구체적인 예를 들면, 「ケイ(케이)」(K),「エイ(에이)」(A)라고 이용자가 입력하면 「か(가)」로 인식하고,「ケイ(케이)」(K),「アイ(아이)」(I)라고 이용자가 발화하면 「き(키)」로 인식하는 네비게이션 장치이다. 또한, 50음표의 행번호와 열번호에 대응시켜서 「イチ(이치)」(1),「イチ(이치)」(1)라고 화자가 발성하면「あ(아)」로 인식하도록 되어 있어도 좋다.(4) In addition, as a method of analyzing a voice, the navigation device 20 may be determined as a single syllable intended by the user based on a combination of single-syllable recognition specific words corresponding to the input Roman oral voice. For example, if a user inputs "Kay" (K) and "Aay" (A), it is recognized as "Ka", and "Kay" (K), " It is a navigation device that recognizes "Ki" when a user speaks with "AI" (I). If the speaker speaks "Ich" (1) and "Ich" (1) in correspondence with the row and column numbers of the 50th note, it may be recognized as "あ".

이와 같은 음성 인식 장치는 인식 대상의 음성 길이 및 음성수가 증가되기 때문에 인식율이 향상한다. 또한, 단음절 전체에 대해서 단음절 인식용 특정어를 준비할 필요가 없기 때문에(상술한 예와 같이 「ケイ(케이)」를 カ(카)행의 전체 단음절을 인식할 경우에 이용할 수 있기 때문에), 사전의 용량이 삭감되고, 또한 이용자도 기억하는 단음절 인식용 특정어가 삭감되어 더욱 사용 편리성이 향상된다.Such a speech recognition apparatus improves the recognition rate because the speech length and the number of speeches of the recognition target are increased. In addition, since it is not necessary to prepare a specific word for single syllable recognition for the whole syllable (as in the above example, `` KE (K) '' can be used to recognize the entire syllable of the カ line). The capacity of the dictionary is reduced, and the single syllable recognition word that the user also remembers is reduced, which further improves usability.

(5) 또한, 네비게이션 장치(20)는, 음성을 입력한 경우에 그 음성 인식 처리의 종료를 의미하는 단어(예를 들면 "종료", "완료" 등)인 경우에는, 음성 인식 처리를 종료하도록 되어 있으면 좋다. 이렇게 되어 있다면, 이용자는 발화에 의해서도 음성 인식 처리를 종료할 수 있기 때문에 사용 편리성이 향상된다. (5) In addition, when a voice is input, the navigation device 20 terminates the voice recognition process when the word indicates the end of the voice recognition process (for example, "end", "complete", etc.). It is good to be. In this case, the user can finish the speech recognition process even by speaking, and thus the usability is improved.

(6) 상기 실시예4에서는, 후보 단음절 그 자체를 통지하는 대신에 확인용 단 어를 통지하도록 되어 있었지만, 미리 각종 장르 등에 의해 나눠진 확인용 단어의 사전을 음성 인식 관련 데이터 기억 매체에 기억시켜 두고, 이용자가 그 사전을 선택할 수 있도록 되어 있다면, 이용자의 취향에 따라 사전을 선택할 수 있기 때문에, 이용자는 자신이 원하는 확인용 단어를 이용할 수 있다. 또한, 추가적으로 확인용 단어를 이용자가 등록할 수 있게 되어 있다면, 이용자는 자신이 원하는 확인용 단어를 더욱 이용할 수 있다.(6) In the fourth embodiment, the confirmation word is notified instead of the candidate single syllable itself, but the dictionary of the confirmation word divided by various genres or the like is stored in the speech recognition data storage medium in advance. If the user can select the dictionary, the dictionary can be selected according to the user's taste, so that the user can use the word for confirmation desired by the user. In addition, if the user can additionally register the confirmation word, the user can further use the confirmation word desired by the user.

본 발명에 의하면, 화자에게 가능한한 사용하기 편리한 음성 인식 장치 등을 제공할 수 있다.
According to the present invention, it is possible to provide a speech recognition device or the like which is as easy to use for the speaker as possible.

Claims

In the speech recognition device for determining the single syllable intended by the speaker based on the voice input by the speaker,

Voice input means for inputting a voice spoken by the speaker;

Speech recognition means for specifying a candidate single syllable by analyzing the speech inputted by the input means;

Notifying means for notifying specified information;

Receiving means for receiving a speaker's operation; And

If the speech recognition means executes a notification process for notifying the notification means of the most probable candidate single syllable among the specific candidate single syllables, and the reception means accepts an operation meaning a decision from the speaker, the notification processing immediately before A confirmation process for determining the candidate single syllable notified by the speaker as a single syllable intended by the speaker, and when a new voice is input from the speaker to the voice input means and the voice recognition means specifies a candidate single syllable, execution of the notification process is performed. Control means for returning to notify said notifying means of the most probable candidate single syllable among the candidate single syllables

Speech recognition device comprising a.

Voice input means for inputting a voice spoken by the speaker;

Speech recognition means for analyzing a speech inputted by the input means to specify a candidate single syllable and to recognize a definite word meaning definite;

Notifying means for notifying specified information; And

The speech recognition means executes a notification process of notifying the notification means of the candidate single syllable having the highest potential among specific candidate syllables, and a new voice is input from the speaker to the speech input means so that the speech recognition means recognizes the final word. In one case, a determination process is performed to determine the candidate single syllable to be notified at the time of the previous notification processing as the single syllable intended by the speaker, and a new voice is input from the talker to the voice input means so that the voice recognition means replaces the candidate single syllable. In a specific case, the control means returns to the execution of the notification process and notifies the notification means of the candidate single syllable having the highest potential among the candidate single syllables.

Speech recognition device comprising a.

The method according to claim 1 or 2,

When the control means executes the notification process two or more times in succession without executing the determination process, the most probable one is excluded from candidate single syllables that notify the candidate single syllables notified in the past by the notification process. To notify candidate syllables

Speech recognition device.

The method of claim 3,

The control means does not exclude candidate single syllables notified by the notification process executed a predetermined number of times except for the immediately preceding one of the notification processes repeatedly executed without executing the confirmation process with respect to the exclusion.

Speech recognition device.

The method of claim 3,

The control means does not exclude candidate single syllables notified by the notification process that has been executed three times in the past among the notification processes repeatedly executed without executing the determination process with respect to the exclusion.

Speech recognition device.

The method of claim 4, wherein

The control means changes the predetermined number of times based on the operation of the speaker accepted by the receiving means.

Speech recognition device.

Voice input means for inputting a voice spoken by the speaker;

Receiving means for receiving a speaker's operation;

Storage means for classifying and storing a dictionary composed of a single syllable recognition specific word consisting of a plurality of syllables corresponding to each single syllable for each kind of the single syllable recognition specific word; And

Based on the operation of the speaker accepted by the receiving means, the dictionary stored in the storage means is selected, and from the selected dictionary, a specific word for single syllable recognition corresponding to the voice input by the voice input means is selected, and the selected dictionary is selected. Speech recognition means for determining a single syllable corresponding to the specific syllable for single syllable recognition as a single syllable intended by a speaker

Speech recognition device comprising a.

The method of claim 7, wherein

The specific syllable recognition word of the dictionary stored in the storage means can be registered by a speaker operating the receiving means in advance.

Speech recognition device.

Voice input means for inputting a voice spoken by the speaker; And

Speech recognition means for dividing a repetitive voice composed of the same single syllables input by the voice input means into voices for each single syllable, and determining the single syllable intended by the speaker based on the respective voices

Speech recognition device comprising a.

Voice input means for inputting a voice spoken by the speaker; And

If the single-syllable voice input by the voice input means is any one of the sound of the sound, the sound of the sound, the sound of the sound, and the sound of the sound, the sound corresponding to the sound of sound, sound of the sound, sound of the sound, and sound of the sound of the sound is determined by the speaker. When the voice input by the voice input means is a specific word meaning a predetermined sound, the single syllable determined immediately before is changed to a single syllable of the corresponding sound sound, and the voice input by the voice input means is a specific word meaning a predetermined sound. In the case, the short syllable determined immediately before is changed to the corresponding syllable single syllable, and when the voice input by the voice input means is a specific word meaning a predetermined tactile sound, the short syllable determined immediately before is changed to the corresponding tactile single syllable, and the voice input is performed. If the voice input by the means is a specific word meaning a predetermined halftone sound, the single syllable determined immediately before Speech recognition means for changing a monosyllable corresponding bantak negative

Speech recognition device comprising a.

Voice input means for inputting a voice spoken by the speaker;

Storage means for storing a dictionary to which a single syllable corresponds to a combination of a single syllable recognition word consisting of a plurality of syllables; And

Speech recognition means for analyzing the speech input by the speech input means to specify a combination of the specific syllables for single syllable recognition, and to determine the single syllable corresponding to the specific combination based on a dictionary stored by the storage means.

Speech recognition device comprising a.

Voice input means for inputting a voice spoken by the speaker;

Speech recognition means for specifying a candidate single syllable by analyzing the speech inputted by the speech input means;

Display means having a display area for displaying designated information, and having a sensor for sensing a speaker's manipulation of the surface of the display area together with a position in the display area; And

In the case where the speech recognition means specifies a plurality of candidate single syllables, each object corresponding to the candidate single syllables is displayed in the display area of the display means as the largest object group in the display area, and operated by the speaker to perform the Control means for determining the candidate single syllable corresponding to the object displayed at the position detected by the sensor as a single syllable intended by a speaker

Speech recognition device comprising a.

The method of claim 12,

The control means selects and displays on the display means only three of each object displayed on the display means in high order in accordance with the prospect of candidate single syllables corresponding to the object.

Speech recognition device.

The method of claim 12,

The control means for displaying each object displayed on the display means on the display means by changing a visual characteristic in accordance with the prospect of candidate single syllables corresponding to the object.

Speech recognition device.

The method of claim 12,

The control means, at the time of the determination, handles a specific range that specifies each object by a position sensed by the sensor more than a display range occupied by the object in the display area of the display means.

Speech recognition device.

Voice input means for inputting a voice spoken by the speaker;

In the display area of the display means, an object representing a single syllable is displayed in correspondence with 50 notes, and when the speech recognition means specifies a plurality of the candidate single syllables, the object in the display area corresponding to each candidate single syllable is displayed. Single syllables intended for the candidate single syllables represented by the object displayed by changing the visual characteristics from other objects, and not being limited to objects whose visual characteristics have been changed but displayed by the speaker operated at the position detected by the sensor. Control means to determine as

Speech recognition device comprising a.

In a speech recognition apparatus mounted on a vehicle that determines a single syllable intended by a speaker based on a voice input by a speaker,

Voice input means for inputting a voice spoken by the speaker;

Display means having a display area for displaying designated information and having a sensor for sensing a speaker's manipulation of the surface of the display area together with a position in the display area;

Vehicle state acquiring means for acquiring information on whether the vehicle is traveling;

The control means of the speech recognition apparatus according to claim 12;

The control means of the speech recognition apparatus according to claim 16; And

On the basis of the information acquired by the vehicle state acquiring means, when it is determined that the vehicle is running, the control means of the speech recognition apparatus according to claim 12 is functioned, and when it is determined that the vehicle is stopped, Main control means for functioning the control means of the described speech recognition apparatus

Speech recognition device comprising a.

Voice input means for inputting a voice spoken by the speaker;

Notifying means for notifying specified information;

Receiving means for receiving a speaker's operation;

Storage means for storing a dictionary composed of a confirmation word composed of a plurality of syllables corresponding to each syllable; And

The voice recognition means executes a notification process of notifying the notification means by the confirmation word stored by the storage means by the most likely candidate single syllable among the specific candidate single syllables, and performing an operation meaning a decision from the speaker. Control means for determining the candidate single syllable notified at the time of the notification process as the single syllable intended by the speaker when the receiving means accepts

Speech recognition device comprising a.

The method of claim 18,

The storage means stores the plurality of dictionaries by dividing the dictionaries by types of the words for confirmation,

The control means selects the dictionary stored by the storage means based on the operation of the speaker accepted by the reception means, and executes the notification process using the selected dictionary.

Speech recognition device.

The method of claim 18,

The confirmation word in the dictionary stored by the storage means can be registered by a speaker in advance by operating the receiving means.

Speech recognition device.

The method of claim 18,

Display means having a display area for displaying specified information, and having a sensor for sensing a speaker's manipulation of the surface of the display area together with a position in the display area,

The control means, when the speech recognition means specifies a plurality of candidate single syllables, displays the identification words corresponding to those candidate single syllables as objects on the display means as objects, and is operated by a speaker to be detected by the sensor. Determining the candidate single syllable corresponding to the object indicated at the position as a single syllable intended by the speaker.

Speech recognition device.

The method of claim 21,

The control means, when the speech recognition means specifies a plurality of candidate single syllables, displays the identification words corresponding to the candidate single syllables as an object on the display means, and then displays the identification means specified by the speech recognition means. Determining the candidate single syllable corresponding to a word as a single syllable intended by a speaker

Speech recognition device.

A program for causing a computer to function as at least one of speech recognition means or control means in the speech recognition apparatus according to any one of claims 1, 12 and 18.

Computer-readable storage medium for storing data.

A program for causing the computer to function as a speech recognition means in the speech recognition apparatus according to claim 6.

Computer-readable storage medium for storing data.

A program for causing the computer to function as a main control means in the speech recognition apparatus according to claim 17.

Computer-readable storage medium for storing data.

delete

A navigation device for performing predetermined navigation processing,

The voice recognition device according to any one of claims 1 to 22,

A single syllable group intended by the speaker acquired by the speech recognition apparatus is used for the navigation processing.

Navigation device.