JP2010054896A

JP2010054896A - Voice recognition device and voice recognition program

Info

Publication number: JP2010054896A
Application number: JP2008220931A
Authority: JP
Inventors: Toshiyuki Hatada; 敏行幡田
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2008-08-29
Filing date: 2008-08-29
Publication date: 2010-03-11

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice recognition device and a voice recognition program, for recognizing voice with high accuracy by using a word dictionary. <P>SOLUTION: When a visitor is detected, voice for a question about his or her name and the name of a person in charge is output from a reception device(S42). Voice of an answer to the question from the visitor is recognized by referring to a general purpose dictionary which shows correspondence between a word which belongs to a plurality of categories and information on its pronunciation, and a division dictionary which is created from the general purpose dictionary based on categories for each person in charge. A final result is obtained which is determined based on a recognition result obtained by referring to the general purpose dictionary, and a recognition result obtained by referring to the division dictionary (S44). When the final result indicates recognition failure (S45:YES), retry is requested to the visitor (S47). When the recognition is successful (S45:NO), the name of the person in charge included in the final result is specified (S52), his or her contact information is specified (S53), and notification is made(S54). <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、音声認識装置および音声認識プログラムに関する。より具体的には、単語辞書を用いて音声認識を行う音声認識装置および音声認識プログラムに関する。 The present invention relates to a speech recognition apparatus and a speech recognition program. More specifically, the present invention relates to a speech recognition apparatus and speech recognition program that perform speech recognition using a word dictionary.

従来、音声の音響的特徴を示す音響モデル、単語とその単語の発音に関する情報との対応を記述する単語辞書、および単語のつながりに関する制約を定義する言語モデルを用いた音声認識技術が知られている。このような技術では、単語辞書に含まれる単語数が増加するのに伴い、単語の組合せから構成される文の数は膨大となる。その結果、実際に発話されたのと異なる文が出力される誤認識の確率が上がる、すなわち認識精度が低下する傾向がある。 Conventionally, speech recognition technology using an acoustic model that shows the acoustic characteristics of speech, a word dictionary that describes the correspondence between a word and information related to the pronunciation of the word, and a language model that defines constraints on word connections has been known. Yes. In such a technique, as the number of words included in the word dictionary increases, the number of sentences composed of word combinations becomes enormous. As a result, the probability of misrecognition in which a sentence different from that actually spoken is output tends to increase, that is, the recognition accuracy tends to decrease.

そこで、予測された単語のみにより生成される文のみを認識候補として認識を行うことにより、誤認識を抑制する音声認識装置が提案されている（例えば、特許文献１）。
特開平６−１８０５９３号公報 Therefore, a speech recognition device that suppresses misrecognition by recognizing only a sentence generated only by a predicted word as a recognition candidate has been proposed (for example, Patent Document 1).
JP-A-6-180593

しかしながら、特許文献１に記載の音声認識装置では、予測された単語のみが音声認識に使用されるため、実際に発話された文が、予測された単語から構成されていない場合、すなわち、予測が外れた場合には、認識が不可能な場合がある。 However, in the speech recognition apparatus described in Patent Document 1, only predicted words are used for speech recognition. Therefore, when a sentence actually spoken is not composed of predicted words, that is, prediction is performed. If it falls off, recognition may not be possible.

本発明は、上記問題点を解決するためになされたものであり、単語辞書を用いて、精度の高い音声認識を行うことができる音声認識装置および音声認識プログラムを提供することを目的とする。 The present invention has been made to solve the above problems, and an object of the present invention is to provide a speech recognition apparatus and a speech recognition program capable of performing speech recognition with high accuracy using a word dictionary.

上記目的を達成するために、請求項１に係る発明の音声認識装置は、単語辞書を用いて音声の認識を行う音声認識装置であって、話者の音声の入力を受け付ける音声入力手段から入力された前記音声の音声情報を取得する音声情報取得手段と、複数のカテゴリの各々について、複数の単語と前記複数の単語の発音に関する情報との対応を示す辞書である汎用辞書を記憶する汎用辞書記憶手段に記憶された前記汎用辞書を用いて、前記音声情報取得手段によって取得された前記音声情報に基づいて前記音声の認識を行う第１の認識手段と、前記複数のカテゴリのうち、属する単語の数が所定量以下であるカテゴリである基準カテゴリに属する複数の単語を分類し、前記汎用辞書から各分類に関連する前記複数の単語と前記複数の単語の発音に関する情報との対応の一部をそれぞれ抽出することにより作成された複数の分割辞書を記憶する分割辞書記憶手段に記憶された前記分割辞書を用いて、前記音声情報取得手段によって取得された前記音声情報に基づいて前記音声の認識を行う第２の認識手段と、前記第１の認識手段の認識結果である第１の認識結果および前記第２の認識手段の認識結果である第２の認識結果に基づいて、前記音声の認識結果を決定する結果決定手段とを備えている。 To achieve the above object, a speech recognition device according to claim 1 is a speech recognition device that recognizes speech using a word dictionary, and is input from speech input means that receives input of a speaker's speech. General-purpose dictionary for storing a general-purpose dictionary that is a dictionary indicating correspondence between a plurality of words and information related to pronunciation of the plurality of words for each of a plurality of categories First recognition means for recognizing the voice based on the voice information acquired by the voice information acquisition means using the general-purpose dictionary stored in the storage means; and words belonging to the plurality of categories Classifying a plurality of words belonging to a reference category that is a category having a number equal to or less than a predetermined amount and relating the pronunciation of the plurality of words and the plurality of words related to each classification from the general dictionary The voice information acquired by the voice information acquisition unit using the division dictionary stored in the division dictionary storage unit that stores a plurality of division dictionaries created by extracting a part of the correspondence with the information A second recognition means for recognizing the voice based on the information; a first recognition result as a recognition result of the first recognition means; and a second recognition result as a recognition result of the second recognition means. And a result determining means for determining the speech recognition result.

請求項２に係る発明の音声認識装置では、請求項１に記載の発明の構成に加え、前記結果決定手段は、前記第１の認識手段による前記第１の認識結果、および前記第２の認識手段による前記第２の認識結果が得られた場合、前記第１の認識結果に含まれる前記基準カテゴリに属する単語と同一の単語を含む前記第２の認識結果を、前記認識結果として決定することを特徴とする。 In the speech recognition apparatus of the invention according to claim 2, in addition to the configuration of the invention of claim 1, the result determination means includes the first recognition result obtained by the first recognition means and the second recognition. When the second recognition result by the means is obtained, the second recognition result including the same word as the word belonging to the reference category included in the first recognition result is determined as the recognition result. It is characterized by.

請求項３に係る発明の音声認識装置では、請求項２に記載の発明の構成に加え、前記結果決定手段は、前記第１の認識結果に含まれる前記基準カテゴリに属する単語と同一の単語を含む前記第２の認識結果がない場合、前記第１の認識結果および前記第２の認識結果を、前記認識結果として決定することを特徴とする。 In the speech recognition device of the invention according to claim 3, in addition to the configuration of the invention according to claim 2, the result determining means selects a word that is the same as the word belonging to the reference category included in the first recognition result. When there is no second recognition result including, the first recognition result and the second recognition result are determined as the recognition result.

請求項４に係る発明の音声認識装置では、請求項１〜３のいずれかに記載の発明の構成に加え、前記結果決定手段は、前記第２の認識手段による前記第２の認識結果が得られなかった場合、前記第１の認識手段による前記第１の認識結果を前記認識結果として決定することを特徴とする。 In the speech recognition device of the invention according to claim 4, in addition to the configuration of the invention according to any one of claims 1 to 3, the result determination means obtains the second recognition result by the second recognition means. If not, the first recognition result by the first recognition means is determined as the recognition result.

請求項５に係る発明の音声認識装置では、請求項１〜４のいずれかに記載の発明の構成に加え、前記結果決定手段は、前記第１の認識手段による前記第１の認識結果が得られなかった場合、前記第２の認識手段による前記第２の認識結果を前記認識結果として決定することを特徴とする。 In the speech recognition apparatus of the invention according to claim 5, in addition to the configuration of the invention according to any one of claims 1 to 4, the result determining means obtains the first recognition result by the first recognition means. If not, the second recognition result by the second recognition means is determined as the recognition result.

請求項６に係る発明の音声認識装置では、請求項１〜５のいずれかに記載の発明の構成に加え、前記複数の分割辞書は、前記基準カテゴリに属する互いに異なる単語をそれぞれ含むことを特徴とする。 In the speech recognition device of the invention according to claim 6, in addition to the configuration of the invention according to any one of claims 1 to 5, the plurality of divided dictionaries respectively include different words belonging to the reference category. And

請求項７に係る発明の音声認識装置は、請求項２〜６のいずれかに記載の発明の構成に加え、前記第１の認識結果に含まれる前記基準カテゴリに属する単語と同一の単語を含む前記第２の認識結果がない場合、前記話者に再度発話を行うように指示する情報を、前記話者に対する情報を出力する情報出力手段に出力させる再発話指示手段をさらに備えている。 A speech recognition apparatus according to a seventh aspect of the invention includes the same word as the word belonging to the reference category included in the first recognition result, in addition to the configuration of the invention according to any one of the second to sixth aspects. In the case where there is no second recognition result, there is further provided a re-speech instruction means for outputting information for instructing the speaker to speak again to an information output means for outputting information for the speaker.

請求項８に係る発明の音声認識装置は、請求項１〜６のいずれかに記載の発明の構成に加え、前記話者に対する所定の質問を、前記話者に対する情報を出力する情報出力手段に出力させる質問提示手段をさらに備え、前記基準カテゴリは、前記質問提示手段によって提示される前記所定の質問に対する応答に関連するカテゴリであることを特徴とする。 In addition to the configuration of the invention according to any one of claims 1 to 6, the speech recognition device according to an eighth aspect of the present invention provides an information output unit that outputs a predetermined question to the speaker. Question output means for outputting is further provided, wherein the reference category is a category related to a response to the predetermined question presented by the question presentation means.

請求項９に係る発明の音声認識装置は、請求項７に記載の発明の構成に加え、前記話者に対する所定の質問を、前記情報出力手段に出力させる質問提示手段をさらに備え、前記基準カテゴリは、前記質問提示手段によって提示される前記所定の質問に対する応答に関連するカテゴリであることを特徴とする。 In addition to the configuration of the invention described in claim 7, the speech recognition apparatus of the invention according to claim 9 further includes question presentation means for causing the information output means to output a predetermined question for the speaker, and the reference category Is a category related to a response to the predetermined question presented by the question presenting means.

請求項１０に係る発明の音声認識装置は、請求項８または９に記載の発明の構成に加え、前記基準カテゴリに属する前記複数の単語の各々に対応する通知先を記憶する通知先記憶手段を参照して、前記認識結果決定手段によって決定された前記認識結果に含まれる前記基準カテゴリに属する単語に対応する前記通知先を特定する通知先特定手段と、前記通知先特定手段によって特定された前記通知先に対して、前記認識結果に基づく通知処理を行うことを特徴とする通知手段をさらに備えたことを特徴とする音声認識装置。 According to a tenth aspect of the present invention, in addition to the configuration of the eighth aspect of the invention, the speech recognition apparatus further includes a notification destination storage unit that stores a notification destination corresponding to each of the plurality of words belonging to the reference category. Referring to, the notification destination specifying means for specifying the notification destination corresponding to the word belonging to the reference category included in the recognition result determined by the recognition result determination means, and the specification specified by the notification destination specifying means A speech recognition apparatus, further comprising notification means for performing notification processing based on the recognition result for a notification destination.

請求項１１に係る発明の音声認識プログラムは、請求項１〜１０のいずれかに記載の音声認識装置の各種処理手段としてコンピュータを機能させることを特徴とする。 According to an eleventh aspect of the present invention, a speech recognition program causes a computer to function as various processing means of the speech recognition apparatus according to any one of the first to tenth aspects.

請求項１に係る発明の音声認識装置によれば、複数の単語に関する汎用辞書を使用した音声認識の結果（第１の認識結果）と、汎用辞書の一部が抽出され作成された分割辞書を使用した音声認識の結果（第２の認識結果）に基づいて、認識結果が決定される。汎用辞書を用いた第１の認識結果は、全体としての信頼性は高いとはいえないが、属する単語の数が所定量以下である基準カテゴリの単語の信頼性はある程度高いと考えられる。一方、分割辞書は、基準カテゴリに属する単語をさらに分類した結果に従って汎用辞書の一部のみが抽出されたものであるから、含まれる単語数が絞り込まれている。よって、発話内容が分割辞書の単語に対応していれば、分割辞書を用いた第２の認識結果は、全体としての信頼性が高いと考えられる。したがって、汎用辞書を用いた第１の認識結果と分割辞書を用いた第２の認識結果との両方に基づいて話者の音声の認識結果を決定することにより、最終的に精度の高い認識結果を得ることができる。 According to the speech recognition apparatus of the first aspect of the present invention, a speech recognition result (first recognition result) using a general dictionary for a plurality of words and a divided dictionary in which a part of the general dictionary is extracted and created The recognition result is determined based on the used speech recognition result (second recognition result). Although the first recognition result using the general-purpose dictionary cannot be said to have high reliability as a whole, it is considered that the reliability of the words in the reference category in which the number of belonging words is equal to or less than a predetermined amount is high to some extent. On the other hand, since the divided dictionary is obtained by extracting only a part of the general-purpose dictionary according to the result of further classifying the words belonging to the reference category, the number of included words is narrowed down. Therefore, if the utterance content corresponds to a word in the divided dictionary, the second recognition result using the divided dictionary is considered to be highly reliable as a whole. Therefore, by determining the speech recognition result of the speaker based on both the first recognition result using the general-purpose dictionary and the second recognition result using the divided dictionary, a highly accurate recognition result is finally obtained. Can be obtained.

請求項２に係る発明の音声認識装置によれば、汎用辞書を用いた第１の認識結果において、属する単語の数が所定量以下である基準カテゴリの単語の信頼性はある程度高いと考えられる。一方、単語数が絞られた分割辞書を用いた第２の認識結果は、発話内容が分割辞書の単語に対応していれば、全体としての信頼性は第１の認識結果よりも高いと考えられる。したがって、請求項１に記載の発明の効果に加え、第１の認識結果で信頼性の高い単語を基準として、これと同じ単語を含む第２の認識結果を選択することにより、全体としてより信頼性の高い認識結果を得ることができる。 According to the speech recognition apparatus of the second aspect of the present invention, it is considered that the reliability of the words in the reference category in which the number of belonging words is equal to or less than a predetermined amount in the first recognition result using the general-purpose dictionary is somewhat high. On the other hand, the second recognition result using the divided dictionary in which the number of words is reduced is considered to be higher in overall reliability than the first recognition result if the utterance content corresponds to the words in the divided dictionary. It is done. Therefore, in addition to the effect of the invention described in claim 1, by selecting a second recognition result including the same word as a reference based on a word having high reliability in the first recognition result, it is more reliable as a whole. A highly recognizable recognition result can be obtained.

請求項３に係る発明の音声認識装置によれば、汎用辞書を用いた第１の認識結果において、属する単語の数が所定量以下である基準カテゴリの単語の信頼性はある程度高いと考えられる。一方、単語数が絞られた分割辞書を用いた第２の認識結果は、全体としての信頼性は第１の認識結果よりも高いと考えられる。したがって、請求項２に記載の発明の効果に加え、基準カテゴリの単語の認識結果が互いに異なる場合、両方を認識結果として採用することにより、一部は正しい認識結果である可能性が高くなる。 According to the speech recognition apparatus of the invention of claim 3, in the first recognition result using the general-purpose dictionary, it is considered that the reliability of the words in the reference category in which the number of belonging words is equal to or less than a predetermined amount is high to some extent. On the other hand, the second recognition result using the divided dictionary in which the number of words is narrowed is considered to have higher reliability as a whole than the first recognition result. Therefore, in addition to the effect of the invention described in claim 2, when the recognition results of the words of the reference category are different from each other, it is highly possible that a part of the recognition results is a correct recognition result by adopting both.

請求項４に係る発明の音声認識装置によれば、第２の認識結果が得られなかった場合、すなわち、単語数が絞られた分割辞書での認識が失敗した場合には、汎用辞書を用いた第１の認識結果が認識結果として決定される。したがって、請求項１〜３のいずれかに記載の発明の効果に加え、発話内容が分割辞書には対応していない場合でも、認識不可能となる可能性を低減できる。 According to the speech recognition device of the invention according to claim 4, when the second recognition result is not obtained, that is, when the recognition with the divided dictionary with the reduced number of words fails, the general dictionary is used. The first recognition result is determined as the recognition result. Therefore, in addition to the effect of the invention according to any one of claims 1 to 3, it is possible to reduce the possibility that recognition is impossible even when the utterance content does not correspond to the divided dictionary.

請求項５に係る発明の音声認識装置によれば、第１の認識結果が得られなかった場合には、分割辞書を用いた第２の認識結果が認識結果として決定される。したがって、請求項１〜４のいずれかに記載の発明の効果に加え、認識不可能となる可能性を低減できる。 According to the speech recognition device of the fifth aspect of the present invention, when the first recognition result is not obtained, the second recognition result using the divided dictionary is determined as the recognition result. Therefore, in addition to the effect of the invention according to any one of claims 1 to 4, the possibility of being unrecognizable can be reduced.

請求項６に係る発明の音声認識装置によれば、複数の分割辞書が、基準カテゴリに属する互いに異なる単語をそれぞれ含むように作成されている。したがって、請求項１〜５のいずれかに記載の発明の効果に加え、各分割辞書に含まれる単語数を最小とすることができるため、さらに認識精度を高めることができる。また、分割辞書が互いに異なる単語をそれぞれ含むように作成されていることにより、その単語に特有の認識結果を得ることができる。 According to the speech recognition device of the invention of claim 6, the plurality of divided dictionaries are created so as to include different words belonging to the reference category, respectively. Therefore, in addition to the effect of the invention according to any one of claims 1 to 5, since the number of words included in each divided dictionary can be minimized, the recognition accuracy can be further improved. In addition, since the divided dictionaries are created so as to include different words, recognition results specific to the words can be obtained.

請求項７に係る発明の音声認識装置によれば、第１の認識結果に含まれる基準カテゴリに属する単語と同一の単語を含む前記第２の認識結果がない場合、話者に再度発話を行うように指示する情報が、情報出力手段に出力される。したがって、請求項２〜６のいずれかに記載の発明の効果に加え、信頼性が不明な認識結果ではなく、正しい認識結果を得られる可能性が高くなる。 According to the speech recognition device of the invention according to claim 7, when there is no second recognition result including the same word as the word belonging to the reference category included in the first recognition result, the speaker is uttered again. Information for instructing to output is output to the information output means. Therefore, in addition to the effect of the invention according to any one of claims 2 to 6, there is a high possibility that a correct recognition result can be obtained instead of a recognition result whose reliability is unknown.

請求項８に係る発明の音声認識装置では、基準カテゴリは、話者に対する所定の質問への応答に関するカテゴリである。話者に所定の質問が提示される場合、質問に対する応答は、通常、決まったカテゴリに属する単語を含むものとなる。したがって、請求項１〜６のいずれかに記載の発明の効果に加え、話者による応答の認識精度を高めることができる。 In the speech recognition device according to the eighth aspect, the reference category is a category related to a response to a predetermined question for the speaker. When a predetermined question is presented to the speaker, the response to the question typically includes words that belong to a fixed category. Therefore, in addition to the effect of the invention according to any one of claims 1 to 6, the recognition accuracy of the response by the speaker can be increased.

請求項９に係る発明の音声認識装置では、基準カテゴリは、話者に対する所定の質問への応答に関するカテゴリである。話者に所定の質問が提示される場合、質問に対する応答は、通常、決まったカテゴリに属する単語を含むものとなる。したがって、請求項７に記載の発明の効果に加え、話者による応答の認識精度を高めることができる。 In the speech recognition device according to the ninth aspect, the reference category is a category related to a response to a predetermined question for the speaker. When a predetermined question is presented to the speaker, the response to the question typically includes words that belong to a fixed category. Therefore, in addition to the effect of the invention according to the seventh aspect, the recognition accuracy of the response by the speaker can be improved.

請求項１０に係る発明の音声認識装置によれば、認識結果に含まれる基準カテゴリに属する単語を基にして、対応する通知先が特定され、認識結果に基づく情報が送信される。したがって、請求項８または９に記載の発明の効果に加え、通知を受けた者は、直接話者と対話しなくても、話者に関する情報を知ることができる。 According to the speech recognition device of the invention of claim 10, the corresponding notification destination is specified based on the words belonging to the reference category included in the recognition result, and information based on the recognition result is transmitted. Therefore, in addition to the effect of the invention described in claim 8 or 9, the person who has received the notification can know information about the speaker without directly interacting with the speaker.

請求項１１に係る発明の音声認識プログラムは、請求項１〜１０のいずれかに記載の音声認識装置の各種処理手段としてコンピュータを機能させることができる。したがって、請求項１〜１０のいずれかに記載の発明の効果を奏することができる。 The speech recognition program of the invention according to claim 11 can cause a computer to function as various processing means of the speech recognition apparatus according to any one of claims 1 to 10. Therefore, the effect of the invention according to any one of claims 1 to 10 can be achieved.

以下、本発明を具現化した実施の形態について、図面を参照して説明する。なお、参照する図面は、本発明が採用しうる技術的特徴を説明するために用いられるものであり、記載されている装置の構成、各種処理のフローチャートなどは、それのみに限定する趣旨ではなく、単なる説明例である。 DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, embodiments of the invention will be described with reference to the drawings. The drawings to be referred to are used for explaining the technical features that can be adopted by the present invention, and the configuration of the apparatus and the flowcharts of various processes described are not intended to be limited thereto. This is just an illustrative example.

まず、図１〜図８を参照して、本実施形態に係る音声認識システムの一例として、受付システム１の全体構成、ならびに、受付システム１の構成要素である音声認識装置としての受付装置１０およびユーザ端末２０の構成について、順に説明する。 First, referring to FIG. 1 to FIG. 8, as an example of a speech recognition system according to the present embodiment, the entire configuration of the reception system 1, and the reception device 10 as a speech recognition device that is a component of the reception system 1 and The configuration of the user terminal 20 will be described in order.

最初に、図１を参照して、受付システム１の全体構成の概略を説明する。図１は、受付システム１の概略構成を示すシステム構成図である。受付システム１は、例えば、ビルや会社に設置され、来訪者に対する受付業務を行うシステムである。なお、本実施形態では、受付システム１は、会社５に設置されているものとして説明する。 First, an outline of the overall configuration of the reception system 1 will be described with reference to FIG. FIG. 1 is a system configuration diagram showing a schematic configuration of the reception system 1. The reception system 1 is a system that is installed in, for example, a building or a company and performs reception work for visitors. In the present embodiment, the reception system 1 is described as being installed in the company 5.

図１に示すように、受付システム１は、受付装置１０および複数のユーザ端末２０を備えており、これらはＬＡＮ９によって相互に接続されている。受付装置１０およびユーザ端末２０は、パーソナルコンピュータ等の汎用のコンピュータであってもよいし、専用の装置であってもよい。なお、本実施形態では、専用の装置であるものとして説明する。また、ＬＡＮ９はその他のネットワークであってもよく、有線、無線の別も問わない。 As shown in FIG. 1, the reception system 1 includes a reception device 10 and a plurality of user terminals 20, which are connected to each other by a LAN 9. The receiving device 10 and the user terminal 20 may be general-purpose computers such as personal computers, or may be dedicated devices. In the present embodiment, description will be made assuming that the device is a dedicated device. Further, the LAN 9 may be another network, whether wired or wireless.

ここで、本実施形態の受付システム１全体で行われる処理の手順について、簡単に説明する。会社５への来訪者が、入口付近に設けられた受付装置１０に近づくと、受付装置１０が備える人感センサ１０９（図２参照）によって検知される。ここで受付装置１０から来訪者および来訪者の応対担当者（以下、単に担当者という）の名前に関する質問が発せられるため、来訪者は受付装置１０に対して応答する。すると、来訪者の応答が音声認識され、認識結果に基づいて、担当者が特定される。そして、受付装置１０から、担当者が使用するユーザ端末２０に対して、来訪者が到着したことを知らせる通知が送信される。このように、受付装置１０は、会社５における受付業務を自動的に行うことができる。なお、ここまでで簡単に説明した処理については、後で詳述する。 Here, a procedure of processing performed in the entire reception system 1 of the present embodiment will be briefly described. When a visitor to the company 5 approaches the reception device 10 provided near the entrance, it is detected by a human sensor 109 (see FIG. 2) provided in the reception device 10. Here, since the question regarding the name of the visitor and the person in charge of the visitor (hereinafter simply referred to as the person in charge) is issued from the reception device 10, the visitor responds to the reception device 10. Then, the visitor's response is recognized by voice, and the person in charge is specified based on the recognition result. And the notification which notifies that the visitor arrived is transmitted from the reception apparatus 10 with respect to the user terminal 20 which a person in charge uses. In this way, the reception device 10 can automatically perform reception work in the company 5. The process briefly described so far will be described in detail later.

次に、図２〜図７を参照して、受付装置１０の構成について説明する。図２は、受付装置１０の電気的構成を示すブロック図である。図３は、ハードディスク装置１５が備える記憶エリアの説明図である。図４は、言語モデル１５２１の説明図である。図５は、汎用辞書１５３１の説明図である。図６は、分割辞書１５４１の説明図である。図７は、社員データベース１５５１の説明図である。 Next, the configuration of the reception device 10 will be described with reference to FIGS. FIG. 2 is a block diagram showing an electrical configuration of the accepting device 10. FIG. 3 is an explanatory diagram of a storage area provided in the hard disk device 15. FIG. 4 is an explanatory diagram of the language model 1521. FIG. 5 is an explanatory diagram of the general-purpose dictionary 1531. FIG. 6 is an explanatory diagram of the division dictionary 1541. FIG. 7 is an explanatory diagram of the employee database 1551.

まず、図２を参照して、受付装置１０の電気的構成について説明する。図２に示すように、受付装置１０は、ＣＰＵ１０１と、ＣＰＵ１０１に各々接続されたＲＯＭ１０２およびＲＡＭ１０３を含む制御回路部１００を備えている。ＣＰＵ１０１には、入出力（Ｉ／Ｏ）インタフェイス１０４が接続されている。Ｉ／Ｏインタフェイス１０４には、ディスプレイ１０６、マイク１０７、スピーカ１０８、人感センサ１０９、通信装置１１０、およびハードディスク装置（ＨＤＤ）１５０が接続されている。 First, the electrical configuration of the accepting device 10 will be described with reference to FIG. As shown in FIG. 2, the receiving apparatus 10 includes a CPU 101 and a control circuit unit 100 including a ROM 102 and a RAM 103 connected to the CPU 101. An input / output (I / O) interface 104 is connected to the CPU 101. A display 106, a microphone 107, a speaker 108, a human sensor 109, a communication device 110, and a hard disk device (HDD) 150 are connected to the I / O interface 104.

ＣＰＵ１０１は、受付装置１０全体の制御を司る。ＲＯＭ１０２は、受付装置１０の基本的な動作に必要なプログラムやそのための設定値を記憶している。ＣＰＵ１０１は、ＲＯＭ１０２や、ＨＤＤ１５０に記憶されたプログラムに従って、受付装置１０の動作を制御する。ＲＡＭ１０３は、各種データを一時的に記憶するための記憶装置である。 The CPU 101 controls the entire receiving device 10. The ROM 102 stores programs necessary for basic operation of the receiving device 10 and setting values for the programs. The CPU 101 controls the operation of the receiving apparatus 10 according to programs stored in the ROM 102 and the HDD 150. The RAM 103 is a storage device for temporarily storing various data.

ディスプレイ１０６は、液晶パネルと駆動回路を備えた表示機器である。ディスプレイ１０６には、例えば、音声案内業務を行っている想定の人物やキャラクタの画像、スピーカ１０８から発話される音声に対応する文章等が表示される。マイク１０７は、音声が入力される機器であり、入力された音声を音声データに変換して出力する。スピーカ１０８は、入力された音声データを音声に変換して出力する機器である。なお、本実施形態では、マイク１０７は、本発明の「音声入力手段」に相当し、スピーカ１０８およびディスプレイ１０６は、「情報出力手段」に相当する。 The display 106 is a display device including a liquid crystal panel and a drive circuit. On the display 106, for example, an image of an assumed person or character who is engaged in voice guidance work, a sentence corresponding to voice uttered from the speaker 108, or the like is displayed. The microphone 107 is a device to which sound is input, and converts the input sound into sound data and outputs the sound data. The speaker 108 is a device that converts input sound data into sound and outputs the sound. In the present embodiment, the microphone 107 corresponds to “audio input means” of the present invention, and the speaker 108 and the display 106 correspond to “information output means”.

人感センサ１０９は、受付装置１０の正面の所定領域内にある人体、すなわち来訪者を検知するセンサである。人感センサ１０９として、例えば、人体に対して赤外線を発射し、反射された赤外線の受光量の変化に基づいて人体を検知する赤外線センサ等、周知の人感センサを採用することができる。通信装置１１０は、ＬＡＮ９を介して、ユーザ端末２０等の外部機器との間でデータの送受信を行う装置である。 The human sensor 109 is a sensor that detects a human body in a predetermined area in front of the reception device 10, that is, a visitor. As the human sensor 109, for example, a well-known human sensor such as an infrared sensor that emits infrared rays to a human body and detects the human body based on a change in the amount of received infrared light can be employed. The communication device 110 is a device that transmits and receives data to and from an external device such as the user terminal 20 via the LAN 9.

図３〜図６を参照して、ＨＤＤ１５０について説明する。記憶装置であるＨＤＤ１５０には、図３に示すように、複数の記憶エリアが設けられている。複数の記憶エリアは、例えば、音響モデル記憶エリア１５１、言語モデル記憶エリア１５２、汎用辞書記憶エリア１５３、分割辞書記憶エリア１５４、社員データベース（ＤＢ）記憶エリア１５５、およびプログラム記憶エリア１５６を含む。なお、本実施形態において、汎用辞書記憶エリア１５３は、本発明の「汎用辞書記憶手段」に相当し、分割辞書記憶エリア１５４は、「分割辞書記憶手段」に相当し、社員ＤＢ記憶エリア１５５は、「通知先記憶手段」に相当する。 The HDD 150 will be described with reference to FIGS. As shown in FIG. 3, the HDD 150 as a storage device is provided with a plurality of storage areas. The plurality of storage areas include, for example, an acoustic model storage area 151, a language model storage area 152, a general dictionary storage area 153, a divided dictionary storage area 154, an employee database (DB) storage area 155, and a program storage area 156. In this embodiment, the general dictionary storage area 153 corresponds to “general dictionary storage means” of the present invention, the divided dictionary storage area 154 corresponds to “divided dictionary storage means”, and the employee DB storage area 155 Corresponds to “notification destination storage means”.

音響モデル記憶エリア１５１には、音声認識処理で使用される周知の音響モデル（図示外）が記憶されている。なお、詳細な説明は省略するが、音響モデルは、音声の音響的特徴（例えば、周波数特性）を統計的にモデル化したもので、例えば、母音、子音のそれぞれについて、特徴量と対応する音素とで表現されている。 The acoustic model storage area 151 stores a known acoustic model (not shown) used in the speech recognition process. Although detailed description is omitted, the acoustic model is a statistical model of the acoustic characteristics (for example, frequency characteristics) of speech. For example, for each vowel and consonant, the phoneme corresponding to the feature amount is used. It is expressed with.

言語モデル記憶エリア１５２には、音声認識処理で使用される言語モデル１５２１（図４参照）が記憶されている。言語モデルは、単語のつながり、すなわち単語間の結びつきに関する制約を定義するものである。代表的な言語モデルとして、例えば、単語間の結びつきを文法（グラマー）で記述する記述文法モデルと、単語間の結びつきを確率で定義する統計モデル（例えば、単語Ｎ−ｇｒａｍ）がある。 The language model storage area 152 stores a language model 1521 (see FIG. 4) used in the speech recognition process. The language model defines restrictions on word connections, that is, connections between words. As typical language models, there are, for example, a description grammar model that describes the connection between words in grammar (grammar), and a statistical model (for example, word N-gram) that defines the connection between words with probability.

これらの代表的な言語モデルのうち、記述文法モデルは、想定される文のパターンを、受理可能な文のパターンとして、予め人手で記述して定義するものである。したがって、記述される受理可能な文のパターンの数には限りがあるが、発話が、定義された文のパターンに対応していれば、高精度な音声認識が可能である。本実施形態では、会社５を訪問してきた人物の発話という、比較的限られた状況での発話について音声認識が行われるため、言語モデルとして記述文法モデルを採用している。しかしながら、言語モデルは必ずしも記述文法モデルである必要はなく、統計モデルを使用してもよい。 Among these typical language models, the description grammar model is defined by manually describing an assumed sentence pattern in advance as an acceptable sentence pattern. Therefore, although the number of acceptable sentence patterns to be described is limited, if the utterance corresponds to the defined sentence pattern, high-accuracy speech recognition is possible. In this embodiment, since speech recognition is performed for an utterance in a relatively limited situation such as an utterance of a person who has visited the company 5, a description grammar model is adopted as a language model. However, the language model is not necessarily a descriptive grammar model, and a statistical model may be used.

言語モデル記憶エリア１５２（図３参照）には、受付装置１０と来訪者との対話で想定される様々な場面に応じて予め作成された受理可能な文のパターンが、言語モデルとして記憶されている。例えば、図４に示す言語モデル１５２１は、来訪者および担当者の名前に関する質問に対する応答として受理可能な文のパターンを示す例である。文のパターンは、例えば、文を構成する単語が属するカテゴリを順に並べることによって定義することができる。 In the language model storage area 152 (see FIG. 3), patterns of acceptable sentences created in advance according to various situations assumed in the dialogue between the receiving device 10 and the visitor are stored as language models. Yes. For example, the language model 1521 shown in FIG. 4 is an example of a sentence pattern that can be accepted as a response to a question regarding the names of visitors and persons in charge. A sentence pattern can be defined, for example, by sequentially arranging categories to which words constituting a sentence belong.

図４の例では、カテゴリ「会社名」、「接続」、「来訪者名」、「末尾１」、「担当者名」、「敬称」、および「末尾２」にそれぞれ属する単語が順に連結された文、ならびに、カテゴリ「不要語」、「会社名」、「接続」、「来訪者名」、「末尾１」、「担当者名」、「敬称」、および「末尾２」にそれぞれ属する単語が順に連結された文の２種類を含む、複数の受理可能な文のパターンが定義されている。 In the example of FIG. 4, the words belonging to the categories “company name”, “connection”, “visitor name”, “end 1”, “person in charge”, “honorific name”, and “end 2” are sequentially connected. And words belonging to the categories "unnecessary word", "company name", "connection", "visitor name", "end 1", "person in charge", "honorific title", and "end 2" A plurality of acceptable sentence patterns are defined, including two kinds of sentences connected in order.

なお、図４には、来訪者および担当者の名前に関する質問に対する応答に対応する言語モデル１５２１のみが例示されているが、言語モデル記憶エリア１５２には、他にも、例えば、用件に関する質問に対する応答の場面等、様々な場面に対応して予め作成された複数の言語モデルが記憶されていてもよい。 Note that FIG. 4 illustrates only the language model 1521 corresponding to the response to the question regarding the names of the visitor and the person in charge, but the language model storage area 152 also includes, for example, questions related to the business. A plurality of language models created in advance corresponding to various scenes such as a scene of a response to may be stored.

汎用辞書記憶エリア１５３には、音声認識に使用される汎用辞書１５３１（図５参照）が記憶されている。図５に示すように、汎用辞書１５３１は、受付装置１０が発する質問に対する応答に関連する複数のカテゴリ毎に、そのカテゴリに属する単語と、各単語の発音に関する情報とを記述するものである。なお、図５では、表記を簡略化するため、発音に関する情報として片仮名が使用されているが、実際には、汎用辞書では音素列として記憶されている。例えば、音素列にローマ字表記を用いる場合は、単語「あの」に対応する発音は、図５では片仮名で「アノ」と記載されているが、実際には「ａｎｏ」と記憶されている。 The general dictionary storage area 153 stores a general dictionary 1531 (see FIG. 5) used for speech recognition. As shown in FIG. 5, the general-purpose dictionary 1531 describes words belonging to a category and information related to pronunciation of each word for each of a plurality of categories related to a response to a question issued by the receiving device 10. In FIG. 5, katakana is used as information related to pronunciation in order to simplify the notation, but actually, it is stored as a phoneme string in the general-purpose dictionary. For example, when the Roman alphabet is used for the phoneme string, the pronunciation corresponding to the word “that” is described as “Ano” in katakana in FIG. 5, but is actually stored as “ano”.

図５は、カテゴリとして、「不要語」、「会社名」、「接続」、「来訪者名」、「末尾１」、「担当者名」、「敬称」、および「末尾２」を含む汎用辞書の例を示している。つまり、前述した言語モデル１５２１で定義されている文に含まれるカテゴリについて、各カテゴリに属する単語およびその発音が記述された辞書である。汎用辞書１５３１中のカテゴリには、カテゴリに属するどの単語を使用しても文全体としての意味には影響しないカテゴリ（以下、非重要カテゴリという）と、それ以外の、カテゴリに属する単語のうちどれを使用するかによって文全体の意味が変化するカテゴリ（以下、重要カテゴリという）とがある。 FIG. 5 shows general categories including “unnecessary words”, “company name”, “connection”, “visitor name”, “end 1”, “person in charge”, “honorific name”, and “end 2” as categories. An example of a dictionary is shown. That is, for a category included in the sentence defined in the language model 1521 described above, this is a dictionary in which words belonging to each category and their pronunciation are described. For the category in the general dictionary 1531, any of the words belonging to the category (hereinafter referred to as a non-important category) that does not affect the meaning of the whole sentence even if any word belonging to the category is used. There are categories where the meaning of the whole sentence changes (hereinafter referred to as important categories) depending on whether or not is used.

図５の汎用辞書１５３１の例では、カテゴリ「不要語」、「接続」、「末尾１」、「敬称」、および「末尾２」は非重要カテゴリであり、カテゴリ「会社名」、「来訪者名」および「担当者名」は重要カテゴリである。 In the example of the general dictionary 1531 in FIG. 5, the categories “unnecessary words”, “connection”, “end 1”, “title”, and “end 2” are non-important categories, and the categories “company name”, “visitor” “Name” and “Name of person in charge” are important categories.

汎用辞書１５３１において、重要カテゴリである「会社名」に属する単語数は５，０００、「来訪者名」に属する単語数は１０，０００、「担当者名」に属する単語数は１５である。つまり、カテゴリ「担当者名」に比べて、「会社名」および「来訪者名」の単語数は著しく多い。会社５に所属する担当者の数は限られているが、担当者１名に対して、例えば、面会に来る取引先の会社の数は複数あり、さらにこれらの会社に所属する人物が複数名いる場合もある。よって、汎用辞書１５３１では、すべての単語を網羅するために、このように「会社名」および「来訪者名」の単語数が非常に多くなる。なお、図５では、各カテゴリに属する単語の数が示されているのは単に説明目的であり、実際の汎用辞書１５３１には、単語数は含まれなくてよい。 In the general-purpose dictionary 1531, the number of words belonging to the important category “company name” is 5,000, the number of words belonging to “visitor name” is 10,000, and the number of words belonging to “person in charge” is 15. That is, the number of words of “company name” and “visitor name” is significantly larger than the category “person in charge”. The number of persons in charge of company 5 is limited, but for one person in charge, for example, there are a number of business partners who come to the meeting, and there are also a number of persons who belong to these companies. There may be. Therefore, in the general dictionary 1531, in order to cover all the words, the number of words of “company name” and “visitor name” becomes very large in this way. In FIG. 5, the number of words belonging to each category is shown for illustrative purposes only, and the actual general-purpose dictionary 1531 may not include the number of words.

なお、前述したように、言語モデル１５２１（図４参照）以外に、様々な場面に応じて予め作成された他の言語モデルがある場合には、汎用辞書記憶エリア１５３に記憶される汎用辞書は、他の言語モデルに対応するカテゴリ毎の単語と発音に関する情報を含むものとなる。 As described above, in addition to the language model 1521 (see FIG. 4), when there are other language models created in advance according to various situations, the general-purpose dictionary stored in the general-purpose dictionary storage area 153 is This includes information on words and pronunciations for each category corresponding to other language models.

分割辞書記憶エリア１５４（図３参照）には、音声認識に使用される複数の分割辞書（図６参照）が記憶されている。分割辞書は、前述した汎用辞書１５３１に含まれる重要カテゴリのうち、属する単語数が所定量以下のカテゴリに属する単語を分類し、汎用辞書１５３１から、各分類に関連する他のカテゴリの単語をそれぞれ抽出することにより作成することができる。以下の説明では、汎用辞書１５３１に含まれる重要カテゴリのうち、分類の基準とされる、属する単語数が所定量以下のカテゴリを「基準カテゴリ」というものとする。 In the divided dictionary storage area 154 (see FIG. 3), a plurality of divided dictionaries (see FIG. 6) used for speech recognition are stored. The divided dictionary classifies words belonging to a category having a predetermined number of words or less among the important categories included in the general-purpose dictionary 1531 described above. From the general-purpose dictionary 1531, words of other categories related to each classification are classified. It can be created by extracting. In the following description, among the important categories included in the general-purpose dictionary 1531, a category that is used as a classification criterion and has a predetermined number of words or less is referred to as a “reference category”.

基準カテゴリは、属する単語の数が所定数以下のカテゴリであってもよいし、他のカテゴリに属する単語の数に対する比率が所定値以下のカテゴリであってもよい。属する単語数が最小の重要カテゴリとすることがより好ましい。また、属する単語が、各々、他のカテゴリの単語と関連するカテゴリであることが好ましい。さらに、受付装置１０から来訪者に対して提示される質問に対する応答に含まれるカテゴリであることが望ましい。この場合、各分割辞書に含まれる単語数を最小とすることができるため、さらに認識精度を高めることができる。また、分割辞書が互いに異なる単語をそれぞれ含むように作成されていることにより、その単語に特有の認識結果を得ることができる。 The reference category may be a category having the number of words belonging to a predetermined number or less, or a category having a ratio with respect to the number of words belonging to another category being a predetermined value or less. It is more preferable to use an important category having the smallest number of words. Moreover, it is preferable that the word to which it belongs is a category relevant to the word of another category, respectively. Furthermore, it is desirable that the category is included in the response to the question presented to the visitor from the reception device 10. In this case, since the number of words included in each divided dictionary can be minimized, the recognition accuracy can be further improved. In addition, since the divided dictionaries are created so as to include different words, recognition results specific to the words can be obtained.

本実施形態では、汎用辞書１５３１に含まれる重要カテゴリ「会社名」、「来訪者名」および「担当者名」のうち、最も単語数が少ないのは「担当者名」である。よって、汎用辞書１５３１のカテゴリ「担当者名」が基準カテゴリとされている。そして、この基準カテゴリに属する単語（担当者名）が担当者１名毎に分類され、各担当者が関係する会社および来訪者に応じて、カテゴリ「会社名」および「来訪者名」に属する単語が、各分類に振り分けられている。そして、各分類に属する単語とその発音に関する情報を互いに対応づけることによって分割辞書が作成され、分割辞書記憶エリア１５４に記憶されている。 In the present embodiment, among the important categories “company name”, “visitor name”, and “person in charge” included in the general dictionary 1531, “person in charge” has the smallest number of words. Therefore, the category “name of person in charge” of the general dictionary 1531 is set as the reference category. The words (name of the person in charge) belonging to this reference category are classified for each person in charge, and belong to the categories “company name” and “visitor name” according to the company and visitor to which each person in charge relates. Words are assigned to each category. A divided dictionary is created by associating words belonging to each category with information related to their pronunciation, and stored in the divided dictionary storage area 154.

例えば、図６は、会社５の担当者名が「加藤」という分類に対して作成された分割辞書１５４１の例である。担当者「加藤」は、名前が「Ａ工業」および「Ｂ運送」である２つの会社と、名前が「永光」および「囲」である２名の来訪者の担当であるために、これらの単語が汎用辞書１５３１から選択され、カテゴリ「会社名」および「来訪者名」に属する単語として分割辞書１５４１に含められる。なお、非重要カテゴリの単語は、どの分類にも関連するので、汎用辞書１５３１からそのまま分割辞書１５４１に含められる。 For example, FIG. 6 shows an example of the division dictionary 1541 created for the classification in which the person in charge of the company 5 is “Kato”. The person in charge “Kato” is in charge of two companies with the names “A Kogyo” and “B Transportation” and two visitors with the names “Nagamitsu” and “Go”. A word is selected from the general dictionary 1531 and included in the divided dictionary 1541 as a word belonging to the categories “company name” and “visitor name”. Note that the words in the non-important category are related to any classification, and are included in the divided dictionary 1541 as they are from the general-purpose dictionary 1531.

その結果、図６に示すように、重要カテゴリ「会社名」、「来訪者名」および「担当者名」の単語数は、それぞれ、２、２、１となる。つまり、分割辞書１５４１では、汎用辞書１５３１と比べ、「会社名」および「来訪者名」の単語数を飛躍的に減少させることができる。なお、図６では、各カテゴリに属する単語の数が示されているのは単に説明目的であり、実際の分割辞書１５４１には、単語数は含まれなくてよい。 As a result, as shown in FIG. 6, the number of words of the important categories “company name”, “visitor name”, and “person in charge” are 2, 2, and 1, respectively. That is, in the divided dictionary 1541, the number of words of “company name” and “visitor name” can be drastically reduced as compared with the general dictionary 1531. In FIG. 6, the number of words belonging to each category is shown for illustrative purposes only, and the actual divided dictionary 1541 may not include the number of words.

本実施形態では、汎用辞書１５３１には、カテゴリ「担当者名」に属する単語は１０あるので（図５参照）、前述のように担当者名を基準とした分割辞書が作成されると、図６の分割辞書１５４１を含め、分割辞書の数は１０となる。ただし、担当者名を基準として分割辞書を作成する場合であっても、必ずしも各分割辞書に含まれる担当者名の単語数を１とする必要はない。例えば、会社５で同じ部署に所属する担当者毎、または同じ業務に携わっている担当者毎に分類して、それぞれ１〜数名を含む分割辞書を作成することも可能である。この場合、分割辞書の数は、１０よりも少なくなる場合がある。 In this embodiment, since there are 10 words belonging to the category “person in charge” in the general-purpose dictionary 1531 (see FIG. 5), when a divided dictionary based on the person in charge is created as described above, The number of divided dictionaries is 10 including 6 divided dictionaries 1541. However, even when a division dictionary is created based on the person-in-charge name, the number of words in the person-in-charge name included in each division dictionary need not necessarily be one. For example, it is possible to classify each person in charge who belongs to the same department in the company 5 or each person in charge who is engaged in the same business, and create a divided dictionary including one to several names. In this case, the number of divided dictionaries may be less than 10.

社員ＤＢ記憶エリア１５５（図３参照）には、会社５の社員全員の個人情報（以下、社員情報という）を格納する社員ＤＢ１５５１（図７参照）が記憶されている。社員情報は、社員１名毎に作成されるものであり、例えば、図７に示すように、氏名、苗字、および連絡先を含む。連絡先としては、例えば、社員が使用するユーザ端末２０のＩＰアドレス、社員の電子メールアドレス、電話番号等を採用することができる。なお、社員情報には、図７に示す以外に、各社員を識別する社員コードや、所属する部署等の情報が含まれていてもよい。本実施形態では、社員ＤＢ１５５１に格納されている氏名または苗字は、本発明の「単語」に相当し、連絡先は、本発明の「通知先」に相当する。 The employee DB storage area 155 (see FIG. 3) stores an employee DB 1551 (see FIG. 7) that stores personal information of all employees of the company 5 (hereinafter referred to as employee information). The employee information is created for each employee, and includes, for example, a name, last name, and contact information as shown in FIG. As the contact information, for example, the IP address of the user terminal 20 used by the employee, the email address of the employee, a telephone number, or the like can be employed. The employee information may include information such as an employee code for identifying each employee and a department to which the employee belongs, in addition to the information shown in FIG. In the present embodiment, the name or last name stored in the employee DB 1551 corresponds to the “word” of the present invention, and the contact address corresponds to the “notification destination” of the present invention.

プログラム記憶エリア１５６（図３参照）には、例えば、後述する各種処理に使用されるプログラムを含む、受付装置１０の各種動作を制御するためのプログラムおよび設定値等が記憶されている。なお、プログラムは、例えばＣＤ−ＲＯＭに記憶されたものがＣＤ−ＲＯＭドライブ（図示外）を介してインストールされ、プログラム記憶エリア１５６に記憶される。または、通信装置１１０を介してＬＡＮ９または他のネットワーク（図示外）に接続し、外部からダウンロードされたプログラムが記憶されてもよい。また、図示はされていないが、ＨＤＤ１５０には、その他、後述する処理でユーザ端末２０に送信される表示用データや音声データも記憶されている。 In the program storage area 156 (see FIG. 3), for example, a program for controlling various operations of the receiving apparatus 10 including a program used for various processes described later, a set value, and the like are stored. For example, a program stored in a CD-ROM is installed via a CD-ROM drive (not shown) and stored in the program storage area 156. Alternatively, a program downloaded from outside connected to the LAN 9 or another network (not shown) via the communication device 110 may be stored. Although not shown, the HDD 150 also stores display data and audio data that are transmitted to the user terminal 20 in a process that will be described later.

次に、図８を参照して、ユーザ端末２０の構成について説明する。図８は、ユーザ端末２０の電気的構成を示すブロック図である。図８に示すように、ユーザ端末２０は、ＣＰＵ２０１と、ＣＰＵ２０１に各々接続されたＲＯＭ２０２およびＲＡＭ２０３を含む制御回路部２００を備えている。ＣＰＵ２０１には、入出力（Ｉ／Ｏ）インタフェイス２０４が接続されている。Ｉ／Ｏインタフェイス２０４には、ディスプレイ２０６、マイク２０７、スピーカ２０８、通信装置２１０、およびハードディスク装置（ＨＤＤ）２５０が接続されている。つまり、人感センサがない以外、ユーザ端末２０の構成は、受付装置１０と同様である。 Next, the configuration of the user terminal 20 will be described with reference to FIG. FIG. 8 is a block diagram showing an electrical configuration of the user terminal 20. As illustrated in FIG. 8, the user terminal 20 includes a CPU 201 and a control circuit unit 200 including a ROM 202 and a RAM 203 connected to the CPU 201. An input / output (I / O) interface 204 is connected to the CPU 201. A display 206, a microphone 207, a speaker 208, a communication device 210, and a hard disk device (HDD) 250 are connected to the I / O interface 204. That is, the configuration of the user terminal 20 is the same as that of the reception device 10 except that there is no human sensor.

ＣＰＵ２０１は、ユーザ端末２０全体の制御を司る。ＲＯＭ２０２は、ユーザ端末２０の基本的な動作に必要なプログラムやそのための設定値を記憶している。ＣＰＵ２０１は、ＲＯＭ２０２や、ＨＤＤ２５０に記憶されたプログラムに従って、ユーザ端末２０の動作を制御する。ＲＡＭ２０３は、各種データを一時的に記憶するための記憶装置である。 The CPU 201 controls the entire user terminal 20. The ROM 202 stores programs necessary for basic operations of the user terminal 20 and setting values for the programs. The CPU 201 controls the operation of the user terminal 20 according to a program stored in the ROM 202 or the HDD 250. The RAM 203 is a storage device for temporarily storing various data.

ディスプレイ２０６は、液晶パネルと駆動回路を備えた表示機器である。ディスプレイ２０６には、例えば、来訪者の到着を告げる通知文等が表示される。マイク２０７は、音声が入力される機器であり、入力された音声を音声データに変換して出力する。スピーカ２０８は、入力された音声データを音声に変換して出力する機器である。例えば、スピーカ２０８からは、来訪者の到着を告げる通知音声が出力される。ＨＤＤ２５０には、ユーザ端末２０の各種動作を制御するためのプログラムおよび設定値等が記憶されている。 The display 206 is a display device including a liquid crystal panel and a drive circuit. The display 206 displays, for example, a notification sentence that announces the arrival of the visitor. The microphone 207 is a device to which sound is input, and converts the input sound into sound data and outputs the sound data. The speaker 208 is a device that converts input sound data into sound and outputs the sound. For example, the speaker 208 outputs a notification sound that notifies the arrival of the visitor. The HDD 250 stores a program for controlling various operations of the user terminal 20, setting values, and the like.

以下に、前述のように構成された受付装置１０において行われる各種処理について、図９〜図１７を参照して説明する。図９は、受付装置１０のメイン処理のフローチャートである。図１０は、メイン処理で実行される取り次ぎ処理のフローチャートである。図１１は、音声認識処理のフローチャートである。図１２および図１３は、音声認識処理で実行される認識結果決定処理のフローチャートである。図１４〜図１７は、それぞれ、音声認識によって得られた認識結果の具体例を示す説明図である。なお、図９〜図１３に示す処理は、ＨＤＤ１５０のプログラム記憶エリア１５６に記憶されているプログラムに従って、受付装置１０のＣＰＵ１０１が実行する。 Hereinafter, various processes performed in the reception apparatus 10 configured as described above will be described with reference to FIGS. 9 to 17. FIG. 9 is a flowchart of the main process of the receiving apparatus 10. FIG. 10 is a flowchart of the agency process executed in the main process. FIG. 11 is a flowchart of the voice recognition process. 12 and 13 are flowcharts of the recognition result determination process executed in the speech recognition process. 14 to 17 are explanatory diagrams illustrating specific examples of recognition results obtained by voice recognition. The processing shown in FIGS. 9 to 13 is executed by the CPU 101 of the accepting apparatus 10 according to the program stored in the program storage area 156 of the HDD 150.

図９に示すメイン処理は、受付装置１０の電源がＯＮにされると開始され、電源がＯＦＦにされるまで継続して行われる。処理が開始されるとまず、開始フラグおよび終了フラグがいずれもｆａｌｓｅにセットされ、ＲＡＭ１０３のフラグ記憶エリア（図示外）に記憶される（Ｓ１）。開始フラグは、後述する音声認識処理（図１１参照）において、受付装置１０のマイク１０７から入力される来訪者の発話の認識を行う期間を規定するフラグである。具体的には、開始フラグがｔｒｕｅとされている間は発話の認識が行われ、ｆａｌｓｅとされると認識が中止される。終了フラグは、音声認識処理を終了させるか否かを示すフラグである。具体的には、終了フラグがｆａｌｓｅである間は音声認識処理が継続され、ｔｒｕｅとされると終了する。 The main process shown in FIG. 9 is started when the power of the receiving apparatus 10 is turned on, and is continuously performed until the power is turned off. When the process is started, first, both a start flag and an end flag are set to false and stored in a flag storage area (not shown) of the RAM 103 (S1). The start flag is a flag that defines a period for recognizing a visitor's utterance input from the microphone 107 of the receiving apparatus 10 in a voice recognition process (see FIG. 11) described later. Specifically, the utterance is recognized while the start flag is set to true, and the recognition is stopped when the start flag is set to false. The end flag is a flag indicating whether or not to end the speech recognition process. Specifically, the speech recognition process is continued while the end flag is false, and ends when the end flag is true.

続いて、音声認識処理が起動される（Ｓ２）。具体的には、ＨＤＤ１５０のプログラム記憶エリア１５６に記憶されている音声認識プログラムが起動されることにより、音声認識処理（図１１参照）が開始され、メイン処理と並行して実行されることになる。なお、音声認識処理については、後で詳述する。 Subsequently, a voice recognition process is activated (S2). Specifically, when the voice recognition program stored in the program storage area 156 of the HDD 150 is activated, the voice recognition process (see FIG. 11) is started and executed in parallel with the main process. . The voice recognition process will be described later in detail.

人感センサ１０９により、来訪者が検知されたか否かが判断される（Ｓ３）。来訪者が検知されない間は、来訪者が検知されるまで待機状態となる（Ｓ３：ＮＯ）。そして、人感センサ１０９により来訪者が検知されると（Ｓ３：ＹＥＳ）、取り次ぎ処理が行われる（Ｓ４および図１０）。詳細は後述するが、取り次ぎ処理では、別途実行されている音声認識処理（図１１参照）によって得られた認識結果に基づいて、来訪者に応対する担当者が特定され、担当者の使用するユーザ端末２０に対する通知処理が行われる。 It is determined whether or not a visitor is detected by the human sensor 109 (S3). While a visitor is not detected, it will be in a standby state until a visitor is detected (S3: NO). When a visitor is detected by the human sensor 109 (S3: YES), an intermediary process is performed (S4 and FIG. 10). Although details will be described later, in the intermediary process, the person in charge who responds to the visitor is specified based on the recognition result obtained by the voice recognition process (see FIG. 11) that is executed separately, and the user used by the person in charge A notification process for the terminal 20 is performed.

取り次ぎ処理の後、電源がＯＦＦにされていなければ（Ｓ５：ＮＯ）、処理はステップＳ３に戻り、来訪者毎に取り次ぎ処理が繰り返される。そして、受付装置１０の電源がＯＦＦにされると（Ｓ５：ＹＥＳ）、並行して実行されている音声認識処理も終了させるために、ＲＡＭ１０３のフラグ記憶エリアに記憶されている終了フラグがｆａｌｓｅからｔｒｕｅに変更され（Ｓ６）、図９に示すメイン処理は終了する。 If the power is not turned off after the agency process (S5: NO), the process returns to step S3, and the agency process is repeated for each visitor. Then, when the power supply of the receiving device 10 is turned off (S5: YES), the end flag stored in the flag storage area of the RAM 103 is set to false in order to end the voice recognition process being executed in parallel. It is changed to true (S6), and the main process shown in FIG. 9 ends.

図１０を参照して、メイン処理で行われる取り次ぎ処理について説明する。図１０に示すように、取次ぎ処理が開始されると、まず、再発話フラグがｆａｌｓｅにセットされ、ＲＡＭ１０３のフラグ記憶エリアに記憶される（Ｓ４１）。再発話フラグは、来訪者の発話の認識に失敗した場合に、来訪者にすでに同じ内容の再発話を促したか否かを示すフラグである。具体的には、再発話フラグがｔｒｕｅの場合は、すでに来訪者に再発話を促したことを示し、ｆａｌｓｅの場合は、まだ再発話を促していないことを示している。 With reference to FIG. 10, the agency process performed in the main process will be described. As shown in FIG. 10, when the mediation process is started, first, the re-utterance flag is set to false and stored in the flag storage area of the RAM 103 (S41). The recurrence flag is a flag indicating whether or not the visitor has already been prompted to recite the same content when the visitor's utterance recognition fails. Specifically, when the recurrent talk flag is true, it indicates that the visitor has already been prompted to reoccurrence, and when false, it indicates that the recurrent talk has not been prompted yet.

続いて、ＣＰＵ１０１は、来訪者名および担当者名を質問する内容の音声をスピーカ１０８から出力させる（Ｓ４２）。より具体的には、予め作成され、ＨＤＤ１５０の所定の記憶エリア（図示外）に記憶されている質問文面のテキストデータが読み出される。そして、音声データに変換され、音声データがスピーカ１０８により音声に変換されて出力される。ここでは、例えば、「お客様のお名前と担当者の名前をお願いします」というように、来訪者名と担当者名を尋ねる音声が出力される。 Subsequently, the CPU 101 causes the speaker 108 to output a sound of content asking questions about the visitor name and the person in charge (S42). More specifically, the text data of the question text that is created in advance and stored in a predetermined storage area (not shown) of the HDD 150 is read out. Then, the sound data is converted into sound data, and the sound data is converted into sound by the speaker 108 and output. Here, for example, a voice requesting the visitor name and the name of the person in charge is output, such as "Please give me the name of the customer and the name of the person in charge."

この質問に対して来訪者が応答する際の発話を認識するために、メイン処理の最初でｆａｌｓｅとされた開始フラグが、ｔｒｕｅに変更される（Ｓ４３）。これにより、並行して実行されている音声認識処理（図１１参照）では、発話の認識が開始される。 In order to recognize the utterance when the visitor responds to this question, the start flag set to false at the beginning of the main process is changed to true (S43). Thereby, in the speech recognition processing (see FIG. 11) executed in parallel, the recognition of the utterance is started.

音声認識処理において得られた最終的な認識結果（以下、最終結果という）は、ＲＡＭ１０３の所定の記憶エリアに記憶される。そこで、ＣＰＵ１０１は、ＲＡＭ１０３に記憶されている最終結果を取得し、認識失敗を示すものか否かを判断する（Ｓ４５）。認識失敗を示す最終結果がＲＡＭ１０３に記憶されている場合（Ｓ４５：ＹＥＳ）、前述した質問に対する応答が認識できなかったことを意味する。したがって、ＣＰＵ１０１は、ＲＡＭ１０３のフラグ記憶エリアに記憶されている再発話フラグがｆａｌｓｅであるか否かを判断する（Ｓ４６）。 A final recognition result (hereinafter referred to as a final result) obtained in the speech recognition process is stored in a predetermined storage area of the RAM 103. Therefore, the CPU 101 obtains the final result stored in the RAM 103 and determines whether or not it indicates a recognition failure (S45). If the final result indicating the recognition failure is stored in the RAM 103 (S45: YES), it means that the response to the above-mentioned question could not be recognized. Therefore, the CPU 101 determines whether or not the recurrence flag stored in the flag storage area of the RAM 103 is false (S46).

再発話フラグがｆａｌｓｅである場合（Ｓ４６：ＮＯ）、来訪者は質問に対して一度応答をしただけであり、再度同じ応答をしたことはない。よって、再発話を促す内容の音声をスピーカ１０８から出力させる（Ｓ４７）。ここでも、予め作成され、ＨＤＤ１５０の所定の記憶エリア（図示外）に記憶されている文面のテキストデータが読み出されて音声データに変換され、さらにスピーカ２０８で音声に変換されて出力される。ここでは、例えば、「もう一度お願いします」というように、来訪者名と担当者名を再度尋ねる音声が出力される。そして、再発話フラグが、来訪者にすでに同じ内容の再発話を促したことを示すｔｒｕｅに変更された後（Ｓ４８）、処理はステップＳ４３に戻る。 When the re-utterance flag is false (S46: NO), the visitor only responds to the question once, and never makes the same response again. Therefore, the sound of the content prompting the re-utterance is output from the speaker 108 (S47). Also here, text data of a text created in advance and stored in a predetermined storage area (not shown) of the HDD 150 is read and converted into voice data, and further converted into voice by the speaker 208 and output. Here, for example, a voice requesting the name of the visitor and the name of the person in charge is output, such as “Please ask again”. Then, after the recurrence flag is changed to true indicating that the visitor has already been prompted to recite the same content (S48), the process returns to step S43.

ステップＳ４８に続くステップＳ４３では、並行して実行されている音声認識処理（図１１参照）において最初の応答の認識の後ｆａｌｓｅとされた開始フラグが、来訪者による再発話を認識するために、ｔｒｕｅに変更される（Ｓ４３）。これにより、音声認識処理では、来訪者の再発話の認識が開始され、最終結果が取得される（Ｓ４４）。 In step S43 following step S48, in order to recognize the recurrent speech by the visitor, the start flag set to false after recognition of the first response in the voice recognition processing (see FIG. 11) being performed in parallel, It is changed to true (S43). Thereby, in the voice recognition process, recognition of the recurrent utterance of the visitor is started and the final result is acquired (S44).

再発話の最終結果も認識失敗を示すものである場合には（Ｓ４５：ＹＥＳ）、再発話フラグが前の処理のステップＳ４８でｔｒｕｅにされている（Ｓ４６：ＮＯ）。よって、ＣＰＵ１０１は、それ以上来訪者に同じ質問をする処理を行うことなく、この来訪者の担当者名は代表担当者名であると決定し、ＲＡＭ１０３の所定の記憶エリアに代表担当者名を記憶させる（Ｓ４９）。なお、ここでいう代表担当者名とは、担当者が不明な場合に来訪者に応対すると予め定められた人物の名前である。代表者名は、例えば、社員ＤＢ１５５１において、担当者名のうち、代表担当者名を識別可能に記憶しておけばよい。また、別途、ＨＤＤ１５０の所定の記憶エリアに記憶させておいてもよい。 If the final result of the recurrent speech also indicates a recognition failure (S45: YES), the recurrent speech flag is set to true in step S48 of the previous process (S46: NO). Therefore, the CPU 101 determines that the name of the person in charge of this visitor is the name of the representative person in charge without performing any further processing for asking the same question to the visitor, and sets the name of the representative person in charge in a predetermined storage area of the RAM 103. Store (S49). Here, the representative person-in-charge name is a name of a person who is predetermined to respond to a visitor when the person in charge is unknown. For example, the representative name may be stored in the employee DB 1551 so that the representative person name among the person names can be identified. Alternatively, it may be stored in a predetermined storage area of the HDD 150 separately.

一方、来訪者および担当者名の質問に対する来訪者の最初の応答の認識が成功した場合（Ｓ４５：ＮＯ）、または、再発話を促された後の再度の応答の認識が成功した場合（Ｓ４５：ＮＯ）、最終結果には、担当者名が含まれている。つまり、担当者に来訪者の到着を告げることができる。そこで、ＣＰＵ１０１は、予め作成され、ＨＤＤ１５０の所定の記憶エリア（図示外）に記憶されている文面のテキストデータを読み出し、スピーカ１０８から、待機を依頼する内容の音声を出力させる（Ｓ５１）。ここでは、例えば、「担当者に取り次いでいますので、しばらくお待ちください」という音声が出力される。 On the other hand, when the first response of the visitor to the question of the visitor and the person in charge is successfully recognized (S45: NO), or when the second response is recognized after being prompted to recurrence again (S45). : NO), the final result includes the name of the person in charge. In other words, the person in charge can be notified of the arrival of the visitor. Therefore, the CPU 101 reads the text data of the text that has been created in advance and stored in a predetermined storage area (not shown) of the HDD 150, and causes the speaker 108 to output a voice requesting standby (S51). Here, for example, a voice message “Please wait for a while because the person in charge has been taken over” is output.

続いて、ＣＰＵ１０１は、最終結果に基づいて担当者名を特定する（Ｓ５２）。具体的には、ＲＡＭ１０３に記憶されている最終結果に含まれる担当者名を抽出する。前述したように、本実施形態では、記述文法モデルである言語モデル１５２１（図４参照）を用いて来訪者の応答が認識されており、認識に成功している場合、最終結果にはカテゴリ「担当者名」の単語が含まれている。なお、詳細は後述するが、音声認識処理の結果、複数の最終結果がＲＡＭ１０３に記憶される場合がある。このような場合は、それぞれの最終結果に含まれる担当者が特定されればよい。特定された担当者名は、ＲＡＭ１０３の所定の記憶エリアに記憶される。 Subsequently, the CPU 101 specifies the person in charge name based on the final result (S52). Specifically, the person-in-charge name included in the final result stored in the RAM 103 is extracted. As described above, in the present embodiment, a visitor's response is recognized using the language model 1521 (see FIG. 4), which is a descriptive grammar model. The word "person in charge" is included. Although details will be described later, a plurality of final results may be stored in the RAM 103 as a result of the speech recognition process. In such a case, the person in charge included in each final result may be specified. The identified person-in-charge name is stored in a predetermined storage area of the RAM 103.

ステップＳ５３では、来訪者に応対する担当者の連絡先が特定される。具体的には、ＣＰＵ１０１は、社員ＤＢ記憶エリア１５５に記憶された社員ＤＢ１５５１（図７参照）を参照して、ステップＳ４９またはステップＳ５２で特定され、ＲＡＭ１０３に記憶されている担当者名と一致する社員の苗字に対応する連絡先を特定する。そして、特定された担当者の連絡先に対して、来訪者の到着を知らせる通知処理が行われる（Ｓ５４）。なお、前述したように、担当者名が複数特定された場合には、すべての担当者の連絡先が特定され、すべての担当者の連絡先に対して通知処理が行われればよい。 In step S53, the contact information of the person in charge who responds to the visitor is specified. Specifically, the CPU 101 refers to the employee DB 1551 (see FIG. 7) stored in the employee DB storage area 155 and matches the person-in-charge name specified in step S49 or S52 and stored in the RAM 103. Identify contact information that corresponds to the employee's last name. Then, notification processing for notifying the arrival of the visitor is performed for the contact information of the identified person in charge (S54). As described above, when a plurality of persons in charge are specified, the contacts of all persons in charge are specified, and the notification process may be performed on the contacts of all persons in charge.

例えば、通知先として、担当者の使用するユーザ端末２０のＩＰアドレスが特定された場合、ＣＰＵ１０１は、ユーザ端末２０に来訪者の到着を知らせる内容の音声データを送信し、ユーザ端末２０のスピーカ２０８からその音声を出力させればよい。ここでは、予め定められた通知文に基づいて、例えば「お客様がいらっしゃっています」とのみ通知してもよいし、「Ａ工業の永光様がいらっしゃっています」というように、最終結果に含まれる会社名や来訪者名をあわせて通知してもよい。また、通知先として電子メールのアドレスが特定された場合は、そのアドレスに通知文のテキストデータを送信してもよい。この場合は、ユーザ端末２０のディスプレイ２０６に、通知文が表示される。 For example, when the IP address of the user terminal 20 used by the person in charge is specified as the notification destination, the CPU 101 transmits audio data with content notifying the arrival of the visitor to the user terminal 20, and the speaker 208 of the user terminal 20. The sound may be output from. Here, for example, it may be notified only based on a predetermined notice, for example, “You are here”, or it is included in the final result, such as “You are Ami of A Industry”. The company name and visitor name may also be notified. In addition, when an e-mail address is specified as a notification destination, text data of a notification sentence may be transmitted to the address. In this case, a notification sentence is displayed on the display 206 of the user terminal 20.

このような通知処理により、通知を受けた担当者は、直接来訪者と対話しなくても、来訪者が到着したという情報、または誰が来訪したのかという情報を知ることができる。担当者の連絡先に対する通知処理が行われた後、図１０に示す取り次ぎ処理は終了し、図９のメイン処理に戻る。 By such notification processing, the person in charge who has received the notification can know information that the visitor has arrived or who has visited without directly interacting with the visitor. After the notification process for the contact information of the person in charge is performed, the relay process shown in FIG. 10 ends, and the process returns to the main process in FIG.

次に、図１１を参照して、図９に示すメイン処理のステップＳ２で起動され、メイン処理と並行して実行される音声認識処理について説明する。 Next, with reference to FIG. 11, the speech recognition process started in step S2 of the main process shown in FIG. 9 and executed in parallel with the main process will be described.

図１１に示すように、音声認識処理では、まず、音響モデル、言語モデルおよび辞書が、ＨＤＤ１５０の所定の記憶エリアからそれぞれ読み出される（Ｓ２１）。具体的には、音響モデル（図示外）が音響モデル記憶エリア１５１から読み出される。来訪者名および担当者名に関する質問への応答に対応する言語モデル１５２１（図４参照）が、言語モデル記憶エリア１５２から読み出される。汎用辞書１５３１（図５参照）が、汎用辞書記憶エリア１５３から読み出され、分割辞書１５４１（図６参照）を含む、汎用辞書１５３１を基に作成された複数の分割辞書が、分割辞書記憶エリア１５４から読み出される。 As shown in FIG. 11, in the speech recognition process, first, an acoustic model, a language model, and a dictionary are each read from a predetermined storage area of the HDD 150 (S21). Specifically, an acoustic model (not shown) is read from the acoustic model storage area 151. The language model 1521 (see FIG. 4) corresponding to the response to the question regarding the visitor name and the person in charge name is read from the language model storage area 152. A general dictionary 1531 (see FIG. 5) is read from the general dictionary storage area 153, and a plurality of division dictionaries created based on the general dictionary 1531 including the division dictionary 1541 (see FIG. 6) are divided dictionary storage areas. Read from 154.

続いて、ＲＡＭ１０３のフラグ記憶エリアに記憶されている終了フラグがｆａｌｓｅであるか否かが判断される（Ｓ２２）。前述したように、終了フラグは、受付装置１０の電源がＯＦＦにされない限り、ｆａｌｓｅのままである（Ｓ２２：ＹＥＳ）。この場合、フラグ記憶エリアに記憶されている開始フラグがｔｒｕｅか否かが判断される（Ｓ２３）。 Subsequently, it is determined whether or not the end flag stored in the flag storage area of the RAM 103 is false (S22). As described above, the end flag remains false unless the receiving apparatus 10 is turned off (S22: YES). In this case, it is determined whether or not the start flag stored in the flag storage area is true (S23).

前述したように、開始フラグは、人感センサ１０９によって来訪者が検知された場合に実行されるメイン処理の取次ぎ処理において、スピーカ１０８から来訪者名および担当者名の質問音声が出力された後、ｔｒｕｅにされる（図１０、Ｓ４３）。つまり、来訪者に対して来訪者名および担当者名の質問音声が出力されるまでの間は、ｆａｌｓｅのままである（Ｓ２３：ＮＯ）。この場合、終了フラグがｔｒｕｅにされるか、開始フラグがｔｒｕｅにされるまで、待機状態となる（Ｓ２２：ＹＥＳ、Ｓ２３：ＮＯ）。 As described above, the start flag is set after the question voice of the visitor name and the person in charge name is output from the speaker 108 in the intermediary process of the main process executed when the visitor is detected by the human sensor 109. , True (FIG. 10, S43). That is, it remains false until the visitor's name and the person in charge of the question name are output to the visitor (S23: NO). In this case, the standby state is set until the end flag is set to true or the start flag is set to true (S22: YES, S23: NO).

受付装置１０の電源がＯＦＦにされ、図９のメイン処理のステップＳ６で終了フラグがｔｒｕｅにされた場合は（Ｓ２２：ＮＯ）、図１１に示す音声認識処理はそのまま終了する。図１０の取次ぎ処理のステップＳ４３で開始フラグがｔｒｕｅにされた場合は（Ｓ２３：ＹＥＳ）。来訪者名および担当者名の質問に対する来訪者の応答の音声がマイク１０７から入力され、その音声データがＲＡＭ１０３に取得される（Ｓ２４）。 When the power of the receiving apparatus 10 is turned off and the end flag is set to true in step S6 of the main process in FIG. 9 (S22: NO), the voice recognition process shown in FIG. If the start flag is set to true in step S43 of the agency process in FIG. 10 (S23: YES). The voice of the visitor's response to the visitor name and the person in charge name questions is input from the microphone 107, and the voice data is acquired in the RAM 103 (S24).

取得された音声データの音声認識が行われる（Ｓ２５）。つまり、音声データが、音響モデル、言語モデル１５２１および辞書、すなわち汎用辞書１５３１および分割辞書１５４１を含む複数の分割辞書を用いて、テキストに変換される。具体的には、例えば、音声データを分析し、特徴量を抽出した後、音響モデルと言語モデル１５２１とのマッチングが行われる。言語モデル１５２１は、辞書として汎用辞書１５３１および分割辞書１５４１を含む複数の分割辞書を参照する。 Voice recognition of the acquired voice data is performed (S25). That is, the speech data is converted into text using an acoustic model, a language model 1521 and a dictionary, that is, a plurality of divided dictionaries including a general dictionary 1531 and a divided dictionary 1541. Specifically, for example, after the voice data is analyzed and the feature amount is extracted, matching between the acoustic model and the language model 1521 is performed. The language model 1521 refers to a plurality of divided dictionaries including a general dictionary 1531 and a divided dictionary 1541 as dictionaries.

マッチングの結果、言語モデル１５２１で受理可能な文毎に尤度が求まり、尤度が最も高い文が認識結果として得られる。なお、尤度が既定の閾値以下の値になった場合は、認識失敗として認識結果は得られない。音声認識に成功した場合には得られた認識結果（テキスト）が、音声認識に失敗した場合には認識失敗を示す情報が、どの辞書を参照して得られた結果であるかを識別可能な状態で、結果としてＲＡＭ１０３の記憶エリアに記憶される。 As a result of matching, the likelihood is obtained for each sentence acceptable by the language model 1521, and the sentence with the highest likelihood is obtained as the recognition result. In addition, when the likelihood becomes a value equal to or less than a predetermined threshold, the recognition result cannot be obtained as a recognition failure. When speech recognition is successful, the recognition result (text) obtained can be identified, and when speech recognition fails, the information indicating the recognition failure is the result obtained by referring to which dictionary. As a result, the result is stored in the storage area of the RAM 103.

例えば、前述したように、１０名の担当者毎に、１０の分割辞書が作成されている場合には、辞書の数は全体で１１であるから、言語モデル１５２１を用いた認識が１１通り行われることになる。つまり、すべての辞書で認識に成功すれば、それぞれの辞書に対応して１１の認識結果（テキスト）が得られる。記憶される各辞書に対応する結果の識別情報として、例えば、汎用辞書に対応する結果にはゼロ（０）、１０の分割辞書に対応する結果には１〜１０の番号がそれぞれ付与され、結果（テキストまたは認識失敗を示す情報）と対応付けてＲＡＭ１０３に記憶される。なお、各辞書を参照して行われる１１通りの音声認識は、並行して同時に行われても、シーケンシャルに行われてもよい。来訪者の待ち時間を考慮すると、並行して行うことが好ましい。 For example, as described above, when 10 divided dictionaries are created for every 10 persons in charge, the number of dictionaries is 11 in total, and thus 11 recognitions using the language model 1521 are performed. It will be. That is, if recognition is successful in all dictionaries, 11 recognition results (text) are obtained corresponding to each dictionary. As the identification information of the result corresponding to each dictionary stored, for example, the result corresponding to the general dictionary is assigned zero (0), and the result corresponding to the divided dictionary of 10 is assigned the numbers 1 to 10, respectively. It is stored in the RAM 103 in association with (text or information indicating recognition failure). The 11 types of speech recognition performed with reference to each dictionary may be performed simultaneously in parallel or sequentially. Considering the waiting time of visitors, it is preferable to carry out in parallel.

このようにして音声認識が行われた後（Ｓ２５）、最終結果を決定する認識結果決定処理が行われる（Ｓ２６および図１２）。認識結果決定処理で決定された最終結果は、前述したように、並行して実行されているメイン処理の取り次ぎ処理（図１０参照）において、担当者名の特定に使用されることになる。 After the voice recognition is performed in this way (S25), a recognition result determination process for determining the final result is performed (S26 and FIG. 12). As described above, the final result determined in the recognition result determination process is used to specify the person in charge in the intermediary process (see FIG. 10) of the main process executed in parallel.

認識結果決定処理の後（Ｓ２６）、音声認識を一旦中止して来訪者による次の発話まで待機するために、フラグ記憶エリアにｔｒｕｅとして記憶されている開始フラグがｆａｌｓｅに変更される（Ｓ２７）。そして、処理はステップＳ２２に戻る。受付装置１０の電源がＯＦＦにされ、終了フラグがｔｒｕｅとされた場合は（図９、Ｓ５：ＹＥＳ、Ｓ６）、図１１に示す音声認識処理は終了する。電源がＯＦＦにされず、終了フラグがｆａｌｓｅの間は（Ｓ２２：ＮＯ）、前述したようにステップＳ２２〜Ｓ２７の処理が繰り返される。 After the recognition result determination process (S26), the start flag stored as true in the flag storage area is changed to false in order to temporarily stop the voice recognition and wait for the next utterance by the visitor (S27). . Then, the process returns to step S22. When the power supply of the receiving apparatus 10 is turned off and the end flag is set to true (FIG. 9, S5: YES, S6), the voice recognition process shown in FIG. 11 ends. While the power is not turned off and the end flag is false (S22: NO), the processes in steps S22 to S27 are repeated as described above.

図１２〜図１７を参照して、認識結果決定処理について説明する。なお、以下では、１０名の担当者毎に、１０の分割辞書が作成されており、汎用辞書に対応する結果にはゼロ（０）、１０の分割辞書に対応する結果には１〜１０の識別番号がそれぞれ付与されているものとして説明する。 The recognition result determination process will be described with reference to FIGS. In the following, ten divided dictionaries are created for every ten persons in charge, and zero (0) for the results corresponding to the general-purpose dictionary and 1 to 10 for the results corresponding to the ten divided dictionaries. In the following description, it is assumed that an identification number is assigned.

図１２に示すように、処理が開始されるとまず、汎用辞書を参照して得られた認識結果（以下、単に「汎用辞書の認識結果」という）があるか否かが判断される（Ｓ２６１）。具体的には、ＲＡＭ１０３に、識別番号ゼロ（０）に対応して記憶されている結果が、認識結果（テキスト）であるか認識失敗を示す情報であるか否かが判断される。認識失敗を示す情報が記憶されている場合には、汎用辞書の認識結果はないと判断され（Ｓ２６１：ＮＯ）、さらに、分割辞書を参照して得られた認識結果（以下、単に「分割辞書の認識結果」という）があるか否かが判断される（Ｓ２６２）。具体的には、ＲＡＭ１０３に、識別番号１〜１０に対応して記憶されている結果に、１つでも認識結果（テキスト）があるか否かが判断される。 As shown in FIG. 12, when the process is started, it is first determined whether or not there is a recognition result obtained by referring to the general dictionary (hereinafter simply referred to as “general dictionary recognition result”) (S261). ). Specifically, it is determined whether the result stored in the RAM 103 corresponding to the identification number zero (0) is a recognition result (text) or information indicating a recognition failure. When the information indicating the recognition failure is stored, it is determined that there is no recognition result of the general dictionary (S261: NO), and the recognition result obtained by referring to the divided dictionary (hereinafter simply referred to as “divided dictionary”). It is determined whether or not there is a "recognition result" (S262). Specifically, it is determined whether or not there is at least one recognition result (text) among the results stored in the RAM 103 corresponding to the identification numbers 1 to 10.

識別番号１〜１０に対応する結果の少なくとも１つが認識結果（テキスト）の場合には、分割辞書の認識結果はあると判断される（Ｓ２６２：ＹＥＳ）。この場合、分割辞書の認識結果が最終結果として決定され（Ｓ２６３）、図１２の認識結果決定処理は終了して、図１１に示す音声認識処理に戻る。なお、認識結果が複数ある場合には、すべての認識結果が最終結果とされる。このように、汎用辞書の認識結果はないが、分割辞書の認識結果がある場合には、分割辞書の認識結果を最終結果とすることにより、最終的に認識失敗とされる可能性を低減できる。また、複数の認識結果がある場合にはすべての認識結果を最終結果として採用することにより、正しい認識結果が最終結果に含まれる可能性がある。 If at least one of the results corresponding to the identification numbers 1 to 10 is a recognition result (text), it is determined that there is a recognition result of the divided dictionary (S262: YES). In this case, the recognition result of the divided dictionary is determined as the final result (S263), the recognition result determination process in FIG. 12 is terminated, and the process returns to the voice recognition process shown in FIG. When there are a plurality of recognition results, all the recognition results are the final results. As described above, there is no recognition result of the general dictionary, but when there is a recognition result of the divided dictionary, the recognition result of the divided dictionary is used as a final result, thereby reducing the possibility of a recognition failure finally. . In addition, when there are a plurality of recognition results, a correct recognition result may be included in the final result by adopting all the recognition results as the final result.

例えば、来訪者の発話が音声認識された結果（図１１、Ｓ２５）、図１４に示す結果がＲＡＭ１０３に記憶されているとする。つまり、汎用辞書の結果である識別番号ゼロ（０）に対応する結果として、認識失敗を示す情報が記憶されている。また、分割辞書の結果である識別番号１〜１０に対応する結果には、認識失敗を示す情報と認識結果の両方が含まれる。例えば、識別番号１に対応する分割辞書の結果は認識失敗を示す情報であるが、識別番号２に対応する結果は「Ａ工業の永光ですが、佐藤さんお願いします」という認識結果（テキスト）である。また、識別番号１０に対応する結果は、「Ｅ工業の中道ですが、佐光さんお願いします」という認識結果（テキスト）である。 For example, it is assumed that the result of speech recognition of the visitor's utterance (FIG. 11, S25) and the result shown in FIG. That is, information indicating a recognition failure is stored as a result corresponding to the identification number zero (0) which is the result of the general dictionary. In addition, the results corresponding to the identification numbers 1 to 10 as the result of the divided dictionary include both information indicating recognition failure and the recognition result. For example, the result of the division dictionary corresponding to the identification number 1 is information indicating a recognition failure, but the result corresponding to the identification number 2 is the recognition result (text) “I am Emitsu of A Industry, but please Mr. Sato” It is. In addition, the result corresponding to the identification number 10 is a recognition result (text) “E-Nagaku Nakamichi, please Sakami-san”.

この例の場合、汎用辞書の認識結果があり（図１２のＳ２６１：ＹＥＳ）、さらに、分割辞書の認識結果があると判断される（Ｓ２６２：ＹＥＳ）。よって、識別番号３〜９の認識結果が、すべて認識失敗を示すものであれば、最終結果として、識別番号２および１０に対応する２種類の認識結果が、最終結果として決定されることになる。そして、別途実行されるメイン処理の取り次ぎ処理（図１０参照）において、これら２種類の最終結果から、担当者名として、「佐藤」および「佐光」が特定され、これら２名の使用するユーザ端末２０に対して通知が行われる。 In this example, there is a general dictionary recognition result (S261: YES in FIG. 12), and it is further determined that there is a divided dictionary recognition result (S262: YES). Therefore, if the recognition results of the identification numbers 3 to 9 all indicate recognition failure, the two types of recognition results corresponding to the identification numbers 2 and 10 are determined as the final results. . Then, in the intermediary process of the main process (see FIG. 10) that is executed separately, “Sato” and “Sako” are specified as the person in charge from these two types of final results, and the users who use these two people Notification is made to the terminal 20.

一方、ステップＳ２６２において、識別番号１〜１０に対応してＲＡＭ１０３に記憶されている結果がすべて認識失敗を示す情報である場合には、分割辞書の認識結果がないと判断される（Ｓ２６２：ＮＯ）。この場合、汎用辞書を参照して行われた音声認識でも、分割辞書を参照して行われた音声認識でも、認識が失敗したことを意味する。したがって、最終結果は認識失敗と決定され（Ｓ２６４）、図１２の認識結果決定処理は終了して、図１１に示す音声認識処理に戻る。この場合、メイン処理の取り次ぎ処理（図１０参照）では、来訪者に対して再発話が促される。そして、再発話の認識を行っても最終結果が認識失敗となった場合は、担当者名として予め定められた代表者名が特定され、その代表者の使用するユーザ端末２０に対して通知が行われる。 On the other hand, if all the results stored in the RAM 103 corresponding to the identification numbers 1 to 10 are information indicating a recognition failure in step S262, it is determined that there is no recognition result of the divided dictionary (S262: NO). ). In this case, it means that the recognition has failed in both the speech recognition performed with reference to the general dictionary and the speech recognition performed with reference to the divided dictionary. Therefore, the final result is determined as a recognition failure (S264), the recognition result determination process in FIG. 12 ends, and the process returns to the speech recognition process shown in FIG. In this case, in the relay process of the main process (see FIG. 10), the visitor is prompted to re-speak. If the final result is a recognition failure even after recognizing the recurrent utterance, a representative name predetermined as the person in charge is specified, and a notification is sent to the user terminal 20 used by the representative. Done.

また、ステップＳ２６１において、識別番号ゼロ（０）に対応してＲＡＭ１０３に記憶されている結果が認識結果（テキスト）である場合には、汎用辞書の認識結果があると判断される（Ｓ２６１：ＹＥＳ）。さらに、分割辞書の認識結果があるか否かが判断される（Ｓ２７１）。識別番号１〜１０に対応してＲＡＭ１０３に記憶されている結果がすべて認識失敗を示す情報であれば、分割辞書の認識結果がないと判断される（Ｓ２７１：ＮＯ）。この場合は、汎用辞書の認識結果が、最終結果として決定される（Ｓ２７２）。このように、発話内容が分割辞書には対応していないために分割辞書では認識に失敗した場合、汎用辞書の認識結果を最終結果として採用することにより、最終結果が得られない可能性を低減できる。 In step S261, if the result stored in the RAM 103 corresponding to the identification number zero (0) is the recognition result (text), it is determined that there is a general dictionary recognition result (S261: YES). ). Further, it is determined whether there is a recognition result of the divided dictionary (S271). If all the results stored in the RAM 103 corresponding to the identification numbers 1 to 10 are information indicating a recognition failure, it is determined that there is no recognition result of the divided dictionary (S271: NO). In this case, the recognition result of the general dictionary is determined as the final result (S272). In this way, if the utterance content does not correspond to the divided dictionary and the recognition fails in the divided dictionary, the possibility that the final result cannot be obtained is reduced by adopting the recognition result of the general dictionary as the final result. it can.

汎用辞書の認識結果があり（Ｓ２６１：ＹＥＳ）、識別番号１〜１０に対応する結果の少なくとも１つが認識結果であり、分割辞書の認識結果もあると判断された場合には（Ｓ２７１：ＹＥＳ）、分割辞書の認識結果の数が１よりも大きいか否かが判断される（図１３のＳ２７３）。分割辞書の認識結果の数が１より大きい場合（Ｓ２７３：ＹＥＳ）、汎用辞書の認識結果に加え、複数の分割辞書の認識結果が得られている。前述したように、分割辞書は、担当者毎に汎用辞書を分割して作成された辞書であるため、単語数が絞り込まれており、認識結果全体としての信頼性は高いと考えられる。そこで、複数の分割辞書の認識結果のうち、信頼性がより高い認識結果を最終結果とする処理が行われる。具体的には、汎用辞書の認識結果に含まれる担当者名が、担当者名１として特定される（Ｓ２７９）。そして、複数の分割辞書の認識結果のうち、担当者名１を含む認識結果が、最終結果として決定される（Ｓ２８０）。 When it is determined that there is a recognition result of the general dictionary (S261: YES), at least one of the results corresponding to the identification numbers 1 to 10 is the recognition result, and there is also a recognition result of the divided dictionary (S271: YES). Then, it is determined whether or not the number of recognition results of the divided dictionary is greater than 1 (S273 in FIG. 13). When the number of recognition results of the divided dictionary is larger than 1 (S273: YES), the recognition result of a plurality of divided dictionaries is obtained in addition to the recognition result of the general dictionary. As described above, since the divided dictionary is a dictionary created by dividing the general-purpose dictionary for each person in charge, the number of words is narrowed down, and it is considered that the reliability of the entire recognition result is high. Therefore, processing is performed in which the recognition result with higher reliability among the recognition results of the plurality of divided dictionaries is used as the final result. Specifically, the person in charge name included in the recognition result of the general dictionary is specified as person in charge name 1 (S279). Then, among the recognition results of the plurality of divided dictionaries, the recognition result including the person in charge name 1 is determined as the final result (S280).

汎用辞書には来訪者が求める担当者名はすべて含まれており、また、担当者名の単語数は他の重要カテゴリの単語数に比べて少ない。よって、汎用辞書の認識結果に含まれる担当者名の信頼性は高いと考えられる。一方、分割辞書は担当者毎に作成されているため、来訪者の発話にその分割辞書の担当者名が含まれていれば、認識結果に含まれる担当者名の信頼性は高いが、含まれていない場合には、信頼性は高いとはいえない。よって、汎用辞書に含まれる担当者名と同じ担当者名を含む分割辞書の認識結果を最終結果とすることにより、全体としての信頼性が高い最終結果を得ることができる。 The general-purpose dictionary includes all the names of persons in charge required by visitors, and the number of words in person-in-charge is smaller than the number of words in other important categories. Therefore, it is considered that the person-in-charge name included in the recognition result of the general dictionary has high reliability. On the other hand, since the division dictionary is created for each person in charge, if the person in charge of the division dictionary is included in the utterance of the visitor, the person name included in the recognition result is highly reliable. If not, the reliability is not high. Therefore, by using the recognition result of the divided dictionary including the same person name as the person name included in the general-purpose dictionary as the final result, it is possible to obtain a final result with high reliability as a whole.

例えば、来訪者の発話が音声認識された結果（図１１のＳ２５）、図１５に示す結果がＲＡＭ１０３に記憶されているとする。つまり、汎用辞書の結果である識別番号ゼロ（０）に対応する結果として、「Ｋ工業の中道ですが、佐藤さんお願いします」が記憶されている。また、分割辞書の結果である識別番号１〜１０に対応する結果として、少なくとも識別番号１、２および１０に対応して認識結果（テキスト）が記憶されている。具体的には、識別番号１に対応する分割辞書の結果は、「Ｋ工業の長井ですが、加藤さんお願いします」という認識結果（テキスト）である。識別番号２に対応する分割辞書の結果は、「Ａ工業の永光ですが、佐藤さんお願いします」という認識結果（テキスト）である。識別番号１０に対応する分割辞書の結果は、「Ｅ工業の中道ですが、佐光さんお願いします」という認識結果（テキスト）である。 For example, it is assumed that the result of speech recognition of the visitor's utterance (S25 in FIG. 11) and the result shown in FIG. That is, as a result corresponding to the identification number zero (0), which is the result of the general dictionary, “K Kogyo Nakamichi, please Sato-san please” is stored. In addition, as a result corresponding to the identification numbers 1 to 10 which are the results of the divided dictionary, recognition results (text) are stored corresponding to at least the identification numbers 1, 2 and 10. Specifically, the result of the division dictionary corresponding to the identification number 1 is a recognition result (text) “I am Nagai of K Industry, but please Mr. Kato”. The result of the division dictionary corresponding to the identification number 2 is a recognition result (text) “I am Emitsu of A Industry, but please Sato-san”. The result of the division dictionary corresponding to the identification number 10 is a recognition result (text) “E-Nagashi's middle road, please Sakami-san”.

この例の場合、汎用辞書の認識結果があり（図１２のＳ２６１：ＹＥＳ）、さらに、分割辞書の認識結果が複数あると判断される（Ｓ２７１：ＹＥＳ、図１３のＳ２７３：ＹＥＳ）。そして、汎用辞書の認識結果である「Ｋ工業の中道ですが、佐藤さんお願いします」に含まれる担当者名である「佐藤」が、担当者名１として特定される（Ｓ２７９）。さらに、担当者名１として特定された「佐藤」を含む分割辞書の認識結果である、識別番号２に対応する「Ａ工業の永光ですが、佐藤さんお願いします」が、最終結果として決定されることになる（Ｓ２８０）。そして、別途実行されるメイン処理の取り次ぎ処理（図１０参照）において、この最終結果から、担当者名として、「佐藤」が特定され、佐藤氏の使用するユーザ端末２０に対して通知が行われる。 In this example, there is a recognition result of the general dictionary (S261: YES in FIG. 12), and it is further determined that there are a plurality of recognition results of the divided dictionary (S271: YES, S273: YES in FIG. 13). Then, “Sato”, the name of the person in charge included in “K Kogyo Nakamichi, please Sato, please”, which is the recognition result of the general-purpose dictionary, is specified as person-in-charge name 1 (S279). In addition, the recognition result of the divided dictionary including “Sato” identified as the person in charge 1 is “A Industry's Nagamitsu, but please Mr. Sato,” corresponding to the identification number 2 as the final result. (S280). Then, in the intermediary process of the main process (see FIG. 10) that is executed separately, “Sato” is identified as the person in charge from this final result, and the user terminal 20 used by Mr. Sato is notified. .

一方、ステップＳ２７３において、分割辞書の認識結果の数が１であると判断された場合には（Ｓ２７３：ＮＯ）、ステップＳ２７９と同様に汎用辞書の認識結果に含まれる担当者名が担当者名１として特定された後（Ｓ２７４）、分割辞書の認識結果に含まれる担当者名が、担当者名２として特定される（Ｓ２７５）。そして、担当者名１と担当者名２とが一致するか否かが判断される（Ｓ２７６）。前述したように、汎用辞書の認識結果に含まれる担当者名の信頼性は高いため、分割辞書の認識結果に含まれる担当者名がこれと同じであれば、この分割辞書の認識結果の信頼性も高いためである。よって、担当者名１と担当者名２が同一であれば（Ｓ２７６：ＹＥＳ）、分割辞書の認識結果が最終結果として決定される（Ｓ２７７）。これにより、全体としての信頼性が高い最終結果を得ることができる。 On the other hand, if it is determined in step S273 that the number of recognition results of the divided dictionary is 1 (S273: NO), the person-in-charge name included in the general-purpose dictionary recognition result is the person-in-charge name as in step S279. After being identified as 1 (S274), the person-in-charge name included in the recognition result of the divided dictionary is identified as person-in-charge name 2 (S275). Then, it is determined whether or not the person in charge name 1 and the person in charge name 2 match (S276). As described above, since the name of the person in charge included in the recognition result of the general dictionary is highly reliable, if the name of the person in charge included in the recognition result of the divided dictionary is the same as this, the reliability of the recognition result of this divided dictionary is confirmed. This is because the property is high. Therefore, if the person in charge name 1 and the person in charge name 2 are the same (S276: YES), the recognition result of the divided dictionary is determined as the final result (S277). Thereby, the final result with high reliability as a whole can be obtained.

例えば、来訪者の発話が音声認識された結果（図１１、Ｓ２５）、図１６に示す結果がＲＡＭ１０３に記憶されているとする。つまり、汎用辞書の結果である識別番号ゼロ（０）に対応する結果として、「Ｋ工業の中道ですが、佐藤さんお願いします」が記憶されている。また、識別番号２に対応する分割辞書の結果は、「Ａ工業の永光ですが、佐藤さんお願いします」という認識結果（テキスト）である。その他の分割辞書の結果は、すべて認識失敗であるとする。 For example, it is assumed that the result of speech recognition of the visitor's utterance (FIG. 11, S25) and the result shown in FIG. That is, as a result corresponding to the identification number zero (0), which is the result of the general dictionary, “K Kogyo Nakamichi, please Sato-san please” is stored. Further, the result of the division dictionary corresponding to the identification number 2 is a recognition result (text) “I am Emitsu of A Industry, but please Mr. Sato”. It is assumed that all other divided dictionary results are recognition failures.

この例の場合、汎用辞書の認識結果があり（図１２、Ｓ２６１：ＹＥＳ）、さらに、分割辞書の認識結果が１つだけであると判断される（Ｓ２７１：ＹＥＳ、図１３のＳ２７３：ＮＯ）。そして、汎用辞書の認識結果に含まれる担当者名である「佐藤」が担当者名１として特定され（Ｓ２７４）、分割辞書の認識結果に含まれる担当者名である「佐藤」が担当者名２として特定される（Ｓ２７５）。担当者名１と担当者名２は一致しているため（Ｓ２７６：ＹＥＳ）、識別番号２に対応する分割辞書の認識結果である「Ａ工業の永光ですが、佐藤さんお願いします」が、最終結果として決定されることになる（Ｓ２７７）。そして、別途実行されるメイン処理の取り次ぎ処理（図１０参照）において、この最終結果から、担当者名として、「佐藤」が特定され、佐藤氏の使用するユーザ端末２０に対して通知が行われる。 In this example, there is a recognition result of the general dictionary (FIG. 12, S261: YES), and it is further determined that there is only one recognition result of the divided dictionary (S271: YES, S273: NO in FIG. 13). . Then, “Sato”, which is the name of the person in charge included in the recognition result of the general dictionary, is specified as the person in charge name 1 (S274), and “Sato”, which is the name of the person in charge included in the recognition result of the divided dictionary, is the name of the person in charge. 2 (S275). Since person-in-charge name 1 and person-in-charge name 2 are the same (S276: YES), the recognition result of the division dictionary corresponding to identification number 2 is "Emitsu of A Industry, but please Sato-san", The final result is determined (S277). Then, in the intermediary process of the main process (see FIG. 10) that is executed separately, “Sato” is identified as the person in charge from this final result, and the user terminal 20 used by Mr. Sato is notified. .

一方、ステップＳ２７６において、汎用辞書の認識結果に含まれる担当者名である担当者名１と、分割辞書の認識結果に含まれる担当者名である担当者名２が異なると判断された場合には（Ｓ２７６：ＮＯ）、汎用辞書の認識結果および分割辞書の認識結果の両方が、最終結果として決定される（Ｓ２７８）。これは、前述したように、汎用辞書の認識結果に含まれる担当者名の信頼性は高いため、分割辞書の認識結果に含まれる担当者名がこれと異なる場合、分割辞書の認識結果の方が汎用辞書の認識結果よりも信頼性が高いとは限らない。よって、両方を最終結果として採用することにより、どちらが正しい場合にも対応可能とするためである。 On the other hand, when it is determined in step S276 that the person-in-charge name 1 as the person-in-charge name included in the recognition result of the general dictionary and the person-in-charge name 2 as the person-in-charge name included in the recognition result of the divided dictionary are different. (S276: NO), both the recognition result of the general dictionary and the recognition result of the divided dictionary are determined as final results (S278). As mentioned above, this is because the person name included in the recognition result of the general dictionary is highly reliable, so if the person name included in the recognition result of the divided dictionary is different from this, the recognition result of the divided dictionary is better. However, the reliability is not necessarily higher than the recognition result of the general-purpose dictionary. Therefore, by adopting both as the final result, it is possible to cope with whichever is correct.

例えば、来訪者の発話が音声認識された結果（図１１、Ｓ２５）、図１７に示す結果がＲＡＭ１０３に記憶されているとする。つまり、汎用辞書の結果である識別番号ゼロ（０）に対応する結果として、「Ｋ工業の中道ですが、加藤さんお願いします」が記憶されている。また、識別番号２に対応する分割辞書の結果は、「Ａ工業の永光ですが、佐藤さんお願いします」という認識結果（テキスト）である。その他の分割辞書の結果は、すべて認識失敗であるとする。 For example, it is assumed that the result of speech recognition of the visitor's utterance (FIG. 11, S25) and the result shown in FIG. That is, as a result corresponding to the identification number zero (0) which is the result of the general dictionary, “K Kogyo Nakamichi, please Kato-san please” is stored. Further, the result of the division dictionary corresponding to the identification number 2 is a recognition result (text) “I am Emitsu of A Industry, but please Mr. Sato”. It is assumed that all other divided dictionary results are recognition failures.

この例の場合、汎用辞書の認識結果があり（図１２、Ｓ２６１：ＹＥＳ）、さらに、分割辞書の認識結果が１つだけであると判断される（Ｓ２７１：ＹＥＳ、図１３のＳ２７３：ＮＯ）。そして、汎用辞書の認識結果に含まれる担当者名である「加藤」が担当者名１として特定され（Ｓ２７４）、分割辞書の認識結果に含まれる担当者名である「佐藤」が担当者名２として特定される（Ｓ２７５）。担当者名１と担当者名２は一致しないため（Ｓ２７６：ＮＯ）、汎用辞書の認識結果である「Ｋ工業の中道ですが、加藤さんお願いします」と、識別番号２に対応する分割辞書の認識結果である「Ａ工業の永光ですが、佐藤さんお願いします」とが、いずれも最終結果として決定される（Ｓ２７７）。そして、別途実行されるメイン処理の取り次ぎ処理（図１０参照）において、これら２種類の最終結果から、担当者名として、「加藤」および「佐藤」が特定され、これら２名の使用するユーザ端末２０に対して通知が行われる。 In this example, there is a recognition result of the general dictionary (FIG. 12, S261: YES), and it is further determined that there is only one recognition result of the divided dictionary (S271: YES, S273: NO in FIG. 13). . Then, “Kato” that is the name of the person in charge included in the recognition result of the general dictionary is specified as the person in charge name 1 (S274), and “Sato” that is the name of the person in charge included in the recognition result of the divided dictionary is 2 (S275). Since the person in charge name 1 and the person in charge name 2 do not match (S276: NO), the division result corresponding to the identification number 2 is “K Kogyo Nakamichi, please Kato-san,” which is the recognition result of the general-purpose dictionary. The dictionary recognition result, "I am Emitsu of A Industry, but please Sato-san", is determined as the final result (S277). Then, in the intermediary process of the main process (see FIG. 10) that is separately executed, “Kato” and “Sato” are specified as the person in charge from these two types of final results, and the user terminals used by these two persons 20 is notified.

以上に説明したように、本実施形態の受付システム１では、受付装置１０で来訪者が検知されると、来訪者名と担当者名を質問する音声が出力される。質問に対して来訪者が応答すると、その音声が汎用辞書および分割辞書を参照して音声認識される。汎用辞書を参照して音声認識が行われた場合の認識結果と、分割辞書を参照して音声認識が行われた場合の認識結果に基づいて、最終的な認識結果が決定される。最終的な認識結果に含まれる担当者名が特定され、その連絡先に対して、来訪者の到着が通知される。 As described above, in the reception system 1 according to the present embodiment, when a visitor is detected by the reception device 10, a voice for asking a visitor name and a person in charge name is output. When the visitor responds to the question, the voice is recognized by referring to the general dictionary and the divided dictionary. The final recognition result is determined based on the recognition result when speech recognition is performed with reference to the general dictionary and the recognition result when speech recognition is performed with reference to the divided dictionary. The name of the person in charge included in the final recognition result is specified, and the arrival of the visitor is notified to the contact information.

汎用辞書の認識結果は、全体としての信頼性は高いとはいえないが、属する単語の数が所定量以下である基準カテゴリの単語である担当者名の信頼性はある程度高いと考えられる。一方、分割辞書は、担当者毎に汎用辞書の一部のみが抽出されたものであるから、含まれる単語数が絞り込まれている。よって、発話内容が分割辞書に含まれる担当者名に対応していれば、分割辞書の認識結果は、全体としての信頼性が高いと考えられる。したがって、本実施形態のように、汎用辞書の認識結果と分割辞書の認識結果との両方に基づいて来訪者の発話の認識結果を決定することにより、最終的に精度の高い認識結果を得ることができる。 Although the recognition result of the general-purpose dictionary cannot be said to be highly reliable as a whole, it is considered that the person-in-charge name, which is a word in the reference category having the number of words belonging to a predetermined amount or less, has a certain degree of reliability. On the other hand, since the divided dictionary is obtained by extracting only a part of the general-purpose dictionary for each person in charge, the number of included words is narrowed down. Therefore, if the utterance content corresponds to the name of the person in charge included in the divided dictionary, the recognition result of the divided dictionary is considered to be highly reliable as a whole. Therefore, as in the present embodiment, the recognition result of the visitor's utterance is determined based on both the recognition result of the general dictionary and the recognition result of the divided dictionary, thereby finally obtaining a highly accurate recognition result. Can do.

本実施形態では、図１１のステップＳ２５でマイク１０７から入力された音声を取得するＣＰＵ１０１が、本発明の「音声情報取得手段」に相当する。ステップＳ２５で汎用辞書および分割辞書を用いて音声認識を行うＣＰＵ１０１が、「第１の認識手段」および「第２の認識手段」に相当する。図１２のステップＳ２６３、Ｓ２６４、Ｓ２７２、図１３のステップＳ２７７、Ｓ２７８、およびＳ２８０で最終結果を決定するＣＰＵ１０１が、「認識結果決定手段」に相当する。 In the present embodiment, the CPU 101 that acquires the voice input from the microphone 107 in step S25 of FIG. 11 corresponds to the “voice information acquisition unit” of the present invention. The CPU 101 that performs speech recognition using the general-purpose dictionary and the divided dictionary in step S25 corresponds to “first recognition means” and “second recognition means”. The CPU 101 that determines the final result in steps S263, S264, and S272 in FIG. 12 and steps S277, S278, and S280 in FIG. 13 corresponds to “recognition result determining means”.

図１０のステップＳ４７で、来訪者に再度の応答発話を促す音声をスピーカ１０８に出力させるＣＰＵ１０１が、「再発話指示手段」に相当する。図１０のステップＳ４２で来訪者名および担当者名に関する質問をスピーカ１０８に出力させるＣＰＵ１０１０が、「質問提示手段」に相当する。ステップＳ５３で担当者の通知先を特定するＣＰＵ１０１が、「通知先特定手段」に相当し、ステップＳ５４で通知処理を行うＣＰＵ１０１が、「通知手段」に相当する。 In step S47 in FIG. 10, the CPU 101 that causes the speaker 108 to output a voice prompting the visitor to respond again is equivalent to “recurrent speech instruction means”. The CPU 1010 that causes the speaker 108 to output a question regarding the visitor name and the person in charge in step S42 in FIG. 10 corresponds to “question presenting means”. The CPU 101 that specifies the notification destination of the person in charge in step S53 corresponds to “notification destination specifying means”, and the CPU 101 that performs notification processing in step S54 corresponds to “notification means”.

なお、前述の実施形態に示される構成や処理は例示であり、各種の変形が可能なことはいうまでもない。例えば、前述の実施形態に係る認識結果決定処理（図１２および図１３）では、汎用辞書の認識結果と分割辞書の認識結果とが得られたが、それぞれに含まれる担当者名が一致しない場合（図１３のＳ２７６：ＮＯ）、最終結果として、それぞれに含まれる２つの担当者名を最終結果として採用している。しかしながら、この場合、汎用辞書の認識結果において、担当者名は信頼性が高いと考えられるにもかかわらず、分割辞書の認識結果の担当者名がそれと異なる場合であるから、来訪者に再度発話を促して、音声認識をやり直してもよい。 It should be noted that the configuration and processing shown in the above-described embodiment are examples, and it goes without saying that various modifications are possible. For example, in the recognition result determination process (FIGS. 12 and 13) according to the above-described embodiment, the recognition result of the general dictionary and the recognition result of the divided dictionary are obtained, but the person-in-charge names included in the recognition results do not match. (S276 in FIG. 13: NO) As the final results, the names of the two persons in charge included in each are adopted as the final results. However, in this case, the name of the person in charge is considered to be highly reliable in the recognition result of the general dictionary, but the person in charge of the recognition result of the divided dictionary is different from that. , And voice recognition may be redone.

具体的には、図１３のステップＳ２７８において、最終結果を認識失敗として決定すればよい。その結果、図１０に示す取り次ぎ処理で、最終結果が認識失敗であるとして（Ｓ４５：ＹＥＳ）、再発話が促されることになる（Ｓ４７）。これにより、信頼性が不明な最終結果ではなく、正しい最終結果を得られる可能性が高くなる。 Specifically, the final result may be determined as a recognition failure in step S278 in FIG. As a result, in the intermediary process shown in FIG. 10, assuming that the final result is a recognition failure (S45: YES), a recurrent talk is prompted (S47). This increases the possibility of obtaining a correct final result rather than a final result whose reliability is unknown.

また、認識結果決定処理において、汎用辞書の認識結果と複数の分割辞書の認識結果とが得られた場合（図１２のＳ２６１：ＹＥＳ、Ｓ２７１：ＹＥＳ、図１３のＳ２７３：ＹＥＳ）、最終結果として、汎用辞書の認識結果に含まれる担当者名１と一致する担当者名を含む分割辞書の認識結果が採用されている（Ｓ２７９〜Ｓ２８０）。分割辞書の認識結果が複数ある場合、通常は、そこに含まれる担当者名のいずれかは、汎用辞書の認識結果に含まれる担当者名１と一致すると考えられる。しかしながら、複数の分割辞書の認識結果に含まれる担当者名が、すべて担当者名１と異なる場合には、図１３のステップＳ２７８と同様、汎用辞書の認識結果および複数の分割辞書の認識結果のすべてを最終結果としてもよいし、来訪者に再発話を促してもよい。 In the recognition result determination process, when the recognition result of the general-purpose dictionary and the recognition results of a plurality of divided dictionaries are obtained (S261: YES, S271: YES in FIG. 12, S273: YES in FIG. 13), the final result The recognition result of the divided dictionary including the person-in-charge name that matches the person-in-charge name 1 included in the recognition result of the general-purpose dictionary is adopted (S279 to S280). When there are a plurality of recognition results of the divided dictionary, it is usually considered that one of the person-in-charge names included therein matches the person-in-charge name 1 included in the recognition result of the general dictionary. However, if the person-in-charge names included in the recognition results of the plurality of division dictionaries are all different from the person-in-charge name 1, the recognition results of the general dictionary and the recognition results of the plurality of division dictionaries are the same as in step S278 of FIG. All may be the final result, or the visitor may be encouraged to speak again.

また、認識結果決定処理（図１２および図１３）において、複数の最終結果が決定された場合には、複数の担当者名が特定され、そのすべての担当者の通知先に対して通知処理が行われている。しかしながら、必ずしも特定された複数の担当者すべてに対して通知を行う必要はなく、複数の最終結果が得られた場合には、例えば、予め定められた代表担当者に通知することも可能である。または、別途、来訪者の予約を応対担当者と対応付けて記憶する来訪者予約データベースがある場合には、特定された複数の担当者名のうち、予約データベースで、最終結果に含まれる来訪者名に対応して記憶されている担当者名を特定し、その担当者に通知してもよい。 Further, in the recognition result determination process (FIGS. 12 and 13), when a plurality of final results are determined, a plurality of person names are specified, and a notification process is performed for the notification destinations of all the persons in charge. Has been done. However, it is not always necessary to notify all of the plurality of specified persons in charge, and when a plurality of final results are obtained, for example, it is possible to notify a predetermined representative person in charge. . Alternatively, if there is a visitor reservation database that stores visitor reservations in association with the person in charge, the visitor included in the final result in the reservation database among the specified names of the persons in charge The name of the person in charge stored corresponding to the name may be specified, and the person in charge may be notified.

前述の実施形態では、汎用辞書１５３１（図５参照）および分割辞書１５４１（図６参照）を含む複数の分割辞書は、いずれも非重要カテゴリの単語すべてを含んでいる。しかしながら、非重要カテゴリに属する単語は、必ずしも汎用辞書や分割辞書に含める必要はなく、非重要カテゴリの単語のみを含む別の辞書を作成しておき、音声認識の際に汎用辞書または分割辞書とともに参照されるようにしてもよい。 In the above-described embodiment, each of the plurality of divided dictionaries including the general-purpose dictionary 1531 (see FIG. 5) and the divided dictionary 1541 (see FIG. 6) includes all the words of the non-important category. However, the words belonging to the non-important category do not necessarily need to be included in the general-purpose dictionary or the divided dictionary, and another dictionary including only the words of the non-important category is created and used together with the general-purpose dictionary or the divided dictionary for speech recognition. You may make it refer.

前述の実施形態では、受付装置１０はマイク１０７、スピーカ１０８および人感センサ１０９を備え、来訪者の検知、音声の入出力、および音声認識等の処理をすべて同一の装置で行っている。しかしながら、前述の実施形態の受付装置１０の構成の一部を別の装置とすることも可能である。例えば、マイク、スピーカおよび人感センサを備え、受付装置１０に接続されたインターホン型の受付端末を会社５の入り口近辺に設置し、受付装置１０は会社５内の別の場所に設置してもよい。そして、受付端末で来訪者の検知や音声の入力を行い、その情報を受付装置１０に送信してもよい。また、受付装置１０から受付端末に音声データを送信し、受付端末のスピーカから音声を出力すればよい。この場合、受付装置１０はマイク１０７、スピーカ１０８および人感センサ１０９を備えている必要はない。 In the above-described embodiment, the reception apparatus 10 includes the microphone 107, the speaker 108, and the human sensor 109, and all processes such as visitor detection, voice input / output, and voice recognition are performed by the same apparatus. However, a part of the configuration of the receiving apparatus 10 according to the above-described embodiment may be another apparatus. For example, even if an interphone type reception terminal that includes a microphone, a speaker, and a human sensor and is connected to the reception device 10 is installed near the entrance of the company 5, the reception device 10 may be installed at another location in the company 5. Good. Then, the reception terminal may detect a visitor or input a voice, and the information may be transmitted to the reception device 10. Moreover, what is necessary is just to transmit audio | voice data from the reception apparatus 10 to a reception terminal, and to output a sound from the speaker of a reception terminal. In this case, the reception apparatus 10 does not need to include the microphone 107, the speaker 108, and the human sensor 109.

また、前述の実施形態では、受付装置１０にＨＤＤ１５０が設けられているが、ＨＤＤ１５０に記憶されている情報（汎用辞書１５３１、分割辞書１５４１、社員ＤＢ１５５１等）は、例えば、ＬＡＮ９を介して受付装置１０に接続可能な別個の記憶装置に記憶させておき、処理中に、必要な情報を読み出す構成としてもよい。 In the above-described embodiment, the receiving device 10 is provided with the HDD 150, but information (general dictionary 1531, division dictionary 1541, employee DB 1551, etc.) stored in the HDD 150 is received via the LAN 9, for example. 10 may be stored in a separate storage device that can be connected to the computer 10 and necessary information may be read out during processing.

前述の実施形態では、受付装置１０から来訪者に対する提示される質問や再発話の指示は、スピーカ１０８から音声を出力することにより行われている。しかしながら、質問や再発話の指示は、ディスプレイ１０６に表示させることも可能である。また、音声出力と表示を同時に行ってもよい。 In the above-described embodiment, the question or recurrent speech instruction presented to the visitor from the reception device 10 is performed by outputting sound from the speaker 108. However, it is possible to display a question or an instruction for a recurrent story on the display 106. Also, audio output and display may be performed simultaneously.

前述の実施形態は、会社やビルの受付システム１において、受付装置１０が来訪者の音声認識を行い、認識結果に応じて担当者の使用するユーザ端末２０に通知をする例であるが、本発明は、その他の装置に適用することも可能である。例えば、鉄道の駅に設置され、音声認識によって利用者の発話を認識し、沿線の施設案内を行う案内装置に適用してもよい。 The above embodiment is an example in which the reception device 10 performs voice recognition of a visitor in the company or building reception system 1 and notifies the user terminal 20 used by the person in charge according to the recognition result. The invention can also be applied to other devices. For example, the present invention may be applied to a guidance device that is installed at a railway station, recognizes a user's utterance by voice recognition, and provides facility guidance along the line.

この場合、案内装置は、例えば「どの駅近辺の施設ですか」や「どの施設の情報ですか」
という質問を発し、利用者は駅名や施設名を含む応答を発話すると想定される。よって、汎用辞書は、質問に対する応答に含まれるこれらの単語を網羅する必要がある。ある路線全体の沿線施設は膨大な数となるため、汎用辞書のみで利用者の音声認識を行えば、認識が不正確になる虞がある。一方、ある路線の駅名の数は多くて数十程度に限定されるため、施設名の数に比べれば著しく少ない。そこで、駅名を基準カテゴリとして駅名を分類し、駅名に関連する他のカテゴリ（例えば、施設名、乗換え路線等）に属する単語を、駅名の分類毎に抽出して分割辞書を作成することができる。これにより、各分割辞書に含まれる施設名等の単語数を大幅に少なくすることができる。 In this case, the guidance device may be, for example, “Which station is nearby?” Or “Which facility information”
It is assumed that the user utters a response including the station name and facility name. Therefore, the general dictionary needs to cover these words included in the response to the question. Since there are an enormous number of facilities along the entire route, recognition may be inaccurate if user speech recognition is performed using only a general dictionary. On the other hand, the number of station names on a certain line is limited to at most about several tens, which is significantly smaller than the number of facility names. Therefore, it is possible to classify the station name using the station name as a reference category and extract words belonging to other categories related to the station name (for example, facility names, transfer routes, etc.) for each station name classification to create a divided dictionary. . Thereby, the number of words, such as a facility name contained in each division | segmentation dictionary, can be reduced significantly.

そして、前述の実施形態と同様、すべての駅名や施設名等を含む汎用辞書と、駅名を基準カテゴリとして作成された分割辞書とを用いて、案内装置から、駅名が応答に含まれるような所定の質問を発し、それに対する利用者の応答の音声認識を行えばよい。これにより、汎用辞書のみを用いて音声認識を行う場合に比べ、精度の高い認識結果を得ることができる。 And like the above-mentioned embodiment, using a general-purpose dictionary including all station names, facility names, etc., and a divided dictionary created using station names as reference categories, a predetermined name such that the station name is included in the response from the guidance device. May be issued, and voice recognition of the user's response to it may be performed. Thereby, compared with the case where speech recognition is performed using only a general-purpose dictionary, a highly accurate recognition result can be obtained.

さらに、前述の実施形態の社員ＤＢ１５５１（図７参照）に代えて、各施設に対応する通知先を、施設名と対応付けて記憶する通知先データベースを予め用意しておくことができる。そして、認識結果に含まれる施設名に基づいて、対応する通知先に通知処理を行うことができる。例えば、案内装置を電話回線やその他のネットワークを介して各施設に接続可能としておき、認識結果から特定される施設名に対応する連絡先を基に、施設の代表電話や施設のコンピュータに接続する処理を行うことができる。 Furthermore, instead of the employee DB 1551 (see FIG. 7) of the above-described embodiment, a notification destination database for storing a notification destination corresponding to each facility in association with the facility name can be prepared in advance. Then, based on the facility name included in the recognition result, notification processing can be performed for the corresponding notification destination. For example, the guidance device can be connected to each facility via a telephone line or other network, and connected to the facility's representative phone or facility computer based on the contact address corresponding to the facility name specified from the recognition result. Processing can be performed.

受付システム１の概略構成を示すシステム構成図である。1 is a system configuration diagram illustrating a schematic configuration of a reception system 1. FIG. 受付装置１０の電気的構成を示すブロック図である。3 is a block diagram showing an electrical configuration of the receiving device 10. FIG. ハードディスク装置１５が備える記憶エリアの説明図である。3 is an explanatory diagram of a storage area provided in the hard disk device 15. FIG. 言語モデル１５２１の説明図である。FIG. 10 is an explanatory diagram of a language model 1521. 汎用辞書１５３１の説明図である。It is explanatory drawing of the general purpose dictionary 1531. FIG. 分割辞書１５４１の説明図である。It is explanatory drawing of the division | segmentation dictionary 1541. FIG. 社員データベース１５５１の説明図である。It is explanatory drawing of the employee database 1551. FIG. ユーザ端末２０の電気的構成を示すブロック図である。3 is a block diagram showing an electrical configuration of a user terminal 20. FIG. 受付装置１０のメイン処理のフローチャートである。4 is a flowchart of main processing of the receiving device 10. メイン処理で実行される取り次ぎ処理のフローチャートである。It is a flowchart of the agency process performed by the main process. 音声認識処理のフローチャートである。It is a flowchart of a speech recognition process. 音声認識処理で実行される認識結果決定処理のフローチャートである。It is a flowchart of the recognition result determination process performed by voice recognition processing. 音声認識処理で実行される認識結果決定処理のフローチャートであって、図１２の続きである。FIG. 13 is a flowchart of a recognition result determination process executed in the voice recognition process, which is a continuation of FIG. 12. 音声認識によって得られた認識結果の具体例を示す説明図である。It is explanatory drawing which shows the specific example of the recognition result obtained by voice recognition. 音声認識によって得られた認識結果の具体例を示す説明図である。It is explanatory drawing which shows the specific example of the recognition result obtained by voice recognition. 音声認識によって得られた認識結果の具体例を示す説明図である。It is explanatory drawing which shows the specific example of the recognition result obtained by voice recognition. 音声認識によって得られた認識結果の具体例を示す説明図である。It is explanatory drawing which shows the specific example of the recognition result obtained by voice recognition.

Explanation of symbols

１０受付装置
１０１ＣＰＵ
１０６ディスプレイ
１０７マイク
１０８スピーカ
１５０ＨＤＤ
１５３汎用辞書記憶エリア
１５４分割辞書記憶エリア
１５５社員データベース記憶エリア
２０ユーザ端末 10 receiving apparatus 101 CPU
106 Display 107 Microphone 108 Speaker 150 HDD
153 General dictionary storage area 154 Divided dictionary storage area 155 Employee database storage area 20 User terminal

Claims

A speech recognition device for recognizing speech using a word dictionary,
Voice information acquisition means for acquiring voice information of the voice input from voice input means for receiving input of a speaker's voice;
For each of a plurality of categories, using the general dictionary stored in a general dictionary storage means for storing a general dictionary that is a dictionary indicating correspondence between a plurality of words and information related to pronunciation of the plurality of words, the audio information First recognition means for recognizing the voice based on the voice information acquired by the acquisition means;
Of the plurality of categories, classify a plurality of words belonging to a reference category that is a category having a predetermined number of words or less, and from the general dictionary, the plurality of words related to each classification and the plurality of words Using the divided dictionary stored in the divided dictionary storage means for storing a plurality of divided dictionaries created by extracting a part of the correspondence with pronunciation information, the voice information acquisition means acquires the Second recognition means for recognizing the voice based on voice information;
A result determining means for determining a recognition result of the voice based on a first recognition result that is a recognition result of the first recognition means and a second recognition result that is a recognition result of the second recognition means; A speech recognition apparatus comprising the voice recognition device.

The result determination means is included in the first recognition result when the first recognition result by the first recognition means and the second recognition result by the second recognition means are obtained. The speech recognition apparatus according to claim 1, wherein the second recognition result including the same word as a word belonging to a reference category is determined as the recognition result.

When there is no second recognition result including the same word as the word belonging to the reference category included in the first recognition result, the result determination means includes the first recognition result and the second recognition result. The speech recognition apparatus according to claim 2, wherein the recognition result is determined as the recognition result.

The result determination means determines the first recognition result by the first recognition means as the recognition result when the second recognition result by the second recognition means is not obtained. The speech recognition apparatus according to claim 1.

The result determination means determines the second recognition result by the second recognition means as the recognition result when the first recognition result by the first recognition means is not obtained. The speech recognition apparatus according to claim 1.

The speech recognition apparatus according to claim 1, wherein the plurality of divided dictionaries include different words belonging to the reference category.

If there is no second recognition result including the same word as the word belonging to the reference category included in the first recognition result, information for instructing the speaker to speak again is given to the speaker. The speech recognition apparatus according to any one of claims 2 to 6, further comprising a recurrence instruction means for outputting information to an information output means for outputting information.

A question presenting means for outputting a predetermined question for the speaker to an information output means for outputting information for the speaker;
The speech recognition apparatus according to claim 1, wherein the reference category is a category related to a response to the predetermined question presented by the question presenting unit.

A question presenting means for causing the information output means to output a predetermined question for the speaker;
The speech recognition apparatus according to claim 7, wherein the reference category is a category related to a response to the predetermined question presented by the question presenting unit.

With reference to a notification destination storage unit that stores a notification destination corresponding to each of the plurality of words belonging to the reference category, a word belonging to the reference category included in the recognition result determined by the recognition result determination unit A notification destination specifying means for specifying the corresponding notification destination;
10. The voice according to claim 8, further comprising a notification unit that performs a notification process based on the recognition result for the notification destination specified by the notification destination specifying unit. Recognition device.

A speech recognition program for causing a computer to function as various processing means of the speech recognition apparatus according to claim 1.