JP4851081B2

JP4851081B2 - Information retrieval device

Info

Publication number: JP4851081B2
Application number: JP2004360039A
Authority: JP
Inventors: 俊一豊島; 淳一鷹見; 博細田; 静夫越川; 雅幸山川; 一郎首藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2004-12-13
Filing date: 2004-12-13
Publication date: 2012-01-11
Anticipated expiration: 2024-12-13
Also published as: JP2006171876A

Description

本発明は、例えば機器メーカーのコールセンター（電話受付）において、自社社員等の人物データの検索を姓名によって行う場合などに使用して好適な情報検索装置に関する。詳しくは、例えばオペレーターが呼び上げる姓名を音声認識して、自動的に人物データの検索を行う場合に、検索の際の表示が良好に行われるようにしたものである。 The present invention relates to an information search apparatus suitable for use in, for example, searching for personal data of company employees or the like by first and last name in a call center (telephone reception) of a device manufacturer. Specifically, for example, when a person's first and last names called up by an operator are recognized by voice and the person data is automatically searched, the display at the time of the search is performed satisfactorily.

従来の音声認識方法を用いた情報検索装置では、例えば会話の中のキーワードとなる語を検索し、その語に関連したデータを表示することが行われている（例えば、特許文献１参照。）。 In an information search apparatus using a conventional speech recognition method, for example, a word that is a keyword in a conversation is searched and data related to the word is displayed (for example, refer to Patent Document 1). .

すなわち、特許文献１の発明においては、オペレーターの発声の中の所定のキーワードの有無を判定して、そのキーワードが検出されたときに情報検索を行うものである。
特開平８−２４８９８７号公報 That is, in the invention of Patent Document 1, the presence or absence of a predetermined keyword in the utterance of the operator is determined, and information retrieval is performed when the keyword is detected.
JP-A-8-248987

例えば機器メーカーのコールセンター（電話受付）においては、顧客からの問合せ等の受付を行うのと共に、自社の営業社員等からの問合せを受け付ける場合がある。このような自社社員等の問合せに対しては、一般の顧客からの問合せより詳細な情報の提供が行われることになるが、その情報には、一般顧客に伝えるべきでない情報が含まれている可能性がある。そこで、問合せ者が自社社員等であるか否かの確認を行う必要が生じる。 For example, in a call center (telephone reception) of an equipment manufacturer, there are cases where an inquiry from a customer is received and an inquiry from an in-house sales employee is received. Such inquiries from employees will be provided with more detailed information than inquiries from general customers, but the information includes information that should not be communicated to general customers. there is a possibility. Therefore, it is necessary to confirm whether or not the inquirer is a company employee.

その場合に従来の手段としては、例えばキーボードを用いて相手が名乗った姓名を入力し、その入力により社員の人物データ等を記憶したデータベースを検索する方法が採られている。しかしながらこのような手段では、例えば電話の応対を続けたままでこのようなキーボードの操作を行うことは容易ではなく、これを行うためには熟練を要する。また、聞き間違いなどにより、円滑な検索が行えない事態も発生する。 In this case, as a conventional means, for example, a method of inputting a surname given by the other party using a keyboard and searching a database storing employee personal data by the input is adopted. However, with such a means, it is not easy to operate such a keyboard while continuing to answer a telephone, for example, and skill is required to do this. In addition, a situation in which a smooth search cannot be performed due to a mistake in hearing occurs.

これに対して、上記の特許文献１では、例えば機器の名称や性能、型式番号などをキーワードとして、それらのキーワードが発声されたときに、音声認識によってその機器の情報が検索されるようになされている。これによれば、キーボード操作等の熟練が不要であると共に、オペレーターが復唱することによって発声されるキーワードの正確が期され、良好な検索を行うことができるものである。 On the other hand, in the above-mentioned Patent Document 1, for example, when the keyword is uttered using the name, performance, model number, etc. of the device as a keyword, information on the device is retrieved by voice recognition. ing. According to this, skill such as keyboard operation or the like is not necessary, and keywords that are uttered when the operator repeats are expected to be accurately searched.

しかしながら、上記の特許文献１の発明は、予め特定されたキーワードを検出して検索を行っているものであり、機器の名称や性能、型式番号などのキーワードが特定されるものの検索には有効であるが、例えば人物データの検索を姓名の音声認識によって行う場合には、同姓同名の存在や「加藤」と「加納」のような発音の似た名前の多いことなどで、音声認識のみでの判断は極めて困難であることが判明した。 However, the above-described invention of Patent Document 1 detects a keyword specified in advance and performs a search, and is effective for searching for a keyword in which a device name, performance, model number, and the like are specified. However, for example, when searching for person data by voice recognition of first and last names, there are many names with similar names and similar names such as “Kato” and “Kano”. Judgment proved extremely difficult.

また、例えば音声認識による人物データの検索をデータベースから行う場合に、本来はデータベースに登録された姓名の読み情報を音声認識の対象とするが、往々にして姓名の読みが登録されていない場合がある。その場合に、例えばローマ字の登録が必須となっている場合には、そのローマ字から読みを推測することになるが、ローマ字の表記には様々な揺らぎがあり、常に正しい読み情報に変換できるとは限らないものであった。 For example, when searching for person data by voice recognition from a database, the reading information of first and last names originally registered in the database is the target of voice recognition, but often the reading of first and last names is not registered. is there. In that case, for example, if registration of Roman letters is essential, the reading will be guessed from the Roman letters, but there are various fluctuations in the Roman letter notation, and it can always be converted into correct reading information It was not limited.

すなわち、例えば「佐藤」には、"sato"、"satou"、"satoh"のような表記が存在し、これらについて、ローマ字を素直に読めば、"sato"と記述されたものに関しては「さと」となり、残りの２つは、「さとう」または「さとー」と変換される。また「浩一」は、"koichi"、"kouichi"、"kohichi"などがあり得る。この場合に、"koichi"は「こいち」、"kohichi"は「こひち」と変換される。さらに「じゅんいち」の場合は、通常"junichi"と記述されるが、これを読みに変換すると「じゅにち」となってしまう。 That is, for example, “Sato” has notations such as “sato”, “satou”, and “satoh”. If you read the Roman letters straight, The remaining two are converted to “Sato” or “Sato”. “Koichi” can be “koichi”, “kouichi”, “kohichi”, and the like. In this case, “koichi” is converted to “koichi” and “kohichi” is converted to “kohichi”. Furthermore, in the case of “Junichi”, it is usually described as “junichi”, but if this is converted into a reading, it becomes “Junichi”.

このような例は挙げればきりがなく、結局のところ、これは漢字の情報を使用せずにローマ字だけから機械的に読み情報を作成する場合の本質的な問題であると言うことができる。この発明はこのような点に鑑みて成されたものであって、本発明の目的は、音声認識によって例えば人物データの検索を行う場合に使用して、極めて良好な情報検索装置を提供するものである。 Such an example can only be mentioned, and in the end, it can be said that this is an essential problem when reading information is created mechanically only from Roman characters without using Chinese character information. The present invention has been made in view of the above points, and an object of the present invention is to provide an extremely good information retrieval apparatus used when, for example, human data is retrieved by voice recognition. It is.

上述した課題を解決し、本発明の目的を達成するため、請求項１に記載された発明は、それぞれが、名称、当該名称のふりがな及び当該名称のローマ字表記のうちの少なくとも１つと、所定の情報とを対応させた複数の検索対象を有するデータベースと、前記複数の検索対象の各々について、当該検索対象に前記ふりがなが含まれていない場合は、当該検索対象に含まれる前記ローマ字表記から、前記ふりがなを推測して作成する読み情報推測手段と、前記複数の検索対象の各々に含まれる前記ふりがなと、入力されたユーザーの音声との一致度に基づき、その入力された音声によって認識される前記ふりがなの候補として、複数の前記ふりがなを特定する音声認識手段と、前記音声認識手段で特定された複数の前記ふりがなの各々について、当該ふりがなを含む少なくとも１つの前記検索対象の中に、前記読み情報推測手段によって作成された前記ふりがなを含む前記検索対象が存在する場合は、当該ふりがなを含む少なくとも１つの前記検索対象の中には、前記読み情報推測手段によって作成された前記ふりがなを含む前記検索対象が存在する旨を表示する認識結果表示手段と、前記音声認識手段で特定された複数の前記ふりがなの中から選択された１つの前記ふりがなを含む前記検索対象を前記データベースから抽出し、当該検索対象に含まれる前記所定の情報を表示する情報表示手段と、を備えることを特徴とする情報検索装置である。 In order to solve the above-described problems and achieve the object of the present invention, each of the inventions described in claim 1 includes at least one of a name, a furigana of the name, and a romanization of the name, and a predetermined For each of the plurality of search targets and a database having a plurality of search targets corresponding to information, and the search target does not include the furigana, from the Roman alphabet notation included in the search target, Reading information guessing means that guesses and creates a furigana, the furigana included in each of the plurality of search objects, and the degree of coincidence between the input user's voice and the recognized voice Speech recognition means for specifying a plurality of phonetics as candidates for phonetics, and each of the plurality of phonetics specified by the speech recognition means In the case where the search object including the furigana created by the reading information estimation means exists in at least one search object including the furigana, the at least one search object including the furigana includes A recognition result display means for displaying that there is the search object including the furigana created by the reading information estimating means, and one selected from the plurality of furiganas specified by the voice recognition means. An information search apparatus comprising: information display means for extracting the search target including the furigana from the database and displaying the predetermined information included in the search target.

請求項２に記載の本発明の情報検索装置においては、前記認識結果表示手段は、前記音声認識手段で特定された複数の前記ふりがなの各々について、当該ふりがなで識別される前記名称を含む少なくとも１つの前記検索対象の全てが、前記読み情報推測手段によって作成された前記ふりがなを含む場合と、当該ふりがなで識別される前記名称を含む少なくとも１つの前記検索対象のうちの一部のみが、前記読み情報推測手段によって作成された前記ふりがなを含む場合とで、表示方法を異ならせることを特徴とするものである。 In the information search apparatus of the present invention according to claim 2, the recognition result display means includes at least one name that is identified by the phonetic for each of the plurality of phonetics specified by the voice recognition means. When all of the two search targets include the phonetic created by the reading information estimation means, and only a part of the at least one search target including the name identified by the phonetic is the reading The display method is different depending on the case of including the furigana created by the information estimation means .

請求項３に記載の本発明の情報検索装置においては、前記認識結果表示手段は、前記音声認識手段で特定された前記ふりがなで識別される前記名称を含む少なくとも１つの前記検索対象の中に、前記読み情報推測手段によって作成された前記ふりがなを含む前記検索対象が存在する旨を表示するために、その特定された前記ふりがなを表す文字列の前や後に特定の記号やアイコンを付随させて表示することを特徴とするものである。 In the information search device of the present invention according to claim 3, the recognition result display means includes at least one of the search objects including the name identified by the phonetic specified by the voice recognition means. In order to display that the search target including the phonetic name created by the reading information estimation means is present, a display is performed with a specific symbol or icon attached before or after the character string representing the specified phonetic name. It is characterized by doing.

請求項４に記載の本発明の情報検索装置においては、前記認識結果表示手段は、前記音声認識手段で特定された前記ふりがなで識別される前記名称を含む少なくとも１つの前記検索対象の中に、前記読み情報推測手段によって作成された前記ふりがなを含む前記検索対象が存在する旨を表示するために、その特定された前記ふりがなを表す文字列のフォント形状を変えて表示することを特徴とするものである。 In the information search device of the present invention according to claim 4, the recognition result display means includes at least one of the search objects including the name identified by the phonetic specified by the voice recognition means. In order to display that the search target including the furigana created by the reading information estimation means is present, the font shape of the character string representing the identified furigana is changed and displayed. It is.

請求項５に記載の本発明の情報検索装置においては、前記認識結果表示手段は、前記音声認識手段で特定された前記ふりがなで識別される前記名称を含む少なくとも１つの前記検索対象の中に、前記読み情報推測手段によって作成された前記ふりがなを含む前記検索対象が存在する旨を表示するために、その特定された前記ふりがなを表す文字列のフォント色を変えて表示することを特徴とするものである。 In the information search device of the present invention according to claim 5, the recognition result display means includes at least one of the search objects including the name identified by the phonetic specified by the voice recognition means. In order to display that the search target including the furigana created by the reading information estimating means is present, the font color of the character string representing the identified furigana is changed and displayed. It is.

請求項６に記載の本発明の情報検索装置においては、前記認識結果表示手段は、前記音声認識手段で特定された前記ふりがなで識別される前記名称を含む少なくとも１つの前記検索対象の中に、前記読み情報推測手段によって作成された前記ふりがなを含む前記検索対象が存在する旨を表示するために、その特定された前記ふりがなを表す文字列の背景色を変えて表示することを特徴とするものである。 In the information search device of the present invention described in claim 6, the recognition result display means includes at least one of the search objects including the name identified by the phonetic specified by the voice recognition means. In order to display that the search target including the furigana created by the reading information estimation means is present, the character string representing the identified furigana is displayed with a different background color. It is.

以上述べた本発明によれば、ローマ字情報からその読み方を推測し、個々の認識結果について、それに対応するデータベース中のエントリのうちの少なくとも１つに推測された読み情報が割り当てられている場合には、その旨を認識結果と合わせて表示するようにしたことによって、ユーザーには変換誤りを含んでいる可能性があることが伝えられ、複数の認識候補中から正解を探す場合に好適な情報検索装置を提供することができる。 According to the present invention described above, the reading method is estimated from the Roman alphabet information, and the estimated reading information is assigned to at least one of the entries in the database corresponding to each recognition result. The fact that it is displayed together with the recognition result informs the user that there is a possibility that it contains a conversion error, and is suitable information when searching for a correct answer from a plurality of recognition candidates. A search device can be provided.

ところで、上述したローマ字表記の揺らぎについて、上述したような揺らぎでは、発音的には正解とかなり近いものになる場合が多く、従って、仮にローマ字だけから機械的に作成した読み情報であっても、これらを正しい読み情報合わせて認識対象語彙を定め、それに基づいて音声認識を行った場合には、「さとー」と入力すれば「さとー」だけでなく、「さと」のように発音が似ている候補も検出することができる。 By the way, with respect to the above-mentioned fluctuations in the Roman alphabet, the fluctuations as described above are often quite similar to the correct answer in pronunciation, so even if the reading information was created mechanically from only the Roman letters, If the recognition target vocabulary is determined by combining the correct reading information, and voice recognition is performed based on the vocabulary, then typing “sato” will not only pronounce “sato” but also sound like “sato”. Can also be detected.

これは、音声認識が不完全であるが故の結果なのではあるが、本発明の場合は、それが逆にメリットとなって、結果的に、いわゆる曖昧検索が実現されていることになる。すなわち、これは読み情報に多少の誤りが含まれていても、音声認識でそれらしい候補を拾ってくることが可能なことを示している。 Although this is a result because speech recognition is incomplete, in the case of the present invention, it is a merit, and so-called fuzzy search is realized as a result. In other words, this indicates that even if the reading information includes some errors, such candidates can be picked up by voice recognition.

しかしながら、例えば佐藤浩一さんに関する情報が欲しくて「さとーこーいち」と入力した場合に、認識候補の中に「さとこひち」というものが表示されたとしても、ユーザーはそれが自分の意図した候補を表しているものだということに気が付かない恐れがある。その結果、候補の中には正解がないものと判断して、もう一度入力をし直すか、さらに入力をし直しても同じ結果となることで、結局は諦めてしまう恐れがある。 However, for example, if you want information about Koichi Sato and enter “Sato Koichi”, even if “Satoko Hichi” is displayed in the recognition candidates, the user does not know There is a risk that you may not realize that it represents a candidate. As a result, it is determined that there is no correct answer among the candidates, and if the input is performed again or the input is performed again, the same result may be obtained, which may eventually give up.

本発明は、このような点に着目して為されたものであって、例えば変換誤りを含んでいる可能性があることをユーザーに伝えることで、ユーザーが認識候補中から正解を探す場面でも、それなりの配慮を行うことができるようにするものである。以下に、図面を参照して本発明を説明するに、まず図１には、本発明による情報検索装置を適用した人物データ検索システムの一実施形態の全体の構成を表すブロック図を示す。 The present invention has been made paying attention to such points. For example, by telling the user that there may be a conversion error, the user searches for a correct answer from among the recognition candidates. , So that they can take appropriate care. The present invention will be described below with reference to the drawings. First, FIG. 1 is a block diagram showing the overall configuration of an embodiment of a person data search system to which an information search apparatus according to the present invention is applied.

図１において、オペレーター（入力者）である話者Ａと、発呼者である話者Ｂとは、例えば電話機である音声入出力部１００Ａ、１００Ｂを介して電話線２００を通じて会話を行っている。一方、本発明による情報検索装置１は、いわゆるパーソナルコンピュータで構成される。そして、この情報検索装置１には、オペレーター（入力者）の音声を収音するマイクロフォン（Ｍｉｃ）２が接続される。従ってこのマイクロフォン２はオペレーター（入力者）である話者Ａの音声のみが収音される。 In FIG. 1, a talker A who is an operator (input person) and a talker B who is a caller are having a conversation through a telephone line 200 via voice input / output units 100A and 100B which are telephones, for example. . On the other hand, the information retrieval apparatus 1 according to the present invention is constituted by a so-called personal computer. The information retrieval apparatus 1 is connected to a microphone (Mic) 2 that collects the voice of the operator (input person). Accordingly, the microphone 2 picks up only the voice of the speaker A who is an operator (input person).

また、情報検索装置１には、上述のマイクロフォン２で収音された音声の波形を抽出する波形抽出手段１１と、収音された音声の認識を行う音声認識手段１２と、認識された情報に基づいてリストを検索する検索手段１３と、ディスプレイ１４と、キーボードやマウスなどの入力手段１５とが設けられる。なお、これらの波形抽出手段１１〜入力手段１５は内部バス１６を通じて互いに接続され、相互にデータのやり取りが行われると共に、制御装置（ＣＰＵ）１７によって動作の制御が行われている。 In addition, the information search apparatus 1 includes a waveform extraction unit 11 that extracts a waveform of the sound collected by the microphone 2 described above, a speech recognition unit 12 that recognizes the collected sound, and the recognized information. A search means 13 for searching the list based on the display 14, a display 14, and an input means 15 such as a keyboard and a mouse are provided. The waveform extracting means 11 to the input means 15 are connected to each other through an internal bus 16, exchange data with each other, and control the operation by a control device (CPU) 17.

さらに、情報検索装置１の検索手段１３には、例えば社内のＬＡＮ（Local Area Network）３を通じて社員データベース４が接続され、この社員データベース４にアクセスすることによって社員の人物データを得ることができる。そこで準備段階として、情報検索装置１の検索手段１３を用いて、検索の対象となる社員全員の人物データを社員データベース４で検索し、そのデータの集合が内部バス１６を通じて情報検索装置１内のハードディスク１８等に保存される。 Furthermore, an employee database 4 is connected to the search means 13 of the information search apparatus 1 through, for example, an in-house LAN (Local Area Network) 3, and by accessing the employee database 4, employee personal data can be obtained. Therefore, as a preparatory stage, the search means 13 of the information search apparatus 1 is used to search the employee database 4 for person data of all employees to be searched, and the data set is stored in the information search apparatus 1 through the internal bus 16. It is stored in the hard disk 18 or the like.

このような人物データ検索システムにおいて、さらに図２には制御装置（ＣＰＵ）１７を中心とした本発明の情報検索装置１の機能ブロックを示す。なお、以下の説明では、請求項１、請求項２、請求項３を組み合わせた場合の実施形態を示し、例えば社員の氏名＋敬称（さん）の音声入力を受け付け、その氏名の社員（同姓同名の社員が複数存在する場合には全員）の付帯情報（社員番号、所属、電話番号、Ｆａｘ番号など）をデータベースから抽出し、表示するという情報検索システムを説明する。 In such a person data search system, FIG. 2 further shows functional blocks of the information search apparatus 1 of the present invention centering on a control device (CPU) 17. In the following description, an embodiment in which claim 1, claim 2, and claim 3 are combined will be described. For example, a voice input of an employee's name + honorific name (san) is accepted, and an employee of that name (same name and same name) An information search system in which incidental information (employee number, affiliation, telephone number, fax number, etc.) of all employees is extracted from a database and displayed will be described.

この図２において、２１は社員情報のデータベースであり、例えば図３に示すようなエントリを有している。また、２２は読み情報推測手段であり、データベース中のエントリでふりがな情報が未登録のものについては、ここでローマ字情報を用いて読み情報を自動的に作成する。さらに、２３は音声認識用の文法規則であり、例えば音素の有限状態オートマトンで記述されたものである。また、２４は音声認識用の音響モデルであり、例えば不特定話者対応のＨＭＭである。 In FIG. 2, reference numeral 21 denotes an employee information database, which has an entry as shown in FIG. 3, for example. Reference numeral 22 denotes reading information estimation means, which automatically creates reading information by using Roman character information for the entry in the database for which the phonetic information is not registered. Further, 23 is a grammatical rule for speech recognition, which is described by, for example, a phoneme finite state automaton. Reference numeral 24 denotes an acoustic model for speech recognition, for example, an HMM corresponding to an unspecified speaker.

さらに２５は音声認識手段であり、入力された音声に対して認識スコアの高い順にｎ個の候補を算出することのできる音声認識装置である。そして２６は認識結果表示手段であり、音声認識手段で得られたｎ個の認識候補（ひらがな文字列）をディスプレイなどの表示装置上に表示するものである。また、２７は情報表示手段であり、認識結果中の１つの候補に対応するエントリを社員情報データベース中から抽出し、その社員の詳細な情報をディスプレイなどの表示装置上に表示するものである。 Furthermore, 25 is a voice recognition means, which is a voice recognition device capable of calculating n candidates in descending order of recognition score with respect to the input voice. A recognition result display means 26 displays n recognition candidates (Hiragana character strings) obtained by the speech recognition means on a display device such as a display. Reference numeral 27 denotes information display means for extracting an entry corresponding to one candidate in the recognition result from the employee information database and displaying the detailed information of the employee on a display device such as a display.

ところで、上述の図３に示したデータベースの例では、例えば「佐藤浩一」にはふりがなが定義されており、従って「佐藤浩一」の場合には、ローマ字の表記が"sato kouichi"であっても「さとうこういち」で検索される。これに対して「佐藤紘一」と「佐藤幸一」はふりがなが未登録である。このためローマ字表記の基づいて読み情報が作成される。ここで「佐藤紘一」はローマ字表示が"satoh kouichi"とされているので「さとうこういち」と作成されるが、「佐藤幸一」は「さとこひち」と作成されるものである。 By the way, in the example of the database shown in FIG. 3 described above, for example, “Kato Sato” has a furigana defined. Therefore, in the case of “Kouichi Sato”, even if the Roman alphabet is “sato kouichi” Searched by “Koichi Sato”. On the other hand, “Kouichi Sato” and “Kouichi Sato” are not registered for furigana. For this reason, reading information is created based on the Roman alphabet. Here, “Sato Junichi” is created as “Satoh Koichi” because the Roman display is “satoh kouichi”, while “Sato Koichi” is created as “Sato Koichi”.

また、「佐藤浩吉」にはふりがなが定義されているので、「さとうこうきち」で検索されるが、「佐藤幸吉」はふりがなが未登録であり、ここではローマ字の表記が"sato koukichi"とされているので、読み情報は「さとこうきち」と作成される。そして、これらのデータベース全員の氏名の読み情報（ふりがな）に対して、それぞれ敬称の「さん」を加えたものを認識対象語彙とし、認識結果として敬称を除く氏名の読み情報を返す音声認識が行われる。 Also, since furigana is defined in “Sato Kokichi”, it will be searched for “Sato Kokichi”, but “Sato Kokichi” has not been registered for furigana, and here the Roman letter is “sato koukichi”. Therefore, the reading information is created as “Sato Kokichi”. Then, speech recognition is performed to return the reading information of the names excluding the honorifics as the recognition vocabulary with the readings of the names (furigana) of all the databases added to each word as the recognition target vocabulary. Is called.

そこで、例えば「佐藤幸一」を検索しようとして「さとうこういちさん」と入力した場合に、第１位候補として正しい音声認識の行われた「さとうこういち」が選ばれ、以下は発音から、第２位候補として「さとこういち」が、第３位候補として「さとうこうきち」が、第４位候補として「さとこうきち」が、第５位候補として「さとこひち」がそれぞれ選ばれたとする。なおこれらの候補は、データベースに読み情報として登録されている語彙の中から選び出されるものである。 So, for example, when searching for “Koichi Sato” and entering “Koichi Sato”, “Sato Koichi” with the correct voice recognition is selected as the first candidate, and the following is the second from the pronunciation It is assumed that “Sato Koichi” is selected as a candidate, “Koko Sato” is selected as the third candidate, “Sato Koichi” is selected as the fourth candidate, and “Sato Koichi” is selected as the fifth candidate. These candidates are selected from vocabulary registered as reading information in the database.

そしてこの場合に、第１位候補の「さとうこういち」という読み情報が与えられた社員は、データベース上では３名で、この内の２名の読み情報（ふりがな）はローマ字表記から自動的に作成されたものである。また以下の候補は、データベース上ではそれぞれ１名ずつで、この内の第３位候補の「さとうこうきち」は読み情報（ふりがな）が登録されているが、他の第２位候補の「さとこういち」、第４位候補の「さとこうきち」、第５位候補の「さとこひち」はローマ字表記から自動的に作成されたものである。 In this case, there are three employees on the database who are given the first candidate “Satou Koichi” reading information, and the reading information (furigana) of the two of them is automatically created from the Roman alphabet. It has been done. The following candidates are one each on the database, and the third candidate “Sato Kokichi” has registered reading information (furigana), but the other second candidate “Sato” “Koichi”, the fourth candidate “Satokohichi”, and the fifth candidate “Satokohichi” are automatically created from Roman letters.

従って、これらのローマ字表記から自動的に作成された読み情報（ふりがな）の有無に関して表示を行うことで、例えば変換誤りを含んでいる可能性があることをユーザーに伝えることができる。 Therefore, by displaying the presence / absence of reading information (phonetics) automatically created from these romaji notations, it is possible to inform the user that there may be a conversion error, for example.

すなわち、例えば「さとうこういち」のように「一部のエントリに関して、読み情報がローマ字から生成されたものを含んでいる」候補には、その旨を表す記号「☆」を合わせて表示する。また、「さとうこうきち」のように「全てのエントリに関して、読み情報が登録されている」候補には、何の記号も表示しないこととする。さらに「さとこういち」「さとこうきち」「さとこひち」のように「全てのエントリに関して、読み情報がローマ字から生成されたものである」候補には、その旨を表す記号「★」を合わせて表示する。 That is, for example, a candidate “contains reading information generated from Roman characters for some entries” such as “Satou Koichi” is displayed with a symbol “☆” indicating that. Also, no symbol is displayed for a candidate “reading information is registered for all entries”, such as “Sato Kokichi”. Furthermore, the symbol “★” indicating that is added to the candidate “the reading information is generated from Roman characters for all entries” such as “Sato Koichi”, “Sato Koichi”, “Sato Koichi”. To display.

そこで、本発明による上述の情報検索装置１においては、人物データの検索を行う場合には、図１のディスプレイ１４には、例えば図４に示すような画像が表示される。 Therefore, in the information search apparatus 1 according to the present invention, when searching for person data, an image as shown in FIG. 4 is displayed on the display 14 of FIG.

図４においては、まず上部の表題領域３１の下の左側に「ノイズ測定」及び「終了」の操作領域３２が設けられ、これらの操作領域３２にカーソルを移動してクリックが行われることにより、それぞれの処理動作が行われる。またこの下には、音声入力の選択領域３３が設けられ、例えば「男女共通」「男性専用」「女性専用」「自動選択」の選択肢が設けられて、これらの選択肢にカーソルを移動してクリックが行われることにより、それぞれの選択が行われる。 In FIG. 4, first, a “noise measurement” and “end” operation area 32 is provided on the left side below the title area 31 at the top, and the cursor is moved to these operation areas 32 to perform a click. Each processing operation is performed. Below this, a selection area 33 for voice input is provided. For example, there are options of “male and female common”, “male only”, “female only”, and “automatic selection”. Move the cursor to these options and click Each selection is performed by performing.

さらにこの下には、波形抽出手段１１で抽出された音声の波形を表示する領域３４が設けられる。すなわちこの領域３４には、マイクロフォン２で収音された音声の波形が表示される。従って、オペレーター（入力者）は、この領域３４に表示される波形を見ることによって、マイクロフォン２に入力された音声の音量の多少などを確認することが可能となる。これにより、オペレーター（入力者）は、マイクロフォン２に入力される自らの音声の音量の加減することで、より適切な音声の入力を実現し、より円滑な音声認識による人物データの検索を行うことができるようになる。 Further below this, an area 34 for displaying the waveform of the sound extracted by the waveform extracting means 11 is provided. That is, in this area 34, the waveform of the sound collected by the microphone 2 is displayed. Therefore, the operator (input person) can confirm the volume of the sound input to the microphone 2 by looking at the waveform displayed in the area 34. Thereby, the operator (input person) realizes more appropriate voice input by adjusting the volume of his / her voice inputted to the microphone 2 and searches for person data by smoother voice recognition. Will be able to.

これにより、図１のマイクロフォン２で音声の収音が行われ、収音された音声の認識が音声認識手段１２で行われる。ここでは、音声認識の曖昧さに応じて、一致度の高いものから複数の候補が上げられる。そしてこの音声認識の結果に基づいて、検索手段１３でハードディスク１８に保存されたリストから姓名の読み（発音）の検索が行われる。さらに、この検索結果が、図４に示す画像の表題領域３１の下の右側の領域３５に表示される。この領域３５には、例えば一致度の高いものから順に複数の候補の読みの平仮名が表示され、各平仮名の表示欄の左端に選択用のチェック欄が設けられる。 As a result, sound is collected by the microphone 2 in FIG. 1, and the recognized sound is recognized by the sound recognition means 12. Here, according to the ambiguity of voice recognition, a plurality of candidates are raised from the one with the high degree of matching. Then, based on the result of the voice recognition, the search means 13 searches for the reading (pronunciation) of the first and last names from the list stored in the hard disk 18. Further, this search result is displayed in the right area 35 below the title area 31 of the image shown in FIG. In this area 35, for example, hiragana readings of a plurality of candidates are displayed in descending order of the matching degree, and a selection check column is provided at the left end of each hiragana display column.

さらにこのチェック欄にチェックが入れられると、画像の下部の領域３６に検索された人物データが表示される。なお、姓名の音声認識の場合には同姓同名の人物データが複数存在する可能性があるが、領域３６にはそれらのデータが並列で表示される。また、音声認識は姓名の読みで行われるので、漢字表記の異なるものも並列で表示される。さらに、検索された人物データの数が領域３６の表示可能な数を超える場合には、スクロールバー３７によって超えた分の表示が行われるようにすることができる。 If the check box is further checked, the retrieved person data is displayed in the area 36 below the image. In the case of voice recognition of first and last names, there may be a plurality of person data with the same first and last names, but these data are displayed in parallel in the area 36. In addition, since speech recognition is performed by reading first and last names, items with different kanji notations are also displayed in parallel. Furthermore, when the number of retrieved person data exceeds the number that can be displayed in the area 36, the scroll bar 37 can display the amount exceeding the number that can be displayed.

なお、表示領域３８には、検索された人物データの数（人数）が表示される。さらに表示領域３９には、「待機中」「始端検出中」「終端検出中」「認識中」などを表示して、それぞれ装置の動作状態を表示する。 In the display area 38, the number (number of people) of the retrieved person data is displayed. Furthermore, the display area 39 displays “standby”, “starting end detection”, “end end detection”, “recognition”, and the like, and displays the operation state of each device.

すなわち、本発明の情報検索装置１においては、オペレーター（入力者）が発声し、マイクロフォン２で収音された音声の認識が音声認識手段１２で行われると、領域３５に音声認識の一致度の高いものから順に候補の読みの平仮名が一覧表示される。そしてオペレーター（入力者）は、これらの読みの平仮名の中から、所望の読みのチェック欄にチェックを入れることによって、その読みで検索された人物データを領域３６に表示させる。これにより、オペレーター（入力者）は、この領域３６の表示を見ることで、人物データの確認を行うことができるものである。 In other words, in the information search apparatus 1 of the present invention, when the operator (input person) speaks and the voice collected by the microphone 2 is recognized by the voice recognition means 12, the voice recognition means 12 shows the degree of coincidence of voice recognition. Hiragana of candidate readings are listed in order from the highest. Then, the operator (input person) puts a check in the check box of the desired reading from the hiragana of these readings, and displays the person data retrieved by the reading in the area 36. Thereby, the operator (input person) can confirm the person data by looking at the display of the area 36.

そして、さらに本発明の情報検索装置１においては、上述の領域３５に、例えば上述したように、「一部のエントリに関して、読み情報がローマ字から生成されたものを含んでいる」候補には、その旨を表す記号「☆」が合せて表示され、「全てのエントリに関して、読み情報がローマ字から生成されたものである」候補には、その旨を表す記号「★」が合わせて表示される。 Further, in the information search apparatus 1 of the present invention, in the above-described area 35, for example, as described above, “for some entries, the reading information includes those generated from Roman characters” A symbol “☆” indicating that is displayed together, and a symbol “★” indicating that is displayed together with a candidate “reading information generated from Roman characters for all entries”. .

すなわち図４においては、例えば第１位候補の「さとうこういち」には、「一部のエントリに関して、読み情報がローマ字から生成されたものを含んでいる」ので、その旨を表す記号「☆」が合わせて表示される。また、第３位候補の「さとうこうきち」には、「全てのエントリに関して、読み情報が登録されている」ので、何の記号も表示しない。さらに他の候補の「さとこういち」「さとこうきち」「さとこひち」には、「全てのエントリに関して、読み情報がローマ字から生成されたものである」ので、その旨を表す記号「★」が合わせて表示される。 That is, in FIG. 4, for example, the first candidate “Satou Koichi” “includes reading information generated from Roman characters for some entries”. Are displayed together. In addition, since the third candidate “Sato Kokichi” “reading information is registered for all entries”, no symbol is displayed. Still other candidates “Sato Koichi”, “Sato Koichi”, and “Sato Koichi” have “the reading information is generated from Roman characters for all entries”, so the symbol “★” representing that fact Are displayed together.

また、図４では、第１位候補の「さとうこういち」のチェック欄にチェックが入れられているので、領域３６には「さとうこういち」で検索された３名の人物データが並列で表示されている。またこの３名の人物データでは、２番目と３番目の人物データについて、「その読み情報がローマ字から生成されたものである」ことを示す記号「★」が合わせて表示される。 In FIG. 4, since the check box for the first candidate “Kouichi Sato” is checked, the person data searched for “Kouichi Sato” in the area 36 is displayed in parallel. Yes. In the three person data, a symbol “★” indicating that “the reading information is generated from Roman characters” is also displayed for the second and third person data.

このようにして、ローマ字表記から自動的に作成された読み情報（ふりがな）の有無に関して表示を行うことで、例えば変換誤りを含んでいる可能性があることをユーザーに伝えることができる。 In this way, by displaying the presence / absence of reading information (phonetics) automatically created from Roman notation, it is possible to inform the user that there may be a conversion error, for example.

そしてこの場合に、第１位候補の「さとうこういち」には記号「☆」が表示されているが、これは「一部のエントリに関して、読み情報がローマ字から生成されたものを含んでいる」もので、このことは「読み情報がローマ字から生成されたもの」であっても、その読み自体は有り得ることを示している。これに対して、第２位候補の「さとこういち」には記号「★」が表示されているが、これは「全てのエントリに関して、読み情報がローマ字から生成されたものである」ものであるから、その読み情報自体が誤っている可能性があるものである。 In this case, the symbol “☆” is displayed on the first candidate “Sato Koichi”, which means that “for some entries, the reading information includes those generated from Roman characters”. This means that even if "reading information is generated from Roman characters", the reading itself is possible. On the other hand, the symbol “★” is displayed in the second candidate “Sato Koichi”, which is “for all entries, reading information is generated from Roman characters”. Therefore, the reading information itself may be incorrect.

一方、第３位候補の「さとうこうきち」には何の記号も表示されていないので、「全てのエントリに関して、読み情報が登録されている」ことになる。従ってこの場合には、読み情報は正しいと見做されるので、読み情報が所望の「さとうこういち」と異なる第３位候補は、ユーザーが行う検索の対象からは外すことが出来る。すなわちこの場合には、第１位候補の「さとうこういち」、第２位候補の「さとこういち」、第４位候補の「さとこうきち」、第５位候補の「さとこひち」に可能性が残されるものである。 On the other hand, since no symbol is displayed in the third candidate “Sato Kokichi”, “reading information is registered for all entries”. Therefore, in this case, since the reading information is considered to be correct, the third candidate whose reading information is different from the desired “Satou Koichi” can be excluded from the search target performed by the user. In other words, in this case, there is a possibility for the first candidate “Sato Koichi”, the second candidate “Sato Koichi”, the fourth candidate “Sato Koichi”, and the fifth candidate “Sato Koichi”. Is left behind.

そこで、例えば第５位候補の「さとこひち」のチェック欄にチェックを入れてみると、図５に示すように、領域３６に「佐藤幸一」の人物データが表示され、所望の人物データを検索することができる。 Therefore, for example, when a check is put in the check box of “Satoko Hichi” of the fifth candidate, as shown in FIG. 5, the person data of “Kouichi Sato” is displayed in the area 36, and the desired person data is displayed. You can search.

すなわち、上述の実施形態において、もしも認識結果として得られたひらがな文字列に対応するデータベース中のエントリの一部のものだけがふりがな未登録の場合（すなわち残りのものは正しくふりがなが入力されている場合）には、ローマ字から自動変換された読み情報と同じ読み方をする（同姓同名の）社員が実在しており、少なくとも人名として不適切ではないということを意味している。 That is, in the above-described embodiment, if only a part of the entries in the database corresponding to the hiragana character string obtained as a recognition result is unregistered (that is, the remaining characters are correctly input) ) Means that there is an actual employee who has the same reading as the reading information automatically converted from Roman characters (same name and same name), and at least it is not appropriate as a personal name.

一方、認識結果として得られたひらがな文字列に対応するデータベース中の全てのエントリがふりがな未登録である場合（すなわち、全てのエントリがローマ字から自動的に生成された読み情報を使用している場合）には、その文字列が人名として相応しくない読みになっている可能性もあり、前者に比べれば誤りを含む可能性は高いものと考えられる。 On the other hand, when all the entries in the database corresponding to the hiragana character string obtained as a recognition result are unregistered phonetics (that is, all entries use reading information automatically generated from Roman characters) ), The character string may have been read inappropriately as a person's name, and it is highly likely that it contains errors compared to the former.

従って、この両者について結果表示の方法を変えることで、ユーザーにそのひらがな文字列に誤りが含まれる可能性の大小を伝えることが可能となるものである。 Therefore, by changing the result display method for both of them, it is possible to inform the user of the possibility of an error in the hiragana character string.

そこで、本発明の請求項１においては、このような問題を解決するために、読み情報がローマ字から自動的に生成されたものである場合には、その旨、つまり「認識結果として表示されているひらがな文字列はローマ字から自動的に変換されたものであるため、間違っているかもしれない」ということを明示的に示すための具体的な手段について言及している。上記のような変換誤りを含んでいる可能性があることが伝えられれば、ユーザーが認識候補中から正解を探す場面でも、それなりの配慮を行うことできるはずである。 Accordingly, in order to solve such a problem, in the first aspect of the present invention, when the reading information is automatically generated from Roman characters, that is, that is, “displayed as a recognition result”. Some hiragana strings are automatically converted from Romaji, so they may be wrong. " If it is reported that there is a possibility of including the conversion error as described above, it should be possible to take appropriate consideration even when the user searches for a correct answer from among the recognition candidates.

また、本発明の請求項２においては、そのひらがな文字列が誤りを含んでいる可能性を２段階に分けて表示する工夫についても言及している。さらに、本発明の請求項３においては、認識結果のひらがな文字列に対して、何らかの記号やアイコン等を付加して表示する工夫について言及している。 Further, in claim 2 of the present invention, mention is made of a device for displaying the possibility that the hiragana character string includes an error in two stages. Furthermore, in claim 3 of the present invention, mention is made of a device for adding and displaying some symbols, icons, etc. to the hiragana character string of the recognition result.

なお、請求項３の替わりに請求項４に開示したように、フォント形状を変えて（フォント形状を斜体にして）表示する場合の実施形態を図６に示す。また、請求項５と請求項６に開示したように、フォント色と背景色を共に変えて（フォント色を黒から白に、背景色を白から黒にして）表示する場合の実施形態を図７に示す。 FIG. 6 shows an embodiment in which the font shape is changed (the font shape is italic) as disclosed in claim 4 instead of claim 3. Further, as disclosed in claims 5 and 6, an embodiment in which the font color and the background color are both changed (the font color is changed from black to white and the background color is changed from white to black) is shown. 7 shows.

このように、誤りを含む可能性があることを表示する具体的な方法としては、認識結果のひらがな文字列に対して、何らかの記号やアイコン等を付加して表示する、フォントの形状を変える、フォントの色や背景色を変えるなど、様々な方法が考えられる。なお、これらの方法では、特に誤りを含んでいる可能性の高い候補について、明示的に表示が行われるようにしているものである。 In this way, as a specific method of displaying that there is a possibility of including an error, the display is performed by adding some symbols or icons to the hiragana character string of the recognition result, changing the font shape, Various methods such as changing the font color and background color can be considered. In these methods, a candidate that is likely to contain an error is explicitly displayed.

こうして本発明の情報検索装置によれば、ローマ字情報からその読み方を推測し、個々の認識結果について、それに対応するデータベース中のエントリのうちの少なくとも１つに推測された読み情報が割り当てられている場合には、その旨を認識結果と合わせて表示するようにしたことによって、ユーザーには変換誤りを含んでいる可能性があることが伝えられ、複数の認識候補中から正解を探す場合に好適な装置を提供することができるものである。 Thus, according to the information retrieval apparatus of the present invention, the reading method is estimated from the Roman character information, and the estimated reading information is assigned to at least one of the entries in the database corresponding to each recognition result. In this case, the fact that it is displayed together with the recognition result informs the user that there may be a conversion error, and is suitable for searching for a correct answer from a plurality of recognition candidates. Can be provided.

なお、本発明の情報検索装置は、上述した会社代表受付（電話交換）での社員検索に限らず、通信販売会社等の顧客検索や、商品名称、アイテム番号、販売価格等が保管されているデータベースで、電話で商品名称からアイテム番号や販売価格などを教えるような業務に有効に利用できるものである。 The information search apparatus of the present invention is not limited to the employee search at the company representative reception (telephone exchange) described above, but stores customer searches for mail order companies, product names, item numbers, sales prices, and the like. This is a database that can be used effectively for work that teaches item numbers, sales prices, etc. from product names by phone.

また本発明は、上述の説明した実施形態に限定されるものではなく、本発明の精神を逸脱することなく種々の変形が可能とされるものである。 The present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the spirit of the present invention.

本発明による情報検索装置を適用した人物データ検索システムの一実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of the person data search system to which the information search device by this invention is applied. 本発明による情報検索装置の一実施形態の機能ブロック図である。It is a functional block diagram of one embodiment of an information retrieval device according to the present invention. その説明のためのデータベースの構成図である。It is a block diagram of the database for the description. その説明のための表示画面の構成図である。It is a block diagram of the display screen for the description. その説明のための表示画面の構成図である。It is a block diagram of the display screen for the description. その説明のための表示画面の構成図である。It is a block diagram of the display screen for the description. その説明のための表示画面の構成図である。It is a block diagram of the display screen for the description.

Explanation of symbols

１…音声認識システム、２…マイクロフォン、３…ＬＡＮ（Local Area Network）、４…社員データベース、１１…波形抽出手段、１２…音声認識手段、１３…検索手段、１４…ディスプレイ、１５…キーボード、１６…内部バス、１７…制御装置、１８…ハードディスク、２１…社員情報のデータベース、２２…読み情報推測手段、２３…音声認識用の文法規則、２４…音声認識用の音響モデル、２５…音声認識手段、２６…認識結果表示手段、２７…情報表示手段 DESCRIPTION OF SYMBOLS 1 ... Voice recognition system, 2 ... Microphone, 3 ... LAN (Local Area Network), 4 ... Employee database, 11 ... Waveform extraction means, 12 ... Voice recognition means, 13 ... Search means, 14 ... Display, 15 ... Keyboard, 16 DESCRIPTION OF SYMBOLS ... Internal bus, 17 ... Control device, 18 ... Hard disk, 21 ... Employee information database, 22 ... Reading information estimation means, 23 ... Grammar rules for voice recognition, 24 ... Acoustic model for voice recognition, 25 ... Voice recognition means , 26 ... recognition result display means, 27 ... information display means

Claims

A database having a plurality of search targets each corresponding to predetermined information and at least one of a name, a furigana of the name, and a Roman notation of the name;
For each of the plurality of search targets, when the phonetic is not included in the search target, reading information estimation means that guesses and creates the phonetic name from the Roman alphabet notation included in the search target;
A voice that identifies a plurality of phonetics as candidates of the phonetic recognized by the input voice based on the degree of coincidence between the phonetic included in each of the plurality of search objects and the input user's voice Recognition means;
For each of the plurality of phonetics specified by the speech recognition means, the search target including the phonetic created by the reading information estimation means exists in at least one of the search targets including the furigana. Is a recognition result display means for displaying that the search object including the phonetic name created by the reading information estimation means is present in at least one of the search objects including the furigana;
Information display means for extracting, from the database, the search target including the one phonetic selected from the plurality of phonetics specified by the voice recognition means, and displaying the predetermined information included in the search target. And comprising
An information retrieval apparatus characterized by that.

The information search device according to claim 1,
The recognition result display means includes, for each of the plurality of phonetics specified by the voice recognition means, all of at least one of the search objects including the phonetic includes the phonetic created by the reading information estimation means. A display method is different between a case and a case where a part of at least one of the search objects including the furigana includes the furigana created by the reading information estimation means.
An information retrieval apparatus characterized by that.

The information search device according to claim 1 or claim 2, wherein
The recognition result display means indicates that the search target including the phonetic created by the reading information estimation means exists in at least one search target including the phonetic specified by the voice recognition means. In order to display, a specific symbol or icon is accompanied or displayed before or after the character string representing the specified phonetic character,
An information retrieval apparatus characterized by that.

The information search device according to claim 1 or claim 2, wherein
The recognition result display means indicates that the search target including the phonetic created by the reading information estimation means exists in at least one search target including the phonetic specified by the voice recognition means. In order to display, the font shape of the character string representing the specified furigana is changed and displayed.
An information retrieval apparatus characterized by that.

The information search device according to claim 1 or claim 2, wherein
The recognition result display means indicates that the search target including the phonetic created by the reading information estimation means exists in at least one search target including the phonetic specified by the voice recognition means. In order to display, the font color of the character string representing the specified furigana is changed and displayed.
An information retrieval apparatus characterized by that.

The information search device according to claim 1 or claim 2, wherein
The recognition result display means indicates that the search target including the phonetic created by the reading information estimation means exists in at least one search target including the phonetic specified by the voice recognition means. In order to display, the background color of the character string representing the specified furigana is changed and displayed.
An information retrieval apparatus characterized by that.