JP5583230B2

JP5583230B2 - Information search apparatus and information search method

Info

Publication number: JP5583230B2
Application number: JP2013004761A
Authority: JP
Inventors: 伸小栗; 真也飯塚; 文彦加藤; 千沙竹田
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2013-01-15
Filing date: 2013-01-15
Publication date: 2014-09-03
Anticipated expiration: 2033-01-15
Also published as: JP2014137636A

Description

本発明は、文字列を用いて所定のデータベースからデータを検索する情報検索装置及び情報検索方法に関する。 The present invention relates to an information retrieval apparatus and an information retrieval method for retrieving data from a predetermined database using a character string.

従来から、ユーザの発話音声を入力し、当該音声を認識することによりテキスト形式の音声認識結果を得て、当該音声認識結果を検索キーワードとして所定のデータベースを検索する技術が知られている（下記特許文献１参照）。 2. Description of the Related Art Conventionally, a technique is known in which a user's speech is input, a speech recognition result in a text format is obtained by recognizing the speech, and a predetermined database is searched using the speech recognition result as a search keyword (described below) Patent Document 1).

下記特許文献１では、音声認識結果に基づいて例文検索を行う場合において、まずユーザに音声認識結果を提示し、ユーザ自身に誤認識した単語を除く操作を実行させることにより、誤認識した単語を含む文字列での検索を防止する方法が提案されている。 In Patent Document 1 below, when an example sentence search is performed based on a speech recognition result, the speech recognition result is first presented to the user, and the user himself / herself performs an operation for removing the misrecognized word to thereby detect the misrecognized word. A method for preventing a search with a character string including the text has been proposed.

特開２００３−３３０９２５号公報JP 2003-330925 A

しかしながら、上記方法では、ユーザによる確認及び操作を要するため、ユーザの手間がかかる。また、ユーザが一旦音声認識結果を確認及び操作する処理が介在することにより、一連の検索処理が完了するまでにより多くの時間がかかってしまうという問題もある。また、上記方法では、分節された複数の文字列毎に検索を行う場合において分節誤りが生じた際の対策はされていない。以下、これについて具体例を用いて説明する。 However, in the above method, since confirmation and operation by the user are required, it takes time and effort for the user. In addition, there is a problem that it takes more time to complete a series of search processes due to the process in which the user once confirms and operates the voice recognition result. In the above method, no countermeasure is taken when a segmentation error occurs when a search is performed for each of a plurality of segmented character strings. Hereinafter, this will be described using a specific example.

例えば、「斉藤洋子さんに電話」という入力音声について形態素解析等による音声認識がされて、「斉藤／洋子／さん／に／電話」のように「／」で分節された５つの文字列（音声認識結果）が得られる場合を考える。このようにして得られた各文字列を検索キーワードとして、例えば氏名及び電話番号を示す情報等が互いに関連付けられたデータを格納した電話帳データベースを検索することにより、氏名を示す情報が「斉藤洋子」であるデータを取得し、ユーザに提示することが可能になる。 For example, the input speech “Call Yoko Saito” is recognized by morphological analysis, etc., and five character strings segmented by “/” like “Saito / Yoko / Ms. / Ni / Telephone” (speech Consider the case where a recognition result is obtained. Each character string thus obtained is used as a search keyword, for example, by searching a telephone directory database that stores data in which information indicating a name and a telephone number is associated with each other, the information indicating the name becomes “Yoko Saito”. Can be obtained and presented to the user.

しかしながら、上記例において、音声認識処理で「佐／伊藤／洋子さん／に／電話」と誤認識（変換誤り及び分節誤り等）が生じた場合には、不適切な検索キーワード（「佐」）が得られる。このような不適切な検索キーワードを用いて検索を実行すると、例えば「佐藤」及び「佐々木」等のユーザの意図しない検索結果がユーザに提示され、ユーザの利便性を損なうという問題が生じ得る。 However, in the above example, if an erroneous recognition (conversion error, segmentation error, etc.) occurs as “Sa / Ito / Yoko / Ni / Telephone” in the speech recognition process, an inappropriate search keyword (“sa”) Is obtained. When a search is executed using such an inappropriate search keyword, for example, search results unintended by the user such as “Sato” and “Sasaki” are presented to the user, which may cause a problem that the user's convenience is impaired.

本発明は、上記の課題に鑑みてなされたものであり、文字列を用いて所定のデータベースを検索する際における検索誤りを効率よく低減することができる情報検索装置及び情報検索方法を提供することを目的とする。 The present invention has been made in view of the above problems, and provides an information search apparatus and an information search method that can efficiently reduce search errors when searching a predetermined database using a character string. With the goal.

本発明に係る情報検索装置は、１つの検索用情報として、文字列を示す文字列情報を複数取得する文字列情報取得手段と、文字列情報取得手段により取得された文字列情報毎に、当該文字列情報が示す文字列の文字数を計数する文字数計数手段と、文字列情報取得手段により取得された検索用情報を用いて、文字数計数手段により計数された当該文字列情報毎の文字数に応じた検索を実行することにより検索結果を取得する検索手段と、検索手段により取得された検索結果を出力する検索結果出力手段と、を備える。 The information search apparatus according to the present invention includes, as one search information, a character string information acquisition unit that acquires a plurality of character string information indicating a character string, and for each character string information acquired by the character string information acquisition unit, According to the number of characters for each character string information counted by the character number counting means, using the character number counting means for counting the number of characters of the character string indicated by the character string information and the search information obtained by the character string information obtaining means. Search means for acquiring a search result by executing a search, and search result output means for outputting the search result acquired by the search means.

本発明に係る情報検索装置では、１つの検索用情報として複数の文字列情報を取得し、当該文字列情報毎の文字数に応じた検索を実行する。これにより、例えば、検索誤りを生じやすい文字列情報を文字数に基づいて抽出し、当該文字列情報を検索用情報から除外する等の適切な処理を行った上で検索を実行することができる。したがって、取得した文字列情報をそのまま用いて検索した場合に生じる検索誤りを防止することができる。即ち、文字列を用いて所定のデータベースを検索する際における検索誤りを効率よく低減することができる。 The information search apparatus according to the present invention acquires a plurality of pieces of character string information as one piece of search information, and executes a search according to the number of characters for each piece of character string information. Thereby, for example, character string information that is likely to cause a search error is extracted based on the number of characters, and the search can be executed after performing appropriate processing such as excluding the character string information from the information for search. Therefore, it is possible to prevent a search error that occurs when a search is performed using the acquired character string information as it is. That is, it is possible to efficiently reduce search errors when searching a predetermined database using a character string.

上記情報検索装置では、検索手段は、文字数が１文字の文字列情報を除外し、文字数が２文字以上の文字列情報を用いて検索を実行する。 In the information search device, the search means excludes character string information having one character and performs a search using character string information having two or more characters.

上記構成によれば、検索用文字列として用いた場合に検索誤りを生じる可能性の高い文字列情報（文字数が１文字の文字列情報）を検索用情報から除外することにより、検索誤りを効率よく低減することができる。 According to the above configuration, the search error is made efficient by excluding from the search information character string information (character string information having one character) that is likely to cause a search error when used as a search character string. It can be reduced well.

上記情報検索装置では、文字列情報取得手段は、予め順序付けされた複数の文字列情報を取得し、検索手段は、上記順序付けに基づいて文字数が１文字の異なる２つの文字列情報が互いに隣接するか否かを判定し、当該２つの文字列情報が互いに隣接する場合には、当該２つの文字列情報が示す文字列同士を連結して新たな文字列を生成し、当該新たな文字列を示す情報を含む文字数が２文字以上の文字列情報を用いて検索を実行する。 In the information search device, the character string information acquisition unit acquires a plurality of pieces of character string information that are ordered in advance, and the search unit adjoins two character string information having different numbers of characters based on the ordering. If the two character string information is adjacent to each other, the character strings indicated by the two character string information are connected to generate a new character string, and the new character string is characters containing information indicating performs a search using the character string information of more than one letter.

例えば、文字列情報取得手段は、元々１つの文字列が本来分離すべきではない箇所（例えば「加護（カゴ）」という氏名を示す文字列についての「カ」と「ゴ」との間）で分割（分節）されて生成された各文字列を示す文字列情報を取得する場合がある。このような場合、上記構成によれば、「カ」と「ゴ」とを連結して「カゴ」（新たな文字列）を生成し、当該新たな文字列を検索用文字列として用いて検索することが可能となる。これにより、検索誤りの発生を防止すると共に、より検索に適した新たな文字列（「カゴ」）を用いて検索を実行することができるため、検索精度を向上させることが期待できる。 For example, the character string information acquisition means is a part where one character string should not be originally separated (for example, between “K” and “G” for a character string indicating a name “Kago”). Character string information indicating each character string generated by being divided (segmented) may be acquired. In such a case, according to the above configuration, “K” and “G” are connected to generate “Kago” (new character string), and the new character string is used as a search character string for searching. It becomes possible to do. Accordingly, occurrence of a search error can be prevented and a search can be executed using a new character string (“cage”) more suitable for the search, so that it can be expected to improve the search accuracy.

上記情報検索装置では、検索手段は、文字数が１文字の文字列情報を用いた検索と文字数が２文字以上の文字列情報を用いた検索とについて、互いに異なる検索条件に基づいて検索を実行してもよい。 In the information search apparatus, the search means executes a search based on different search conditions for a search using character string information having one character and a search using character string information having two or more characters. May be.

例えば人の氏名等を示す文字列情報を含むデータベースを検索する場合等には、文字数が１文字の文字列は、分節誤りにより得られた文字列である可能性が高く、検索用文字列として適切ではない可能性が高いと考えられる。そのため、例えば、文字数が１文字の文字列を検索用文字列として、当該検索用文字列が部分的に含まれていれば検索結果として抽出する部分一致検索等のあいまい検索を行った場合には、大量の検索誤りが生じるおそれがある。一方で、例えば「李（り）」や「津（つ）」等、文字数が１文字の苗字（氏）も存在し得る。上記構成によれば、例えば、文字数が２文字以上の文字列については上述のあいまい検索を実行し、文字数が１文字の文字列については当該文字列が過不足なく含まれている場合に限り検索結果として抽出する完全一致検索を実行するというように、文字数に応じて適切な検索条件に基づいて検索を実行することが可能となる。これにより、ユーザの所望する情報を取得できる可能性を高めると共に、検索誤りを効率よく低減することができる。 For example, when searching a database including character string information indicating a person's name or the like, a character string with one character is likely to be a character string obtained by segmentation error, and is used as a search character string. Probably not appropriate. Therefore, for example, when a fuzzy search such as a partial match search that extracts a character string with one character as a search result and extracts the search result if the search character string is partially included is performed. A large amount of search errors may occur. On the other hand, there may be a surname (Mr.) having one character such as “Li” or “Tsu”. According to the above configuration, for example, the above-mentioned fuzzy search is executed for a character string with two or more characters, and a character string with a single character is searched only when the character string is included without excess or deficiency. It is possible to execute a search based on an appropriate search condition according to the number of characters, such as executing a complete match search extracted as a result. As a result, it is possible to increase the possibility of acquiring information desired by the user and to efficiently reduce search errors.

上記情報検索装置は、ユーザの一連の発話音声を入力し、当該発話音声に対して音声認識処理を行うことにより文字列情報を生成する音声認識手段を更に備え、文字列情報取得手段は、音声認識手段により生成された文字列情報を取得してもよい。 The information search apparatus further includes voice recognition means for inputting a series of user's utterance voices and generating character string information by performing voice recognition processing on the utterance voices. The character string information generated by the recognition unit may be acquired.

上記構成によれば、文字列情報取得手段は、音声認識手段によってユーザの発話音声から文字列情報を取得することができる。また、当該文字列情報が音声認識によって分節誤りが生じた結果としての文字列情報である場合であっても、上述した各手段によって検索誤りを効率よく低減することができる。 According to the said structure, the character string information acquisition means can acquire character string information from a user's speech voice by a speech recognition means. Further, even when the character string information is character string information as a result of a segmentation error caused by speech recognition, search errors can be efficiently reduced by the above-described means.

上記情報検索装置では、第１の文字種別とは異なる第２の文字種別で示される文字列情報を用いて検索を実行することにより検索結果を取得する第２の検索手段を更に備え、文字列情報取得手段は、検索用情報としての第１の文字種別で示される第１の文字列情報と、当該第１の文字列情報と対応し且つ第２の文字種別で示される第２の文字列情報とを取得し、文字数計数手段及び検索手段は、第１の文字列情報に対して各処理を実行し、第２の検索手段は、第２の文字列情報を用いて検索を実行し、検索結果出力手段は、検索手段により取得された第１の検索結果と第２の検索手段により取得された第２の検索結果とに基づいて検索結果を出力してもよい。 The information search apparatus further includes second search means for acquiring a search result by executing a search using character string information indicated by a second character type different from the first character type, and the character string The information acquisition means includes a first character string information indicated by a first character type as search information, and a second character string corresponding to the first character string information and indicated by a second character type Information, the character number counting means and the search means perform each process on the first character string information, the second search means performs a search using the second character string information, The search result output means may output the search result based on the first search result acquired by the search means and the second search result acquired by the second search means.

上記構成によれば、例えばカタカナ（第１の文字種別）で示される文字列情報（第１の文字列情報。例えば「カゴ」）について上述の検索を実行し、検索結果（第１の検索結果）を取得する。それと同時に、例えば漢字（第２の文字種別）で示される文字列情報（第２の文字列情報。例えば「加護」）について任意の方法による検索を実行した検索結果（第２の検索結果）を取得する。これにより、検索結果出力手段は、各々の検索結果に基づいてユーザに提示する検索結果を出力することができる。即ち、複数の文字種別の各々について検索を実行し、各検索結果を総合的に評価して検索結果を出力することにより、検索の品質を向上させることが期待できる。 According to the above configuration, for example, the above-described search is performed on character string information (first character string information; for example, “Kago”) indicated by katakana (first character type), and a search result (first search result) ) To get. At the same time, for example, a search result (second search result) obtained by executing a search by an arbitrary method for character string information (second character string information; for example, “protection”) indicated by kanji (second character type). get. Thereby, the search result output means can output the search result presented to the user based on each search result. That is, it is expected that the search quality is improved by executing a search for each of a plurality of character types, comprehensively evaluating each search result, and outputting the search result.

上記情報検索装置では、第２の検索手段は、第２の文字列情報に対して、検索手段が当該第２の文字列情報と対応する第１の文字列情報に対して当該第１の文字列情報の文字数に応じて実行する処理と同一の処理を実行し、当該第２の文字列情報を用いて検索を実行してもよい。 In the information search device, the second search unit is configured to search for the second character string information, and the search unit corresponds to the first character string information corresponding to the second character string information. The same process as the process executed according to the number of characters in the column information may be executed, and the search may be executed using the second character string information.

上記構成によれば、検索手段が第１の文字列情報に対して文字数に応じて何らかの処理（例えば、除外処理等）を行った上で検索を実行する場合には、第２の検索手段は、当該第１の文字列情報に対応する第２の文字列情報についても同様の処理を行った上で、検索手段と同様の検索方法により検索を実行する。これにより、例えば検索用情報から除外された第１の文字列情報に対応する第２の文字列情報についても検索用情報から除外することができ、検索誤りをより効率よく低減することができる。 According to the above configuration, when the search means performs a search after performing some process (for example, an exclusion process) on the first character string information according to the number of characters, the second search means The second character string information corresponding to the first character string information is subjected to the same processing, and the search is executed by the same search method as the search means. Thereby, for example, the second character string information corresponding to the first character string information excluded from the search information can also be excluded from the search information, and search errors can be reduced more efficiently.

ところで、本発明は、上記のように情報検索装置の発明として記述できる他に、以下のように情報検索方法の発明としても記述することができる。これらの発明はカテゴリが異なるだけで、実質的に同一の発明であるため、同様の作用及び効果を奏する。 By the way, the present invention can be described as an invention of an information search apparatus as described above, and can also be described as an invention of an information search method as follows. Since these inventions are substantially the same inventions only in different categories, they exhibit the same operations and effects.

即ち、本発明に係る情報検索方法は、１つの検索用情報として、文字列を示す文字列情報を複数取得する文字列情報取得ステップと、文字列情報取得ステップにおいて取得された文字列情報毎に、当該文字列情報が示す文字列の文字数を計数する文字数計数ステップと、文字列情報取得ステップにおいて取得された検索用情報を用いて、文字数計数ステップにおいて計数された当該文字列情報毎の文字数に応じた検索を実行することにより検索結果を取得する検索ステップと、検索ステップにおいて取得された検索結果を出力する検索結果出力ステップと、を含み、検索ステップにおいて、文字数が１文字の文字列情報を除外し、文字数が２文字以上の文字列情報を用いて検索を実行する。また、本発明に係る情報検索方法は、１つの検索用情報として、文字列を示す文字列情報を１以上取得する文字列情報取得ステップと、文字列情報取得ステップにおいて取得された文字列情報毎に、当該文字列情報が示す文字列の文字数を計数する文字数計数ステップと、文字列情報取得ステップにおいて取得された検索用情報を用いて、文字数計数ステップにおいて計数された当該文字列情報毎の文字数に応じた検索を実行することにより検索結果を取得する検索ステップと、検索ステップにおいて取得された検索結果を出力する検索結果出力ステップと、を含み、文字列情報取得ステップにおいて、予め順序付けされた複数の文字列情報を取得し、検索ステップにおいて、上記順序付けに基づいて文字数が１文字の異なる２つの文字列情報が互いに隣接するか否かを判定し、当該２つの文字列情報が互いに隣接する場合には、当該２つの文字列情報が示す文字列同士を連結して新たな文字列を生成し、当該新たな文字列を示す情報を含む文字数が２文字以上の文字列情報を用いて検索を実行する。 That is, the information search method according to the present invention includes, as one search information, a character string information acquisition step for acquiring a plurality of character string information indicating character strings, and for each character string information acquired in the character string information acquisition step. The character count step for counting the number of characters in the character string indicated by the character string information and the search information acquired in the character string information acquisition step are used to calculate the number of characters for each character string information counted in the character number count step. A search step for obtaining a search result by executing a corresponding search, and a search result output step for outputting the search result obtained in the search step. In the search step, character string information having one character is stored. A search is performed using character string information with two or more characters excluded. The information search method according to the present invention includes, as one search information, a character string information acquisition step for acquiring one or more character string information indicating a character string, and for each character string information acquired in the character string information acquisition step. The number of characters for each character string information counted in the character number counting step using the character number counting step for counting the number of characters of the character string indicated by the character string information and the search information acquired in the character string information acquiring step A search step for acquiring a search result by executing a search according to the search result, and a search result output step for outputting the search result acquired in the search step. In the search step, two character string information having different character numbers based on the above ordering are obtained. If the two character string information are adjacent to each other, the character strings indicated by the two character string information are connected to generate a new character string, and the new character string A search is executed using character string information including two or more characters including information indicating a simple character string.

本発明によれば、文字列を用いて所定のデータベースを検索する際における検索誤りを効率よく低減することができる。 ADVANTAGE OF THE INVENTION According to this invention, the search error at the time of searching a predetermined database using a character string can be reduced efficiently.

本発明の一実施形態に係る情報検索装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the information search device which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報検索装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the information search device which concerns on one Embodiment of this invention. ユーザの発話音声に対する音声認識結果の例を示す図である。It is a figure which shows the example of the speech recognition result with respect to a user's speech sound. 特定の文字についての変換の一例（長音ヨミ変換）を示す図である。It is a figure which shows an example (long sound reading conversion) about the conversion about a specific character. 情報検索装置の動作を説明するために用いる情報の例を示す図である。It is a figure which shows the example of the information used in order to demonstrate operation | movement of an information search device. 第１の例における情報検索装置の動作を示す図である。It is a figure which shows operation | movement of the information search device in a 1st example. 第２の例における情報検索装置の動作を示す図である。It is a figure which shows operation | movement of the information search device in a 2nd example. 第３の例における情報検索装置の動作を示す図である。It is a figure which shows operation | movement of the information search device in a 3rd example.

以下、添付図面を参照しながら本発明の実施形態を詳細に説明する。なお、図面の説明において同一又は同等の要素には同一の符号を付し、重複する説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the description of the drawings, the same or equivalent elements are denoted by the same reference numerals, and redundant description is omitted.

図１は、本発明に係る情報検索装置の一実施形態の構成図である。本実施形態に係る情報検索装置１は、例えばユーザに携帯される携帯端末等のユーザ端末であって、ユーザ利便性及び操作性を向上させるために、ユーザの発話音声に基づく検索サービスを提供可能としたものである。具体的には、情報検索装置１は、ユーザの発話音声を音声認識して得られた１以上の文字列情報を検索用情報として取得し、当該検索用情報を用いて所定のデータベースを検索する装置である。 FIG. 1 is a configuration diagram of an embodiment of an information search apparatus according to the present invention. The information search apparatus 1 according to the present embodiment is a user terminal such as a portable terminal carried by a user, for example, and can provide a search service based on the user's uttered voice in order to improve user convenience and operability. It is what. Specifically, the information search apparatus 1 acquires one or more character string information obtained by voice recognition of a user's uttered voice as search information, and searches a predetermined database using the search information. Device.

本実施形態では、所定のデータベースとして、人の氏名を示す文字列情報と電話番号を示す情報とを関連付けて格納した電話帳データベースを想定している。即ち、情報検索装置１は、上述の検索用情報を検索キーワードとし、電話帳データベースに格納された人の氏名を示す文字列情報を検索対象として電話帳データベースを検索することにより、電話帳データベースから該当する情報（人の氏名及び電話番号）を取得する電話帳検索の機能をユーザに提供する装置である。 In the present embodiment, a telephone directory database that stores character string information indicating a person's name and information indicating a telephone number in association with each other is assumed as the predetermined database. That is, the information search apparatus 1 uses the search information as a search keyword and searches the phone book database for character string information indicating a person's name stored in the phone book database. It is a device that provides a user with a phone book search function for acquiring relevant information (name and phone number of a person).

ただし、情報検索装置１の形態は上記に限定されない。例えば、情報検索装置１の各機能は、ユーザ端末から通信ネットワークを介して利用可能なサーバ上に備えられてもよい。この場合、ユーザはユーザ端末を介して情報検索装置１が提供する機能を利用することができる。また、情報検索装置１の各機能は、例えばユーザ端末とサーバとで分散され、ユーザ端末及びサーバが互いに協働して動作することにより実現されてもよい。 However, the form of the information search device 1 is not limited to the above. For example, each function of the information search device 1 may be provided on a server that can be used from a user terminal via a communication network. In this case, the user can use the function provided by the information search device 1 via the user terminal. Moreover, each function of the information search device 1 may be realized by, for example, being distributed between a user terminal and a server, and the user terminal and the server operating in cooperation with each other.

図１に示すように、本実施形態に係る情報検索装置１は、音声認識部１１と、文字列情報取得部１２と、文字数計数部１３と、電話帳データベースである文字列情報データベース１４と、検索処理を実行する検索部１５及び第２検索部１６と、検索結果を出力する検索結果出力部１７とを備える。 As shown in FIG. 1, the information search apparatus 1 according to the present embodiment includes a voice recognition unit 11, a character string information acquisition unit 12, a character number counting unit 13, a character string information database 14 that is a phone book database, A search unit 15 and a second search unit 16 that execute search processing, and a search result output unit 17 that outputs a search result are provided.

図２は、本実施形態に係る情報検索装置１のハードウェア構成図である。図２を用いて、情報検索装置１のハードウェア構成について説明する。図２に示すように、情報検索装置１は、オペレーティングシステムやアプリケーションプログラムなどを実行するＣＰＵ１０１と、ＲＯＭ及びＲＡＭで構成される主記憶部１０２と、ハードディスクメモリなどで構成される補助記憶部１０３と、データ通信を行う通信制御部１０４と、液晶モニタなどで構成される出力部１０５と、入力デバイスであるキーボード、マウス及びマイク等で構成される入力部１０６と、ＵＳＢメモリ、ＣＤ−ＲＯＭ、ＤＶＤなどの記録媒体１０８を読み取る記録媒体読取部１０７とを備える。 FIG. 2 is a hardware configuration diagram of the information search apparatus 1 according to the present embodiment. A hardware configuration of the information search apparatus 1 will be described with reference to FIG. As shown in FIG. 2, the information retrieval apparatus 1 includes a CPU 101 that executes an operating system, application programs, and the like, a main storage unit 102 that includes a ROM and a RAM, an auxiliary storage unit 103 that includes a hard disk memory, and the like. , A communication control unit 104 that performs data communication, an output unit 105 including a liquid crystal monitor, an input unit 106 including a keyboard, a mouse, and a microphone as input devices, a USB memory, a CD-ROM, a DVD And a recording medium reading unit 107 that reads the recording medium 108.

図１に示す情報検索装置１の各機能は、ＣＰＵ１０１の制御の下で、主記憶部１０２に所定のソフトウェアプログラムを読み込ませて実行することにより実現される。その際、ＣＰＵ１０１は、ソフトウェアプログラムの処理手順に従い、主記憶部１０２及び補助記憶部１０３におけるデータの読み出し及び書き込み動作を制御し、入力部１０６、出力部１０５及び通信制御部１０４の動作を制御する。以下、図１に示す情報検索装置１の各機能要素について説明する。 Each function of the information search apparatus 1 shown in FIG. 1 is realized by reading and executing a predetermined software program in the main storage unit 102 under the control of the CPU 101. At that time, the CPU 101 controls data reading and writing operations in the main storage unit 102 and the auxiliary storage unit 103 according to the processing procedure of the software program, and controls operations of the input unit 106, the output unit 105, and the communication control unit 104. . Hereinafter, each functional element of the information search apparatus 1 shown in FIG. 1 will be described.

音声認識部１１は、ユーザの一連の発話音声を入力し、当該発話音声に対して従来の音声認識技術を用いて音声認識処理を実行し、音声認識結果としての文字列情報を生成する音声認識手段である。音声認識部１１は、例えば情報検索装置１に内蔵されたマイク等の音声入力を行うハードウェアと、当該ハードウェアを介して入力された音声に対して音声認識処理を実行して文字列情報を生成する音声認識ミドルウェアとを備えるものとして実現される。 The voice recognition unit 11 inputs a series of user's utterance voices, executes voice recognition processing on the uttered voices using a conventional voice recognition technology, and generates character string information as a voice recognition result. Means. The voice recognition unit 11 performs, for example, hardware that performs voice input, such as a microphone built in the information search apparatus 1, and performs voice recognition processing on voice input through the hardware to obtain character string information. It implement | achieves as what is provided with the voice recognition middleware to produce | generate.

音声認識部１１は、例えば「斎藤陽子さんに電話」という内容をユーザが意図して発話した音声を入力すると、当該発話音声に対して形態素解析等を用いて音声認識処理を実行する。その結果、音声認識部１１は、ヨミを示す文字列情報として、例えばカタカナ（第１の文字種別）で示される「サ／イトウ／ヨウコ／サン／ニ／デンワ」という文字列情報（第１の文字列情報）を取得する。ここで、「／」は、音声認識部１１が音声認識により単語の境界であると判定した箇所を示す。音声認識部１１は、「／」を境界として分節（分割）された一群の文字列情報（「／」で区切られた各々の単語が１つの文字列情報）を音声認識結果として取得する。なお、音声認識部１１が取得する音声認識結果は、必ずしも分節されている必要はなく、区切りのない１つの文字列情報であってもよい。 For example, when a voice that the user intentionally utters the content of “call to Yoko Saito” is input, the voice recognition unit 11 performs a voice recognition process on the uttered voice using morphological analysis or the like. As a result, the voice recognition unit 11 uses, for example, character string information (first / same / yoko / san / ni / denwa) indicated by katakana (first character type) as the character string information indicating the reading (first character). Get string information). Here, “/” indicates a location that the speech recognition unit 11 determines to be a word boundary by speech recognition. The voice recognition unit 11 acquires a group of character string information segmented (divided) with “/” as a boundary (one word string information each word separated by “/”) as a voice recognition result. Note that the speech recognition result acquired by the speech recognition unit 11 does not necessarily have to be segmented, and may be one character string information without a break.

また、音声認識部１１は、第１の文字列情報に対応し、表記を示す文字列情報（第２の文字列情報）として、例えばひらがな、カタカナ及び漢字等が混在する文字種別（第２の文字種別）で示される「差／伊藤／洋子／さん／に／電話」という文字列情報（第２の文字列情報）を併せて取得してもよい。このような第２の文字列情報は、例えば情報検索装置１が内部に保持する変換辞書（ヨミを示す文字列と当該単語のかな漢字表記を示す文字列とを関連付けて格納した辞書）を用いて第１の文字列情報を変換すること等により得ることができる。 In addition, the voice recognition unit 11 corresponds to the first character string information, and as character string information (second character string information) indicating notation, for example, a character type (second character) in which hiragana, katakana and kanji are mixed. Character string information (second character string information) “difference / Ito / Yoko / san / ni / telephone” indicated by (character type) may be acquired together. Such second character string information is obtained by using, for example, a conversion dictionary (a dictionary storing a character string indicating a reading and a character string indicating a kana-kanji notation of the word) held in the information search apparatus 1. It can be obtained by converting the first character string information.

ここで、音声認識処理の精度は、周囲の雑音、及びユーザの発音の明瞭性等に依存するため、正しい音声認識結果が常に得られるとは限らない。そこで、音声認識部１１は、音声認識結果の候補（ｉ−Ｂｅｓｔ認識結果）を複数取得するものであってもよい。図３に、ユーザの発話音声に対する音声認識結果の例を示す。図３に示すように、例えば、音声認識部１１は、「斎藤陽子さんに電話」という内容をユーザが意図して発話した音声に対して音声認識処理を実行し、音声認識結果の第１候補（１−Ｂｅｓｔ認識結果）として、「サ／イトウ／ヨウコ／サン／ニ／デンワ」及び「差／伊藤／洋子／さん／に／電話」を取得し、第２候補（２−Ｂｅｓｔ認識結果）として、「サイトー／ヨウコ／サン／ニ／デンワ」及び「斎藤／陽子／さん／に／電話」を取得してもよい。ここで、「ｉ−Ｂｅｓｔ認識結果」における「ｉ」の値が小さいほど音声認識順位（音声認識の推定精度の順位）が高いことを示している。なお、上記の音声認識結果の候補の例は一例である。 Here, since the accuracy of the speech recognition process depends on ambient noise and the clarity of the pronunciation of the user, a correct speech recognition result is not always obtained. Therefore, the speech recognition unit 11 may acquire a plurality of speech recognition result candidates (i-Best recognition results). FIG. 3 shows an example of a speech recognition result for the user's speech. As illustrated in FIG. 3, for example, the voice recognition unit 11 performs a voice recognition process on the voice that the user intentionally utters the content of “call Yoko Saito”, and the first candidate of the voice recognition result. As “1-Best recognition result”, “Sa / Ito / Yoko / San / Ni / Denwa” and “Difference / Ito / Yoko / Mr. / Ni / Telephone” are acquired and the second candidate (2-Best recognition result) “Site / Yoko / San / Ni / Denwa” and “Saito / Yoko / Ms. / Ni / Telephone” may be acquired. Here, the smaller the value of “i” in the “i-Best recognition result”, the higher the speech recognition rank (rank of the speech recognition estimation accuracy). Note that the above example of speech recognition result candidates is an example.

文字列情報取得部１２は、１つの検索用情報として、文字列情報を１以上取得する文字列情報取得手段である。「１つの検索用情報」とは、例えば、音声認識結果の１つの候補（ｉ−Ｂｅｓｔ認識結果）に含まれる１以上の文字列情報である。具体的には、１つの検索用情報には、ヨミを示す第１の文字列情報（例えば「サ／イトウ／ヨウコ／サン／ニ／デンワ」）と、表記を示す第２の文字列情報（例えば「差／伊藤／洋子／さん／に／電話」）とが含まれる。後述する検索部１５及び第２の検索手段１６による検索によって、１つの検索用情報に対して１つの検索結果（順位付けされた検索結果）が得られる。 The character string information acquisition unit 12 is a character string information acquisition unit that acquires one or more character string information as one piece of search information. “One search information” is, for example, one or more pieces of character string information included in one candidate (i-Best recognition result) of a speech recognition result. Specifically, one piece of search information includes first character string information indicating a reading (for example, “sa / itou / yoko / san / ni / denwa”) and second character string information indicating notation ( For example, “Difference / Ito / Yoko / Ms. / Ni / Telephone”) is included. One search result (ranked search result) is obtained for one search information by a search performed by the search unit 15 and the second search unit 16 described later.

文字列情報取得部１２は、音声認識部１１により生成された音声認識結果（文字列情報）を取得する。図３に示すように、音声認識部１１が２つの音声認識結果の候補を生成した場合には、「１−Ｂｅｓｔ認識結果」に対応する文字列情報（ヨミ・表記）及び「２−Ｂｅｓｔ認識結果」に対応する文字列情報（ヨミ・表記）をそれぞれ１つの検索用情報として取得する。文字列情報取得部１２によって取得された第１の文字列情報（ヨミ）は、後述する検索部１５による検索に用いられ、第２の文字列情報（表記）は、後述する第２の検索部１６による検索に用いられる。音声認識結果の候補毎に得られる順位付けされた検索結果は、後述する検索結果出力部１７によって総合的に評価され、最終的に出力（ユーザに提示）されるユーザによる１つの発話に対応する１つの検索結果が生成及び出力される。 The character string information acquisition unit 12 acquires the voice recognition result (character string information) generated by the voice recognition unit 11. As illustrated in FIG. 3, when the speech recognition unit 11 generates two speech recognition result candidates, character string information (yomi / notation) corresponding to “1-Best recognition result” and “2-Best recognition” Each piece of character string information (yomi / notation) corresponding to “result” is acquired as one piece of search information. The first character string information (yomi) acquired by the character string information acquisition unit 12 is used for search by the search unit 15 described later, and the second character string information (notation) is used for the second search unit described later. 16 is used for search. The ranked search results obtained for each speech recognition result candidate are comprehensively evaluated by the search result output unit 17 described later, and correspond to one utterance by the user that is finally output (presented to the user). One search result is generated and output.

また、文字列情報取得部１２は、形態素解析等の技術を用いることによって、上述のように取得した文字列情報に対して、敬称削除及び名詞取得等の処理を行ってもよい。このような処理を行うことにより、文字列情報データベース１４を検索する際に適切な検索キーワードとなり得る文字列情報を効率よく抽出することができる。例えば、図３に示す「１−Ｂｅｓｔ認識結果」に対応する検索用情報について敬称削除及び名詞取得の処理を行った場合には、敬称を示す「サン（さん）」が特定されて削除される。さらに残った文字列情報の中からある基準を満たす名詞文節（例えば、「サ（差）」、「イトウ（伊藤）」、「ヨウコ（洋子）」）が特定及び取得される。ただし、上述の処理は、情報検索装置１において必須の処理ではなく省略されてもよい。 Moreover, the character string information acquisition part 12 may perform processes, such as title removal and noun acquisition, with respect to the character string information acquired as mentioned above by using techniques, such as a morphological analysis. By performing such processing, it is possible to efficiently extract character string information that can be an appropriate search keyword when searching the character string information database 14. For example, in the case where the title removal and noun acquisition processing is performed on the search information corresponding to the “1-Best recognition result” shown in FIG. 3, “san” indicating the title is identified and deleted. . Furthermore, noun phrases (for example, “sa (difference)”, “Ito (Ito)”, “yoko (Yoko)”) satisfying a certain criterion are specified and acquired from the remaining character string information. However, the above-described process is not an essential process in the information search apparatus 1 and may be omitted.

文字数計数部１３は、文字列情報取得部１２により取得された第１の文字列情報毎に、当該文字列情報が示す文字列の文字数を計数する文字数計数手段である。文字数計数部１３は、文字列を構成する文字の個数をカウントすることで第１の文字列情報毎の文字数を取得する。具体的には、文字数計数部１３は、例えば「サ」については１文字、「イトウ」については３文字と計数する。文字数計数部１３により計数された第１の文字列情報毎の文字数は、後述する検索部１５により利用される。 The character number counting unit 13 is a character number counting unit that counts the number of characters of the character string indicated by the character string information for each first character string information acquired by the character string information acquiring unit 12. The character number counting unit 13 obtains the number of characters for each first character string information by counting the number of characters constituting the character string. Specifically, the character count unit 13 counts, for example, 1 character for “sa” and 3 characters for “Ito”. The number of characters for each first character string information counted by the character number counting unit 13 is used by the search unit 15 described later.

文字列情報データベース１４は、情報検索装置１において検索対象となる文字列情報を含むデータ（レコード）を記憶する記憶手段である。具体的には、文字列情報データベース１４は、人の氏名を示す文字列情報と電話番号を示す情報とを関連付けて記憶した電話帳データベースである。文字列情報データベース１４には、例えば、氏名の表記をひらがな、カタカナ及び漢字等により示す文字列情報（例えば「斎藤洋子」）と、氏名のヨミをカタカナにより示す文字列情報（例えば「サイトウヨウコ」）と、電話番号を示す情報とを相互に関連付けたレコード情報が、ユーザによる操作（情報入力及び登録）等により予め複数記憶されている。 The character string information database 14 is a storage unit that stores data (record) including character string information to be searched in the information search apparatus 1. Specifically, the character string information database 14 is a phone book database that stores character string information indicating a person's name and information indicating a telephone number in association with each other. In the character string information database 14, for example, character string information (for example, “Yoko Saito”) that indicates the name in hiragana, katakana and kanji, and character string information (for example, “Saito Yoko”) that indicates the name's reading in katakana. ) And information indicating the telephone number are stored in advance by a user operation (information input and registration) or the like.

検索部１５は、文字列情報取得部１２により取得された検索用情報のうちの第１の文字列情報を用いて、文字数計数部１３により計数された第１の文字列情報毎の文字数に応じた検索を実行することにより検索結果（第１の検索結果）を取得する検索手段である。第２検索部１６は、文字列情報取得部１２により取得された検索用情報のうちの第２の文字列情報を用いて検索を実行することにより検索結果（第２の検索結果）を取得する検索手段（第２の検索手段）である。検索結果出力部１７は、第１の検索結果と第２の検索結果とを総合的に評価した結果に基づいて最終的にユーザに提示する検索結果を出力する検索結果出力手段である。検索部１５、第２検索部１６、及び検索結果出力部１７の具体的な機能については、以下に示す具体例の中で詳細に説明する。 The search unit 15 uses the first character string information of the search information acquired by the character string information acquisition unit 12 and uses the first character string information counted by the character number counting unit 13 according to the number of characters for each first character string information. Search means for acquiring a search result (first search result) by executing the search. The second search unit 16 acquires a search result (second search result) by executing a search using the second character string information among the search information acquired by the character string information acquisition unit 12. Search means (second search means). The search result output unit 17 is a search result output unit that outputs a search result that is finally presented to the user based on a result of comprehensive evaluation of the first search result and the second search result. Specific functions of the search unit 15, the second search unit 16, and the search result output unit 17 will be described in detail in the specific examples shown below.

（第１の例）
第１の例における情報検索装置１では、検索部１５Ａは、文字数が１文字の文字列情報を除外する。具体的には、検索部１５Ａは、文字列情報取得部１２が検索用情報として取得した文字列情報のうち第１の文字列情報について、文字数計数部１３によって計数された文字数を取得し、当該文字数が１文字であると判定した場合に、当該第１の文字列情報を検索用情報から除外（削除・破棄）する。 (First example)
In the information search device 1 in the first example, the search unit 15A excludes character string information having one character. Specifically, the search unit 15A acquires the number of characters counted by the character number counting unit 13 for the first character string information among the character string information acquired by the character string information acquisition unit 12 as the search information. When it is determined that the number of characters is one, the first character string information is excluded (deleted / discarded) from the search information.

例えばユーザが「加護（かご）」という内容を意図して発話した際に、音声認識部１１が誤認識して取得した第１の文字列情報「タ／ゴ」が、文字列情報取得部１２により検索用情報として取得された場合を考える。この場合、検索部１５Ａは、文字数計数部１３が計数した各第１の文字列情報（「タ」及び「ゴ」）の文字数を取得し、これらの第１の文字列情報の文字数が１文字であるか否かを判定する。「タ」及び「ゴ」はいずれも１文字であるため、検索部１５Ａは、「タ」及び「ゴ」を検索用情報から除外する。検索部１５Ａは、文字数が２文字以上の第１の文字列情報（例えば「カゴ」、「タナカ」等）については除外せず、当該文字列情報を検索キーワードとして用いて検索を実行する。 For example, when the user speaks with the intention of “protection (carriage)”, the first character string information “ta / go” acquired by erroneous recognition by the voice recognition unit 11 is the character string information acquisition unit 12. Consider the case where the information is obtained as search information. In this case, the search unit 15A acquires the number of characters of each first character string information (“Ta” and “Go”) counted by the character number counting unit 13, and the number of characters of these first character string information is one character. It is determined whether or not. Since both “ta” and “go” are one character, the search unit 15A excludes “ta” and “go” from the search information. The search unit 15A does not exclude the first character string information (for example, “Kago”, “Tanaka”, etc.) having two or more characters, and executes the search using the character string information as a search keyword.

検索部１５Ａは、例えば「あいまい検索」による検索を実行する。具体的には、検索部１５Ａは、個々の第１の文字列情報を個々の検索キーワード（検索用文字列）とし、検索キーワード毎に、検索対象の文字列情報（文字列情報データベース１４に記憶された文字列情報）との比較を行い、ある特定の基準を満たすか否かを判定し、当該基準を満たすと判定した場合に、当該検索対象の文字列情報を含むレコード情報を検索結果として取得する検索方法である。 The search unit 15A executes a search by “fuzzy search”, for example. Specifically, the search unit 15A sets the individual first character string information as individual search keywords (search character strings), and stores the search target character string information (character string information database 14) for each search keyword. If it is determined whether or not a certain criterion is satisfied, and it is determined that the criterion is satisfied, the record information including the character string information to be searched is used as a search result. Search method to obtain.

あいまい検索の具体例としては、完全一致検索及び部分一致検索により検索する方法がある。完全一致検索とは、検索キーワードと検索対象の文字列情報とが完全に一致するか否かを判定し、完全に一致すると判定した場合に、当該文字列情報を検索結果として取得する検索方法である。また、部分一致検索とは、検索キーワードが検索対象の文字列情報に部分的に含まれるか否かを判定し、含まれると判定した場合に、当該文字列情報を検索結果として取得する検索方法である。部分一致検索には、検索対象の文字列情報の前方部分、後方部分、及びこれら以外の部分に検索キーワードが含まれるか否かを判定する方法（前方部分一致、後方部分一致、及び部分一致）等の種類がある。部分一致検索においては、これらの種類のうち１つだけを用いてもよいし、複数組み合わせて用いてもよい。 As a specific example of the fuzzy search, there is a method of searching by complete match search or partial match search. The exact match search is a search method that determines whether or not the search keyword and the character string information to be searched match completely, and if it is determined that the search keyword matches completely, the character string information is acquired as a search result. is there. The partial match search is a search method for determining whether or not a search keyword is partially included in character string information to be searched, and acquiring the character string information as a search result when it is determined that the search keyword is included. It is. In partial match search, a method for determining whether or not a search keyword is included in the front part, rear part, and other parts of the character string information to be searched (front part match, rear part match, and part match) There are different types. In the partial match search, only one of these types may be used, or a plurality may be used in combination.

例えば、検索キーワードが「加護」である場合、検索対象の文字列情報が「加護」であれば完全一致検索により検索結果として取得される。また、検索対象の文字列情報が「加護ちゃん」、「東京の加護」、及び「東京の加護ちゃん」の場合には、それぞれ前方部分一致、後方部分一致、及び部分一致に分類される部分一致検索により検索結果として取得される。 For example, when the search keyword is “protection”, if the search target character string information is “protection”, it is acquired as a search result by an exact match search. In addition, when the search target character string information is “Kago-chan”, “Tokyo Kago-chan”, and “Tokyo Kago-chan”, partial matches classified as front partial match, rear partial match, and partial match, respectively. Acquired as a search result by search.

その他のあいまい検索の例として、子母音マッチングによる検索方法がある。子母音マッチングとは、検索キーワードと検索対象の文字列情報との比較において、例えば第一ワード（１番目の文字）の母音同士が一致し且つ第２ワード（２番目の文字）の子音同士及び母音同士がそれぞれ一致するか否かを判定し、一致すると判定した場合に検索対象の文字列情報を検索結果として取得する方法である。ただし、ここで用いられる判定方法は上記に限定されず、例えば上記条件を緩和して、第１ワードの母音同士及び第２ワードの母音同士がそれぞれ一致する場合等に、検索対象の文字列情報を検索結果として取得するようにしてもよい。 Another example of fuzzy search is a search method based on consonant matching. In the comparison between the search keyword and the character string information to be searched, the consonant matching means that, for example, the vowels of the first word (first character) match and the consonants of the second word (second character) This is a method of determining whether or not vowels match each other, and obtaining character string information to be searched as a search result when it is determined that they match. However, the determination method used here is not limited to the above. For example, when the above conditions are relaxed and the vowels of the first word and the vowels of the second word match each other, the character string information to be searched May be acquired as a search result.

例えば、検索キーワードが「加護（ｋａｇｏ）」である場合、検索対象の文字列情報が「田護（ｔａｇｏ）」であれば、第１ワードの母音（ａ）同士が一致し且つ第２ワードの子音（ｇ）同士及び母音（ｏ）同士がそれぞれ一致するため、上記いずれの子母音マッチングによる検索方法によっても、検索対象の文字列情報が検索結果として取得される。一方、検索対象の文字列情報が「賀古（ｋａｋｏ）」であれば、第１ワードの母音（ａ）同士及び第２ワードの母音（ａ）同士がそれぞれ一致するが、第２ワードの子音は一致しない（ｇとｋ）。したがって、この場合には、検索対象の文字列情報は、前者の子母音マッチングによる検索方法では検索結果として取得されないが、後者の子母音マッチングによる検索方法では検索結果として取得される。 For example, if the search keyword is “kago” and the search target character string information is “tago”, the vowels (a) of the first word match and the second word Since the consonants (g) and the vowels (o) match each other, the search target character string information is acquired as a search result by any of the above-described search methods using consonant matching. On the other hand, if the character string information to be searched is “kako”, the vowels (a) of the first word and the vowels (a) of the second word match, but the consonant of the second word is Does not match (g and k). Therefore, in this case, the search target character string information is not acquired as a search result by the former search method using consonant matching, but is acquired as a search result by the latter search method using consonant matching.

検索部１５Ａは、２文字以上の第１の文字列情報を用いてあいまい検索を実行する前に、当該第１の文字列情報の一部又は全部として含まれる部分文字列を特定の文字に変換する処理を行ってもよい。具体的には、例えば、検索部１５Ａは、変換前の文字列と変換後の文字列との対応情報を記憶する辞書データを予め保持しておく。検索部１５Ａは、当該辞書データを参照することにより、第１の文字列情報に含まれる全ての部分文字列について、辞書データに記憶されている変換前の文字列と一致するか否かを判定する。検索部１５Ａは、上記判定により変換前の文字列と一致すると判定された部分文字列を、当該変換前の文字列と対応する変換後の文字列に変換する。これにより、検索部１５Ａは、変換後の第１の文字列情報を取得する。 The search unit 15A converts a partial character string included as part or all of the first character string information into a specific character before executing an ambiguous search using the first character string information of two or more characters. You may perform the process to do. Specifically, for example, the search unit 15A holds in advance dictionary data that stores correspondence information between a character string before conversion and a character string after conversion. The search unit 15A refers to the dictionary data to determine whether or not all partial character strings included in the first character string information match the character strings before conversion stored in the dictionary data. To do. The search unit 15A converts the partial character string determined to match the character string before conversion by the above determination into a character string after conversion corresponding to the character string before conversion. Thereby, the search unit 15A acquires the converted first character string information.

図４は、特定の文字についての変換の一例（長音ヨミ変換）を示す図である。このような変換により、例えば「オーノ」及び「サイトー」等の長音記号を含む第１の文字列情報から、「オオノ」及び「サイトウ」等の長音記号を含まない第１の文字列情報を取得することができる。通常、人の氏名のフリガナ（ヨミを示す文字列情報）として登録されるデータには長音記号は含まれないと想定されるため、上記変換処理によって、より検索に適した検索キーワードを取得することができる。なお、検索部１５Ａは、変換後の第１の文字列情報のみを用いて検索を実行してもよいし、変換前及び変換後両方の第１の文字列情報を用いて検索を実行してもよい。 FIG. 4 is a diagram illustrating an example of conversion (long sound conversion) for a specific character. By such conversion, for example, first character string information that does not include a long sound symbol such as “Oono” or “Saito” is obtained from first character string information that includes a long sound symbol such as “Ono” or “Saito”. can do. Usually, it is assumed that the data registered as the reading of the person's full name (character string information indicating yomi) does not contain a long clef symbol, and therefore, by using the above conversion process, a search keyword more suitable for the search is acquired. Can do. The search unit 15A may execute the search using only the first character string information after conversion, or may execute the search using both the first character string information before conversion and after conversion. Also good.

検索部１５Ａは、文字数が２文字以上の第１の文字列情報を検索キーワードとして用いて文字列情報データベース１４を検索（文字列情報毎に検索）し、検索用情報毎に、検索ヒットした（検索により取得された）文字列情報データベース１４上のレコードに関連付けて、検索ヒット回数に応じた検索スコア（初期値は「０」）を算出する。「検索スコア」は、１回の検索ヒットに対して固定値（例えば「１」）を加算するものであってもよいし、例えば第１候補（１−Ｂｅｓｔ認識結果）に対応する第１の文字列情報を検索キーワードとして検索ヒットした場合には「１」を加算し、第２候補（２−Ｂｅｓｔ認識結果）に対応する第１の文字列情報を検索キーワードとして検索ヒットした場合には「０．８」を加算する等、音声認識順位がより高い文字列情報を用いて検索ヒットした際により多くの検索スコアが加算されるように設定されてもよい。 The search unit 15A searches the character string information database 14 using the first character string information having two or more characters as a search keyword (searches for each character string information), and finds a search hit for each search information ( A search score (initial value is “0”) corresponding to the number of search hits is calculated in association with the record on the character string information database 14 (obtained by the search). The “search score” may be a value obtained by adding a fixed value (for example, “1”) to a single search hit, for example, the first candidate corresponding to the first candidate (1-Best recognition result). “1” is added when a search hit is made using character string information as a search keyword, and “1” is added when a search hit is made using the first character string information corresponding to the second candidate (2-Best recognition result). It may be set such that more search scores are added when a search hit is made using character string information having a higher voice recognition rank, such as adding “0.8”.

本実施形態においては、検索スコアは後者の方法で計算されるものとする。ここで、各レコードに関連付けられた検索用情報毎の検索スコアを示す情報は、例えば情報検索装置１上において一時的に確保された記憶領域上に記憶され、後述する検索結果出力部１７が検索結果を出力する際等に参照される。 In this embodiment, the search score is calculated by the latter method. Here, information indicating a search score for each search information associated with each record is stored, for example, in a storage area temporarily secured on the information search apparatus 1, and the search result output unit 17 described later searches the information. Referenced when outputting the result.

第２検索部１６は、検索用情報として取得した第２の文字列情報を用いて、検索部１５Ａと同様に、「あいまい検索」による検索を実行する。ただし、第２の文字列情報にヨミを示す情報（子音及び母音を示す情報）が含まれない場合には、子母音マッチングによるあいまい検索を除く。また、第２検索部１６は、検索部１５Ａと同様の方法により、検索により取得されたレコード毎に関連付けて検索用情報毎の検索スコアを算出する。 The second search unit 16 uses the second character string information acquired as the search information to perform a search by “fuzzy search”, similar to the search unit 15A. However, in the case where the second character string information does not include reading information (information indicating consonants and vowels), fuzzy search by consonant matching is excluded. Further, the second search unit 16 calculates a search score for each search information in association with each record acquired by the search by the same method as the search unit 15A.

検索結果出力部１７は、検索用情報毎に、検索部１５Ａが第１の文字列情報を用いた検索により算出した検索スコアと、第２検索部１６が第２の文字列情報を用いた検索により算出した検索スコアとの和を算出する。そして、検索結果出力部１７は、検索スコアの和が大きい順に、文字列情報データベース１３に記憶されたレコード情報に含まれる文字列情報（例えば氏名及び電話情報を示す情報）を並べて検索結果として出力する。 The search result output unit 17 searches the search score calculated by the search using the first character string information by the search unit 15A and the search using the second character string information by the second search unit 16 for each search information. The sum with the search score calculated by the above is calculated. And the search result output part 17 arranges the character string information (for example, information which shows a name and telephone information) contained in the record information memorize | stored in the character string information database 13 in the order with the largest sum of search scores, and outputs it as a search result. To do.

ここで、複数の検索用情報（複数の音声認識結果の候補）を用いて検索を行った場合には、上述のとおり、各検索結果（レコード）の検索スコアは、検索用情報毎に算出されることになる。この場合には、例えば、検索結果出力部１７は、各レコードについて、検索用情報毎に算出された検索スコアのうち最大の検索スコアを各レコードに関連付けられる検索スコア（代表スコア）として抽出する。これについて、以下に具体例を用いて説明する。 Here, when a search is performed using a plurality of search information (a plurality of speech recognition result candidates), as described above, the search score of each search result (record) is calculated for each search information. Will be. In this case, for example, the search result output unit 17 extracts, for each record, the maximum search score among the search scores calculated for each search information as a search score (representative score) associated with each record. This will be described below using a specific example.

例えば、第１候補の音声認識結果に対応する検索用情報が「田／後（タ／ゴ）」であり、第２候補の音声認識結果に対応する検索用情報が「加／護（カ／ゴ）」であったような場合を考える。この場合において、「加護（カゴ）」を含むレコードの第１候補についての検索スコアが「１」であり、第２候補についての検索スコアが「１．６」であったような場合には、検索結果出力部１７は、「加護（カゴ）」を含むレコードに関連付けられる代表スコアとして「１．６」を取得する。これにより、各レコードについて、最も検索スコアが高く算出された（検索適合度が高かった）方の検索スコアを代表スコアとして抽出できる。このような代表スコアは、レコード間の検索順位（検索適合順位）を比較するにあたって適切な指標の１つと考えられる。 For example, the search information corresponding to the voice recognition result of the first candidate is “Ta / go (ta / go)”, and the search information corresponding to the voice recognition result of the second candidate is “ka / go (ka / go). I think about the case where In this case, when the search score for the first candidate of the record including “bago” is “1” and the search score for the second candidate is “1.6”, The search result output unit 17 acquires “1.6” as the representative score associated with the record including “protection (cage)”. Thereby, for each record, the search score with the highest search score (having the highest search fitness) can be extracted as the representative score. Such a representative score is considered to be one of appropriate indexes for comparing search ranks (search suitability ranks) between records.

このような処理を行うことにより、検索結果出力部１７は、レコード毎に一意に関連付けられた代表スコアを取得する。検索結果出力部１７は、例えば情報検索装置１が備えるディスプレイ等の出力部１０５に、当該代表スコアが高い順に検索結果（例えば氏名及び電話情報を示す情報）を検索スコア順に並べて出力表示する。 By performing such processing, the search result output unit 17 acquires a representative score uniquely associated with each record. The search result output unit 17 outputs and displays the search results (for example, information indicating name and telephone information) in the order of the search score on the output unit 105 such as a display provided in the information search device 1 in descending order of the representative score.

ここで、検索結果出力部１７が第１の検索結果（ヨミを示す第１の文字列情報による検索結果）と第２の検索結果（表記を示す第２の文字列情報による検索結果）の両方の検索スコアに基づいて検索結果を出力する理由について、上記とは別の例を用いて説明する。 Here, the search result output unit 17 displays both the first search result (search result based on the first character string information indicating the reading) and the second search result (search result based on the second character string information indicating the notation). The reason for outputting the search result based on the search score will be described using an example different from the above.

例えば、文字列情報データベース１４に、「斎藤洋子（サイトウヨウコ）」、「斎藤ひろし（サイトウヒロシ）」、「小野ヨーコ」等の文字列情報を含むレコードが記憶されている場合を考える。この場合において、ヨミを示す第１の文字列情報として「サイトウ／ヨウコ」が取得され、表記を示す第２の文字列情報（表記）として「斎藤／陽子」が取得された場合、第１の文字列情報による検索では、「斎藤洋子」と「小野ヨーコ」の両方に同一の検索スコアが付き、第２の文字列情報による検索では、「斎藤洋子」と「斎藤ひろし」の両方に同一の検索スコアが付くような場合が想定される。 For example, consider a case in which records including character string information such as “Yoko Saito”, “Hiroshi Saito”, “Yoko Ono”, and the like are stored in the character string information database 14. In this case, when “Saito / Yoko” is acquired as the first character string information indicating the reading and “Saito / Yoko” is acquired as the second character string information (notation) indicating the notation, In the search based on character string information, the same search score is attached to both “Yoko Saito” and “Yoko Ono”, and in the search based on the second character string information, the same is applied to both “Yoko Saito” and “Hiroshi Saito”. A case where a search score is attached is assumed.

このように、第１の文字列情報又は第２の文字列情報のいずれか一方のみを用いて検索した場合には、検索スコアに差がつかない場合であっても、第１の文字列情報及び第２の文字列情報の両方を総合（検索スコアを合算）して判断することにより、ユーザの意図する「斎藤洋子」の検索スコアが大きくなるようにし、上位に表示させることが期待できる。 As described above, when a search is performed using only one of the first character string information and the second character string information, even if there is no difference in the search score, the first character string information By determining both the second character string information and the total of the second character string information (summing the search score), the search score of “Yoko Saito” intended by the user can be increased, and can be expected to be displayed at the top.

続いて、図５及び図６を用いて、第１の例における情報検索装置１により実行される処理の動作について説明する。図５は、情報検索装置１により実行される処理を説明するために用いる情報の例を示す図である。図６は、第１の例における情報検索装置１の動作を示すフロー図である。また、以下の説明において、検索部１５Ａ及び第２検索部１６は、上述の完全一致検索及び部分一致検索（前方部分一致、後方部分一致、及び部分一致の全てを含む）をあいまい検索として実行するものとする。 Subsequently, an operation of processing executed by the information search apparatus 1 in the first example will be described with reference to FIGS. 5 and 6. FIG. 5 is a diagram illustrating an example of information used for explaining processing executed by the information search apparatus 1. FIG. 6 is a flowchart showing the operation of the information search apparatus 1 in the first example. In the following description, the search unit 15A and the second search unit 16 execute the above-described complete match search and partial match search (including all of the front partial match, the rear partial match, and the partial match) as a fuzzy search. Shall.

まず、音声認識部１１は、ユーザが「加護さんに電話」という内容を意図して発話した音声を入力し、音声認識処理を実行する（ステップＳ１０１）。続いて、文字列情報取得部１２は、当該音声認識処理により生成された文字列情報（第１候補の音声認識結果として「タ／ゴ／サン／ニ／デンワ」及び「田／後／さん／に／電話」、第２候補の音声認識結果として「カゴ／サン／ニ／デンワ」及び「加護／さん／に／電話」）を取得する（ステップＳ１０２）。文字列情報取得部１２は、第１候補の音声認識結果を取得し（ステップＳ１０３）、敬称削除及び名詞取得の処理を実行することにより、検索用情報（「タ／ゴ」及び「田／後」）を取得する（ステップＳ１０４、文字列情報取得ステップ）。ここで、ステップＳ１０４の処理を行わない場合には、ステップＳ１０３が文字列情報取得ステップに相当する。 First, the voice recognition unit 11 inputs a voice uttered by the user with the intention of “calling Kago-san” and executes a voice recognition process (step S101). Subsequently, the character string information acquisition unit 12 generates character string information generated by the voice recognition processing (“Ta / Go / San / Ni / Denwa” and “Ta / go / san / san / N / telephone ”and“ Kago / San / Ni / Denwa ”and“ Kago / san / ni / telephone ”) are acquired as the second candidate speech recognition results (step S102). The character string information acquisition unit 12 acquires the speech recognition result of the first candidate (step S103), and executes search processing (“TA / GO” and “TA / GO” by executing the title deletion and noun acquisition processing. ") Is acquired (step S104, character string information acquisition step). Here, when the process of step S104 is not performed, step S103 corresponds to a character string information acquisition step.

続いて、文字数計数部１３は、文字列情報取得部１２が取得したヨミを示す検索用情報（「タ／ゴ」）に含まれる各文字列情報の文字数を計数する（ステップＳ１０５，文字数計数ステップ）。これにより、文字数計数部１３は、「タ」及び「ゴ」それぞれの文字数（１文字）を取得する。 Subsequently, the character number counting unit 13 counts the number of characters of each character string information included in the search information (“t / go”) indicating the reading obtained by the character string information obtaining unit 12 (step S105, character number counting step). ). Thereby, the character number counting unit 13 acquires the number of characters (one character) of each of “T” and “G”.

続いて、検索部１５Ａは、ステップＳ１０５で計数された文字数に基づいて検索処理を実行する（ステップＳ１０６、検索ステップ）。具体的には、検索部１５Ａは、文字列情報取得部１２が取得した文字列情報「タ」を取得し（ステップＳ１０６Ａ）、文字数が１文字か否かを判定する（ステップＳ１０６Ｂ）。ここで、「タ」の文字数は１文字であるため、検索部１５Ａは、「タ」を用いた検索処理（あいまい検索）をスキップ（「タ」を検索用情報から除外）する（ステップＳ１０６Ｂ：ＹＥＳ）。その後、次の文字列情報「ゴ」についても、検索部１５Ａにより同様の処理がされる（ステップＳ１０６Ｅ：ＮＯ，Ｓ１０６Ａ，Ｓ１０６Ｂ：ＹＥＳ）。 Subsequently, the search unit 15A executes a search process based on the number of characters counted in step S105 (step S106, search step). Specifically, the search unit 15A acquires the character string information “ta” acquired by the character string information acquisition unit 12 (step S106A), and determines whether the number of characters is one character (step S106B). Here, since the number of characters of “ta” is one, the search unit 15A skips the search process (fuzzy search) using “ta” (excludes “ta” from the search information) (step S106B: YES) Thereafter, the same processing is performed by the search unit 15A for the next character string information “go” (step S106E: NO, S106A, S106B: YES).

一方、第２検索部１６は、文字列情報取得部１２が取得した表記を示す文字列情報（「田／後」）から、文字列情報「田」を取得し（ステップＳ１０７）、当該文字列情報を用いてあいまい検索を実行する（ステップＳ１０８）。ここで、第２検索部１６は、当該あいまい検索により検索ヒットした「田原」及び「田辺」を含むレコードの第１候補についての検索スコアを１だけ加算する。 On the other hand, the second search unit 16 acquires the character string information “ta” from the character string information (“TA / GO”) indicating the notation acquired by the character string information acquisition unit 12 (step S107). A fuzzy search is executed using the information (step S108). Here, the second search unit 16 adds 1 to the search score for the first candidate of the record including “Tahara” and “Tanabe” that has been searched for by the fuzzy search.

続いて、第２検索部１６は、「田」の次の文字列情報「後」を用いたあいまい検索を実行する（ステップＳ１０９：ＮＯ，Ｓ１０７，Ｓ１０８）。ここで、第２検索部１６は、当該あいまい検索により検索ヒットした「後藤」を含むレコードの第１候補についての検索スコアを１だけ加算する。 Subsequently, the second search unit 16 performs an ambiguous search using the character string information “after” of “ta” (steps S109: NO, S107, S108). Here, the second search unit 16 adds 1 to the search score for the first candidate of the record including “Goto” that has been searched for by the fuzzy search.

第１候補の音声認識結果に対応する検索用情報に含まれる全ての文字列情報についての検索部１５Ａ及び第２検索部１６による検索処理が完了したら（ステップＳ１０６Ｅ：ＹＥＳ，Ｓ１０９：ＹＥＳ）、それぞれの検索により得られたレコード毎の検索スコアが合算されて、第１候補の音声認識結果に対応する検索用情報に関する各レコードの検索スコアが確定する（ステップＳ１１０）。即ち、「後藤」を含むレコード、「田原」を含むレコード、及び「田辺」を含むレコードの第１候補についての検索スコアがそれぞれ「１」と確定する。 When the search processing by the search unit 15A and the second search unit 16 for all character string information included in the search information corresponding to the speech recognition result of the first candidate is completed (steps S106E: YES, S109: YES), respectively. The search scores for each record obtained by the above search are added together, and the search score of each record related to the search information corresponding to the speech recognition result of the first candidate is determined (step S110). That is, the search score for the first candidate of the record including “Goto”, the record including “Tahara”, and the record including “Tanabe” is determined as “1”.

続いて、文字列情報取得部１２は、第２候補の音声認識結果を取得し（ステップＳ１１１：ＮＯ，Ｓ１０３）、敬称削除及び名詞取得の処理を実行することにより、検索用情報（「カゴ」及び「加護」）を取得する（ステップＳ１０４、文字列情報取得ステップ）。 Subsequently, the character string information acquisition unit 12 acquires the speech recognition result of the second candidate (steps S111: NO, S103), and executes the process of deleting the title and acquiring the noun, thereby obtaining the search information (“Kago”). And “Protect”) (step S104, character string information acquisition step).

続いて、文字数計数部１３は、文字列情報取得部１２が取得したヨミを示す検索用情報（「カゴ」）に含まれる各文字列情報の文字数を計数する（ステップＳ１０５，文字数計数ステップ）。これにより、文字数計数部１３は、「カゴ」の文字数（２文字）を取得する。続いて、検索部１５Ａは、「カゴ」を取得し（ステップＳ１０６Ａ）、文字数が１文字か否かを判定する（ステップＳ１０６Ｂ）。「カゴ」の文字数は２文字であるため、検索部１５Ａは、「カゴ」について特定の文字を変換する処理（例えば図４に示す長音ヨミ変換）を必要に応じて実行し（ステップＳ１０６Ｂ：ＮＯ，Ｓ１０６Ｃ）、「カゴ」を用いてあいまい検索を実行する（ステップＳ１０６Ｄ）。検索部１５Ａは、当該あいまい検索により検索ヒットした「カゴ」を含むレコードの第２候補についての検索スコアを０．８だけ加算する。 Subsequently, the character number counting unit 13 counts the number of characters of each character string information included in the search information (“cage”) indicating the reading acquired by the character string information acquiring unit 12 (step S105, character number counting step). Thereby, the character count unit 13 obtains the number of characters “2” (two characters). Subsequently, the search unit 15A acquires “basket” (step S106A) and determines whether the number of characters is one (step S106B). Since the number of characters of “Kago” is 2, the search unit 15A executes a process of converting a specific character for “Kago” (for example, the long sound conversion shown in FIG. 4) as necessary (Step S106B: NO). , S106C), a fuzzy search is executed using “basket” (step S106D). The search unit 15A adds 0.8 to the search score for the second candidate for the record including “Kago” that has been searched for by the fuzzy search.

一方、第２検索部１６は、文字列情報取得部１２が取得した表記を示す検索用情報（「加護」）を取得し（ステップＳ１０７）、当該文字列情報を用いてあいまい検索を実行する（ステップＳ１０８）。ここで、第２検索部１６は、当該あいまい検索により検索ヒットした「加護」を含むレコードの第２候補についての検索スコアを０．８だけ加算する。 On the other hand, the second search unit 16 acquires search information (“protection”) indicating the notation acquired by the character string information acquisition unit 12 (step S107), and executes an ambiguous search using the character string information (step S107). Step S108). Here, the second search unit 16 adds 0.8 to the search score for the second candidate of the record including “protect” that has been hit by the fuzzy search.

検索部１５Ａ及び第２検索部１６によって検索用情報に含まれる全ての文字列情報についての検索処理が完了したら（ステップＳ１０６Ｅ：ＹＥＳ，Ｓ１０９：ＹＥＳ）、それぞれの検索により得られたレコード毎の検索スコアが合算されて、第２候補の音声認識結果に対応する検索用情報に関する各レコードの検索スコアが確定する（ステップＳ１１０）。即ち、「加護」を含むレコードの第２候補についての検索スコアが「１．６」と確定する。 When the search processing for all character string information included in the search information is completed by the search unit 15A and the second search unit 16 (step S106E: YES, S109: YES), search for each record obtained by each search The scores are added together to determine the search score of each record related to the search information corresponding to the second candidate speech recognition result (step S110). That is, the search score for the second candidate for the record including “protection” is determined as “1.6”.

全ての音声認識結果の候補について処理が完了したら（ステップＳ１１１：ＹＥＳ）、検索結果出力部１７が、検索部１５Ａ及び第２検索部１６により取得された検索結果について代表スコアを抽出する。検索結果出力部１７は、当該検索結果を代表スコア順（「加護」（１．６）→「後藤」（１）→「田原」（１）→「田辺」（１）→…。ここで、括弧内の数字は代表スコアを示す。）に並べて、情報検索装置１が備えるディスプレイ等の出力部１０５に出力する（ステップＳ１１２、検索結果出力ステップ）。 When the processing is completed for all speech recognition result candidates (step S111: YES), the search result output unit 17 extracts a representative score for the search results acquired by the search unit 15A and the second search unit 16. The search result output unit 17 displays the search results in the order of representative scores (“Kago” (1.6) → “Goto” (1) → “Tahara” (1) → “Tanabe” (1) →... The numbers in parentheses are representative scores.) Are output to the output unit 105 such as a display provided in the information search device 1 (step S112, search result output step).

以上の処理において、仮に文字数計数部１３及び検索部１５ＡによるステップＳ１０６及びステップＳ１０７Ａの処理がされなければ、「タ」及び「ゴ」についての検索処理がスキップされず、ステップＳ１０７Ｃにおいてあいまい検索がされる。その結果、「後藤」を含むレコード、「田原」を含むレコード、及び「田辺」を含むレコードの検索スコア（代表スコア）は、「２」となり、「加護」を含むレコードの検索スコア（代表スコア）である「１．６」を上回る。したがって、ステップＳ１１３において検索結果出力部１７により並べられる順序は、「後藤」→「田原」→「田辺」→「加護」となってしまい、ユーザが意図（所望）する「加護」を含むレコードよりもユーザの意図しないレコードが上位に表示されてしまうという問題が生じる。特に上記例のように、１文字のキーワードを用いてあいまい検索を行った場合には、ユーザの意図しないレコードが多数検索ヒットして検索スコアが高く算出されてしまうおそれが高くなると考えられる。 In the above processing, if the processing of step S106 and step S107A by the character count unit 13 and the search unit 15A is not performed, the search processing for “ta” and “go” is not skipped, and an ambiguous search is performed in step S107C. The As a result, the search score (representative score) of the record including “Goto”, the record including “Tahara”, and the record including “Tanabe” is “2”, and the search score (representative score) of the record including “Kago” ) Which is “1.6”. Therefore, the order arranged by the search result output unit 17 in step S113 is “Goto” → “Tahara” → “Tanabe” → “protection”, and the record including “protection” intended (desired) by the user. However, there arises a problem that a record unintended by the user is displayed at the top. In particular, as in the above example, when a fuzzy search is performed using a one-character keyword, it is considered that there is a high risk that a large number of records that are not intended by the user will be searched and the search score will be calculated high.

第１の例における情報検索装置１によれば、上述のとおり、検索誤りを生じる可能性の高い文字列情報（文字数が１文字の文字列情報）を検索キーワードから除外することにより、検索誤りによる影響を低減できる。即ち、検索誤りによってユーザの意図しない検索結果が大量に表示（上位に表示）されてしまい、ユーザの所望する検索結果が表示されない（又は下位に表示される）といった状況が生じるおそれを低減できる。 According to the information search device 1 in the first example, as described above, by removing character string information (character string information with one character) that is likely to cause a search error from the search keyword, The impact can be reduced. That is, it is possible to reduce a possibility that a search error unintended by the user is displayed (displayed at a higher level) due to a search error and a search result desired by the user is not displayed (or displayed at a lower level).

（第２の例）
第２の例における情報検索装置１について、第１の例と主に相違する点について説明する。第２の例における情報検索装置１では、文字列情報取得部１２は、予め順序付けされた文字列情報を検索用情報として取得する。文字列情報取得部１２は、例えば図３に示すような音声認識結果から、分節された各文字列を先頭から順に取得し、取得した順序を各文字列の順序として対応付けることができる。また、検索部１５Ｂは、順序付けに基づいて文字数が１文字の異なる２つの文字列情報が互いに隣接するか否かを判定する。具体的な判定処理の手順の一例については、後述のフローの説明において示す。検索部１５Ｂは、当該２つの文字列情報が互いに隣接すると判定した場合には、当該２つの文字列情報が示す文字列同士を連結して新たな文字列を生成し、当該新たな文字列を示す情報を含む文字数が２文字以上の文字列情報を用いて検索を実行する。 (Second example)
The information search device 1 in the second example will be described mainly regarding differences from the first example. In the information search device 1 in the second example, the character string information acquisition unit 12 acquires character string information that has been ordered in advance as search information. The character string information acquisition unit 12 can acquire each segmented character string in order from the top, for example, from the speech recognition result as shown in FIG. 3, and can associate the acquired order as the order of each character string. In addition, the search unit 15B determines whether two character string information having different character numbers are adjacent to each other based on the ordering. An example of a specific determination processing procedure will be described in the description of the flow described later. When the search unit 15B determines that the two character string information is adjacent to each other, the search unit 15B generates a new character string by connecting the character strings indicated by the two character string information, and The search is executed using character string information including two or more characters including the information to be indicated.

続いて、図５及び図７を用いて、第２の例における情報検索装置１により実行される処理の動作を説明する。図７は、第２の例における情報検索装置１の動作を示すフロー図である。ただし、ステップＳ２０１〜Ｓ２０５，Ｓ２０７〜Ｓ２１２の処理は、図６に示す第１の例における情報検索装置１により実行されるステップＳ１０１〜Ｓ１０５，Ｓ１０７〜Ｓ１１２の処理と同様であるため詳細な説明を省略する。また、以下の説明において、検索部１５Ｂ及び第２検索部１６は、上述の完全一致検索及び部分一致検索（前方部分一致、後方部分一致、及び部分一致の全てを含む）をあいまい検索として実行するものとする。 Subsequently, an operation of processing executed by the information search apparatus 1 in the second example will be described with reference to FIGS. 5 and 7. FIG. 7 is a flowchart showing the operation of the information search apparatus 1 in the second example. However, steps S201 to S205 and S207 to S212 are the same as steps S101 to S105 and S107 to S112 executed by the information search apparatus 1 in the first example shown in FIG. Omitted. In the following description, the search unit 15B and the second search unit 16 execute the above-described complete match search and partial match search (including all of the front partial match, the rear partial match, and the partial match) as an ambiguous search. Shall.

ステップＳ２０１〜Ｓ２０４の処理によって、文字列情報取得部１２は、検索用情報（「タ／ゴ」及び「田／後」）を取得する。続いて、ステップＳ２０５の処理によって、文字数計数部１３は、文字列情報「タ」及び「ゴ」それぞれの文字数（１文字）を計数により取得する。 Through the processing in steps S201 to S204, the character string information acquisition unit 12 acquires search information ("ta / go" and "ta / go"). Subsequently, by the processing in step S205, the character count unit 13 obtains the number of characters (one character) of each of the character string information “ta” and “go” by counting.

続いて、検索部１５Ｂは、ステップＳ２０５で計数された文字数に基づいて検索処理を実行する（ステップＳ２０６、検索ステップ）。具体的には、検索部１５Ｂは、文字列情報取得部１２が取得した文字列情報「タ」を取得し（ステップＳ２０６Ａ）、文字数が１文字か否かを判定する（ステップＳ２０６Ｂ）。ここで、「タ」の文字数は１文字であるため、検索部１５Ｂは、１つ次の文字列情報（次分節）が１文字か否かを判定する（ステップＳ２０６Ｃ）。ここで、文字列情報「タ」の次の文字列情報「ゴ」も１文字であるため（ステップＳ２０６Ｃ：ＹＥＳ）、「タ」と「ゴ」とを連結して新たな文字列「タゴ」を生成する（ステップＳ２０６Ｄ）。 Subsequently, the search unit 15B performs a search process based on the number of characters counted in step S205 (step S206, search step). Specifically, the search unit 15B acquires the character string information “ta” acquired by the character string information acquisition unit 12 (step S206A), and determines whether the number of characters is one (step S206B). Here, since the number of characters of “Ta” is one character, the search unit 15B determines whether or not the first character string information (next segment) is one character (step S206C). Here, since the character string information “go” next to the character string information “tag” is also one character (step S206C: YES), “ta” and “go” are connected to form a new character string “tag”. Is generated (step S206D).

続いて、検索部１５Ｂは、当該新たな文字列「タゴ」について特定の文字を変換する処理（例えば図４に示す長音ヨミ変換）を必要に応じて実行し（ステップＳ２０６Ｅ）、「タゴ」を用いてあいまい検索を実行する（ステップＳ２０６Ｆ）。図５に示す文字列情報データベース１３に格納された文字列情報の中には「タゴ」を用いたあいまい検索により検索ヒットするレコードはないため、どのレコードの検索スコアも加算されない。 Subsequently, the search unit 15B executes a process of converting a specific character for the new character string “tag” (for example, the long sound conversion shown in FIG. 4) as necessary (step S206E). The fuzzy search is executed using the information (step S206F). In the character string information stored in the character string information database 13 shown in FIG. 5, there is no record that makes a search hit by an ambiguous search using “tag”, so the search score of any record is not added.

その後、第２検索部１６による検索処理（ステップＳ２０７〜Ｓ２０９）を含め、全ての音声認識結果の候補についての処理が完了したら（ステップＳ２１１：ＹＥＳ）、検索結果出力部１７が、検索部１５Ｂ及び第２検索部１６により取得された検索結果について代表スコアを抽出する。検索結果出力部１７は、当該検索結果を代表スコア順（「加護」（１．６）→「後藤」（１）→「田原」（１）→「田辺」（１）→…。ここで、括弧内の数字は代表スコアを示す。）に並べて、情報検索装置１が備えるディスプレイ等の出力部１０５に出力する（ステップＳ２１２、検索結果出力ステップ）。 After that, when the processing for all speech recognition result candidates including the search processing (steps S207 to S209) by the second search unit 16 is completed (step S211: YES), the search result output unit 17 includes the search unit 15B and the search unit 15B. A representative score is extracted for the search result acquired by the second search unit 16. The search result output unit 17 displays the search results in the order of representative scores (“Kago” (1.6) → “Goto” (1) → “Tahara” (1) → “Tanabe” (1) →... The numbers in parentheses indicate representative scores.) Are output to the output unit 105 such as a display provided in the information search device 1 (step S212, search result output step).

以上のように、第２の例における情報検索装置１によれば、検索部１５Ｂが互いに隣接する１文字の文字列情報（「タ」と「ゴ」）を連結して「タゴ」（新たな文字列）を生成し、当該新たな文字列を検索用文字列として用いて検索することによって、第１の例と同様に、検索誤りによる影響を低減できる。 As described above, according to the information search apparatus 1 in the second example, the search unit 15B connects character string information (“tag” and “go”) of adjacent one character to “tag” (new (Character string) is generated, and the new character string is used as a search character string to perform a search, so that the influence of a search error can be reduced as in the first example.

また、例えばユーザが「加護さんに電話」という内容を意図して発話した場合であって、音声認識誤りによって「カ／ゴ」と分節された文字列情報が得られた場合を考える。この場合、検索部１５Ｂが、互いに隣接する１文字の文字列情報である「カ」及び「ゴ」を除外するのではなく、これらの文字列情報を連結して生成した新たな文字列情報「カゴ」を用いて検索することにより、ユーザの意図する「加護（カゴ）」を含むレコードを検索結果として取得し、当該レコードの検索スコアを加算することができる。したがって、ユーザの意図する「加護（カゴ）」を含むレコードが、検索結果出力部１７により上位に出力（表示）される可能性を高めることができ、検索精度を向上させることが期待できる。 Further, for example, a case where the user utters intentionally with the content “call Kago-san”, and character string information segmented as “ka / go” is obtained due to a voice recognition error is considered. In this case, the search unit 15B does not exclude “f” and “go”, which are character string information of one character adjacent to each other, but creates new character string information “ By performing a search using “cage”, it is possible to acquire a record including “protection (cargo)” intended by the user as a search result and add the search score of the record. Therefore, it is possible to increase the possibility that a record including “protection (carriage)” intended by the user is output (displayed) by the search result output unit 17 and to improve the search accuracy.

なお、第２の例では、検索部１５Ｂは、文字列情報を順次処理し、１つ次の文字列情報が１文字か否かを判定することにより文字列情報同士を連結するか否かを判定するものとして説明したが、判定方法はこれに限定されない。例えば、検索部１５Ｂは、文字列情報を順次処理し、１つ前の文字列情報が１文字か否かを判定することにより文字列情報同士を連結するか否かを判定してもよい。また、検索部１５Ｂは、順序付けされた文字列情報を最初に全て走査し、１文字の文字列情報が隣接する箇所を検出したら当該文字列情報同士を連結するようにしてもよい。 In the second example, the search unit 15B sequentially processes the character string information, and determines whether or not the character string information is linked by determining whether or not the first character string information is one character. Although described as a determination, the determination method is not limited to this. For example, the search unit 15B may determine whether or not to link the character string information by sequentially processing the character string information and determining whether or not the previous character string information is one character. Further, the search unit 15B may first scan all the ordered character string information, and connect the character string information when detecting portions where the character string information of one character is adjacent.

（第３の例）
第３の例における情報検索装置１について、第１の例と主に相違する点について説明する。第３の例における情報検索装置１では、検索部１５Ｃは、文字数が１文字の文字列情報を用いた検索と文字数が２文字以上の文字列情報を用いた検索とについて、互いに異なる検索条件に基づいて検索を実行する。具体的には、検索部１５Ｃは、文字数が２文字以上の第１の文字列情報については上述のあいまい検索を実行し、文字数が１文字の第１の文字列情報については完全一致検索のみを実行する。 (Third example)
The information search device 1 in the third example will be described mainly regarding differences from the first example. In the information search apparatus 1 in the third example, the search unit 15C uses different search conditions for a search using character string information having one character and a search using character string information having two or more characters. Perform a search based on Specifically, the search unit 15C performs the above-described fuzzy search for the first character string information having two or more characters, and performs only an exact match search for the first character string information having one character. Run.

続いて、図５及び図８を用いて、第３の例における情報検索装置１により実行される処理の動作を説明する。図８は、第３の例における情報検索装置１の動作を示すフロー図である。ただし、ステップＳ３０１〜Ｓ３０５，Ｓ３０７〜Ｓ３１２の処理は、図６に示す第１の例における情報検索装置１により実行されるステップＳ１０１〜Ｓ１０５，Ｓ１０７〜Ｓ１１２の処理と同様であるため詳細な説明を省略する。また、以下の説明において、検索部１５Ｃ及び第２検索部１６は、上述の完全一致検索及び部分一致検索（前方部分一致、後方部分一致、及び部分一致の全てを含む）をあいまい検索として実行するものとする。 Subsequently, an operation of processing executed by the information search apparatus 1 in the third example will be described with reference to FIGS. 5 and 8. FIG. 8 is a flowchart showing the operation of the information search apparatus 1 in the third example. However, since the processes of steps S301 to S305 and S307 to S312 are the same as the processes of steps S101 to S105 and S107 to S112 executed by the information search apparatus 1 in the first example shown in FIG. Omitted. In the following description, the search unit 15C and the second search unit 16 execute the above-described complete match search and partial match search (including all of the front partial match, the rear partial match, and the partial match) as an ambiguous search. Shall.

ステップＳ３０１〜Ｓ３０４の処理によって、文字列情報取得部１２は、検索用情報（「タ／ゴ」及び「田／後」）を取得する。続いて、ステップＳ３０５の処理によって、文字数計数部１３は、文字列情報「タ」及び「ゴ」それぞれの文字数（１文字）を計数により取得する。 Through the processing in steps S301 to S304, the character string information acquisition unit 12 acquires search information ("ta / go" and "ta / go"). Subsequently, by the processing in step S305, the character count unit 13 obtains the number of characters (one character) of each of the character string information “ta” and “go” by counting.

続いて、検索部１５Ｃは、ステップＳ３０５で計数された文字数に基づいて検索処理を実行する（ステップＳ３０６、検索ステップ）。まず、検索部１５Ｃは、文字列情報取得部１２が取得した文字列情報「タ」を取得し（ステップＳ３０６Ａ）、特定の文字を変換する処理（例えば図４に示す長音ヨミ変換）を必要に応じて実行する（ステップＳ３０６Ｂ）。続いて、検索部１５Ｃは、文字列情報「タ」について文字数が１文字か否かを判定する（ステップＳ３０６Ｃ）。ここで、「タ」の文字数は１文字であるため、検索部１５Ｃは、文字列情報「タ」を用いて完全一致検索を実行する（ステップＳ３０６Ｄ）。ここで、文字列情報データベース１３は、「タ」と完全一致する文字列情報を記憶していないため、検索ヒットするレコードは存在せず、どのレコードの検索スコアも加算されない。その後、次の文字列情報「ゴ」についても、検索部１５Ｃにより同様の処理がされる（ステップＳ３０６Ｆ：ＮＯ，Ｓ３０６Ａ，Ｓ３０６Ｂ，Ｓ３０６Ｃ：ＹＥＳ，Ｓ３０６Ｄ）。 Subsequently, the search unit 15C executes a search process based on the number of characters counted in step S305 (step S306, search step). First, the search unit 15C acquires the character string information “ta” acquired by the character string information acquisition unit 12 (step S306A), and needs a process of converting a specific character (for example, the long sound conversion shown in FIG. 4). The process is executed accordingly (step S306B). Subsequently, the search unit 15C determines whether or not the number of characters for the character string information “ta” is one (step S306C). Here, since the number of characters of “ta” is one character, the search unit 15C performs a complete match search using the character string information “ta” (step S306D). Here, since the character string information database 13 does not store character string information that completely matches “ta”, there is no record that makes a search hit, and the search score of any record is not added. Thereafter, the same processing is performed on the next character string information “go” by the search unit 15C (steps S306F: NO, S306A, S306B, S306C: YES, S306D).

その後、第２検索部１６による検索処理（ステップＳ３０７〜Ｓ３０９）を含め、全ての音声認識結果の候補についての処理が完了したら（ステップＳ３１１：ＹＥＳ）、検索結果出力部１７が、検索部１５Ｃ及び第２検索部１６により取得された検索結果について代表スコアを抽出する。検索結果出力部１７は、当該検索結果を代表スコア順（「加護」（１．６）→「後藤」（１）→「田原」（１）→「田辺」（１）→…。ここで、括弧内の数字は代表スコアを示す。）に並べて、情報検索装置１が備えるディスプレイ等の出力部１０５に出力する（ステップＳ３１２、検索結果出力ステップ）。 After that, when processing for all speech recognition result candidates including the search processing (steps S307 to S309) by the second search unit 16 is completed (step S311: YES), the search result output unit 17 includes the search unit 15C and the search unit 15C. A representative score is extracted for the search result acquired by the second search unit 16. The search result output unit 17 displays the search results in the order of representative scores (“Kago” (1.6) → “Goto” (1) → “Tahara” (1) → “Tanabe” (1) →... The numbers in parentheses indicate the representative score.) And output to the output unit 105 such as a display provided in the information search device 1 (step S312, search result output step).

以上のように、検索部１５Ｃが１文字の文字列情報（「タ」と「ゴ」）についてはあいまい検索を許容せず、完全一致検索を実行することにより、１文字の文字列情報を用いてあいまい検索を行った場合に生じ得る大量の検索誤りによる影響を低減できる。 As described above, the search unit 15C does not allow fuzzy search for single-character string information ("ta" and "go"), and uses the single-character string information by executing a complete match search. Thus, it is possible to reduce the influence of a large number of search errors that may occur when a fuzzy search is performed.

また、例えばユーザが「李さんに電話」という内容を意図して発話した場合であって、正しく音声認識されて文字列情報「リ」が得られた場合を考える。この場合、検索部１５Ｃが、「リ」を単純に除外するのではなく、「リ」を用いて完全一致検索を実行することにより、ユーザの所望する情報（「李（リ）」を含むレコード）を取得する（当該レコードの検索スコアを高くし、検索結果出力部１７により上位に表示させる）ことができる。 Further, for example, a case where the user utters intentionally with the content of “Call Mr. Lee” and the character string information “Li” is obtained by correctly recognizing the speech. In this case, the search unit 15C does not simply exclude “Li”, but performs a complete match search using “Li”, so that a record including information desired by the user (“Li (Li)”) is obtained. ) Can be obtained (the search score of the record is increased, and the search result output unit 17 displays it at the top).

以上、本実施形態に係る情報検索装置１について第１の例から第３の例まで説明したが、これらの例において、第２検索部１６は、検索用情報として取得した第２の文字列情報に対して、検索部１５が当該第２の文字列情報と対応する第１の文字列情報に対して当該第１の文字列情報の文字数に応じて実行する処理と同一の処理を実行し、当該第２の文字列情報を用いて検索を実行してもよい。 The information search apparatus 1 according to the present embodiment has been described from the first example to the third example. In these examples, the second search unit 16 acquires the second character string information acquired as the search information. On the other hand, the search unit 15 performs the same process as the process executed on the first character string information corresponding to the second character string information according to the number of characters of the first character string information. The search may be executed using the second character string information.

例えば、図５に示す例において、検索部１５が除外処理を行った第１の文字列情報「タ」と対応する第２の文字列情報「田」については、第２検索部１６は検索部１５と同様に除外処理を行ってもよい。例えば、検索部１５が、そのまま検索用情報として用いるのに適さないと判断して第１の文字列情報を除外及び連結等の処理を実行した場合には、当該第１の文字列情報に対応する第２の文字列情報についても同様に検索用情報として用いるのに適さない可能性が高いと考えられる。したがって、上記構成によれば、第１の文字列情報と当該第１の文字列情報に対応する第２の文字列情報とについて、同一の考え方に則った同一の方法（除外・連結等の処理、及び検索方法（完全一致検索及び部分一致検索等）等）によって検索を実行するため、検索誤りをより効率よく低減することができる。 For example, in the example illustrated in FIG. 5, for the second character string information “ta” corresponding to the first character string information “ta” for which the search unit 15 has performed the exclusion process, the second search unit 16 uses the search unit. Exclusion processing may be performed in the same manner as 15. For example, when the search unit 15 determines that it is not suitable for use as search information as it is and executes processing such as exclusion and concatenation of the first character string information, it corresponds to the first character string information. The second character string information is also likely to be unsuitable for use as search information. Therefore, according to the above configuration, the same method (exclusion / concatenation processing, etc.) based on the same concept is used for the first character string information and the second character string information corresponding to the first character string information. And a search method (exact match search, partial match search, etc.)), search errors can be reduced more efficiently.

上記構成によれば、上述の第１の例〜第３の例において、第２検索部１６によって「田」及び「後」を用いたあいまい検索が実行されないため、「後藤」、「田原」、及び「田辺」を含むレコードは検索結果として取得されず、検索スコアが加算されない。これにより、検索結果出力部１７によって、検索結果として取得された「加護」のみが表示される。即ち、上記構成によれば、ユーザの意図しない検索結果の表示（誤りデータ件数）を抑制できることがわかる。 According to the above configuration, in the above first to third examples, the fuzzy search using “field” and “go” is not executed by the second search unit 16, so that “Goto”, “Tahara”, And a record including “Tanabe” is not acquired as a search result, and the search score is not added. As a result, the search result output unit 17 displays only “protection” acquired as the search result. That is, according to the above configuration, it can be understood that display of search results unintended by the user (number of erroneous data items) can be suppressed.

また、情報検索装置１は、第１の例と第２の例とを組み合わせた構成として実現されてもよい。即ち、検索部１５による処理において、検索用情報として取得した第１の文字列情報について、互いに隣接する２つの１文字の文字列情報については連結し、互いに隣接しない１文字の文字列情報については除外する構成としてもよい。 Further, the information search device 1 may be realized as a configuration combining the first example and the second example. That is, in the processing by the search unit 15, for the first character string information acquired as the search information, the two character strings information adjacent to each other are linked, and the character string information of one character not adjacent to each other is linked. It is good also as a structure to exclude.

また、情報検索装置１は、第２の例と第３の例とを組み合わせた構成として実現されてもよい。即ち、検索部１５による処理において、検索用情報として取得した第１の文字列情報について、互いに隣接する２つの１文字の文字列情報については連結し、互いに隣接しない１文字の文字列情報については完全一致検索を実行する構成としてもよい。 In addition, the information search device 1 may be realized as a configuration in which the second example and the third example are combined. That is, in the processing by the search unit 15, for the first character string information acquired as the search information, the two character strings information adjacent to each other are linked, and the character string information of one character not adjacent to each other is linked. It may be configured to execute an exact match search.

なお、本実施形態では、文字列情報データベース１４が電話帳データベースであり、情報検索装置１が電話帳検索の機能をユーザに提供する装置である場合の例を示したが、必ずしもこれらに限定されるものではない。文字列情報データベース１４は、検索対象となり得る文字列情報（例えば、人名及び地名等の固有名詞を示す情報）を含むものであれば何でもよく、情報検索装置１は、例えば電話帳検索と同様に人の氏名等を示す文字列情報を検索キーワードとしてメール検索（送信者による宛先メールアドレスの検索）及びスケジュール検索（特定個人のスケジュールの検索）等を行う検索装置に適用可能である。また、駅名及び地名等の固有名詞を示す文字列情報を検索キーワードとして駅名検索及び地図（地名）検索等を行う検索装置にも適用可能である。 In the present embodiment, an example in which the character string information database 14 is a phone book database and the information search device 1 is a device that provides a user with a phone book search function is shown, but the present invention is not necessarily limited thereto. It is not something. The character string information database 14 may be anything as long as it includes character string information that can be searched (for example, information indicating proper nouns such as person names and place names), and the information search device 1 is similar to, for example, telephone directory search. The present invention can be applied to a search device that performs mail search (search for a destination email address by a sender) and schedule search (search for a schedule of a specific individual) using character string information indicating a person's name as a search keyword. Further, the present invention can also be applied to a search device that performs a station name search, a map (place name) search, and the like using character string information indicating proper nouns such as station names and place names as search keywords.

１…情報検索装置、１１…音声認識部、１２…文字列情報取得部、１３…文字数計数部、１４…文字列情報データベース、１５（１５Ａ，１５Ｂ，１５Ｃ）…検索部、１６…第２検索部、１７…検索結果出力部。 DESCRIPTION OF SYMBOLS 1 ... Information retrieval apparatus, 11 ... Voice recognition part, 12 ... Character string information acquisition part, 13 ... Character number counting part, 14 ... Character string information database, 15 (15A, 15B, 15C) ... Search part, 16 ... 2nd search Part, 17 ... search result output part.

Claims

Character string information acquisition means for acquiring a plurality of character string information indicating character strings as one search information,
For each of the character string information acquired by the character string information acquisition means, a character number counting means for counting the number of characters of the character string indicated by the character string information;
Search means for acquiring a search result by executing a search according to the number of characters for each character string information counted by the character number counting means, using the search information acquired by the character string information acquiring means; ,
Search result output means for outputting the search results acquired by the search means,
The information search apparatus, wherein the search means excludes character string information having one character and performs a search using character string information having two or more characters.

Character string information acquisition means for acquiring one or more character string information indicating a character string as one search information,
For each of the character string information acquired by the character string information acquisition means, a character number counting means for counting the number of characters of the character string indicated by the character string information;
Search means for acquiring a search result by executing a search according to the number of characters for each character string information counted by the character number counting means, using the search information acquired by the character string information acquiring means; ,
Search result output means for outputting the search results acquired by the search means ,
The character string information acquisition means acquires a plurality of character string information ordered in advance,
The search means determines whether or not two character string information having a different number of characters are adjacent to each other based on the ordering, and if the two character string information are adjacent to each other, the two characters An information search apparatus that generates a new character string by concatenating character strings indicated by the column information, and executes a search using character string information including two or more characters including information indicating the new character string .

The searching means, for the search character is using the character string information search and the number of characters is more than 2 characters using the character string information of a character, performs a search based on different search criteria, according to claim 2, wherein Information retrieval device.

Further comprising speech recognition means for inputting the user's series of uttered speech and generating the character string information by performing speech recognition processing on the uttered speech;
The character string information obtaining means obtains the character string information generated by the speech recognition means, information retrieval apparatus of any one of claims 1-3.

A second search means for acquiring a search result by executing a search using character string information indicated by a second character type different from the first character type;
The character string information acquisition means corresponds to the first character string information indicated by the first character type as the search information and the first character string information and indicated by the second character type. Second character string information to be acquired,
The character number counting means and the search means execute each process on the first character string information,
The second search means executes a search using the second character string information,
The search result output means outputs the search result based on the second search result obtained by the first search result and the second search unit obtained by the search unit, according to claim 1-4 The information search device according to any one of the above.

The second search means for the second character string information, the search means for the first character string information for the first character string information corresponding to the second character string information. The information search device according to claim 5 , wherein the same processing as the processing executed according to the number of characters is executed, and the search is executed using the second character string information.

As one piece of search information, a character string information acquisition step for acquiring a plurality of character string information indicating character strings,
For each of the character string information acquired in the character string information acquisition step, a character number counting step for counting the number of characters of the character string indicated by the character string information;
A search step for acquiring a search result by executing a search according to the number of characters for each character string information counted in the character number counting step, using the search information acquired in the character string information acquisition step; ,
A search result output step for outputting the search result acquired in the search step,
An information search method for an apparatus, wherein in the search step, character string information having one character is excluded, and search is performed using character string information having two or more characters.

A character string information acquisition step of acquiring one or more character string information indicating a character string as one piece of search information;
For each of the character string information acquired in the character string information acquisition step, a character number counting step for counting the number of characters of the character string indicated by the character string information;
A search step for acquiring a search result by executing a search according to the number of characters for each character string information counted in the character number counting step, using the search information acquired in the character string information acquisition step; ,
See containing and a search result output step of outputting the acquired search result in the search step,
In the character string information acquisition step, a plurality of pre-ordered character string information is acquired,
In the search step, based on the ordering, it is determined whether or not two character string information having a different number of characters are adjacent to each other. If the two character string information are adjacent to each other, the two characters A device information search in which character strings indicated by column information are concatenated to generate a new character string, and a search is performed using character string information having two or more characters including information indicating the new character string . Method.