JP5229448B2

JP5229448B2 - Reading imparting device and program

Info

Publication number: JP5229448B2
Application number: JP2007233825A
Authority: JP
Inventors: 英一郎隅田; 史昭菅谷
Original assignee: ATR Advanced Telecommunications Research Institute International; KDDI Corp
Current assignee: ATR Advanced Telecommunications Research Institute International; KDDI Corp
Priority date: 2007-09-10
Filing date: 2007-09-10
Publication date: 2013-07-03
Anticipated expiration: 2027-09-10
Also published as: JP2009064383A

Description

本発明は、キーワードの読み方についての情報を出力する読み付与装置等に関するものである。 The present invention relates to a reading imparting device that outputs information about how to read a keyword.

従来、以下に示すような読みがな付与装置があった。読みがな付与装置は、単語の発音（品詞情報不要）を登録している辞書と、同一表記で発音の異なる単語間の接続確率を区別して与える言語モデルを設ける。そして、読みがな付与装置の仮説生成・削除部は、日本語テキストに対する読みがな候補の仮説を前記辞書により生成し、仮説計算部は、前記言語モデルに基づいて当該仮説における単語間の接続確率の積を計算し、単語間の接続確率の積が最大となる仮説のみを残すようにマージし、文末での単語間の接続確率の積が最大となる仮説を選択して、その発音列を出力する（例えば、特許文献１参照）。 Conventionally, there has been a reading device as shown below. The reading device is provided with a language model in which pronunciation of words (part of speech information is unnecessary) and a language model that distinguishes and gives connection probabilities between words having the same notation and different pronunciations. Then, the hypothesis generation / deletion unit of the reading device adds a hypothesis of a reading candidate for the Japanese text by the dictionary, and the hypothesis calculation unit connects the words in the hypothesis based on the language model. Computes the product of probabilities, merges so that only the hypothesis with the highest connection probability product between words remains, selects the hypothesis with the highest product of the connection probabilities between words at the end of the sentence, and its pronunciation sequence Is output (for example, see Patent Document 1).

また、一般的に、音声認識や音声合成や機械翻訳などのシステムには、単語の読み情報が不可欠である。そして、単語の読み情報は、通常、読み辞書という形態でシステムに組み込まれる。
特開２００３−１３２０５２号公報（第１頁、第１図等） In general, word reading information is indispensable for systems such as speech recognition, speech synthesis, and machine translation. The word reading information is usually incorporated into the system in the form of a reading dictionary.
Japanese Unexamined Patent Publication No. 2003-132052 (first page, FIG. 1 etc.)

しかしながら、従来の読み付与装置等においては、新たに創作されるキーワード、固有名詞など、辞書に記載されていない単語が多くあり、音声認識や音声合成や機械翻訳等の処理の失敗の原因となっていた。 However, in conventional reading imparting devices and the like, there are many words that are not described in the dictionary, such as newly created keywords and proper nouns, which cause processing failures such as speech recognition, speech synthesis, and machine translation. It was.

本第一の発明の読み付与装置は、キーワードを受け付ける受付部と、用語と、当該用語の読みを示す読み情報を有する用語読み情報を２以上格納している読み辞書に対して、前記受付部が受け付けたキーワードまたは当該キーワードを分割した文字列である２以上の各分割文字列である対象文字列を用いて検索し、２以上の読み情報を取得する読み辞書検索部と、前記対象文字列と、前記読み辞書検索部が取得した２以上の読み情報のうちの各読み情報の組である各取得対象文字列読み情報を用いて、文書群を検索し、前記各取得対象文字列読み情報が有する対象文字列と読み情報が共起する文書の数である共起頻度を、読み情報ごとに取得する共起頻度取得部と、前記共起頻度取得部が取得した読み情報ごとの共起頻度を用いて、少なくとも最も共起頻度の多い読み情報、または最も共起頻度の多い読み情報の組み合わせを出力する出力部を具備する読み付与装置である。 The reading assigning device according to the first aspect of the present invention relates to a receiving unit that receives a keyword, and a reading dictionary that stores two or more term reading information having a term and reading information indicating reading of the term. A search dictionary search unit that searches using a target character string that is a keyword received by the user or two or more divided character strings that are character strings obtained by dividing the keyword, and acquires two or more reading information, and the target character string And a document group using each acquisition target character string reading information, which is a set of each reading information among the two or more reading information acquired by the reading dictionary search unit, and each of the acquisition target character string reading information A co-occurrence frequency acquisition unit that acquires, for each reading information, a co-occurrence frequency that is the number of documents in which the target character string and the reading information co-occur, and a co-occurrence for each reading information acquired by the co-occurrence frequency acquisition unit Using frequency, at least A reading application device having a highest occurrence frequency with many reading information or the most co-occurrence frequency with many read output unit for outputting the combination information.

かかる構成により、Ｗｅｂ上のホームページなど、現に記載された文書群、人間の現在の営みを反映した文書群を利用してキーワードの読み情報を取得するので、精度高く、キーワードの読み情報を取得できる。 With this configuration, keyword reading information is acquired using a group of documents that are currently described, such as a homepage on the Web, or a group of documents that reflect the current activities of human beings, so that the keyword reading information can be acquired with high accuracy. .

また、本第二の発明の読み付与装置は、第一の発明に対して、前記読み辞書検索部は、用語と、当該用語の読みを示す読み情報を有する用語読み情報を２以上格納している読み辞書に対して、前記対象文字列を用いて検索し、２以上の読み情報を取得する読み情報取得手段と、前記キーワードがカタカナの文字列を含む場合、当該カタカナの文字列をひらがなに変換する変換手段を具備する読み付与装置である。 Further, in the reading imparting device of the second invention, in contrast to the first invention, the reading dictionary search unit stores two or more term reading information having a term and reading information indicating the reading of the term. A reading information acquisition unit that searches the reading dictionary using the target character string and acquires two or more reading information, and if the keyword includes a katakana character string, the katakana character string is hiragana It is a reading provision apparatus provided with the conversion means to convert.

かかる構成により、Ｗｅｂ上のホームページなどの文書群は、読みに「カタカナ」より「ひらがな」が使用されることが多く、かかる文書群の特性を利用してキーワードの読み情報を取得するので、さらに精度高く、キーワードの読み情報を取得できる。 With such a configuration, a document group such as a homepage on the Web often uses “Hiragana” rather than “katakana” for reading, and keyword reading information is acquired using the characteristics of the document group. The reading information of keywords can be acquired with high accuracy.

また、本第三の発明の読み付与装置は、第二の発明に対して、前記変換手段は、前記キーワードが、あらかじめ決められたカタカナを含むか否かを判断し、当該カタカナを含む場合には、前記カタカナの文字列をひらがなに変換しない読み付与装置である。 Further, in the reading imparting device according to the third aspect of the present invention, in contrast to the second aspect, the conversion means determines whether or not the keyword includes a predetermined katakana and includes the katakana. Is a reading imparting device that does not convert the katakana character string into hiragana.

かかる構成により、ひらがなとして存在しない「ヴ」や、特にカタカナで読み方を記載している文字などを含む場合、当該文字の特性を利用してキーワードの読み情報を取得するので、さらに精度高く、キーワードの読み情報を取得できる。 With such a configuration, when “V” that does not exist as a hiragana character, or in particular, a character that describes how to read in katakana, etc., the keyword reading information is acquired using the characteristics of the character, so the keyword Reading information can be acquired.

また、本第四の発明の読み付与装置は、第一から第三いずれかの発明に対して、前記読み辞書検索部は、前記受付部が受け付けたキーワードを１文字以上の単位で分割し、分割した文字列である２以上の分割文字列を取得するキーワード分割手段と、前記２以上の各分割文字列に一致する用語と対になる１以上の読み情報を、前記読み辞書から、分割文字列ごとに検索し、分割文字列と読み情報を有する取得対象文字列読み情報を、２以上取得する読み情報取得手段を具備する読み付与装置である。 Further, in the reading imparting device according to the fourth aspect of the present invention, with respect to any one of the first to third aspects, the reading dictionary search unit divides the keyword received by the receiving unit into units of one or more characters, Keyword dividing means for obtaining two or more divided character strings that are divided character strings, and one or more reading information paired with a term that matches each of the two or more divided character strings, from the reading dictionary, the divided characters This is a reading imparting device including a reading information acquisition unit that searches for each column and acquires two or more acquisition target character string reading information having divided character strings and reading information.

かかる構成により、長いキーワード、複合語からなるキーワードなどについても、精度高く、そのキーワードの読み情報を取得できる。 With such a configuration, it is possible to acquire reading information of a keyword with high accuracy even for a long keyword, a keyword composed of compound words, and the like.

また、本第五の発明の読み付与装置は、第四の発明に対して、前記キーワード分割手段は、前記受付部が受け付けたキーワードの文字列長を取得し、当該文字列長が予め決められた文字列長より大きい、または予め決められた文字列長以上の場合には、前記キーワードの分割を行わず、前記読み情報取得手段は、前記キーワードの分割を行わない場合、前記受付部が受け付けたキーワードと一致する用語と対になる１以上の読み情報を、前記読み辞書から取得する読み付与装置である。 Further, in the reading imparting device according to the fifth invention, in contrast to the fourth invention, the keyword dividing means acquires the character string length of the keyword received by the receiving unit, and the character string length is determined in advance. If the character string length is longer than a predetermined character string length or longer than a predetermined character string length, the keyword is not divided, and the reading information acquisition unit accepts the keyword when the keyword is not divided. This is a reading imparting device that acquires one or more reading information paired with a term that matches the keyword from the reading dictionary.

かかる構成により、膨大な処理量になることを防止し、適切な時間で、キーワードの読み情報を取得できる。 With this configuration, it is possible to prevent an enormous amount of processing, and to acquire keyword reading information in an appropriate time.

また、本第六の発明の読み付与装置は、第四の発明に対して、前記キーワード分割手段は、切断数Ｎ（Ｎは０以上）の少ない場合から、順に、前記受付部が受け付けたキーワードを１文字以上の単位で分割し、分割した文字列である１以上の分割文字列を取得し、前記読み情報取得手段からの終了指示があるまで、前記キーワードの分割を繰り返し、前記読み情報取得手段は、前記キーワード分割手段が取得した１以上の各分割文字列に一致する用語と対になる１以上の読み情報を、前記読み辞書から、分割文字列ごとに取得し、１以上の分割文字列のすべてに対応する読み情報が取得できなかった場合には、前記キーワード分割手段に対して、次に分割のパターンである２以上の分割文字列を取得するように指示し、前記１以上の分割文字列のすべてに対応する読み情報が取得できた場合には、当該１以上の分割文字列のすべてに対応する読み情報を記憶媒体に少なくとも一時格納する読み情報の取得処理を行い、かつ、前記キーワード分割手段に終了指示を渡し、または、前記１以上の分割文字列のすべてに対応する読み情報が取得できた場合と同じ切断数の他の分割のパターンである２以上の分割文字列を、前記キーワード分割手段に対して、取得するように指示し、前記同じ切断数の他の分割のパターンに対する読み情報の取得処理が完了した後、前記キーワード分割手段に終了指示を渡す読み付与装置である。 Further, in the reading imparting device according to the sixth aspect of the present invention, in contrast to the fourth aspect, the keyword dividing means is configured such that the keyword received by the receiving unit in order from the case where the number of cuts N (N is 0 or more) is small. Is divided into units of one character or more, one or more divided character strings that are the divided character strings are obtained, and division of the keyword is repeated until an end instruction is given from the reading information obtaining means, thereby obtaining the reading information The means acquires, for each divided character string, one or more reading information that is paired with a term that matches the one or more divided character strings acquired by the keyword dividing means, and acquires one or more divided characters. When reading information corresponding to all of the columns cannot be acquired, the keyword dividing unit is instructed to acquire two or more divided character strings that are division patterns, and the one or more Split character If the reading information corresponding to all of the one or more divided character strings can be acquired, the reading information corresponding to all of the one or more divided character strings is at least temporarily stored in the storage medium, and the keyword division is performed. An end instruction is passed to the means, or two or more divided character strings that are other division patterns having the same number of cuts as when reading information corresponding to all of the one or more divided character strings can be obtained The reading giving device that instructs the dividing unit to acquire, and after completing the reading information acquisition process for the other division patterns of the same number of cuts, passes the end instruction to the keyword dividing unit.

また、本第七の発明の読み付与装置は、第一から第六いずれかの発明に対して、前記読み辞書検索部は、前記対象文字列がアラビア数字を含むか否かを判断し、アラビア数字を含む場合は、当該アラビア数字を桁付き数字とする読みを示す桁付き数字読み情報を生成し、当該桁付き数字読み情報を読み情報として加えるアラビア数字処理手段をさらに具備する読み付与装置である。 Further, in the reading imparting device according to the seventh aspect of the present invention, with respect to any one of the first to sixth aspects, the reading dictionary search unit determines whether or not the target character string includes Arabic numerals. In the case of including a number, the reading imparting device further includes an Arabic numeral processing unit that generates digit number reading information indicating a reading with the Arabic numeral as a digit, and adds the digit reading information as the reading information. is there.

かかる構成により、アラビア数字を含むキーワードでも、精度高く読み情報を取得できる。 With this configuration, reading information can be obtained with high accuracy even for keywords including Arabic numerals.

また、本第八の発明の読み付与装置は、第一から第七いずれかの発明に対して、前記読み辞書検索部は、前記対象文字列がアルファベットを含むか否かを判断し、アルファベットを含む場合は、当該アルファベットに対応するローマ字読みを示すローマ字読み情報を生成し、当該ローマ字読み情報を読み情報として加えるアルファベット処理手段をさらに具備する読み付与装置である。 Further, in the reading imparting device according to the eighth aspect of the present invention, with respect to any one of the first to seventh aspects, the reading dictionary search unit determines whether or not the target character string includes an alphabet, If included, the reading imparting apparatus further includes alphabet processing means for generating Roman character reading information indicating the Roman character reading corresponding to the alphabet and adding the Roman character reading information as the reading information.

かかる構成により、アルファベットを含むキーワードでも、精度高く読み情報を取得できる。 With this configuration, reading information can be acquired with high accuracy even for keywords including alphabets.

また、本第九の発明の読み付与装置は、第一から第八いずれかの発明に対して、前記共起頻度取得部は、前記読み辞書検索部が取得した２以上の各取得対象文字列読み情報を用いて、文書群を検索し、前記各取得対象文字列読み情報が有するキーワードまたは分割文字列と、読み情報が共起する文書の数である共起頻度を、取得対象文字列読み情報ごとに取得する共起頻度取得手段と、前記受付部が受け付けたキーワードまたは２以上の分割文字列を有する取得対象文字列読み情報に対応する共起頻度をパラメータして、取得対象文字列読み情報または取得対象文字列読み情報の組み合わせごとに、スコアを算出するスコア算出手段を具備し、前記出力部は、前記共起頻度取得部が取得したスコアを用いて、少なくとも最も共起頻度の多い読み情報または読み情報の組み合わせを出力する読み付与装置である。 In addition, in the ninth aspect of the present invention, in the first to eighth aspects of the invention, the co-occurrence frequency acquisition unit includes two or more acquisition target character strings acquired by the reading dictionary search unit. The document group is searched using the reading information, and the keyword or the divided character string included in each of the acquisition target character string reading information and the co-occurrence frequency that is the number of documents in which the reading information co-occurs are obtained. Co-occurrence frequency acquisition means for acquiring each information, and the acquisition target character string reading by parameterizing the co-occurrence frequency corresponding to the acquisition target character string reading information having the keyword or two or more divided character strings received by the reception unit For each combination of information or character string reading information to be acquired, score calculating means for calculating a score is provided, and the output unit uses the score acquired by the co-occurrence frequency acquisition unit and at least has the highest co-occurrence frequency. Reading A reading application device for outputting a combination of information or reading information.

かかる構成により、適切な読み方の候補の情報が出力できる。 With this configuration, it is possible to output information on appropriate reading candidates.

本発明による読み付与装置によれば、精度高く、キーワードの読み情報を取得できる。 According to the reading imparting apparatus of the present invention, it is possible to acquire keyword reading information with high accuracy.

以下、読み付与装置等の実施形態について図面を参照して説明する。なお、実施の形態において同じ符号を付した構成要素は同様の動作を行うので、再度の説明を省略する場合がある。
（実施の形態１） Hereinafter, embodiments of a reading imparting device and the like will be described with reference to the drawings. In addition, since the component which attached | subjected the same code | symbol in embodiment performs the same operation | movement, description may be abbreviate | omitted again.
(Embodiment 1)

本実施の形態において、キーワードの入力に対して、読み辞書を検索し、読み候補の情報（読み情報という）を取得し、キーワードと読み情報をキーとして文書群（例えば、Ｗｅｂ上のホームページの集合）を検索し、キーワードと読み候補の共起数を取得し、その共起数を用いて、出力する読み情報を決定し、当該読み情報を出力する読み付与装置１について説明する。 In this embodiment, in response to a keyword input, a reading dictionary is searched to obtain information on reading candidates (referred to as reading information), and a group of documents (for example, a set of web pages on the Web) using the keyword and reading information as keys. ) Is acquired, the co-occurrence number of the keyword and the reading candidate is acquired, the reading information to be output is determined using the co-occurrence number, and the reading imparting apparatus 1 that outputs the reading information will be described.

また、本実施の形態の読み付与装置１において、通常、キーワードを分割し、分割文字列を取得し、当該分割文字列を用いて読み辞書を検索し、読み候補の情報（読み情報という）を取得する。そして、当該分割文字列と読み候補の共起数を取得し、その共起数を用いて、出力する読み情報を決定し、当該読み情報を出力する。なお、キーワードが所定の長さ以上の場合は、分割は行わない処理についても説明する。また、本実施の形態の読み付与装置１において、通常、キーワードの切断数の少ないものから順に処理を行い、読めた切断数で処理を停止するものについて説明する。 Further, in the reading assigning apparatus 1 of the present embodiment, usually, keywords are divided, divided character strings are acquired, reading dictionaries are searched using the divided character strings, and reading candidate information (referred to as reading information) is obtained. get. Then, the co-occurrence number of the divided character string and the reading candidate is acquired, the reading information to be output is determined using the co-occurrence number, and the reading information is output. In addition, the process which does not perform a division | segmentation when a keyword is more than predetermined length is also demonstrated. In the reading imparting apparatus 1 according to the present embodiment, a description will be given of processing that normally performs processing in order from the smallest number of keyword cuts, and stops processing at the read number of cuts.

また、本実施の形態の読み付与装置１において、読み情報は、主として、ひらがなを用いる。また、本実施の形態の読み付与装置１において、あらかじめ決められた文字（例えば、「ヴ」）を含む場合は、読み情報はカタカナとする。 Moreover, in the reading provision apparatus 1 of this Embodiment, hiragana is mainly used for reading information. Moreover, in the reading provision apparatus 1 of this Embodiment, when the predetermined character (for example, "V") is included, reading information is set to katakana.

また、本実施の形態の読み付与装置１において、アラビア数字の場合、桁付き数字の読みを生成し、候補に追加する処理について説明する。また、アルファベットの場合、ローマ字読みを追加する処理について説明もする。 In addition, in the reading assigning apparatus 1 according to the present embodiment, a process of generating a reading with a digit and adding it to a candidate in the case of Arabic numerals will be described. In addition, in the case of alphabets, a process for adding a romaji reading will also be described.

さらに、共起数を用いて、分割キーワードごとに順位付けし、当該順位を用いて、予め決められた個数の読み候補を得るについて説明する。 Further, description will be given of ranking each divided keyword using the number of co-occurrence and obtaining a predetermined number of reading candidates using the ranking.

図１は、本実施の形態における読み付与装置１を含むシステムの概念図である。本システムは、読み辞書装置２、サーバ装置３を含む。読み辞書装置２は、用語を受け付け、当該用語に対応する読み情報を出力する。出力される読み情報は、２以上でも良い。また、出力される読み情報が存在しない場合もあり得る。読み辞書装置２は、読み辞書を格納している。読み辞書は、用語と、当該用語の読みを示す読み情報を有する用語読み情報を２以上格納している辞書である。サーバ装置３は、１以上の文書の集合である文書群を格納している。文書群は、例えば、Ｗｅｂ上のホームページや、新聞記事の集合などである。また、文書群を構成する文書は、ＨＴＭＬやＸＭＬやＳＧＭＬなどの記述言語で記載されていても良いし、テキストデータでも良いし、データベースを構成するテーブルやレコードなどでも良い。つまり、文書群を構成する文書のデータ構造やフォーマット等は問わない。また、読み付与装置１、読み辞書装置２、およびサーバ装置３は、インターネット等のネットワーク４により接続されている。 FIG. 1 is a conceptual diagram of a system including a reading imparting apparatus 1 according to the present embodiment. This system includes a reading dictionary device 2 and a server device 3. The reading dictionary device 2 receives a term and outputs reading information corresponding to the term. Two or more reading information may be output. In addition, there may be cases where there is no output reading information. The reading dictionary device 2 stores a reading dictionary. The reading dictionary is a dictionary that stores two or more term reading information having a term and reading information indicating the reading of the term. The server device 3 stores a document group that is a set of one or more documents. The document group is, for example, a homepage on the Web or a set of newspaper articles. Further, the documents constituting the document group may be described in a description language such as HTML, XML, or SGML, may be text data, or may be a table or a record constituting a database. That is, the data structure, format, etc. of the documents constituting the document group do not matter. The reading assigning device 1, the reading dictionary device 2, and the server device 3 are connected by a network 4 such as the Internet.

また、図１において、読み付与装置１は、読み辞書装置２やサーバ装置３を含んでいないが、含んでいても良い。 In FIG. 1, the reading imparting device 1 does not include the reading dictionary device 2 or the server device 3, but may include it.

図２は、本実施の形態における読み付与装置１のブロック図である。読み付与装置１は、受付部１１、読み辞書検索部１２、共起頻度取得部１３、出力部１４を具備する。 FIG. 2 is a block diagram of the reading imparting apparatus 1 in the present embodiment. The reading assigning apparatus 1 includes a receiving unit 11, a reading dictionary search unit 12, a co-occurrence frequency acquisition unit 13, and an output unit 14.

読み辞書検索部１２は、キーワード分割手段１２１、読み情報取得手段１２２、変換手段１２３、アラビア数字処理手段１２４、アルファベット処理手段１２５を具備する。 The reading dictionary search unit 12 includes a keyword dividing unit 121, a reading information acquisition unit 122, a conversion unit 123, an Arabic numeral processing unit 124, and an alphabet processing unit 125.

共起頻度取得部１３は、共起頻度取得手段１３１、スコア算出手段１３２を具備する。 The co-occurrence frequency acquisition unit 13 includes co-occurrence frequency acquisition means 131 and score calculation means 132.

受付部１１は、キーワードを受け付ける。キーワードとは、文字列、という程度の意味である。キーワードは、単語、複数の単語からなる複合語、単語や複合語を含む文章でも良い。なお、受付部１１が文章を受け付けた場合、単語（複合語を含む）に区切って、後述する読み情報を取得する処理を行うことは好適である。また、キーワードは、通常、名詞であるが、動詞や形容詞などの他の品詞の文字列でも良い。ここで、受け付けとは、ユーザからの入力受付、他の処理部（モジュール）からの受付、他の装置からの受信、記憶媒体（ハードディスク、ＤＶＤ、ＦＤなど問わない）からの読み込みなどである。キーワードの入力手段は、キーボードやマウスやメニュー画面によるもの等、何でも良い。受付部１１は、キーボード等の入力手段のデバイスドライバーや、メニュー画面の制御ソフトウェア等で実現され得る。 The reception unit 11 receives a keyword. A keyword means a character string. The keyword may be a word, a compound word composed of a plurality of words, or a sentence including the word or compound word. In addition, when the reception part 11 receives a sentence, it is suitable to divide into a word (a compound word is included), and to perform the process which acquires the reading information mentioned later. The keyword is usually a noun, but may be a character string of another part of speech such as a verb or an adjective. Here, accepting includes accepting input from a user, accepting from another processing unit (module), receiving from another device, reading from a storage medium (such as a hard disk, DVD, or FD). The keyword input means may be anything such as a keyboard, mouse or menu screen. The accepting unit 11 can be realized by a device driver for input means such as a keyboard, control software for a menu screen, or the like.

読み辞書検索部１２は、読み辞書に対して、対象文字列を用いて検索し、２以上の読み情報を取得する。読み辞書は、２以上の用語読み情報を格納している。用語読み情報は、用語と、読み情報を有する。読み情報とは、対応する用語の読みを示す情報である。用語読み情報は、例えば、用語「成田」と読み情報「なりた」を有する。読み辞書は、読み付与装置１が保持していても良いし、外部の装置（読み辞書装置２）が保持していても良い。また、読み辞書は、１種類でも良いし、２種類以上でも良い。読み辞書は、例えば、一般辞書、連濁辞書、人名辞書のうちの１以上の辞書である。一般辞書は、例えば、地名辞書、国語辞書、漢和辞書、カタカナ英単語辞書などから構成されている。対象文字列とは、受付部１１が受け付けたキーワードまたは２以上の各分割文字列である。分割文字列は、受付部１１が受け付けたキーワードを分割した文字列である。分割文字列の取得（分割）は、後述するキーワード分割手段１２１が行う。また、読み辞書検索部１２が行う検索は、例えば、読み辞書装置２が保持している検索エンジンに対して、対象文字列を渡し、読み辞書装置２から読み情報を得る処理である。検索エンジンは、読み辞書検索部１２が有しても良いし、他の装置が有しても良いし、図示しない他の処理部であっても良い。 The reading dictionary search unit 12 searches the reading dictionary using the target character string, and acquires two or more reading information. The reading dictionary stores two or more terms reading information. The term reading information includes a term and reading information. Reading information is information indicating the reading of the corresponding term. The term reading information includes, for example, the term “Narita” and reading information “Narita”. The reading dictionary may be held by the reading assigning device 1 or may be held by an external device (reading dictionary device 2). Further, the reading dictionary may be one type or two or more types. The reading dictionary is, for example, one or more dictionaries among a general dictionary, a linkage dictionary, and a personal name dictionary. The general dictionary includes, for example, a place name dictionary, a national language dictionary, a Hanwa dictionary, a katakana English word dictionary, and the like. The target character string is a keyword received by the receiving unit 11 or two or more divided character strings. The divided character string is a character string obtained by dividing the keyword received by the receiving unit 11. Acquisition (division) of the divided character string is performed by the keyword division unit 121 described later. The search performed by the reading dictionary search unit 12 is a process of obtaining the reading information from the reading dictionary device 2 by passing the target character string to a search engine held by the reading dictionary device 2, for example. The search engine may be included in the reading dictionary search unit 12, may be included in another device, or may be another processing unit (not shown).

読み辞書検索部１２は、通常、ＭＰＵやメモリ等から実現され得る。読み辞書検索部１２の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The reading dictionary search unit 12 can usually be realized by an MPU, a memory, or the like. The processing procedure of the reading dictionary search unit 12 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

キーワード分割手段１２１は、受付部１１が受け付けたキーワードを１文字以上の単位で分割し、分割した文字列である２以上の分割文字列を取得する。キーワード分割手段１２１は、受付部１１が受け付けたキーワードの文字列長を取得し、当該文字列長が予め決められた文字列長より大きい、または予め決められた文字列長以上の場合には、キーワードの分割を行わないことは好適である。 The keyword dividing unit 121 divides the keyword received by the receiving unit 11 in units of one or more characters, and acquires two or more divided character strings that are divided character strings. The keyword dividing unit 121 acquires the character string length of the keyword accepted by the accepting unit 11, and when the character string length is greater than the predetermined character string length or greater than or equal to the predetermined character string length, It is preferable not to divide the keyword.

また、キーワード分割手段１２１は、切断数Ｎ（Ｎは０以上）の少ない場合から（つまり、「０」「１」「２」と順に）、受付部１１が受け付けたキーワードを１文字以上の単位で分割し、分割した文字列である１以上の分割文字列を取得し、読み情報取得手段１２２からの終了指示があるまで、キーワードの分割を繰り返すことは好適である。そして、キーワードを分割して得た分割文字列を読み情報取得手段１２２に渡す。なお、切断数Ｎ＝０の場合は、キーワード分割手段１２１は、キーワードの分割を行わないことを示す。 Further, the keyword dividing unit 121 determines that the keyword received by the receiving unit 11 is a unit of one or more characters from the case where the number of cuts N (N is 0 or more) is small (that is, “0”, “1”, “2” in order). It is preferable to repeat the keyword division until one or more divided character strings, which are divided character strings, are acquired, and an end instruction is received from the reading information acquisition unit 122. Then, the divided character string obtained by dividing the keyword is passed to the reading information acquisition means 122. When the number of cuts N = 0, the keyword dividing unit 121 does not perform keyword division.

キーワード分割手段１２１は、通常、ＭＰＵやメモリ等から実現され得る。キーワード分割手段１２１の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The keyword dividing unit 121 can usually be realized by an MPU, a memory, or the like. The processing procedure of the keyword dividing unit 121 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

読み情報取得手段１２２は、読み辞書に対して、対象文字列を用いて検索し、２以上の読み情報を取得する。なお、読み辞書は、用語と、当該用語の読みを示す読み情報を有する用語読み情報を２以上格納している。また、読み情報取得手段１２２は、２以上の各分割文字列に一致する用語と対になる１以上の読み情報を、読み辞書から、分割文字列ごとに検索し、分割文字列と読み情報を有する取得対象文字列読み情報を、２以上取得する。また、読み情報取得手段１２２は、キーワードの分割を行わない場合、受付部１１が受け付けたキーワードと一致する用語と対になる１以上の読み情報を、読み辞書から取得する。 The reading information acquisition unit 122 searches the reading dictionary using the target character string, and acquires two or more reading information. Note that the reading dictionary stores two or more term reading information having a term and reading information indicating the reading of the term. The reading information acquisition unit 122 searches the reading dictionary for one or more reading information paired with a term that matches two or more divided character strings for each divided character string, and obtains the divided character string and the reading information. Two or more acquisition target character string reading information is acquired. Also, the reading information acquisition unit 122 acquires, from the reading dictionary, one or more reading information that is paired with a term that matches the keyword received by the receiving unit 11 when the keyword is not divided.

読み情報取得手段１２２は、キーワード分割手段１２１が取得した１以上の各分割文字列に一致する用語と対になる１以上の読み情報を、読み辞書から、分割文字列ごとに取得しようとする。そして、読み情報取得手段１２２は、１以上の分割文字列のすべてに対応する読み情報が取得できなかった場合には、キーワード分割手段１２１に対して、次に分割のパターンである２以上の分割文字列を取得するように指示する。そして、読み情報取得手段１２２は、１以上の分割文字列のすべてに対応する読み情報が取得できた場合には、当該１以上の分割文字列のすべてに対応する読み情報を記憶媒体に少なくとも一時格納する読み情報の取得処理を行う。そして、読み情報取得手段１２２は、取得処理を行った場合に、ただちに、キーワード分割手段１２１に終了指示を渡しても良いし、１以上の分割文字列のすべてに対応する読み情報が取得できた場合と同じ切断数の他の分割のパターンである２以上の分割文字列を、キーワード分割手段１２１に対して、取得するように指示し、同じ切断数の他の分割のパターンに対する読み情報の取得処理が完了した後、キーワード分割手段１２１に終了指示を渡す。つまり、上記処理の終了情報は、少なくとも２以上あり得る。また、考えられる数だけの分割パターンを構成し、読み情報の取得処理を行っても良い。 The reading information acquisition unit 122 attempts to acquire, from the reading dictionary, one or more pieces of reading information paired with a term that matches one or more divided character strings acquired by the keyword dividing unit 121 for each divided character string. When the reading information acquisition unit 122 cannot acquire reading information corresponding to all of the one or more divided character strings, the reading information acquisition unit 122 instructs the keyword dividing unit 121 to perform two or more divisions which are the next division patterns. Tells to get a string. When the reading information acquisition unit 122 acquires reading information corresponding to all of the one or more divided character strings, the reading information acquisition unit 122 stores at least the reading information corresponding to all of the one or more divided character strings in the storage medium. Acquires reading information to be stored. Then, the reading information acquisition unit 122 may immediately give an end instruction to the keyword dividing unit 121 when the acquisition process is performed, or reading information corresponding to all of one or more divided character strings can be acquired. Instructs the keyword dividing unit 121 to acquire two or more divided character strings that are other division patterns having the same number of cuts as the case, and obtains reading information for other division patterns having the same number of cuts. After the processing is completed, an end instruction is given to the keyword dividing unit 121. That is, there may be at least two pieces of end information of the above process. Further, as many division patterns as possible may be configured, and reading information acquisition processing may be performed.

読み情報取得手段１２２は、通常、ＭＰＵやメモリ等から実現され得る。読み情報取得手段１２２の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The reading information acquisition unit 122 can be usually realized by an MPU, a memory, or the like. The processing procedure of the reading information acquisition unit 122 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

変換手段１２３は、対象文字列がカタカナの文字列を含む場合、当該カタカナの文字列をひらがなに変換する。また、変換手段１２３は、カタカナの文字列が、あらかじめ決められたカタカナ（例えば、「ヴ」）を含むか否かを判断し、当該カタカナを含む場合には、カタカナをひらがなに変換しないことは好適である。 When the target character string includes a katakana character string, the conversion unit 123 converts the katakana character string into hiragana. In addition, the conversion unit 123 determines whether or not the katakana character string includes a predetermined katakana (for example, “V”). If the katakana character string includes the katakana, the conversion means 123 does not convert the katakana into hiragana. Is preferred.

変換手段１２３は、通常、ＭＰＵやメモリ等から実現され得る。変換手段１２３の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The conversion unit 123 can be usually realized by an MPU, a memory, or the like. The processing procedure of the conversion means 123 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

アラビア数字処理手段１２４は、対象文字列がアラビア数字を含むか否かを判断し、アラビア数字（例えば、「１２３」）を含む場合は、当該アラビア数字を桁付き数字とする読みを示す桁付き数字読み情報（例えば、「ひゃくにじゅうさん」）を生成し、当該桁付き数字読み情報を読み情報として加える。なお、対象文字列がアラビア数字を含むか否かの判断は、「０」から「９」までの文字を有するか否かにより可能である。また、アラビア数字から、桁付き数字読み情報を生成する処理は、公知技術である。つまり、桁数に対応する読みのルール（例えば、３桁目は「ひゃく」、２桁目は「じゅう」など）と、数値の読み方の情報（「１」は「いち」、「２」は「に」など）と、特定ルール（例えば、２桁目以降の「１」は読み方に加えない、など）を用いて、アラビア数字から、桁付き数字読み情報を生成する。アラビア数字処理手段１２４は、通常、ＭＰＵやメモリ等から実現され得る。アラビア数字処理手段１２４の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The Arabic numeral processing means 124 determines whether or not the target character string includes Arabic numerals. If the target character string includes Arabic numerals (for example, “123”), the Arabic numeral processing means 124 includes digits indicating the reading with the Arabic numerals as digits. Number reading information (for example, “Hyaku Niju-san”) is generated, and the number reading information with digits is added as reading information. Note that whether or not the target character string includes Arabic numerals can be determined based on whether or not the target character string includes characters “0” to “9”. Moreover, the process which produces | generates the number reading information with a digit from an Arabic number is a well-known technique. In other words, reading rules corresponding to the number of digits (for example, “Hyaku” for the third digit, “10” for the second digit, etc.) and information on how to read the numerical values (“1” is “1”, “2” is Digit reading information with digits is generated from Arabic numerals using a specific rule (for example, “1” after the second digit is not added to the reading). The Arabic numeral processing means 124 can be usually realized by an MPU, a memory or the like. The processing procedure of the Arabic numeral processing means 124 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

アルファベット処理手段１２５は、対象文字列がアルファベットを含むか否かを判断し、アルファベットを含む場合は、当該アルファベットに対応するローマ字読みを示すローマ字読み情報を生成し、当該ローマ字読み情報を読み情報として加える。対象文字列がアルファベットを含むか否かの判断処理、アルファベットからローマ字読みの情報を生成する処理も公知技術である。また、アルファベット処理手段１２５は、対象文字列がアルファベットを含むか否かを判断し、アルファベットを含む場合は、当該アルファベットを１文字ずつの読み上げた場合の文字単位読み情報を生成し、当該文字単位読み情報を読み情報として加えることは好適である。つまり、アルファベット処理手段１２５は、対象文字列が「ＡＢＣ」である場合、文字単位読み情報「えーびーしー」を生成し、読み情報として加えることは好適である。アルファベット処理手段１２５は、通常、ＭＰＵやメモリ等から実現され得る。アルファベット処理手段１２５の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The alphabet processing unit 125 determines whether or not the target character string includes an alphabet. If the target character string includes the alphabet, the alphabet processing unit 125 generates Roman character reading information indicating the Roman character reading corresponding to the alphabet, and uses the Roman character reading information as the reading information. Add. A process for determining whether or not the target character string includes an alphabet and a process for generating information on reading Roman characters from the alphabet are also known techniques. Further, the alphabet processing means 125 determines whether or not the target character string includes an alphabet. If the target character string includes the alphabet, the alphabet processing unit 125 generates character unit reading information when the alphabet is read out character by character. It is preferable to add reading information as reading information. That is, when the target character string is “ABC”, the alphabet processing unit 125 preferably generates character unit reading information “Ebisu-shi” and adds it as reading information. The alphabet processing means 125 can be usually realized by an MPU, a memory, or the like. The processing procedure of the alphabet processing means 125 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

共起頻度取得部１３は、対象文字列と、読み辞書検索部１２が取得した読み情報を用いて、文書群を検索し、対象文字列と読み情報が共起する文書の数である共起頻度を、読み情報ごとに取得する。対象文字列と読み情報の組の情報を取得対象文字列読み情報とも言うこととする。また、言い換えれば、共起頻度取得部１３は、対象文字列と、読み辞書検索部１２が取得した２以上の読み情報のうちの各読み情報の組である各取得対象文字列読み情報を用いて、文書群を検索し、各取得対象文字列読み情報が有する対象文字列と読み情報が共起する文書の数である共起頻度を、読み情報ごとに取得する。ここで、文書群は、ローカルに存在してもＷｅｂなどのネットワーク上（外部）に存在しても良い。文書群は、文書の集合である。文書は、例えば、ウェブ上のホームページ、データベース内のレコード、電子メール文書など、何でも良い。共起するか否かの判断の対象となる文書は、通常、Ｗｅｂ上の１ホームページ（１ファイル）であるが、一サイトなどの複数のファイルでも良い。取得対象文字列読み情報は、キーワードとその読み情報、または分割文字列とその読み情報である。 The co-occurrence frequency acquisition unit 13 searches the document group using the target character string and the reading information acquired by the reading dictionary search unit 12, and the co-occurrence is the number of documents in which the target character string and the reading information co-occur. The frequency is acquired for each reading information. Information on a set of the target character string and reading information is also referred to as acquisition target character string reading information. In other words, the co-occurrence frequency acquisition unit 13 uses each acquisition target character string reading information that is a set of the target character string and each of the reading information of the two or more reading information acquired by the reading dictionary search unit 12. Then, the document group is searched, and the co-occurrence frequency that is the number of documents in which the reading information co-occurs with the target character string included in each acquisition target character string reading information is acquired for each reading information. Here, the document group may exist locally or on a network such as the Web (external). A document group is a set of documents. The document may be anything such as a homepage on the web, a record in a database, or an e-mail document. A document that is a target for determining whether or not to co-occur is usually one home page (one file) on the Web, but may be a plurality of files such as one site. The acquisition target character string reading information is a keyword and its reading information, or a divided character string and its reading information.

共起頻度取得部１３は、通常、ＭＰＵやメモリ等から実現され得る。共起頻度取得部１３の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The co-occurrence frequency acquisition unit 13 can usually be realized by an MPU, a memory, or the like. The processing procedure of the co-occurrence frequency acquisition unit 13 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

共起頻度取得手段１３１は、読み辞書検索部１２が取得した２以上の各取得対象文字列読み情報を用いて、文書群を検索し、各取得対象文字列読み情報が有するキーワードまたは分割文字列と、読み情報が共起する文書の数である共起頻度を、取得対象文字列読み情報ごと（読み情報ごと）に取得する。 The co-occurrence frequency acquisition means 131 searches the document group using each of the two or more acquisition target character string reading information acquired by the reading dictionary search unit 12, and includes keywords or divided character strings included in the acquisition target character string reading information. And the co-occurrence frequency, which is the number of documents in which reading information co-occurs, is acquired for each acquisition target character string reading information (for each reading information).

共起頻度取得手段１３１は、通常、ＭＰＵやメモリ等から実現され得る。共起頻度取得手段１３１の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The co-occurrence frequency acquisition unit 131 can be usually realized by an MPU, a memory, or the like. The processing procedure of the co-occurrence frequency acquisition unit 131 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

スコア算出手段１３２は、受付部１１が受け付けたキーワードまたは２以上の分割文字列を有する取得対象文字列読み情報に対応する共起頻度をパラメータして、取得対象文字列読み情報または取得対象文字列読み情報の組み合わせごとに、スコアを算出する。 The score calculation means 132 parameters the co-occurrence frequency corresponding to the acquisition target character string reading information having the keyword or two or more divided character strings received by the receiving unit 11, and acquires the acquisition target character string reading information or the acquisition target character string. A score is calculated for each combination of reading information.

スコアを算出とは、通常、共起頻度をパラメータとする増加関数による算出である。例えば、スコアの算出式は、２以上の分割文字列の各分割文字列に対応する共起頻度の総和を算出する式、各分割文字列に対応する共起頻度の平均を算出する式、各分割文字列に対応する共起頻度の中央値を取得する式などである。スコアは、そのように読むであろう確からしさ、尤度に対応した数値である、とも言える。 The calculation of the score is usually a calculation by an increasing function using the co-occurrence frequency as a parameter. For example, the formula for calculating the score is an expression for calculating the sum of co-occurrence frequencies corresponding to each divided character string of two or more divided character strings, an expression for calculating an average of co-occurrence frequencies corresponding to each divided character string, An expression for obtaining the median value of the co-occurrence frequencies corresponding to the divided character strings. It can be said that the score is a numerical value corresponding to the certainty and likelihood that would be read as such.

スコア算出手段１３２は、通常、ＭＰＵやメモリ等から実現され得る。スコア算出手段１３２の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The score calculation unit 132 can be usually realized by an MPU, a memory, or the like. The processing procedure of the score calculation means 132 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

出力部１４は、共起頻度取得部１３が取得した読み情報ごとの共起頻度を用いて、少なくとも最も共起頻度の多い読み情報、または最も共起頻度の多い読み情報の組み合わせを出力する。出力部１４は、共起頻度取得部１３が取得したスコアを用いて、少なくとも最も共起頻度の多い読み情報、または最も共起頻度の多い読み情報の組み合わせを出力する。出力部１４は、例えば、共起頻度の多い方から、予め決められた個数（例えば、「３」）の読み情報を出力することは好適である。なお、かかる場合でも、出力部１４は、３位と同スコアの４位以下の読み情報を出力することはさらに好適である。ここで、出力とは、ディスプレイへの表示、プロジェクターを用いた投影、プリンタへの印字、音出力、外部の装置への送信、記録媒体への蓄積、他の処理装置や他のプログラム等への処理結果の引渡し等を含む概念である。出力部１４は、ディスプレイやスピーカー等の出力デバイスを含むと考えても含まないと考えても良い。出力部１４は、出力デバイスのドライバーソフトまたは、出力デバイスのドライバーソフトと出力デバイス等で実現され得る。 Using the co-occurrence frequency for each piece of reading information acquired by the co-occurrence frequency acquisition unit 13, the output unit 14 outputs at least reading information with the highest co-occurrence frequency or a combination of reading information with the highest co-occurrence frequency. Using the score acquired by the co-occurrence frequency acquisition unit 13, the output unit 14 outputs at least reading information with the highest co-occurrence frequency or a combination of reading information with the highest co-occurrence frequency. For example, it is preferable that the output unit 14 outputs a predetermined number (for example, “3”) of reading information from the side with the higher co-occurrence frequency. Even in such a case, it is more preferable that the output unit 14 outputs the reading information of the fourth place or lower of the same score as the third place. Here, output refers to display on a display, projection using a projector, printing on a printer, sound output, transmission to an external device, storage in a recording medium, other processing device or other program, etc. It is a concept including delivery of processing results. The output unit 14 may be considered as including or not including an output device such as a display or a speaker. The output unit 14 can be realized by output device driver software, or output device driver software and an output device.

次に、読み付与装置の動作について図３のフローチャートを用いて説明する。 Next, the operation of the reading imparting apparatus will be described using the flowchart of FIG.

（ステップＳ３０１）受付部１１は、キーワードを受け付けたか否かを判断する。キーワードを受け付ければステップＳ３０２に行き、キーワードを受け付けなければステップＳ３０１に戻る。 (Step S301) The reception unit 11 determines whether a keyword has been received. If a keyword is accepted, the process goes to step S302, and if no keyword is accepted, the process returns to step S301.

（ステップＳ３０２）読み辞書検索部１２は、ステップＳ３０１で受け付けたキーワードを用いて、読み候補生成処理を行う。読み候補生成処理について、図４のフローチャートを用いて説明する。 (Step S302) The reading dictionary search unit 12 performs reading candidate generation processing using the keyword received in step S301. The reading candidate generation process will be described with reference to the flowchart of FIG.

（ステップＳ３０３）共起頻度取得部１３の共起頻度取得手段１３１は、ステップＳ３０２で生成された読み情報（読み方の候補の情報）が空であるか否かを判断する。空であればステップＳ３０１に行き、空でなければステップＳ３０４に行く。なお、ステップＳ３０１に行く場合に、読み情報が出力できない旨を示すエラーメッセージを出力することは好適である。 (Step S303) The co-occurrence frequency acquisition unit 131 of the co-occurrence frequency acquisition unit 13 determines whether or not the reading information (reading candidate information) generated in step S302 is empty. If it is empty, go to step S301, and if it is not empty, go to step S304. In addition, when going to step S301, it is preferable to output an error message indicating that reading information cannot be output.

（ステップＳ３０４）共起頻度取得手段１３１は、ステップＳ３０１で受け付けたキーワードと、ステップＳ３０２で取得した読み情報を用いて、文書群を検索し、キーワードと読み情報の共起頻度を取得する。なお、ステップＳ３０２で２以上の読み情報を取得した場合、共起頻度取得手段１３１は、各読み情報に対応して、キーワードと読み情報の共起頻度を取得する。また、ステップＳ３０２で、キーワードの読み情報（切断数０の場合の読み情報）を取得できなかった場合に、本ステップとステップＳ３０５を飛ばし、ステップＳ３０６に行っても良い。 (Step S304) The co-occurrence frequency acquisition means 131 searches the document group using the keyword received in step S301 and the reading information acquired in step S302, and acquires the co-occurrence frequency of the keyword and the reading information. When two or more pieces of reading information are acquired in step S302, the co-occurrence frequency acquisition unit 131 acquires the co-occurrence frequency of the keyword and the reading information corresponding to each reading information. Further, when the keyword reading information (reading information when the number of cuts is 0) cannot be acquired in step S302, this step and step S305 may be skipped and the process may be performed in step S306.

（ステップＳ３０５）共起頻度取得部１３は、ステップＳ３０４で取得された１以上の共起頻度のうち、すべての共起頻度が「０」であるか否かを判断する。すべての共起頻度が「０」であればステップＳ３０６に行き、いずれかの共起頻度が「０」でなければステップＳ３０７に行く。 (Step S305) The co-occurrence frequency acquisition unit 13 determines whether or not all the co-occurrence frequencies among the one or more co-occurrence frequencies acquired in step S304 are “0”. If all the co-occurrence frequencies are “0”, the process goes to step S306, and if any of the co-occurrence frequencies is not “0”, the process goes to step S307.

（ステップＳ３０６）共起頻度取得部１３は、ステップＳ３０２で取得した分割文字列に対する読み情報を用いて文書群を検索し、分割文字列に対応する共起頻度を取得する。かかる処理を分割文字列検索処理という。分割文字列検索処理について、図５のフローチャートを用いて説明する。 (Step S306) The co-occurrence frequency acquisition unit 13 searches the document group using the reading information for the divided character string acquired in step S302, and acquires the co-occurrence frequency corresponding to the divided character string. Such a process is called a divided character string search process. The divided character string search process will be described with reference to the flowchart of FIG.

（ステップＳ３０７）スコア算出手段１３２は、ステップＳ３０６で取得した共起頻度を用いて、分割文字列の読み情報ごとにスコアを算出する。かかる処理をスコア算出処理という。スコア算出処理について、図６のフローチャートを用いて説明する。 (Step S307) The score calculation means 132 calculates a score for each reading information of the divided character string, using the co-occurrence frequency acquired in step S306. Such processing is called score calculation processing. The score calculation process will be described with reference to the flowchart of FIG.

（ステップＳ３０８）出力部１４は、出力する情報を構成する。出力する情報は、最も共起頻度の多い読み情報または読み情報の組み合わせを有する。また、出力する情報は、例えば、スコア上位Ｎ個（Ｎは予め決められている）の読み情報または、スコア上位Ｎ個の読み情報の組み合わせからなる読み情報である。 (Step S308) The output unit 14 constitutes information to be output. The information to be output has reading information or a combination of reading information having the highest co-occurrence frequency. Also, the information to be output is, for example, reading information consisting of N reading information with the highest score (N is determined in advance) or a combination of reading information with the highest score N.

（ステップＳ３０９）出力部１４は、ステップＳ３０８で構成した情報を出力する。ステップＳ３０１に戻る。 (Step S309) The output unit 14 outputs the information configured in step S308. The process returns to step S301.

なお、図３のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。 In the flowchart of FIG. 3, the process ends when the power is turned off or the process is terminated.

次に、ステップＳ３０２の読み候補生成処理について図４のフローチャートを用いて説明する。 Next, the reading candidate generation processing in step S302 will be described using the flowchart in FIG.

（ステップＳ４０１）キーワード分割手段１２１は、キーワードの文字数を取得する。 (Step S401) The keyword dividing unit 121 acquires the number of characters of the keyword.

（ステップＳ４０２）キーワード分割手段１２１は、予め決められた閾値（例えば、「１５」）と比較して、ステップＳ４０１で取得された文字数が所定の関係を満たすか否かを判断する。所定の関係を満たせばステップＳ４０３に行き、所定の関係を満たさなければステップＳ４０４に行く。所定の関係を満たす場合は、例えば、閾値より文字数が多い、または、文字数が閾値以上である場合である。 (Step S402) The keyword dividing unit 121 determines whether or not the number of characters acquired in step S401 satisfies a predetermined relationship by comparing with a predetermined threshold (for example, “15”). If the predetermined relationship is satisfied, the process proceeds to step S403. If the predetermined relationship is not satisfied, the process proceeds to step S404. The case where the predetermined relationship is satisfied is, for example, a case where the number of characters is greater than the threshold or the number of characters is equal to or greater than the threshold.

（ステップＳ４０３）読み情報取得手段１２２は、キーワードを用いて読み辞書を検索する。つまり、読み情報取得手段１２２は、例えば、図示しない検索処理部（検索エンジン）にキーワードを渡し、キーワードに一致する用語に対応する読み情報を得る。得られる読み情報の数は、０以上の整数である。そして、読み情報取得手段１２２は、得た読み情報を記憶媒体に、少なくとも一時格納する。上位処理にリターンする。 (Step S403) The reading information acquisition means 122 searches the reading dictionary using a keyword. That is, for example, the reading information acquisition unit 122 passes a keyword to a search processing unit (search engine) (not shown), and obtains reading information corresponding to a term that matches the keyword. The number of reading information obtained is an integer of 0 or more. Then, the reading information acquisition unit 122 temporarily stores the obtained reading information in a storage medium. Return to upper process.

（ステップＳ４０４）キーワード分割手段１２１は、初期化処理を行う。初期化処理とは、カウンタｉに０、カウンタｊに１を代入し、終了フラグにＯＦＦ（例えば、「０」）を代入する処理である。 (Step S404) The keyword dividing unit 121 performs an initialization process. The initialization process is a process of substituting 0 for the counter i, 1 for the counter j, and substituting OFF (for example, “0”) for the end flag.

（ステップＳ４０５）キーワード分割手段１２１は、切断数ｉのキーワードの分割パターンが存在するか否かを判断する。この判断は、例えば、「キーワードの文字数＜＝ｉ」であるか否かの判断である。切断数ｉの分割パターンが存在すればステップＳ４０６に行き、切断数ｉの分割パターンが存在しなければステップＳ４１７に行く。 (Step S405) The keyword dividing unit 121 determines whether or not there is a keyword dividing pattern having the number of cuts i. This determination is, for example, a determination as to whether “the number of characters in the keyword <= i”. If there is a division pattern with the number of cuts i, the process goes to step S406, and if there is no division pattern with the number of cuts i, the process goes to step S417.

（ステップＳ４０６）キーワード分割手段１２１は、切断数ｉのｊ番目のキーワードの分割パターンが存在するか否かを判断する。例えば、文字数５のキーワード、切断数１の場合、分割パターンは、４通りである。また、例えば、文字数５のキーワード、切断数２の場合、分割パターンは、６通りである。 (Step S406) The keyword dividing unit 121 determines whether there is a dividing pattern of the j-th keyword with the number of cuts i. For example, when the number of characters is 5 and the number of cuts is 1, there are four division patterns. For example, in the case of a keyword with 5 characters and a cut number of 2, there are 6 division patterns.

（ステップＳ４０７）キーワード分割手段１２１は、切断数ｉのｊ番目の分割パターンにおける、すべての分割文字列を取得し、メモリ上に配置する。 (Step S407) The keyword dividing unit 121 acquires all the divided character strings in the j-th division pattern with the number of cuts i, and arranges them on the memory.

（ステップＳ４０８）キーワード分割手段１２１は、カウンタｋに１を代入する。 (Step S408) The keyword dividing unit 121 assigns 1 to the counter k.

（ステップＳ４０９）キーワード分割手段１２１は、ステップＳ４０７で取得した２以上の分割文字列のうち、ｋ番目の分割文字列が存在するか否かを判断する。ｋ番目の分割文字列が存在すればステップＳ４１０に行き、ｋ番目の分割文字列が存在しなければステップＳ４１５に行く。 (Step S409) The keyword dividing unit 121 determines whether or not the k-th divided character string exists among the two or more divided character strings acquired in step S407. If the kth divided character string exists, the process goes to step S410, and if the kth divided character string does not exist, the process goes to step S415.

（ステップＳ４１０）読み情報取得手段１２２は、ｋ番目の分割文字列を用いて、読み辞書を検索し、０以上の読み情報を得る。ここで、例えば、読み辞書を構成する辞書が一つの場合は、キーワード分割手段１２１は、単純に、分割文字列をキーとして、当該分割文字列に一致する用語と対になる０以上の読み情報を得る。また、例えば、読み辞書を構成する辞書が、「一般辞書」「連濁辞書」「人名辞書」から構成される場合は、読み情報取得手段１２２は、例えば、「一般辞書」を検索し、読み情報が得られなかった場合は、「連濁辞書」を検索し、さらに読み情報が得られなかった場合は、「人名辞書」を検索する、という処理を行っても良い。また、例えば、読み辞書を構成する辞書が、「一般辞書」「連濁辞書」「人名辞書」から構成される場合は、読み情報取得手段１２２は、例えば、３つの辞書をすべて検索し、すべての対応する読み情報を取得し、メモリ上に配置しても良い。なお、「連濁辞書」を用いるのは、先頭の分割文字列でない場合である。 (Step S410) The reading information acquisition unit 122 searches the reading dictionary using the k-th divided character string to obtain zero or more reading information. Here, for example, when there is only one dictionary constituting the reading dictionary, the keyword dividing unit 121 simply uses the divided character string as a key, and zero or more reading information paired with a term matching the divided character string. Get. Further, for example, when the dictionary constituting the reading dictionary is composed of “general dictionary”, “rendition dictionary”, “person name dictionary”, the reading information acquisition means 122 searches, for example, “general dictionary” and reads the reading information. If “No” is not obtained, a “relationship dictionary” may be searched, and if no reading information is obtained, a “person name dictionary” may be searched. Further, for example, when the dictionaries constituting the reading dictionary are composed of a “general dictionary”, “rendition dictionary”, and “person name dictionary”, the reading information acquisition unit 122 searches all three dictionaries, for example, Corresponding reading information may be acquired and placed on the memory. Note that the “Random Dictionary” is used when it is not the first divided character string.

（ステップＳ４１１）読み情報取得手段１２２は、ステップＳ４１０で取得した読み情報が空（「０」）であったか否かを判断する。空であればステップＳ４１２に行き、空でなければステップＳ４１３に行く。 (Step S411) The reading information acquisition unit 122 determines whether the reading information acquired in step S410 is empty (“0”). If it is empty, go to step S412, and if it is not empty, go to step S413.

（ステップＳ４１２）キーワード分割手段１２１は、カウンタｊを１、インクリメントし、ステップＳ４０６に戻る。 (Step S412) The keyword dividing unit 121 increments the counter j by 1, and returns to Step S406.

（ステップＳ４１３）読み辞書検索部１２は、例外処理を行う。例外処理について、図７のフローチャートを用いて説明する。 (Step S413) The reading dictionary search unit 12 performs an exception process. The exception process will be described with reference to the flowchart of FIG.

（ステップＳ４１４）読み情報取得手段１２２は、カウンタｋを１、インクリメントし、ステップＳ４０９に戻る。 (Step S414) The reading information acquisition unit 122 increments the counter k by 1, and returns to Step S409.

（ステップＳ４１５）キーワード分割手段１２１は、切断数ｉのｊ番目の分割パターンと、各分割文字列における１以上の読み情報を、少なくとも一時格納する。 (Step S415) The keyword dividing unit 121 temporarily stores at least the j-th division pattern with the number of cuts i and one or more reading information in each divided character string.

（ステップＳ４１６）キーワード分割手段１２１は、終了フラグをＯＮ（例えば、「１」）にする。ステップＳ４１２に行く。 (Step S416) The keyword dividing unit 121 sets the end flag to ON (for example, “1”). Go to step S412.

（ステップＳ４１７）キーワード分割手段１２１は、終了フラグがＯＮであるか否かを判断する。終了フラグがＯＮであれば上位処理にリターンし、終了フラグがＯＮでなければステップＳ４１８に行く。 (Step S417) The keyword dividing unit 121 determines whether or not the end flag is ON. If the end flag is ON, the process returns to the host process, and if the end flag is not ON, the process goes to step S418.

（ステップＳ４１８）キーワード分割手段１２１は、カウンタｉを１、インクリメントし、ステップＳ４０５に戻る。 (Step S418) The keyword dividing unit 121 increments the counter i by 1, and returns to step S405.

なお、図４のフローチャートにおいて、ステップＳ４０１や、ステップＳ４１３等の処理は、必須ではない。 In the flowchart of FIG. 4, processes such as step S401 and step S413 are not essential.

次に、ステップＳ３０６の分割文字列検索処理について図５のフローチャートを用いて説明する。 Next, the divided character string search process in step S306 will be described with reference to the flowchart of FIG.

（ステップＳ５０１）共起頻度取得手段１３１は、カウンタｉに１を代入する。 (Step S501) The co-occurrence frequency acquisition unit 131 assigns 1 to the counter i.

（ステップＳ５０２）共起頻度取得手段１３１は、ｉ番目の分割パターンが存在するか否かを判断する。ｉ番目の分割パターンが存在すればステップＳ５０３に行き、ｉ番目の分割パターンが存在しなければ上位処理にリターンする。なお、分割パターンとは、図４のフローチャートのステップＳ４１５で、各分割文字列の読み情報が一時格納された２以上の分割文字列に対応する分割パターンである。分割パターンは、例えば、２以上の分割文字列を有する。 (Step S502) The co-occurrence frequency acquisition unit 131 determines whether or not the i-th division pattern exists. If the i-th division pattern exists, the process goes to step S503, and if the i-th division pattern does not exist, the process returns to the upper level process. The divided pattern is a divided pattern corresponding to two or more divided character strings in which reading information of each divided character string is temporarily stored in step S415 of the flowchart of FIG. The division pattern has, for example, two or more divided character strings.

（ステップＳ５０３）共起頻度取得手段１３１は、カウンタｊに１を代入する。 (Step S503) The co-occurrence frequency acquisition unit 131 assigns 1 to the counter j.

（ステップＳ５０４）共起頻度取得手段１３１は、ｉ番目の分割パターンの中に、ｊ番目の取得対象文字列読み情報が存在するか否かを判断する。取得対象文字列読み情報は、ここでは、対象文字列と読み情報を有する。ｊ番目の取得対象文字列読み情報が存在すればステップＳ５０５に行き、ｊ番目の取得対象文字列読み情報が存在しなければステップＳ５０９に行く。 (Step S504) The co-occurrence frequency acquisition unit 131 determines whether or not the j-th acquisition target character string reading information exists in the i-th division pattern. Here, the acquisition target character string reading information includes a target character string and reading information. If the jth acquisition target character string reading information exists, the process proceeds to step S505. If the jth acquisition target character string reading information does not exist, the process proceeds to step S509.

（ステップＳ５０５）共起頻度取得手段１３１は、ｊ番目の取得対象文字列読み情報が有する対象文字列と読み情報をキーとして、文書群を検索する。なお、文書群の検索とは、対象文字列と読み情報を、いわゆるＷｅｂの検索エンジンに与えて、当該検索エンジンを起動する（または、起動指示をする）ことである。 (Step S505) The co-occurrence frequency acquisition unit 131 searches the document group using the target character string and the reading information included in the j-th acquisition target character string reading information as keys. The search for a document group is to give a target character string and reading information to a so-called Web search engine and start the search engine (or issue a start instruction).

（ステップＳ５０６）共起頻度取得手段１３１は、ステップＳ５０５における検索の結果、ｊ番目の取得対象文字列読み情報に対応する共起頻度を取得する。 (Step S506) The co-occurrence frequency acquisition unit 131 acquires the co-occurrence frequency corresponding to the j-th acquisition target character string reading information as a result of the search in step S505.

（ステップＳ５０７）共起頻度取得手段１３１は、ステップＳ５０６で取得した共起頻度を、ｊ番目の取得対象文字列読み情報に対応付けて、メモリ等に、少なくとも一時格納する。 (Step S507) The co-occurrence frequency acquisition unit 131 stores the co-occurrence frequency acquired in step S506 at least temporarily in a memory or the like in association with the j-th acquisition target character string reading information.

（ステップＳ５０８）共起頻度取得手段１３１は、カウンタｊを１、インクリメントし、ステップＳ５０４に戻る。 (Step S508) The co-occurrence frequency acquisition unit 131 increments the counter j by 1, and returns to step S504.

（ステップＳ５０９）共起頻度取得手段１３１は、カウンタｉを１、インクリメントし、ステップＳ５０２に戻る。 (Step S509) The co-occurrence frequency acquisition unit 131 increments the counter i by 1, and returns to step S502.

次に、ステップＳ３０７のスコア算出処理について図６のフローチャートを用いて説明する。 Next, the score calculation process of step S307 will be described using the flowchart of FIG.

（ステップＳ６０１）スコア算出手段１３２は、カウンタｉに１を代入する。 (Step S601) The score calculation means 132 substitutes 1 for the counter i.

（ステップＳ６０２）スコア算出手段１３２は、ｉ番目の分割パターンが存在するか否かを判断する。ｉ番目の分割パターンが存在すればステップＳ６０３に行き、ｉ番目の分割パターンが存在しなければ上位処理にリターンする。 (Step S602) The score calculation unit 132 determines whether or not the i-th division pattern exists. If the i-th division pattern exists, the process goes to step S603, and if the i-th division pattern does not exist, the process returns to the upper level process.

（ステップＳ６０３）スコア算出手段１３２は、カウンタｊに１を代入する。 (Step S603) The score calculation means 132 substitutes 1 to the counter j.

（ステップＳ６０４）スコア算出手段１３２は、ｉ番目の分割パターンのうち、ｊ番目の読み情報の組み合わせが存在するか否かを判断する。ｊ番目の読み情報の組み合わせが存在すればステップＳ６０５に行き、存在しなければステップＳ６０９に行く。 (Step S604) The score calculation unit 132 determines whether there is a combination of j-th reading information in the i-th divided pattern. If the j-th reading information combination exists, the process goes to step S605, and if not, the process goes to step S609.

（ステップＳ６０５）スコア算出手段１３２は、ｊ番目の読み情報の組み合わせが有する各読み情報に対応するすべての共起頻度を取得する。 (Step S605) The score calculation means 132 acquires all the co-occurrence frequencies corresponding to each reading information included in the j-th reading information combination.

（ステップＳ６０６）スコア算出手段１３２は、ステップＳ６０５で取得した２以上の共起頻度を用いて、スコアを算出する。スコアの算出方法は、例えば、同じ分割文字列の中の共起頻度の多さの順位（数値）の和である。この場合、スコアが小さい方を優先する。 (Step S606) The score calculation means 132 calculates a score using the two or more co-occurrence frequencies acquired in step S605. The score calculation method is, for example, the sum of the ranks (numerical values) of the co-occurrence frequencies in the same divided character string. In this case, priority is given to the smaller score.

（ステップＳ６０７）スコア算出手段１３２は、ｉ番目の分割パターンのｊ番目の読み情報の組み合わせに対応付けて、ステップＳ６０６で算出したスコアを、少なくとも一時格納する。 (Step S607) The score calculation unit 132 temporarily stores at least the score calculated in Step S606 in association with the combination of the j-th reading information of the i-th divided pattern.

（ステップＳ６０８）スコア算出手段１３２は、カウンタｊを１、インクリメントし、ステップＳ６０４に戻る。 (Step S608) The score calculation means 132 increments the counter j by 1, and returns to step S604.

（ステップＳ６０９）スコア算出手段１３２は、カウンタｉを１、インクリメントし、ステップＳ６０２に戻る。 (Step S609) The score calculation means 132 increments the counter i by 1, and returns to step S602.

次に、ステップＳ４１３（図４）の例外処理について図７のフローチャートを用いて説明する。 Next, the exception process in step S413 (FIG. 4) will be described with reference to the flowchart of FIG.

（ステップＳ７０１）変換手段１２３は、分割文字列にカタカナを含むか否かを判断する。カタカナを含めばステップＳ７０２に行き、カタカナを含まなければステップＳ７０４に行く。 (Step S701) The conversion unit 123 determines whether or not the divided character string includes katakana. If katakana is included, the process proceeds to step S702, and if katakana is not included, the process proceeds to step S704.

（ステップＳ７０２）変換手段１２３は、分割文字列に予め決められたカタカナ文字を含むか否かを判断する。予め決められたカタカナ文字を含めば上位処理にリターンし、予め決められたカタカナ文字を含まなければステップＳ７０３に行く。 (Step S702) The conversion unit 123 determines whether or not the divided character string includes a predetermined katakana character. If a predetermined katakana character is included, the process returns to the upper process, and if it does not include a predetermined katakana character, the process goes to step S703.

（ステップＳ７０３）変換手段１２３は、分割文字列をひらがなに変換し、変換したひらがなの文字列を分割文字列の読み情報とする。上位処理にリターンする。 (Step S703) The conversion unit 123 converts the divided character string into hiragana, and uses the converted hiragana character string as reading information of the divided character string. Return to upper process.

（ステップＳ７０４）アルファベット処理手段１２５は、分割文字列にアルファベットを含むか否かを判断する。アルファベットを含めばステップＳ７０５に行き、アルファベットを含まなければステップＳ７０７に行く。 (Step S704) The alphabet processing means 125 determines whether or not the divided character string includes alphabets. If the alphabet is included, the process proceeds to step S705, and if the alphabet is not included, the process proceeds to step S707.

（ステップＳ７０５）アルファベット処理手段１２５は、分割文字列のアルファベットの文字列を取り出し、当該文字列からローマ字読みを示すローマ字読み情報を生成し、当該ローマ字読み情報を読み情報として加える。 (Step S705) The alphabet processing means 125 extracts an alphabetic character string of the divided character string, generates Roman character reading information indicating Roman reading from the character string, and adds the Roman character reading information as reading information.

（ステップＳ７０６）アルファベットを含む場合は、当該アルファベットを１文字ずつの読み上げた場合の文字単位読み情報を生成し、当該文字単位読み情報を読み情報として加える。上位処理にリターンする。 (Step S706) If an alphabet is included, character unit reading information when the alphabet is read out character by character is generated, and the character unit reading information is added as reading information. Return to upper process.

（ステップＳ７０７）アラビア数字処理手段１２４は、分割文字列がアラビア数字を含むか否かを判断。アラビア数字を含めばステップＳ７０８に行き、アラビア数字を含まなければ上位処理にリターンする。 (Step S707) The Arabic numeral processing means 124 determines whether or not the divided character string includes Arabic numerals. If Arabic numerals are included, the process proceeds to step S708, and if Arabic numerals are not included, the process returns to the higher-level processing.

（ステップＳ７０８）アラビア数字処理手段１２４は、アラビア数字を桁付き数字とする読みを示す桁付き数字読み情報を生成し、当該桁付き数字読み情報を読み情報として加える。上位処理にリターンする。 (Step S708) The Arabic numeral processing means 124 generates digit reading information with digits indicating reading with Arabic numerals as digits, and adds the digit reading information with digits as reading information. Return to upper process.

以下、本実施の形態における読み付与装置の具体的な動作について説明する。読み付与装置１を含むシステムの概念図は図１である。 Hereinafter, a specific operation of the reading imparting apparatus according to the present embodiment will be described. FIG. 1 is a conceptual diagram of a system including the reading imparting device 1.

今、読み辞書として、図８から図１０の辞書が与えられている、とする。図８の辞書は、一般辞書であり、例えば、地名辞書、国語辞書、漢和辞書、カタカナ英単語辞書などから作成された辞書である。また、図９の辞書は、連濁辞書であり、図８の一般辞書の語彙の中で連濁化可能な語について作成されたものである。さらに、図１０は、人名辞書である。ここで、読み辞書検索部１２は、まず、与えられた文字列に対して、まず、「一般辞書」を調べて、文字列と一致する用語と対になる読み情報を取得する。そして、次に、文字列がキーワードの語頭以外であれば、「連濁辞書」を調べて、文字列と一致する用語と対になる読み情報を取得し、当該取得した読み情報を追加する。そして、いずれにも無ければ、「人名辞書」を調べて、文字列と一致する用語と対になる読み情報を取得する。以上のようなアルゴリズムで、文字列に対応する読み情報を得るものとする。以下、２つの具体例を用いて、読み付与装置１の動作について説明する。
（具体例１） Assume that the dictionary shown in FIGS. 8 to 10 is given as a reading dictionary. The dictionary in FIG. 8 is a general dictionary, for example, a dictionary created from a place name dictionary, a national language dictionary, a Hanwa dictionary, a Katakana English word dictionary, and the like. Further, the dictionary of FIG. 9 is a rendaku dictionary, and is created for words that can be reconstituted in the vocabulary of the general dictionary of FIG. Furthermore, FIG. 10 is a personal name dictionary. Here, the reading dictionary search unit 12 first checks a “general dictionary” for a given character string, and acquires reading information paired with a term that matches the character string. Then, if the character string is other than the beginning of the keyword, the “relationship dictionary” is examined to obtain reading information paired with a term matching the character string, and the acquired reading information is added. If none exists, the “person name dictionary” is checked to obtain reading information paired with a term matching the character string. The reading information corresponding to the character string is obtained by the algorithm as described above. Hereinafter, the operation of the reading imparting apparatus 1 will be described using two specific examples.
(Specific example 1)

まず、読み付与装置１に、キーワード「成田」が与えられた、とする。そして、受付部１１は、キーワード「成田」を受け付ける。 First, it is assumed that the keyword “Narita” is given to the reading assigning apparatus 1. Then, the reception unit 11 receives the keyword “Narita”.

次に、読み辞書検索部１２は、以下のように読み候補生成処理を行う。つまり、まず、キーワード分割手段１２１は、キーワード「成田」の文字数「２」を取得する。次に、キーワード分割手段１２１は、予め決められた閾値（１５）と比較して、取得された文字数「２」が、所定の関係（例えば、「閾値＜文字数」）を満たさない、と判断する。 Next, the reading dictionary search unit 12 performs reading candidate generation processing as follows. That is, first, the keyword dividing unit 121 acquires the number of characters “2” of the keyword “Narita”. Next, the keyword dividing unit 121 determines that the acquired number of characters “2” does not satisfy a predetermined relationship (for example, “threshold <number of characters”) as compared with a predetermined threshold (15). .

次に、キーワード分割手段１２１は、切断数０のキーワードの分割パターン「成田」を取得する。そして、読み情報取得手段１２２は、対象文字列「成田」をキーとして、図８から図１０の読み辞書を検索する。ここでは、読み情報取得手段１２２は、対象文字列「成田」をキーとして、図８の読み辞書を検索する。そして、読み情報取得手段１２２は、「なりた」「なるだ」「なるた」を得る。 Next, the keyword dividing unit 121 obtains a keyword dividing pattern “Narita” with 0 cuts. Then, the reading information acquisition unit 122 searches the reading dictionaries of FIGS. 8 to 10 using the target character string “Narita” as a key. Here, the reading information acquisition unit 122 searches the reading dictionary of FIG. 8 using the target character string “Narita” as a key. Then, the reading information acquisition unit 122 obtains “Naru”, “Naruda”, and “Naruta”.

次に、例外処理（図７で説明）を行おうとするが、例外処理の対象ではない、と判断される。 Next, an exception process (described in FIG. 7) is to be performed, but it is determined that the exception process is not targeted.

そして、読み情報取得手段１２２は、図１１に示す読み候補管理表を得る。読み候補管理表は、「分割パターン」「分割文字列」「読み情報」を有する。「分割パターン」は、分割パターンを示す情報であり、ここでは、「成田」（分割なし）や、「成田｜リムジンバス｜乗り場」などの切断箇所（分割箇所）に「｜」が入ったキーワードの情報である。また、「分割文字列」は、ここでは、対象文字列と同意義である。さらに、「読み情報」は、対応する分割文字列の読みを示す情報である。 Then, the reading information acquisition unit 122 obtains the reading candidate management table shown in FIG. The reading candidate management table has “divided pattern”, “divided character string”, and “reading information”. “Division pattern” is information indicating a division pattern, and here, a keyword in which “|” is entered at a cutting point (division point) such as “Narita” (no division) or “Narita | Limousine Bus | Platform”. Information. Further, the “divided character string” has the same meaning as the target character string here. Furthermore, “reading information” is information indicating the reading of the corresponding divided character string.

次に、共起頻度取得部１３の共起頻度取得手段１３１は、生成された読み情報（読み方の候補の情報）が空でないと判断する（図１１参照）。 Next, the co-occurrence frequency acquisition unit 131 of the co-occurrence frequency acquisition unit 13 determines that the generated reading information (reading candidate information) is not empty (see FIG. 11).

そして、共起頻度取得手段１３１は、受け付けたキーワード「成田」と、取得した第一の読み情報「なりた」を用いて、文書群を検索し、キーワードと読み情報の共起頻度を取得する。つまり、ここでは、共起頻度取得手段１３１は、いわゆるＷｅｂの検索エンジンに「成田，なりた」を渡し、検索を実行させる。そして、共起頻度取得手段１３１は、検索エンジンから、文字列「成田」と文字列「なりた」の共起頻度（ここでは「１１６０００」）を取得した、とする。そして、共起頻度取得手段１３１は、文字列「成田」と文字列「なりた」と、共起頻度「１１６０００」を対にして、メモリ上に一時格納する。 Then, the co-occurrence frequency acquisition unit 131 searches the document group using the received keyword “Narita” and the acquired first reading information “Narita”, and acquires the co-occurrence frequency of the keyword and the reading information. . That is, here, the co-occurrence frequency acquisition unit 131 passes “Narita, Narita” to a so-called Web search engine to execute a search. The co-occurrence frequency acquisition unit 131 acquires the co-occurrence frequency (“116000” in this case) of the character string “Narita” and the character string “Narita” from the search engine. Then, the co-occurrence frequency acquisition unit 131 temporarily stores the character string “Narita”, the character string “Naruto”, and the co-occurrence frequency “116000” in the memory.

次に、共起頻度取得手段１３１は、受け付けたキーワード「成田」と、取得した第二の読み情報「なるだ」を用いて、文書群を検索し、キーワードと読み情報の共起頻度を取得する。そして、共起頻度取得手段１３１は、検索エンジンから、文字列「成田」と文字列「なるだ」の共起頻度（ここでは「３８４００」）を取得した、とする。そして、共起頻度取得手段１３１は、文字列「成田」と文字列「なるだ」と、共起頻度「３８４００」を対にして、メモリ上に一時格納する。 Next, the co-occurrence frequency acquisition unit 131 searches the document group using the received keyword “Narita” and the acquired second reading information “Naruda”, and acquires the co-occurrence frequency of the keyword and the reading information. To do. The co-occurrence frequency acquisition unit 131 acquires the co-occurrence frequency (“38400” in this case) of the character string “Narita” and the character string “Naruda” from the search engine. Then, the co-occurrence frequency acquisition unit 131 temporarily stores the character string “Narita” and the character string “Naruda” in the memory with the co-occurrence frequency “38400” as a pair.

さらに、共起頻度取得手段１３１は、受け付けたキーワード「成田」と、取得した第二の読み情報「なるた」を用いて、文書群を検索し、キーワードと読み情報の共起頻度を取得する。そして、共起頻度取得手段１３１は、検索エンジンから、文字列「成田」と文字列「なるた」の共起頻度（ここでは「１０１０」）を取得した、とする。そして、共起頻度取得手段１３１は、文字列「成田」と文字列「なるた」と、共起頻度「１０１０」を対にして、メモリ上に一時格納する。 Further, the co-occurrence frequency acquisition unit 131 searches the document group using the received keyword “Narita” and the acquired second reading information “Naruta”, and acquires the co-occurrence frequency of the keyword and the reading information. . The co-occurrence frequency acquisition unit 131 acquires the co-occurrence frequency (“1010” in this case) of the character string “Narita” and the character string “Naruta” from the search engine. The co-occurrence frequency acquisition unit 131 temporarily stores the character string “Narita”, the character string “Naruta”, and the co-occurrence frequency “1010” in the memory.

次に、スコア算出手段１３２は、取得した共起頻度を用いて、分割文字列の読み情報ごとにスコアを算出する。ここでは、スコアは、順位である、とする。つまり、スコア算出手段１３２は、文字列「成田」と文字列「なりた」に対してスコア「１」、文字列「成田」と文字列「なるだ」に対してスコア「２」、文字列「成田」と文字列「なるた」に対してスコア「３」を得る。なお、ここでは、スコアが低いほど、読み候補として優先される。 Next, the score calculation means 132 calculates a score for each reading information of the divided character string using the acquired co-occurrence frequency. Here, it is assumed that the score is a rank. In other words, the score calculation means 132 has a score “1” for the character string “Narita” and the character string “Narita”, a score “2” for the character string “Narita” and the character string “Naruda”, a character string Score “3” is obtained for “Narita” and the character string “Naruta”. Here, the lower the score, the higher the priority as a reading candidate.

次に、出力部１４は、出力する情報を、「１．なりた，２．なるだ，３．なるた」を構成する。そして、出力部１４は、「成田」に対する読み候補を優先度の高いものから２つ出力する。つまり、出力部１４は、「１．なりた，２．なるだ」を出力する。なお、ここでは、出力部１４は、例えば、出力する読み情報の数「２」を格納している、とする。
（具体例２） Next, the output unit 14 configures the information to be output as “1. is, 2. is, 3. is”. Then, the output unit 14 outputs two reading candidates for “Narita” in descending order of priority. That is, the output unit 14 outputs “1. Here, it is assumed that the output unit 14 stores, for example, the number “2” of reading information to be output.
(Specific example 2)

次に、読み付与装置１に、キーワード「成田リムジンバス乗り場」が与えられた、とする。そして、受付部１１は、キーワード「成田リムジンバス乗り場」を受け付ける。 Next, it is assumed that the keyword “Narita Limousine Bus Stop” is given to the reading imparting device 1. And the reception part 11 receives the keyword "Narita limousine bus platform".

次に、読み辞書検索部１２は、以下のように読み候補生成処理を行う。つまり、まず、キーワード分割手段１２１は、キーワード「成田リムジンバス乗り場」の文字数「１３」を取得する。次に、キーワード分割手段１２１は、予め決められた閾値（１５）と比較して、取得された文字数「１３」が、所定の関係（例えば、「閾値＜文字数」）を満たさない、と判断する。 Next, the reading dictionary search unit 12 performs reading candidate generation processing as follows. That is, first, the keyword dividing unit 121 acquires the number of characters “13” of the keyword “Narita Limousine Bus Station”. Next, the keyword dividing unit 121 determines that the acquired number of characters “13” does not satisfy a predetermined relationship (for example, “threshold <number of characters”) as compared with a predetermined threshold (15). .

次に、キーワード分割手段１２１は、切断数０のキーワードの分割パターン「成田リムジンバス乗り場」を取得する（図１２の分割パターンＩＤ＝１を参照）。そして、読み情報取得手段１２２は、対象文字列「成田リムジンバス乗り場」をキーとして、図８から図１０の読み辞書を検索する。ここでは、読み情報取得手段１２２は、対象文字列「成田リムジンバス乗り場」をキーとして、図８の読み辞書を検索する。そして、読み情報取得手段１２２は、対象文字列「成田リムジンバス乗り場」に対応する読み情報を取得できない。 Next, the keyword dividing unit 121 obtains a keyword division pattern “Narita Limousine Bus Platform” with a cut number of 0 (see division pattern ID = 1 in FIG. 12). Then, the reading information acquisition unit 122 searches the reading dictionaries of FIGS. 8 to 10 using the target character string “Narita Limousine Bus Station” as a key. Here, the reading information acquisition unit 122 searches the reading dictionary of FIG. 8 using the target character string “Narita Limousine Bus Stop” as a key. And the reading information acquisition means 122 cannot acquire the reading information corresponding to the object character string “Narita Limousine Bus Station”.

次に、キーワード分割手段１２１は、切断数１のキーワードの分割パターンを取得する。つまり、まず、キーワード分割手段１２１は、切断数１の第一の分割パターン「成｜田リムジンバス乗り場」を取得する（図１２の分割パターンＩＤ＝２を参照）。「成｜田リムジンバス乗り場」とは、分割文字列「成」と分割文字列「田リムジンバス乗り場」を意味する。 Next, the keyword dividing unit 121 acquires a keyword dividing pattern with the number of cuts of one. That is, first, the keyword dividing unit 121 obtains the first division pattern “Sei | Ta Limousine Bus Station” with the number of cuts 1 (see division pattern ID = 2 in FIG. 12). “Sei | Ta Limousine Bus Station” means a divided character string “Nari” and a divided character string “Ta Limousine Bus Station”.

そして、読み情報取得手段１２２は、分割文字列「成」を用いて、図８から図１０の読み辞書を検索するが、読み情報を得られない。 And the reading information acquisition means 122 searches the reading dictionary of FIGS. 8-10 using the division | segmentation character string "sei", but reading information cannot be obtained.

次に、キーワード分割手段１２１は、分割文字列「成」では読み情報を得られないとの情報を読み情報取得手段１２２の処理結果から得て、分割文字列「成田」と分割文字列「リムジンバス乗り場」を取得する（図１２の分割パターンＩＤ＝３を参照）。 Next, the keyword dividing unit 121 obtains information that the divided character string “sei” cannot obtain reading information from the processing result of the reading information acquisition unit 122, and obtains the divided character string “Narita” and the divided character string “limousine”. "Bus stop" is acquired (see division pattern ID = 3 in FIG. 12).

そして、読み情報取得手段１２２は、分割文字列「成田」を用いて、図８から図１０の読み辞書を検索し、図８の読み辞書から、読み情報「なりた」「なるだ」「なるた」を得る。 Then, the reading information acquisition unit 122 searches the reading dictionary of FIGS. 8 to 10 using the divided character string “Narita”, and reads the reading information “Naru”, “Naruda” and “Naruda” from the reading dictionary of FIG. Get ".

次に、読み情報取得手段１２２は、分割文字列「リムジンバス乗り場」を用いて、図８から図１０の読み辞書を検索するが、読み情報を得られない。 Next, the reading information acquisition unit 122 searches the reading dictionaries of FIGS. 8 to 10 using the divided character string “Limousine bus stop”, but cannot read the reading information.

そして、同様に、キーワード分割手段１２１は、図１２の分割パターンＩＤ「４」から「１１」までの分割文字列を順に取得する。そして、順に、キーワード分割手段１２１が取得した分割文字列を用いて、読み情報取得手段１２２は、読み辞書を検索するが、読み情報を得られない。 Similarly, the keyword dividing unit 121 sequentially obtains divided character strings from the division pattern IDs “4” to “11” in FIG. Then, in order, the reading information acquisition unit 122 searches the reading dictionary using the divided character string acquired by the keyword dividing unit 121, but reading information cannot be obtained.

次に、キーワード分割手段１２１は、切断数２のキーワードの分割パターンを取得する。つまり、まず、キーワード分割手段１２１は、切断数１の第一の分割パターン「成田｜リ｜ムジンバス乗り場」を取得する（図１２の分割パターンＩＤ＝１２を参照）。 Next, the keyword dividing unit 121 acquires the dividing pattern of the keyword with 2 cuts. That is, first, the keyword dividing unit 121 obtains the first division pattern “Narita | Li | Mujin Bus stop” with the number of cuts 1 (see division pattern ID = 12 in FIG. 12).

次に、読み情報取得手段１２２は、分割文字列「成田」については、読み情報「なりた」「なるだ」「なるた」を得る。しかし、読み情報取得手段１２２は、分割文字列「リ」について、図８から図１０の読み辞書を検索するが、読み情報を得られない。 Next, for the divided character string “Narita”, the reading information acquisition unit 122 obtains reading information “Narita”, “Naruda”, and “Naruta”. However, the reading information acquisition unit 122 searches the reading dictionary of FIGS. 8 to 10 for the divided character string “Li”, but cannot read the reading information.

以上の処理を、図１２の分割パターンＩＤ「１３」から「１６」まで繰り返すが、各分割パターンについて、すべての分割文字列の読み情報を得るには至らない。 The above processing is repeated from the division pattern IDs “13” to “16” in FIG. 12, but reading information of all divided character strings cannot be obtained for each division pattern.

そして、キーワード分割手段１２１は、分割パターン「成田｜リムジンバス｜乗り場」を取得する（図１２の分割パターンＩＤ＝１７を参照）。 Then, the keyword dividing unit 121 acquires the division pattern “Narita | Limousine bus | Place” (see division pattern ID = 17 in FIG. 12).

次に、読み情報取得手段１２２は、分割文字列「成田」については、読み情報「なりた」「なるだ」「なるた」を得る。そして、読み情報取得手段１２２は、分割文字列「リムジンバス」については、読み情報「りむじんばす」を得る。さらに、分割文字列「乗り場」については、「のりば」「のりじょう」を得る。つまり、読み情報取得手段１２２は、図１３に示す読み情報管理表を取得し、メモリ上に配置する。読み情報管理表は、読み候補を管理している表であり、「分割文字列」と「読み情報」を有する。なお、分割文字列「成田」、「乗り場」について、例外処理の対象とはならない。また、変換手段１２３は、分割文字列「リムジンバス」について、予め決められた文字列「ヴ」を含まないと判断し、「リムジンバス」をひらがな「りむじんばす」に変換する。なお、この変換処理は、行わなくても良い。読み情報「りむじんばす」が既に取得されているからである。また、図８の読み辞書に「リムジンバス，りむじんばす」が存在しない場合、カタカナをひらがなに変換する例外処理が機能する。 Next, for the divided character string “Narita”, the reading information acquisition unit 122 obtains reading information “Narita”, “Naruda”, and “Naruta”. Then, the reading information acquisition unit 122 obtains the reading information “Rimujinbas” for the divided character string “limousine bus”. Further, for the divided character string “Platform”, “Park” and “Norijo” are obtained. That is, the reading information acquisition unit 122 acquires the reading information management table shown in FIG. 13 and arranges it on the memory. The reading information management table is a table that manages reading candidates, and includes “divided character strings” and “reading information”. Note that the split character strings “Narita” and “Platform” are not subject to exception processing. Further, the conversion unit 123 determines that the divided character string “limousine bus” does not include the predetermined character string “V”, and converts the “limousine bus” into the hiragana “Riminbus”. This conversion process may not be performed. This is because the reading information “Rimujinbas” has already been acquired. In addition, when “Limousine Bus, Rimu Jinba” does not exist in the reading dictionary of FIG. 8, exception processing for converting katakana into hiragana functions.

次に、キーワード分割手段１２１は、切断数２の他の分割パターンを順に取得する。キーワード分割手段１２１は、例えば、「成田｜リムジンバス乗｜り場」、「成田｜リムジンバス乗り｜場」、「成田リ｜ム｜ジンバス乗り場」、「成田リム｜ジ｜ンバス乗り場」、「成田リムジ｜ン｜バス乗り場」、「成田リムジン｜バ｜ス乗り場」などを順に取得する。そして、読み情報取得手段１２２は、各分割パターンが有するすべての分割文字列の読み情報が取得できるか判断し、すべての分割文字列の読み情報が取得できる分割パターンが、「成田｜リムジンバス｜乗り場」以外に存在しない、と判断する。 Next, the keyword dividing unit 121 sequentially acquires other division patterns having a cut number of 2. The keyword dividing means 121 is, for example, “Narita | Limousine bus boarding | ground”, “Narita | Limousine bus boarding | field”, “Narita Ri | Mu | Jinbus boarding”, “Narita rim | Jimbus boarding”, “ Get Narita Limousine | Non Bus Stop, Narita Limousine | Bus Bus Stop, etc. in order. Then, the reading information acquisition unit 122 determines whether reading information of all divided character strings included in each divided pattern can be acquired, and the divided pattern capable of acquiring reading information of all divided character strings is “Narita | Limousine bus | It is determined that there is no place other than "Platform".

次に、共起頻度取得手段１３１は、１番目の分割パターン「成田｜リムジンバス｜乗り場」の中の１番目の対象文字列「成田」の読み情報「なりた」を取得する。 Next, the co-occurrence frequency acquisition unit 131 acquires the reading information “Narita” of the first target character string “Narita” in the first division pattern “Narita | Limousine Bus | Place”.

次に、共起頻度取得手段１３１は、１番目の取得対象文字列読み情報が有する対象文字列「成田」と読み情報「なりた」を、検索エンジン（検索エンジンは問わない。）に渡し、検索エンジンを動作させることにより、Ｗｅｂ上の文書群（ここでは、ホームページ）を検索する。なお、共起頻度取得手段１３１は、検索エンジンのＡＰＩ（例えば、ｓｅａｒｃｈ（））に対して、「成田」「なりた」を引数として与え、実行する（例えば、ｓｅａｒｃｈ（成田，なりた）を実行する）。そして、共起頻度取得手段１３１は、「成田」「なりた」の、Ｗｅｂ上の文書群における共起頻度「１１６０００」を取得する。そして、共起頻度取得手段１３１は、「成田，なりた，１１６０００」をメモリ上に配置する。 Next, the co-occurrence frequency acquisition unit 131 passes the target character string “Narita” and the reading information “Narita” included in the first acquisition target character string reading information to a search engine (regardless of the search engine). By operating a search engine, a document group (here, a home page) on the Web is searched. The co-occurrence frequency acquisition means 131 gives “Narita” and “Narita” as arguments to an API (for example, search ()) of the search engine and executes it (for example, search (Narita, Narita)). Run). The co-occurrence frequency acquisition unit 131 acquires the co-occurrence frequency “116000” in the document group on the Web for “Narita” and “Narita”. Then, the co-occurrence frequency acquisition unit 131 arranges “Narita, Narita, 116000” on the memory.

次に、共起頻度取得手段１３１は、１番目の分割パターン「成田｜リムジンバス｜乗り場」の中の１番目の対象文字列「成田」の２番目の読み情報「なるだ」を取得する。そして、共起頻度取得手段１３１は、用語「成田」「なるだ」を検索エンジンに渡し、検索エンジンを動作させることにより、Ｗｅｂ上の文書群（ここでは、ホームページ）を検索する。そして、共起頻度取得手段１３１は、「成田」「なるだ」の、Ｗｅｂ上の文書群における共起頻度「３８４００」を取得する。そして、共起頻度取得手段１３１は、「成田，なるだ，３８４００」をメモリ上に配置する。 Next, the co-occurrence frequency acquisition unit 131 acquires the second reading information “Naruda” of the first target character string “Narita” in the first division pattern “Narita | Limousine bus | Place”. Then, the co-occurrence frequency acquisition means 131 passes the terms “Narita” and “Naruda” to the search engine, and operates the search engine to search for a document group (here, a home page) on the Web. Then, the co-occurrence frequency acquisition unit 131 acquires the co-occurrence frequency “38400” in the document group on the Web for “Narita” and “Naruda”. Then, the co-occurrence frequency acquisition unit 131 arranges “Narita, Naruda, 38400” on the memory.

次に、共起頻度取得手段１３１は、１番目の分割パターン「成田｜リムジンバス｜乗り場」の中の１番目の対象文字列「成田」の３番目の読み情報「なるた」を取得する。そして、共起頻度取得手段１３１は、用語「成田」「なるた」を検索エンジンに渡し、検索エンジンを動作させることにより、Ｗｅｂ上の文書群（ここでは、ホームページ）を検索する。そして、共起頻度取得手段１３１は、「成田」「（検索エンジンは問わない。）」の、Ｗｅｂ上の文書群における共起頻度「１０１０」を取得する。そして、共起頻度取得手段１３１は、「成田，なるた，１０１０」をメモリ上に配置する。 Next, the co-occurrence frequency acquisition unit 131 acquires the third reading information “Naruta” of the first target character string “Narita” in the first division pattern “Narita | Limousine Bus | Place”. Then, the co-occurrence frequency acquisition unit 131 passes the terms “Narita” and “Naruta” to the search engine, and operates the search engine to search a document group (here, a home page) on the Web. The co-occurrence frequency acquisition unit 131 acquires the co-occurrence frequency “1010” in the document group on the Web of “Narita” and “(regardless of the search engine)”. Then, the co-occurrence frequency acquisition unit 131 arranges “Narita, Naruta, 1010” on the memory.

次に、共起頻度取得手段１３１は、１番目の分割パターン「成田｜リムジンバス｜乗り場」の中の２番目の対象文字列「リムジンバス」の読み情報「りむじんばす」を取得する。そして、共起頻度取得手段１３１は、用語「リムジンバス」「りむじんばす」を検索エンジンに渡し、検索エンジンを動作させることにより、Ｗｅｂ上の文書群（ここでは、ホームページ）を検索する。そして、共起頻度取得手段１３１は、「リムジンバス」「りむじんばす」の、Ｗｅｂ上の文書群における共起頻度「６１９００」を取得する。そして、共起頻度取得手段１３１は、「リムジンバス，りむじんばすだ，６１９００」をメモリ上に配置する。 Next, the co-occurrence frequency acquisition means 131 acquires the reading information “Rimujinbas” of the second target character string “limousine bus” in the first division pattern “Narita | Limousine bus | Place”. The co-occurrence frequency acquisition means 131 passes the terms “limousine bus” and “Rim Jinba” to the search engine, and operates the search engine to search a document group (here, a home page) on the Web. Then, the co-occurrence frequency acquisition unit 131 acquires the co-occurrence frequency “61900” in the document group on the Web for “Limousine bus” and “Rim Jinbus”. Then, the co-occurrence frequency acquisition unit 131 arranges “Limousine bus, Rimujinsuda, 61900” on the memory.

次に、共起頻度取得手段１３１は、１番目の分割パターン「成田｜リムジンバス｜乗り場」の中の３番目の対象文字列「乗り場」の1番目の読み情報「のりば」を取得する。そして、共起頻度取得手段１３１は、用語「乗り場」「のりば」を検索エンジンに渡し、検索エンジンを動作させることにより、Ｗｅｂ上の文書群（ここでは、ホームページ）を検索する。そして、共起頻度取得手段１３１は、「乗り場」「のりば」の、Ｗｅｂ上の文書群における共起頻度「６２５００」を取得する。そして、共起頻度取得手段１３１は、「乗り場，のりば，６２５００」をメモリ上に配置する。 Next, the co-occurrence frequency acquisition unit 131 acquires the first reading information “Platform” of the third target character string “Platform” in the first division pattern “Narita | Limousine Bus | Platform”. Then, the co-occurrence frequency acquisition means 131 searches the document group on the Web (here, a home page) by passing the terms “depot” and “pick” to the search engine and operating the search engine. Then, the co-occurrence frequency acquisition unit 131 acquires the co-occurrence frequency “62500” in the document group on the Web for “Platform” and “Platform”. Then, the co-occurrence frequency acquisition unit 131 arranges “Platform, platform, 62500” on the memory.

次に、共起頻度取得手段１３１は、１番目の分割パターン「成田｜リムジンバス｜乗り場」の中の３番目の対象文字列「乗り場」の２番目の読み情報「のりじょう」を取得する。そして、共起頻度取得手段１３１は、用語「乗り場」「のりじょう」を検索エンジンに渡し、検索エンジンを動作させることにより、Ｗｅｂ上の文書群（ここでは、ホームページ）を検索する。そして、共起頻度取得手段１３１は、「乗り場」「のりじょう」の、Ｗｅｂ上の文書群における共起頻度「１」を取得する。そして、共起頻度取得手段１３１は、「乗り場，のりじょう，１」をメモリ上に配置する。 Next, the co-occurrence frequency acquisition means 131 acquires the second reading information “Norijo” of the third target character string “Platform” in the first division pattern “Narita | Limousine Bus | Platform”. Then, the co-occurrence frequency acquisition means 131 searches the document group on the Web (here, the homepage) by passing the terms “Place” and “Norijo” to the search engine and operating the search engine. Then, the co-occurrence frequency acquisition unit 131 acquires the co-occurrence frequency “1” in the document group on the Web for “Platform” and “Norijo”. Then, the co-occurrence frequency acquisition unit 131 arranges “Platform, Norijo, 1” on the memory.

次に、スコア算出手段１３２は、例えば、以下のようにスコアを算出する。つまり、スコア算出手段１３２は、用語「成田」に対する読み情報の順位を、各読み情報の共起頻度「１１６０００，３８４００，１０１０」から得る。つまり、スコア算出手段１３２は、「１：なりた」「２：なるだ」「３：なるた」を得る。 Next, the score calculation means 132 calculates a score as follows, for example. That is, the score calculation unit 132 obtains the ranking of the reading information for the term “Narita” from the co-occurrence frequency “116000, 38400, 1010” of each reading information. That is, the score calculation unit 132 obtains “1: become”, “2: become”, and “3: become”.

次に、同様に、スコア算出手段１３２は、用語「りむじんばす」に対する読み情報の順位の情報「１：りむじんばす」を得る。 Next, similarly, the score calculation means 132 obtains the information “1: Rimujinba” of the ranking of the reading information for the term “rimujinba”.

さらに、スコア算出手段１３２は、用語「乗り場」に対する読み情報の順位の情報「１：のりば」「２：のりじょう」を得る。 Furthermore, the score calculation means 132 obtains information “1: platform” and “2: route” of the ranking of the reading information for the term “depot”.

次に、スコア算出手段１３２は、すべての読み情報の組み合わせに対するスコアを算出する。ここでは、スコアは各読み情報の順位の合計である、とする。つまり、スコア算出手段１３２は、すべての読み情報の組み合わせに対するスコアを算出し、図１４に示すスコア管理表を得る。スコア管理表のスコアは、「成田リムジンバス乗り場」の読み情報を構成する３つの読み情報のスコアを加算したものである。なお、図１４において、スコアが小さいほど、読み候補として優先度が高いことを示す。 Next, the score calculation means 132 calculates scores for all combinations of reading information. Here, it is assumed that the score is the sum of the ranks of the reading information. That is, the score calculation unit 132 calculates scores for all combinations of reading information, and obtains a score management table shown in FIG. The score of the score management table is obtained by adding the scores of the three reading information constituting the reading information of “Narita Limousine Bus Station”. In FIG. 14, the smaller the score, the higher the priority as a reading candidate.

次に、出力部１４は、出力する情報を構成する。ここでは、出力部１４は、最も優先度が高い１つの読み情報「なりたりむじんばすのりば」を取得し、出力するデータ構造に構成する。次に、出力部１４は、構成した情報「なりたりむじんばすのりば」を出力する。
（実験結果） Next, the output unit 14 configures information to be output. Here, the output unit 14 obtains one reading information with the highest priority, “Narimu Mujinba Stop”, and configures the data structure to be output. Next, the output unit 14 outputs the configured information “Naritar Mujinba Bus Stop”.
(Experimental result)

以下、読み付与装置１の実験結果について説明する。本実験において、ガイドブックに掲載されている多数のランドマークをキーワードとして選択した。そして、読み付与装置１の手法を用いた実験結果を図１５に示す。図１５において、楕円で囲まれた箇所の「比率」の合計である「９８．５３％」が正解率である。つまり、読み付与装置１は、極めて高い正解率で、読み情報を取得できる。なお、図１５において、楕円で囲まれた第一の値「２３．９８％」は、上記の分割処理による効果であると推定される。また、楕円で囲まれた第二の値「６９．１７％」は、２以上の読み情報の候補が存在する場合に、正解の読み情報が取得できたものであると推定される。さらに、楕円で囲まれた第三の値「５．３８％」は、大語彙辞書の効果により正解の読み情報が取得できたものであると推定される。 Hereinafter, experimental results of the reading imparting apparatus 1 will be described. In this experiment, many landmarks listed in the guidebook were selected as keywords. And the experimental result using the method of the reading provision apparatus 1 is shown in FIG. In FIG. 15, “985.53%”, which is the sum of the “ratio” of the portions surrounded by the ellipse, is the correct answer rate. That is, the reading imparting device 1 can acquire reading information with a very high accuracy rate. In FIG. 15, the first value “23.98%” surrounded by an ellipse is estimated to be an effect of the above division processing. The second value “69.17%” surrounded by an ellipse is estimated to be obtained as correct reading information when two or more reading information candidates exist. Further, the third value “5.38%” surrounded by an ellipse is estimated to be obtained as correct reading information due to the effect of the large vocabulary dictionary.

以上、本実施の形態によれば、非常に精度高く、キーワードの読み情報を取得できる。 As described above, according to the present embodiment, it is possible to acquire keyword reading information with very high accuracy.

なお、本実施の形態によれば、キーワードを分割し、２以上の分割文字列を得るアルゴリズムは、上記したアルゴリズムとは限らない。とのような手順で分割文字列を得ても良い。 According to the present embodiment, an algorithm for dividing a keyword and obtaining two or more divided character strings is not limited to the algorithm described above. A divided character string may be obtained by a procedure such as

さらに、本実施の形態における処理は、ソフトウェアで実現しても良い。そして、このソフトウェアをソフトウェアダウンロード等により配布しても良い。また、このソフトウェアをＣＤ−ＲＯＭなどの記録媒体に記録して流布しても良い。なお、このことは、本明細書における他の実施の形態においても該当する。なお、本実施の形態における情報処理装置を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、コンピュータを、キーワードを受け付ける受付部と、用語と、当該用語の読みを示す読み情報を有する用語読み情報を２以上格納している読み辞書に対して、前記受付部が受け付けたキーワードまたは当該キーワードを分割した文字列である２以上の各分割文字列である対象文字列を用いて検索し、２以上の読み情報を取得する読み辞書検索部と、前記対象文字列と、前記読み辞書検索部が取得した２以上の読み情報のうちの各読み情報の組である各取得対象文字列読み情報を用いて、文書群を検索し、前記各取得対象文字列読み情報が有する対象文字列と読み情報が共起する文書の数である共起頻度を、読み情報ごとに取得する共起頻度取得部と、前記共起頻度取得部が取得した読み情報ごとの共起頻度を用いて、少なくとも最も共起頻度の多い読み情報または読み情報の組み合わせを出力する出力部として機能させるためのプログラム、である。 Furthermore, the processing in the present embodiment may be realized by software. Then, this software may be distributed by software download or the like. Further, this software may be recorded and distributed on a recording medium such as a CD-ROM. This also applies to other embodiments in this specification. Note that the software that implements the information processing apparatus according to the present embodiment is the following program. That is, the program accepts the computer with respect to a reading dictionary storing a keyword and a reading dictionary storing two or more term reading information including a term and reading information indicating reading of the term. A keyword dictionary or a character string obtained by dividing the keyword and a target character string that is two or more divided character strings, and obtains two or more reading information; the target character string; A document group is searched using each acquisition target character string reading information that is a set of each reading information among two or more reading information acquired by the reading dictionary search unit, and each acquisition target character string reading information has. The co-occurrence frequency acquisition unit that acquires the co-occurrence frequency, which is the number of documents in which the target character string and the reading information co-occur, for each reading information, and the co-occurrence frequency for each reading information acquired by the co-occurrence frequency acquisition unit. Use A program, to function as an output unit for outputting a combination of more reading information or reading information of at least the most co-occurrence frequency.

また、上記プログラムにおいて、前記読み辞書検索部は、用語と、当該用語の読みを示す読み情報を有する用語読み情報を２以上格納している読み辞書に対して、前記対象文字列を用いて検索し、２以上の読み情報を取得する読み情報取得手段と、前記対象文字列がカタカナの文字列を含む場合、当該カタカナの文字列をひらがなに変換する変換手段を具備するものとして、コンピュータを機能させるためのプログラムであることは好適である。 In the above program, the reading dictionary search unit searches the reading dictionary storing two or more term reading information having a term and reading information indicating the reading of the term using the target character string. A computer having a reading information acquisition means for acquiring two or more reading information and a conversion means for converting the katakana character string into hiragana when the target character string includes a katakana character string; It is preferable that the program is for causing the program to occur.

また、上記プログラムにおいて、前記変換手段は、前記対象文字列が、あらかじめ決められたカタカナを含むか否かを判断し、当該カタカナを含む場合には、前記カタカナの文字列をひらがなに変換しないものとして、コンピュータを機能させるためのプログラムであることは好適である。 Further, in the above program, the conversion means determines whether or not the target character string includes a predetermined katakana, and if it includes the katakana, does not convert the katakana character string into hiragana It is preferable that the program is for causing a computer to function.

また、上記プログラムにおいて、前記読み辞書検索部は、前記受付部が受け付けたキーワードを１文字以上の単位で分割し、分割した文字列である２以上の分割文字列を取得するキーワード分割手段と、前記２以上の各分割文字列に一致する用語と対になる１以上の読み情報を、前記読み辞書から、分割文字列ごとに検索し、分割文字列と読み情報を有する取得対象文字列読み情報を、２以上取得する読み情報取得手段を具備するものとして、コンピュータを機能させるためのプログラムであることは好適である。 In the above program, the reading dictionary search unit divides the keyword received by the receiving unit in units of one character or more, and obtains two or more divided character strings that are divided character strings; One or more reading information paired with a term matching each of the two or more divided character strings is retrieved from the reading dictionary for each divided character string, and the acquisition target character string reading information having the divided character string and the reading information Is preferably a program for causing a computer to function as a reading information acquisition means for acquiring two or more.

また、上記プログラムにおいて、前記キーワード分割手段は、前記受付部が受け付けたキーワードの文字列長を取得し、当該文字列長が予め決められた文字列長より大きい、または予め決められた文字列長以上の場合には、前記キーワードの分割を行わず、前記読み情報取得手段は、前記キーワードの分割を行わない場合、前記受付部が受け付けたキーワードと一致する用語と対になる１以上の読み情報を、前記読み辞書から取得するものとして、コンピュータを機能させるためのプログラムであることは好適である。 In the above program, the keyword dividing unit obtains the character string length of the keyword received by the receiving unit, and the character string length is larger than a predetermined character string length or a predetermined character string length. In the above case, the keyword is not divided, and the reading information acquisition means is one or more reading information paired with a term that matches the keyword received by the receiving unit when the keyword is not divided. Is obtained from the reading dictionary, and is preferably a program for causing a computer to function.

また、上記プログラムにおいて、前記キーワード分割手段は、切断数Ｎ（Ｎは０以上）の少ない場合から、順に、前記受付部が受け付けたキーワードを１文字以上の単位で分割し、分割した文字列である１以上の分割文字列を取得し、前記読み情報取得手段からの終了指示があるまで、前記キーワードの分割を繰り返し、前記読み情報取得手段は、前記キーワード分割手段が取得した１以上の各分割文字列に一致する用語と対になる１以上の読み情報を、前記読み辞書から、分割文字列ごとに取得し、１以上の分割文字列のすべてに対応する読み情報が取得できなかった場合には、前記キーワード分割手段に対して、次に分割のパターンである２以上の分割文字列を取得するように指示し、前記１以上の分割文字列のすべてに対応する読み情報が取得できた場合には、当該１以上の分割文字列のすべてに対応する読み情報を記憶媒体に少なくとも一時格納する読み情報の取得処理を行い、かつ、前記キーワード分割手段に終了指示を渡し、または、前記１以上の分割文字列のすべてに対応する読み情報が取得できた場合と同じ切断数の他の分割のパターンである２以上の分割文字列を、前記キーワード分割手段に対して、取得するように指示し、前記同じ切断数の他の分割のパターンに対する読み情報の取得処理が完了した後、前記キーワード分割手段に終了指示を渡すものとして、コンピュータを機能させるためのプログラムであることは好適である。 In the above program, the keyword dividing means divides the keyword received by the receiving unit in order of one character or more in order from the case where the number of cuts N (N is 0 or more) is small, and the divided character string One or more divided character strings are acquired, and the keyword division is repeated until an end instruction is received from the reading information acquisition unit, and the reading information acquisition unit includes one or more divisions acquired by the keyword dividing unit. When one or more reading information paired with a term matching the character string is obtained from the reading dictionary for each divided character string, and reading information corresponding to all of the one or more divided character strings cannot be obtained. Instructs the keyword dividing means to acquire two or more divided character strings which are the next division patterns, and reads information corresponding to all of the one or more divided character strings. Is acquired, the reading information corresponding to all of the one or more divided character strings is read at least temporarily stored in a storage medium, and an end instruction is passed to the keyword dividing means, Alternatively, two or more divided character strings, which are other division patterns having the same number of cuts as when reading information corresponding to all of the one or more divided character strings has been acquired, are acquired from the keyword dividing unit. A program for causing a computer to function as an instruction to pass an end instruction to the keyword dividing means after completion of reading information acquisition processing for other division patterns of the same number of cuts. Is preferred.

また、上記プログラムにおいて、前記読み辞書検索部は、前記対象文字列がアラビア数字を含むか否かを判断し、アラビア数字を含む場合は、当該アラビア数字を桁付き数字とする読みを示す桁付き数字読み情報を生成し、当該桁付き数字読み情報を読み情報として加えるアラビア数字処理手段をさらに具備するものとして、コンピュータを機能させるためのプログラムであることは好適である。 In the above program, the reading dictionary search unit determines whether or not the target character string includes Arabic numerals. If the target character string includes Arabic numerals, the reading dictionary search section includes digits indicating the reading with the Arabic numerals as digits. It is preferable that the program is for causing a computer to function as further comprising Arabic numeral processing means for generating numeral reading information and adding the digit reading information with digits as reading information.

また、上記プログラムにおいて、前記読み辞書検索部は、前記対象文字列がアルファベットを含むか否かを判断し、アルファベットを含む場合は、当該アルファベットに対応するローマ字読みを示すローマ字読み情報を生成し、当該ローマ字読み情報を読み情報として加えるアルファベット処理手段をさらに具備するものとして、コンピュータを機能させるためのプログラムであることは好適である。 Further, in the above program, the reading dictionary search unit determines whether or not the target character string includes an alphabet, and if it includes the alphabet, generates romaji reading information indicating a romaji reading corresponding to the alphabet, A program for causing a computer to function is preferable as further comprising alphabet processing means for adding the Roman character reading information as reading information.

また、上記プログラムにおいて、前記共起頻度取得部は、前記読み辞書検索部が取得した２以上の各取得対象文字列読み情報を用いて、文書群を検索し、前記各取得対象文字列読み情報が有するキーワードまたは分割文字列と、読み情報が共起する文書の数である共起頻度を、取得対象文字列読み情報ごとに取得する共起頻度取得手段と、前記受付部が受け付けたキーワードまたは２以上の分割文字列を有する取得対象文字列読み情報に対応する共起頻度をパラメータして、取得対象文字列読み情報または取得対象文字列読み情報の組み合わせごとに、スコアを算出するスコア算出手段を具備し、前記出力部は、前記共起頻度取得部が取得したスコアを用いて、少なくとも最も共起頻度の多い読み情報または読み情報の組み合わせを出力するものとして、コンピュータを機能させるためのプログラムであることは好適である。
（実施の形態２） In the above program, the co-occurrence frequency acquisition unit searches for a document group using each of the two or more acquisition target character string reading information acquired by the reading dictionary search unit, and each of the acquisition target character string reading information A co-occurrence frequency acquisition unit that acquires, for each acquisition target character string reading information, a co-occurrence frequency that is the number of documents or reading character co-occurrence, and a keyword received by the reception unit, A score calculation means for calculating a score for each combination of the acquisition target character string reading information or the acquisition target character string reading information by setting the co-occurrence frequency corresponding to the acquisition target character string reading information having two or more divided character strings. And the output unit outputs at least reading information or a combination of reading information having the highest co-occurrence frequency using the score acquired by the co-occurrence frequency acquiring unit. As the, it is preferable that a program for causing a computer to function.
(Embodiment 2)

本実施の形態において、実施の形態１で説明した読み付与装置１の応用例について説明する。 In the present embodiment, an application example of the reading imparting apparatus 1 described in the first embodiment will be described.

まず、読み付与装置１は、音声合成装置や音声認識装置や機械翻訳装置の辞書の構築のために利用できる。つまり、音声合成装置に必要な「読み情報」と「音声」を対に有する読み音声情報を複数有する音声合成辞書の作成の際に、読み付与装置１は機能する。また、音声認識装置に必要な「読み情報」と「用語」と「音声」を対に有する音声認識情報を複数有する音声認識辞書の作成の際に、読み付与装置１は機能する。さらに、機械翻訳装置に必要な、例えば、「人名」や「地名」などの固有名詞とローマ字からなる「英単語」の対の単語情報を構成する場合に、読み付与装置１は機能する。つまり、読み情報が対訳情報となり、読み付与装置１は、翻訳辞書の構築のために利用できる。例えば、読み付与装置１は、「浅草寺」から「せんそうじ」を取得し、図示しない公知の手段により、ひらがな「せんそうじ」からローマ字「Sensoji」を得て、「（日本語）浅草寺，（英語）Sensoji」の対訳情報を構成する。 First, the reading imparting device 1 can be used for constructing a dictionary of a speech synthesizer, a speech recognition device, or a machine translation device. That is, the reading imparting device 1 functions when creating a speech synthesis dictionary having a plurality of read speech information having a pair of “reading information” and “speech” necessary for the speech synthesizer. In addition, the reading imparting device 1 functions when creating a speech recognition dictionary having a plurality of speech recognition information having a pair of “reading information”, “term”, and “speech” necessary for the speech recognition device. Furthermore, the reading imparting apparatus 1 functions when the word information of a pair of “English words” made up of proper nouns such as “person names” and “place names” and Roman letters is configured, which is necessary for the machine translation device. That is, the reading information becomes parallel translation information, and the reading assigning apparatus 1 can be used to construct a translation dictionary. For example, the reading imparting apparatus 1 acquires “Sensoji” from “Asakusa Temple”, obtains the Roman character “Sensoji” from the “Sensoji” hiragana by a known means (not shown), and “(Japanese) Sensoji Temple (English ) Sensoji "translation information.

また、音声合成装置は、以下のように読み付与装置１を用いることができる。つまり、音声合成装置は、音声データベースを保持している。この音声データベースは、ひらがな表記に対応する音声を、ひらがな表記ごとに格納している。また、好ましくは、音声データベースは、カタカナ表記に対応する音声を、カタカナ表記ごとに格納している。そして、音声合成装置の音声合成部は、読み付与装置１が出力した読み情報（仮名表記）に対応する音声を音声データベースから読み出し、音声を合成し、アナログ音声信号を取得する。そして、音声合成装置の音声出力部は、音声合成部が音声合成し、取得したアナログ音声信号を出力する。
また、音声翻訳装置の場合、以下のように読み付与装置１を用いることができる。音声翻訳装置は、通常、音声認識部と翻訳部を具備する。そして、音声翻訳装置が有する音声認識部から読み情報を得る。次に、翻訳部は、当該読み情報を用いて、単語の語義を確定する。例えば、「工夫（こうふ）」は「worker」と翻訳され、「工夫（くふう）」は「device」と翻訳される。また、「佐原（さわら）」は「sawara」と翻訳され、「佐原（さはら）」は「sahara」と翻訳される。さらに、「方（かた）」は「person」と翻訳され、「方（ほう）」は「direction」と翻訳される。 The speech synthesizer can use the reading imparting device 1 as follows. That is, the speech synthesizer holds a speech database. This speech database stores speech corresponding to hiragana notation for each hiragana notation. Preferably, the speech database stores speech corresponding to katakana notation for each katakana notation. Then, the speech synthesizer of the speech synthesizer reads speech corresponding to the reading information (kana notation) output by the reading imparting device 1 from the speech database, synthesizes speech, and acquires an analog speech signal. The speech output unit of the speech synthesizer then synthesizes speech by the speech synthesizer and outputs the acquired analog speech signal.
In the case of a speech translation apparatus, the reading imparting apparatus 1 can be used as follows. A speech translation apparatus usually includes a speech recognition unit and a translation unit. Then, reading information is obtained from a speech recognition unit included in the speech translation apparatus. Next, the translation unit determines the meaning of the word using the reading information. For example, “Koufu” is translated as “worker”, and “Kufuu” is translated as “device”. Also, “Sawara” is translated as “sawara”, and “Sahara” is translated as “sahara”. Furthermore, “how” is translated as “person”, and “how” is translated as “direction”.

さらに、機械翻訳装置は、以下のように読み付与装置１を用いることができる。つまり、機械翻訳装置は、翻訳辞書を保持している。翻訳辞書は、日本語の単語の読み情報と目的言語（例えば、英語や中国語など）の単語の対を有する。また、翻訳辞書は、日本語の単語とその読み情報と目的言語の単語の対を有することは好適である。そして、入力受付部が翻訳対象の日本語の文を受け付ける。次に、機械翻訳装置の形態素解析部は、機械翻訳装置の入力受付部が受け付けた文を形態素解析し、形態素に分解する。そして、読み付与装置１は、形態素解析部の出力のうちの単語を入力のキーワードとして、当該キーワードの読み情報を取得する。次に、機械翻訳装置の単語検索部は、読み付与装置１が取得したキーワードの読み情報を用いて、翻訳辞書から、当該読み情報と対になる目的言語の単語を取得する。そして、機械翻訳装置の翻訳文構成部は、形態素解析部の結果、および単語検索部が取得した目的言語の単語を用いて、入力受付部が受け付けた文の翻訳文を得る。そして、機械翻訳装置の翻訳文出力部は、翻訳文構成部が取得した翻訳文を出力する。 Furthermore, the machine translation apparatus can use the reading provision apparatus 1 as follows. In other words, the machine translation device holds a translation dictionary. The translation dictionary includes a pair of reading information of a Japanese word and a word of a target language (for example, English or Chinese). The translation dictionary preferably includes a pair of a Japanese word, its reading information, and a target language word. The input reception unit receives a Japanese sentence to be translated. Next, the morphological analysis unit of the machine translation device performs morphological analysis on the sentence received by the input reception unit of the machine translation device, and breaks it down into morphemes. And the reading provision apparatus 1 acquires the reading information of the said keyword by using the word of the output of a morphological analysis part as an input keyword. Next, the word search unit of the machine translation device uses the keyword reading information acquired by the reading assigning device 1 to acquire a word in the target language paired with the reading information from the translation dictionary. And the translated sentence structure part of a machine translation apparatus obtains the translation sentence of the sentence which the input reception part received using the word of the target language which the result of the morphological analysis part acquired by the word search part. The translated sentence output unit of the machine translation device outputs the translated sentence acquired by the translated sentence constituent unit.

以上、読み付与装置１は、種々の装置で利用可能である。 As described above, the reading imparting device 1 can be used in various devices.

また、上記各実施の形態において、各処理（各機能）は、単一の装置（システム）によって集中処理されることによって実現されてもよく、あるいは、複数の装置によって分散処理されることによって実現されてもよい。 In each of the above embodiments, each process (each function) may be realized by centralized processing by a single device (system), or by distributed processing by a plurality of devices. May be.

また、図１６は、本明細書で述べたプログラムを実行して、上述した種々の実施の形態の読み付与装置等を実現するコンピュータの外観を示す。上述の実施の形態は、コンピュータハードウェア及びその上で実行されるコンピュータプログラムで実現され得る。図１６は、このコンピュータシステム３４０の概観図であり、図１７は、コンピュータシステム３４０のブロック図である。 FIG. 16 shows the external appearance of a computer that executes the program described in this specification to realize the reading imparting apparatus and the like of the various embodiments described above. The above-described embodiments can be realized by computer hardware and a computer program executed thereon. FIG. 16 is an overview diagram of the computer system 340, and FIG. 17 is a block diagram of the computer system 340.

図１６において、コンピュータシステム３４０は、ＦＤ（ＦｌｅｘｉｂｌｅＤｉｓｋ）ドライブ、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｋＲｅａｄＯｎｌｙＭｅｍｏｒｙ）ドライブを含むコンピュータ３４１と、キーボード３４２と、マウス３４３と、モニタ３４４と、マイク３４５とを含む。 In FIG. 16, a computer system 340 includes a computer 341 including an FD (Flexible Disk) drive and a CD-ROM (Compact Disk Read Only Memory) drive, a keyboard 342, a mouse 343, a monitor 344, and a microphone 345. .

図１７において、コンピュータ３４１は、ＦＤドライブ３４１１、ＣＤ−ＲＯＭドライブ３４１２に加えて、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）３４１３と、ＣＤ−ＲＯＭドライブ３４１２及びＦＤドライブ３４１１に接続されたバス３４１４と、ブートアッププログラム等のプログラムを記憶するためのＲＯＭ（Ｒｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）３４１５と、ＣＰＵ３４１３に接続され、アプリケーションプログラムの命令を一時的に記憶するとともに一時記憶空間を提供するためのＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）３４１６と、アプリケーションプログラム、システムプログラム、及びデータを記憶するためのハードディスク３４１７とを含む。ここでは、図示しないが、コンピュータ３４１は、さらに、ＬＡＮへの接続を提供するネットワークカードを含んでも良い。 In FIG. 17, in addition to the FD drive 3411 and the CD-ROM drive 3412, a computer 341 includes a CPU (Central Processing Unit) 3413, a bus 3414 connected to the CD-ROM drive 3412 and the FD drive 3411, and a boot-up program. ROM (Read-Only Memory) 3415 for storing programs such as a RAM, and a RAM (Random Access Memory) 3416 connected to the CPU 3413 for temporarily storing application program instructions and providing a temporary storage space , An application program, a system program, and a hard disk 3417 for storing data. Although not shown here, the computer 341 may further include a network card that provides connection to the LAN.

コンピュータシステム３４０に、上述した実施の形態の読み付与装置の機能を実行させるプログラムは、ＣＤ−ＲＯＭ３５０１、またはＦＤ３５０２に記憶されて、ＣＤ−ＲＯＭドライブ３４１２またはＦＤドライブ３４１１に挿入され、さらにハードディスク３４１７に転送されても良い。これに代えて、プログラムは、図示しないネットワークを介してコンピュータ３４１に送信され、ハードディスク３４１７に記憶されても良い。プログラムは実行の際にＲＡＭ３４１６にロードされる。プログラムは、ＣＤ−ＲＯＭ３５０１、ＦＤ３５０２またはネットワークから直接、ロードされても良い。 A program that causes the computer system 340 to execute the function of the reading imparting device according to the above-described embodiment is stored in the CD-ROM 3501 or the FD 3502, inserted into the CD-ROM drive 3412 or the FD drive 3411, and further stored in the hard disk 3417. May be forwarded. Alternatively, the program may be transmitted to the computer 341 via a network (not shown) and stored in the hard disk 3417. The program is loaded into the RAM 3416 at the time of execution. The program may be loaded directly from the CD-ROM 3501, the FD 3502, or the network.

プログラムは、コンピュータ３４１に、上述した実施の形態の読み付与装置の機能を実行させるオペレーティングシステム（ＯＳ）、またはサードパーティープログラム等は、必ずしも含まなくても良い。プログラムは、制御された態様で適切な機能（モジュール）を呼び出し、所望の結果が得られるようにする命令の部分のみを含んでいれば良い。コンピュータシステム３４０がどのように動作するかは周知であり、詳細な説明は省略する。 The program does not necessarily include an operating system (OS), a third-party program, or the like that causes the computer 341 to execute the functions of the reading provision apparatus according to the above-described embodiment. The program only needs to include an instruction portion that calls an appropriate function (module) in a controlled manner and obtains a desired result. How the computer system 340 operates is well known and will not be described in detail.

また、上記プログラムを実行するコンピュータは、単数であってもよく、複数であってもよい。すなわち、集中処理を行ってもよく、あるいは分散処理を行ってもよい。 Further, the computer that executes the program may be singular or plural. That is, centralized processing may be performed, or distributed processing may be performed.

本発明は、以上の実施の形態に限定されることなく、種々の変更が可能であり、それらも本発明の範囲内に包含されるものであることは言うまでもない。 The present invention is not limited to the above-described embodiments, and various modifications are possible, and it goes without saying that these are also included in the scope of the present invention.

以上のように、本発明にかかる読み付与装置は、精度高く、キーワードの読み情報を取得できる、という効果を有し、音声合成装置等として有用である。 As described above, the reading imparting apparatus according to the present invention has an effect that it can acquire keyword reading information with high accuracy, and is useful as a speech synthesizer or the like.

実施の形態１における読み付与装置を含むシステムの概念図Conceptual diagram of a system including a reading provision apparatus according to Embodiment 1 同読み付与装置のブロック図Block diagram of the reading device 同読み付与装置の動作について説明するフローチャートFlow chart for explaining the operation of the reading granting device 同候補生成処理の動作について説明するフローチャートA flowchart for explaining the operation of the candidate generation process 同分割文字列検索処理の動作について説明するフローチャートA flowchart for explaining the operation of the divided character string search process 同スコア算出処理の動作について説明するフローチャートA flowchart for explaining the operation of the score calculation process 同例外処理の動作について説明するフローチャートFlow chart explaining the operation of the exception processing 同読み辞書を示す図Figure showing the same reading dictionary 同読み辞書を示す図Figure showing the same reading dictionary 同読み辞書を示す図Figure showing the same reading dictionary 同読み候補管理表を示す図Figure showing the same reading candidate management table 同分割パターンを示す図Diagram showing the same division pattern 同読み情報管理表を示す図Figure showing the same reading information management table 同スコア管理表を示す図Figure showing the score management table 同実験結果を示す図Figure showing the results of the experiment 同コンピュータシステムの概観図Overview of the computer system 同コンピュータシステムのブロック図Block diagram of the computer system

Explanation of symbols

１読み付与装置
２辞書装置
３サーバ装置
１１受付部
１２辞書検索部
１３共起頻度取得部
１４出力部
１２１キーワード分割手段
１２２情報取得手段
１２３変換手段
１２４アラビア数字処理手段
１２５アルファベット処理手段
１３１共起頻度取得手段
１３２スコア算出手段 DESCRIPTION OF SYMBOLS 1 Reading provision apparatus 2 Dictionary apparatus 3 Server apparatus 11 Reception part 12 Dictionary search part 13 Co-occurrence frequency acquisition part 14 Output part 121 Keyword division means 122 Information acquisition means 123 Conversion means 124 Arabic numeral processing means 125 Alphabetical processing means 131 Co-occurrence frequency Acquisition means 132 Score calculation means

Claims

A reception unit that accepts keywords;
The keyword received by the receiving unit is divided by two or more patterns, two or more divided patterns that are sets of two or more divided character strings are acquired, and term reading information having reading information indicating a term and reading of the term A reading dictionary storing two or more, and using the two or more division patterns, a reading dictionary search unit for acquiring two or more reading information for each division pattern ;
Each of the divided character strings that constitute the divided pattern, with respect to only the divided pattern for which the reading dictionary search unit has acquired reading information corresponding to all of the two or more divided character strings that constitute the divided pattern ; A document group is searched using each acquisition target character string reading information, which is a set of each reading information among the two or more reading information acquired by the reading dictionary search unit, and each of the acquisition target character string reading information has a division A co-occurrence frequency acquisition unit that acquires the co-occurrence frequency, which is the number of documents in which character strings and reading information co-occur , for each reading information, and calculates the score for each division pattern using the co-occurrence frequency for each reading information When,
A reading imparting apparatus comprising: an output unit that outputs at least a combination of reading information having the highest co-occurrence frequency using the score acquired by the co-occurrence frequency acquiring unit.

The reading dictionary search unit
A keyword dividing unit that divides the keyword received by the receiving unit in units of one or more characters and acquires two or more divided character strings that are divided character strings;
One or more reading information paired with a term matching each of the two or more divided character strings is retrieved from the reading dictionary for each divided character string, and the acquisition target character string reading information having the divided character string and the reading information The reading imparting device according to claim 1, further comprising reading information acquisition means for acquiring two or more readings,
The keyword dividing means includes:
From the case where the number of cuts N (N is 0 or more) is small, the keyword received by the receiving unit is sequentially divided into units of one or more characters, and one or more divided character strings that are divided character strings are obtained, Until the reading information acquisition means receives an end instruction, the division of the keyword is repeated,
The reading information acquisition means includes
One or more reading information paired with a term that matches one or more divided character strings acquired by the keyword dividing means is acquired for each divided character string from the reading dictionary, and all of the one or more divided character strings are obtained. If the reading information corresponding to is not acquired, the keyword dividing unit is instructed to acquire two or more divided character strings which are the next division patterns, and the one or more divided character strings are obtained. If the reading information corresponding to all of the one or more divided character strings can be acquired, the reading information corresponding to all of the one or more divided character strings is at least temporarily stored in the storage medium, and the keyword division is performed. An end instruction is passed to the means, or two or more divided character strings that are other division patterns having the same number of cuts as when reading information corresponding to all of the one or more divided character strings can be obtained 2. The reading according to claim 1, wherein the keyword dividing unit is instructed to acquire, and after completion of the reading information acquisition process for the other division patterns of the same number of cuts, an end instruction is passed to the keyword dividing unit. Granting device.

The reading dictionary search unit
If the keyword includes a string of Katakana reading application device according to claim 1 or claim 2 wherein comprises a converting means for converting a character string of the Katakana to Hiragana.

The converting means includes
4. The reading assigning apparatus according to claim 3 , wherein whether or not the keyword includes a predetermined katakana is determined, and if the keyword includes the katakana, the character string of the katakana is not converted into hiragana.

The keyword dividing means includes:
The character string length of the keyword received by the reception unit is acquired, and the keyword is divided when the character string length is greater than a predetermined character string length or greater than a predetermined character string length. Without
The reading information acquisition means includes
If you do not split the keyword, the described terms and one or more reading information paired to match the keyword acceptance unit accepts, in claim 4 any one of claims 2 to retrieve from the read dictionary Reading imparting device.

The reading dictionary search unit
It is determined whether or not the target character string includes Arabic numerals, and when the target character strings include Arabic numerals, digit number reading information indicating reading with the Arabic numerals as digits is generated, and the digit reading information with digits is generated. The reading imparting apparatus according to any one of claims 1 to 5 , further comprising an Arabic numeral processing unit that adds the information as reading information.

The reading dictionary search unit
It is determined whether or not the target character string includes an alphabet. If the target character string includes an alphabet, alphabet processing means for generating Roman character reading information indicating a Roman character reading corresponding to the alphabet and adding the Roman character reading information as reading information is provided. The reading provision apparatus as described in any one of Claims 1-6 which comprises .

The co-occurrence frequency acquisition unit
A document group is searched using two or more acquisition target character string reading information acquired by the reading dictionary search unit, and a divided character string included in each acquisition target character string reading information and a document in which the reading information co-occurs A co-occurrence frequency acquisition means for acquiring the co-occurrence frequency that is the number of each for each division pattern and each acquisition target character string reading information;
And parameter co-occurrence frequency corresponding to the retrieval target character string reading information having the two or more divided strings for each division pattern, comprising a score calculating means for calculating a score,
The output unit is
The reading imparting apparatus according to any one of claims 1 to 7, wherein a combination of at least reading information having the highest co-occurrence frequency is output using the score acquired by the co-occurrence frequency acquiring unit.

Computer
A reception unit that accepts keywords;
The keyword received by the receiving unit is divided by two or more patterns, two or more divided patterns that are sets of two or more divided character strings are acquired, and term reading information having reading information indicating a term and reading of the term A reading dictionary storing two or more of the reading dictionary, using the two or more division patterns, to obtain two or more reading information for each division pattern;
Each of the divided character strings that constitute the divided pattern, with respect to only the divided pattern for which the reading dictionary search unit has acquired reading information corresponding to all of the two or more divided character strings that constitute the divided pattern; A document group is searched using each acquisition target character string reading information, which is a set of each reading information among the two or more reading information acquired by the reading dictionary search unit, and each of the acquisition target character string reading information has a division A co-occurrence frequency acquisition unit that acquires the co-occurrence frequency, which is the number of documents in which character strings and reading information co-occur, for each reading information, and calculates the score for each division pattern using the co-occurrence frequency for each reading information When,
A program for functioning as an output unit that outputs at least a combination of reading information with the highest co-occurrence frequency using the score acquired by the co-occurrence frequency acquisition unit.

The reading dictionary search unit
A keyword dividing unit that divides the keyword received by the receiving unit in units of one or more characters and acquires two or more divided character strings that are divided character strings;
One or more reading information paired with a term matching each of the two or more divided character strings is retrieved from the reading dictionary for each divided character string, and the acquisition target character string reading information having the divided character string and the reading information The program according to claim 9 for causing a computer to function as a reading information acquisition means for acquiring two or more reading information,
The keyword dividing means includes:
From the case where the number of cuts N (N is 0 or more) is small, the keyword received by the receiving unit is sequentially divided into units of one or more characters, and one or more divided character strings that are divided character strings are obtained, Until the reading information acquisition means receives an end instruction, the division of the keyword is repeated,
The reading information acquisition means includes
One or more reading information paired with a term that matches one or more divided character strings acquired by the keyword dividing means is acquired for each divided character string from the reading dictionary, and all of the one or more divided character strings are obtained. If the reading information corresponding to is not acquired, the keyword dividing unit is instructed to acquire two or more divided character strings which are the next division patterns, and the one or more divided character strings are obtained. If the reading information corresponding to all of the one or more divided character strings can be acquired, the reading information corresponding to all of the one or more divided character strings is at least temporarily stored in the storage medium, and the keyword division is performed. An end instruction is passed to the means, or two or more divided character strings that are other division patterns having the same number of cuts as when reading information corresponding to all of the one or more divided character strings can be obtained Instruct the keyword dividing means to acquire, and after completing the reading information acquisition processing for the other division patterns of the same number of cuts, the computer functions as an instruction to pass an end instruction to the keyword dividing means The program of Claim 9 for making it do.