JP6020093B2

JP6020093B2 - Alphabet reading estimation device

Info

Publication number: JP6020093B2
Application number: JP2012260940A
Authority: JP
Inventors: 貴弘大塚; 啓吾川島; 訓古田; 山浦　正; 正山浦
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2012-11-29
Filing date: 2012-11-29
Publication date: 2016-11-02
Anticipated expiration: 2032-11-29
Also published as: JP2014106857A

Description

本発明は、アルファベット文字列の読みを推定する装置に関するものである。 The present invention relates to an apparatus for estimating reading of an alphabetic character string.

従来のアルファベット読み推定装置は、アルファベット文字列とその日本語読みとを対応付けた大量のデータを用いて、入力されたアルファベット文字列を分解した複数の文字列とその日本語読みとのｎｇｒａｍ(複数要素の連鎖)頻度（例えば、「ｔｉｏ,ショ」と「ｎ,ン」の連鎖の頻度）を計算し、このｎｇｒａｍ頻度を用いた音訳モデルに基づいて、入力されたアルファベット単語の日本語読みを推定する装置であった。（例えば、特許文献１） A conventional alphabet reading estimation apparatus uses a large amount of data in which an alphabet character string and its Japanese reading are associated with each other, and an gramm of a plurality of character strings obtained by decomposing an input alphabet character string and its Japanese reading. Multi-element linkage) frequency (for example, “tio, sho” and “n, n” chain frequency) is calculated and based on the transliteration model using this ngram frequency, the input alphabetic word is read in Japanese It was the device which estimates. (For example, Patent Document 1)

特許第４０８４５１５号公報Japanese Patent No. 4084515

特許文献１に開示されているアルファベット読み推定装置は、日本語の読みを推定するアルファベット文字列が入力されると、所定の文字列単位の部分アルファベット文字列に分解し、分解された部分アルファベット文字列のｎｇｒａｍ頻度を大量データから参照し、参照したデータから音訳モデルに基づいて、日本語読みを推定する。 The alphabet reading estimation device disclosed in Patent Literature 1 is decomposed into partial alphabet character strings in units of predetermined character strings when an alphabet character string for estimating Japanese readings is input, and the decomposed partial alphabet characters The ngram frequency of the column is referenced from a large amount of data, and Japanese readings are estimated from the referenced data based on the transliteration model.

一般に、アルファベット読み推定装置は、単語辞書内に入力されたアルファベット文字列に対応するより長い文字列があれば、より正確に読みを推定できる。しかし、より長い文字列を多数内包するために大量のデータを単語辞書内に記憶させる必要があるため、メモリ等が余分に必要となり、ハードウェアコストが高くなる。一方、少量のデータでｎｇｒａｍ頻度テーブルを構成すると、データの信頼性を欠くという問題があった。 In general, the alphabet reading estimation device can estimate reading more accurately if there is a longer character string corresponding to the alphabet character string input in the word dictionary. However, since it is necessary to store a large amount of data in the word dictionary in order to include a lot of longer character strings, an extra memory or the like is required, resulting in an increase in hardware cost. On the other hand, when the gram frequency table is configured with a small amount of data, there is a problem that the reliability of the data is lacking.

本発明に係るアルファベット読み推定装置は単語辞書に大量のデータが無くても、少ないデータで信頼性の高いアルファベットの日本語等の特定言語の読みを推定可能とすることを目的とする。 An object of the present invention is to enable estimation of reading of a specific language such as Japanese of an alphabet with high reliability even if there is no large amount of data in a word dictionary.

本発明は、入力されたアルファベット文字列を複数の部分アルファベット文字列に分解して、前記部分アルファベット文字列とその読みの候補である複数の部分読みを対応づけて登録させた辞書を参照して、該部分アルファベット文字列ごとに部分読みを推定するアルファベット読み推定装置において、アルファベット文字列を構成する複数の辞書部分アルファベット文字列のうち、複数の部分読みが推定される辞書部分アルファベット文字列に基づいて作成された辞書特徴アルファベット文字列と、前記複数の部分読みとを対応させ、前記辞書特徴アルファベット文字列の読みを部分読みとして使用する頻度を得点で表した部分得点辞書と、前記入力されたアルファベット文字列を構成する複数の部分アルファベット文字列のうち、複数の部分読みが推定される部分アルファベット文字列に基づいて複数の特徴アルファベット文字列を作成する特徴アルファベット文字列作成部と、前記部分得点辞書を参照して、前記複数の部分読みが推定される辞書特徴アルファベット文字列に一致する前記特徴アルファベット文字列の得点を前記複数の部分読みごとに算出する特徴アルファベット文字列得点算出部と、前記複数の部分読みごとに算出した得点に基づいて部分読みを推定する特徴アルファベット文字列読み判定部と、を備えることを特徴とする。 The present invention refers to a dictionary in which an input alphabet character string is decomposed into a plurality of partial alphabet character strings, and the partial alphabet character string and a plurality of partial readings that are candidates for reading are associated with each other and registered. In the alphabet reading estimation device for estimating partial reading for each partial alphabet character string, based on a dictionary partial alphabet character string from which a plurality of partial readings are estimated among a plurality of dictionary partial alphabet character strings constituting the alphabet character string The dictionary feature alphabet character string created in correspondence with the plurality of partial readings, and a partial score dictionary representing the frequency with which the reading of the dictionary feature alphabet character string is used as the partial reading, and the input Among the multiple partial alphabet strings that make up the alphabet string, multiple A characteristic alphabet character string creating unit that creates a plurality of characteristic alphabet character strings based on a partial alphabet character string on which partial reading is estimated, and a dictionary feature on which the plurality of partial readings are estimated with reference to the partial score dictionary A characteristic alphabet character string score calculation unit that calculates a score of the characteristic alphabet character string that matches the alphabet character string for each of the plurality of partial readings, and estimates a partial reading based on the score calculated for each of the plurality of partial readings And an alphabetic character string reading determination unit.

本発明に係るアルファベット読み推定装置は、入力アルファベット文字列を部分アルファベット文字列に分解し、さらに部分アルファベット文字列から特徴アルファベット文字列を抽出して部分読みとして使用される頻度に応じて得点付けを行い、その得点に基づいて部分読みを推定する構成としたので、少量のデータでも信頼性の高い特定言語の読みを提供することができる。 The alphabet reading estimation device according to the present invention decomposes an input alphabet character string into partial alphabet character strings, further extracts characteristic alphabet character strings from the partial alphabet character strings, and scores them according to the frequency used as partial readings. Since the partial reading is estimated based on the score, the reading of a specific language with high reliability can be provided even with a small amount of data.

実施の形態１に係るアルファベット読み推定装置の装置構成を示す図。FIG. 3 is a diagram illustrating a device configuration of the alphabet reading estimation device according to the first embodiment. 実施の形態１に係る見出し辞書の例を示す図。FIG. 3 is a diagram showing an example of a heading dictionary according to the first embodiment. 実施の形態１に係る部分得点辞書の例を示す図。FIG. 4 is a diagram showing an example of a partial score dictionary according to the first embodiment. 実施の形態１に係るアルファベット読み推定装置の動作フローチャートを示す図。The figure which shows the operation | movement flowchart of the alphabet reading estimation apparatus which concerns on Embodiment 1. FIG. 実施の形態１に係る見出し系列の例を示す図。FIG. 5 is a diagram showing an example of a headline series according to the first embodiment. 実施の形態１に係る特徴アルファベット文字列の例を示す図。FIG. 4 is a diagram illustrating an example of a characteristic alphabet character string according to the first embodiment. 実施の形態１に係る部分アルファベット文字列の合計得点の例を示す図。The figure which shows the example of the total score of the partial alphabetic character string which concerns on Embodiment 1. FIG. 実施の形態２に係る部分得点辞書の例を示す図。The figure which shows the example of the partial score dictionary which concerns on Embodiment 2. FIG. 実施の形態２に係る部分アルファベット文字列の合計得点の例を示す図。The figure which shows the example of the total score of the partial alphabet character string which concerns on Embodiment 2. FIG.

実施の形態１
以下、図１から図７を用いて本発明の実施の形態１を説明する。図１は実施の形態１に係るアルファベット読み推定装置の装置構成を示す図である。図２は実施の形態１に係る見出し辞書の例を示す図である。図３は実施の形態１に係る部分得点辞書の例を示す図である。図４は実施の形態１に係るアルファベット読み推定装置の動作フローチャートを示す図である。図５は実施の形態１に係る見出し系列の例を示す図である。図６は実施の形態１に係る特徴アルファベット文字列の例を示す図である。図７は実施の形態１に係る部分アルファベット文字列の合計得点の例を示す図である。 Embodiment 1
Hereinafter, Embodiment 1 of the present invention will be described with reference to FIGS. FIG. 1 is a diagram showing a device configuration of the alphabet reading estimation device according to the first embodiment. FIG. 2 is a diagram showing an example of a heading dictionary according to the first embodiment. FIG. 3 is a diagram showing an example of the partial score dictionary according to the first embodiment. FIG. 4 is a diagram showing an operation flowchart of the alphabet reading estimation apparatus according to the first embodiment. FIG. 5 is a diagram illustrating an example of a heading sequence according to the first embodiment. FIG. 6 is a diagram illustrating an example of a characteristic alphabet character string according to the first embodiment. FIG. 7 is a diagram showing an example of the total score of the partial alphabetic character strings according to the first embodiment.

以下、図１を用いて実施の形態１に係るアルファベット読み推定装置１の構成を説明する。 Hereinafter, the configuration of the alphabet reading estimation apparatus 1 according to Embodiment 1 will be described with reference to FIG.

実施の形態１に係るアルファベット読み推定装置１は、入力装置１０と、情報処理部２０と、記憶部３０と、出力装置４０とで構成される。日本語読みを推定するアルファベット文字列（以下、「入力アルファベット文字列」とする）は、入力装置１０から情報処理部２０へ入力され、情報処理部２０で日本語読みが推定されて出力装置４０から出力される。例えば、情報処理部２０は、入力装置１０から「ｒｅｖｅｒｓａｌ」という入力アルファベット文字列が入力されると、「リバーサル」という日本語読みを作成して出力装置４０から出力する。 The alphabet reading estimation apparatus 1 according to Embodiment 1 includes an input device 10, an information processing unit 20, a storage unit 30, and an output device 40. Alphabetic character strings for estimating Japanese readings (hereinafter referred to as “input alphabet character strings”) are input from the input device 10 to the information processing unit 20, and Japanese readings are estimated by the information processing unit 20 to output the output device 40. Is output from. For example, when an input alphabet character string “reversal” is input from the input device 10, the information processing unit 20 creates a Japanese reading “reversal” and outputs it from the output device 40.

入力装置１０は、入力アルファベット文字列を情報処理部２０に入力する装置である。例えば、キーボードやタッチパネル等がこれにあたる。 The input device 10 is a device that inputs an input alphabet character string to the information processing unit 20. For example, a keyboard or a touch panel corresponds to this.

記憶部３０は、入力アルファベット文字列の日本語読みの推定に使用する各種データ（見出し辞書３１と部分得点辞書３２）を記憶する装置である。例えばハードディスクドライブなどがこれにあたる。 The memory | storage part 30 is an apparatus which memorize | stores the various data (Heading dictionary 31 and the partial score dictionary 32) used for the estimation of the Japanese reading of an input alphabet character string. This is the case for hard disk drives, for example.

見出し辞書３１は、アルファベット文字列を複数の文字列に分解したもの（以下、「辞書部分アルファベット文字列」という）とその日本語読み（以下、「部分読み」とする）とを対応付けたデータである。見出し辞書３１は、予め、アルファベット文字列に日本語読みを付与して対応付け、さらにアルファベット文字列と日本語読みの音の対応をとることで作成される。例えば、図２は見出し辞書３１の例である。この見出し辞書３１は、アルファベット文字列「ｃｏｎｖｅｒｓａｔｉｏｎ」、「ｄｉｓｐｏｓａｌ」、「ａｎｎｉｖｅｒｓａｒｙ」、「ｒｅｃｏｇｎｉｚｅ」から作成されたものである。これらのアルファベット文字列に日本語読みを付与し、音で区切ると、「ｃｏ／ｎ／ｖｅｒ／ｓａ／ｔｉｏ／ｎ」＝「カ／ン／バ／セー／ショ／ン」、「ｄｉ／ｓ／ｐｏ／ｓａ／ｌ」＝「ディ／ス／ポー／ザ／ル」、「ａ／ｎｎｉ／ｖｅｒ／ｓａ／ｒｙ」＝「ア／ニ／バー／サ／リー」、「ｒｅ／ｃｏ／ｇ／ｎｉ／ｚｅ」＝「リ／コ／グ／ナイ／ズ」となる。例えば、番号１の行には、辞書部分アルファベット文字列「ａ」に部分読み「ア」が対応づけられている。これは、「ａ／ｎｎｉ／ｖｅｒ／ｓａ／ｒｙ」＝「ア／ニ／バー／サ／リー」の頭文字から作成された見出し辞書３１である。なお、辞書部分アルファベット文字列「ｓａ」は、対応する部分読みが「セー」、「ザ」、「サ」のように複数あるので、部分読みが複数対応付けられている。 The heading dictionary 31 is data in which an alphabet character string is decomposed into a plurality of character strings (hereinafter referred to as “dictionary partial alphabet character string”) and its Japanese reading (hereinafter referred to as “partial reading”). It is. The heading dictionary 31 is created in advance by assigning Japanese readings to an alphabetic character string and associating them with each other, and further taking correspondence between the alphabetic character string and the Japanese reading sound. For example, FIG. 2 is an example of the heading dictionary 31. The heading dictionary 31 is created from the alphabetic character strings “conversion”, “disposal”, “anniversary”, and “recognize”. If these alphabetic character strings are given Japanese readings and separated by sounds, “co / n / ver / sa / tio / n” = “ka / n / ba / se / sho / on”, “di / s / Po / sa / l ”=“ Di / Su / Po / The / Le ”,“ a / nni / ver / sa / ry ”=“ A / Di / Bar / Sa / Lee ”,“ re / co / g / Ni / ze ”=“ re / co / g / nai / z ”. For example, a partial reading “a” is associated with the line number 1 in the dictionary partial alphabet character string “a”. This is a heading dictionary 31 created from the acronym “a / nni / ver / sa / ry” = “a / ni / bar / sa / ree”. The dictionary partial alphabet character string “sa” has a plurality of corresponding partial readings such as “se”, “za”, and “sa”, so a plurality of partial readings are associated with each other.

部分得点辞書３２は、辞書部分アルファベット文字列から抽出された辞書特徴アルファベット文字列に対応する部分読みが使用される頻度を表す部分得点３３で構成される得点情報である。ここで、辞書特徴アルファベット文字列とは、見出し辞書３１を作成する際に用いたアルファベット文字列に基づいて辞書部分アルファベット文字列から文字列を抽出したもので、具体的には、「辞書部分アルファベット文字列」と「辞書部分アルファベット文字列を分解して抽出した文字列」と「辞書部分アルファベット文字列を中心に前又は後の文字を含めて抽出した文字列」とをいう。また、使用される頻度とは、１つの部分アルファベット文字列に特定の部分読みを付す頻度のことをいう。ここで、図３は実施の形態１に係る部分得点辞書３２の例である。この部分得点辞書３２は、アルファベット文字列「ｃｏｎｖｅｒｓａｔｉｏｎ」、「ｄｉｓｐｏｓａｌ」、「ａｎｎｉｖｅｒｓａｒｙ」、「ｒｅｃｏｇｎｉｚｅ」と、これらのアルファベット文字列から得られた部分アルファベット文字列「ｓａ」とその部分読みから作成されたものである。辞書部分アルファベット文字列「ｓａ」において、「辞書部分アルファベット文字列」は「ｓａ」、「辞書部分アルファベット文字列を分解して抽出した文字列」は「ｓ」と「ａ」、「辞書部分アルファベット文字列を中心に前又は後の文字を含めて抽出した文字列」は「ｏ−ｓａ」、「ｒ−ｓａ」、「ｓａ＋ｌ」、「ｓａ＋ｔ」、「ｐｏ−ｓａ」、「ｅｒ−ｓａ」となる。図３において、「ｏ−ｓａ」という辞書特徴アルファベット文字列の「ｓａ」の部分に対して「サ」と読む（部分読み）頻度が高い場合、辞書特徴アルファベット文字列「ｏ−ｓａ」の列うち、部分読み「サ」の行の部分得点３３は高く設定される。なお、辞書特徴アルファベット文字列の「−」や「＋」は、「辞書部分アルファベット文字列を中心に前又は後の文字を含めて抽出した文字列」の「辞書部分アルファベット文字列」と「前又は後の文字」との接続関係を示す記号である。例えば「−」は「辞書部分アルファベット文字列」と「前の文字」とが接続されていることを示し、「＋」は「辞書部分アルファベット文字列」と「後の文字」とが接続されていることを示す。また、部分得点辞書３２は、アルファベット文字列にその日本語読みを付与し、音の対応を分析して作成される。この部分得点３３は、設計者の知見や多数のデータの性能テストの結果から設定される得点である。 The partial score dictionary 32 is score information composed of partial scores 33 representing the frequency with which partial reading corresponding to the dictionary characteristic alphabet character string extracted from the dictionary partial alphabet character string is used. Here, the dictionary characteristic alphabet character string is a character string extracted from the dictionary partial alphabet character string based on the alphabet character string used when the index dictionary 31 was created. “Character string”, “Character string extracted by decomposing dictionary partial alphabet character string” and “Character string extracted including characters before or after the dictionary partial alphabet character string”. Moreover, the frequency used is a frequency which attaches a specific partial reading to one partial alphabet character string. Here, FIG. 3 is an example of the partial score dictionary 32 according to the first embodiment. The partial scoring dictionary 32 is created from the alphabet strings “conversion”, “disposal”, “anniversary”, “recognize”, the partial alphabet string “sa” obtained from these alphabet strings, and its partial readings. It is a thing. In the dictionary partial alphabet character string “sa”, the “dictionary partial alphabet character string” is “sa”, and the “character string extracted by decomposing the dictionary partial alphabet character string” is “s”, “a”, “dictionary partial alphabet” “O-sa”, “r-sa”, “sa + l”, “sa + t”, “po-sa”, “er-sa” are extracted from the character string including the preceding or following characters. It becomes. In FIG. 3, when the frequency of “sa” in the dictionary characteristic alphabet character string “o-sa” is read as “sa” (partial reading), the dictionary characteristic alphabet character string “o-sa” Of these, the partial score 33 of the line of partial reading “sa” is set high. In addition, “−” and “+” of the dictionary characteristic alphabet character string are “dictionary partial alphabet character string” and “previous character string extracted from the dictionary partial alphabet character string including the preceding or following characters”. It is a symbol indicating a connection relationship with “or a subsequent character”. For example, “−” indicates that “dictionary part alphabet character string” and “previous character” are connected, and “+” indicates that “dictionary part alphabet character string” and “following character” are connected. Indicates that The partial score dictionary 32 is created by assigning Japanese readings to an alphabetic character string and analyzing the correspondence of sounds. This partial score 33 is a score set from the knowledge of the designer and the results of performance tests on a large number of data.

情報処理部２０は、入力アルファベット文字列の日本語読みを推定する装置である。情報処理部２０は、入力アルファベット文字列分解部２１、見出し系列作成部２２、推定対象決定部２３、特徴アルファベット文字列作成部２４、部分アルファベット文字列得点算出部２５、部分アルファベット文字列読み判定部２６、から構成される。 The information processing unit 20 is a device that estimates Japanese readings of an input alphabet character string. The information processing unit 20 includes an input alphabet character string decomposition unit 21, a headline series creation unit 22, an estimation target determination unit 23, a feature alphabet character string creation unit 24, a partial alphabet character string score calculation unit 25, and a partial alphabet character string reading determination unit. 26.

入力アルファベット文字列分解部２１は、入力装置１０から入力アルファベット文字列を受けて、複数の部分アルファベット文字列に分解する装置である。 The input alphabet character string decomposition unit 21 is a device that receives an input alphabet character string from the input device 10 and decomposes it into a plurality of partial alphabet character strings.

見出し系列作成部２２は、見出し辞書３１を参照して、入力アルファベット文字列を分解して得られた部分アルファベット文字列に部分読みを割り当てた見出し系列２２０を作成する装置である。 The headline series creation unit 22 is an apparatus that creates a headline series 220 by referring to the headline dictionary 31 and assigning partial readings to partial alphabetic character strings obtained by decomposing an input alphabetic character string.

推定対象決定部２３は、複数の部分アルファベット文字列のうち、部分読みを推定する部分アルファベット文字列を選択する装置である。 The estimation target determining unit 23 is a device that selects a partial alphabet character string for estimating partial reading from among a plurality of partial alphabet character strings.

特徴アルファベット文字列作成部２４は、部分アルファベット文字列から特徴アルファベット文字列を抽出する装置である。特徴アルファベット文字列とは、部分アルファベット文字列の文字列を抽出したもので、具体的には、「部分アルファベット文字列」と「部分アルファベット文字列を分解して抽出した文字列」と「部分アルファベット文字列を中心に前又は後の文字を含めて抽出した文字列」とをいう。例えば、入力アルファベット文字列「ｒｅｖｅｒｓａｌ」の部分アルファベット文字列「ｓａ」において、「部分アルファベット文字列」は「ｓａ」、「部分アルファベット文字列を分解して抽出した文字列」は、「ｓ」、「ａ」、「部分アルファベット文字列を中心に前又は後の文字を含めて抽出した文字列」は、「ｒ−ｓａ」、「ｓａ＋ｌ」、「ｅｒ−ｓａ」、「ｓａ＋ｌ＄」となる。また、「＄」マークは何も文字が無いことを意味する。なお、「部分アルファベット文字列を分解して抽出した文字列」は、部分アルファベット文字列が１文字である場合は分解することができないため作成されない。 The characteristic alphabet character string creating unit 24 is a device that extracts a characteristic alphabet character string from a partial alphabet character string. Characteristic alphabet string is a character string extracted from a partial alphabet string. Specifically, "partial alphabet string", "character string extracted by decomposing partial alphabet string" and "partial alphabet" It is a "character string extracted including the preceding or succeeding characters centering on the character string". For example, in the partial alphabet character string “sa” of the input alphabet character string “reversal”, “partial alphabet character string” is “sa”, “character string extracted by decomposing the partial alphabet character string” is “s”, “A” and “a character string extracted including the preceding or following character centered on a partial alphabetic character string” are “r-sa”, “sa + l”, “er-sa”, and “sa + l $”. The “$” mark means that there are no characters. Note that the “character string extracted by decomposing a partial alphabetic character string” is not created because the partial alphabetic character string cannot be decomposed if it is one character.

部分アルファベット文字列得点算出部２５は、部分得点辞書３２を参照して、部分読みごとに特徴アルファベット文字列の得点を割り当てて合算し、合計得点２０１を算出する装置である。 The partial alphabet character string score calculation unit 25 is a device that refers to the partial score dictionary 32 and assigns and adds the score of the characteristic alphabet character string for each partial reading to calculate the total score 201.

部分アルファベット文字列読み判定部２６は、部分アルファベット文字列得点算出部２５で合算された得点に基づいて部分アルファベット文字列の部分読みを推定する装置である。 The partial alphabet character string reading determination unit 26 is a device that estimates partial reading of the partial alphabet character string based on the score added by the partial alphabet character string score calculation unit 25.

出力装置４０は、情報処理部２０で推定された入力アルファベット文字列の日本語読みを表示する装置である。例えば、ＰＣモニタやカーナビゲーションシステムのディスプレイがこれにあたる。 The output device 40 is a device that displays Japanese readings of the input alphabet character string estimated by the information processing unit 20. For example, the display of a PC monitor or a car navigation system corresponds to this.

次に、図４を用いて実施の形態１に係るアルファベット読み推定装置１の動作を説明する。なお、図４の説明では、入力装置１０から情報処理部２０に入力アルファベット文字列「ｒｅｖｅｒｓａｌ」が入力され、日本語読み「リバーサル」が出力装置４０に出力される例を説明する。なお、この例では、日本語読みないし部分読みを片仮名文字で示したが、平仮名でもよい。 Next, operation | movement of the alphabet reading estimation apparatus 1 which concerns on Embodiment 1 is demonstrated using FIG. In the description of FIG. 4, an example in which an input alphabet character string “reversal” is input from the input device 10 to the information processing unit 20 and a Japanese reading “reversal” is output to the output device 40 will be described. In this example, Japanese readings or partial readings are shown with katakana characters, but hiragana may be used.

ステップ１において、入力装置１０は、入力アルファベット文字列「ｒｅｖｅｒｓａｌ」を情報処理部２０に入力する。 In step 1, the input device 10 inputs the input alphabet character string “reversal” to the information processing unit 20.

ステップ２において、入力アルファベット文字列分解部２１は、入力アルファベット文字列「ｒｅｖｅｒｓａｌ」を複数の部分アルファベット文字列に分解する。なお、入力アルファベット文字列分解部２１は、見出し辞書３１に登録されている辞書部分アルファベット文字列の区分で入力アルファベット文字列を分解する。すなわち、この例においては、見出し辞書３１に登録されている辞書部分アルファベット文字列は「ｒｅ」「ｖｅｒ」「ｓａ」「ｌ」なので、入力アルファベット文字列分解部２１は、入力アルファベット文字列を「ｒｅ、ｖｅｒ、ｓａ、ｌ」と分解する。なお、見出し辞書３１には登録されている辞書部分アルファベット文字列として「ｓａ」以外に「ｓ」「ａ」も存在するが、この例ではより長い文字列の「ｓａ」で分解する方法を採用する。 In step 2, the input alphabet character string decomposition unit 21 decomposes the input alphabet character string “reversal” into a plurality of partial alphabet character strings. Note that the input alphabet character string decomposing unit 21 decomposes the input alphabet character string according to the classification of dictionary partial alphabet character strings registered in the heading dictionary 31. That is, in this example, the dictionary partial alphabet character strings registered in the heading dictionary 31 are “re”, “ver”, “sa”, “l”, and therefore the input alphabet character string decomposition unit 21 converts the input alphabet character string into “ re, ver, sa, l ”. In addition, although “s” and “a” exist in addition to “sa” as a registered dictionary partial alphabet character string in the heading dictionary 31, in this example, a method of decomposing with a longer character string “sa” is adopted. To do.

ステップ３において、見出し系列作成部２２は、見出し辞書３１を参照して見出し系列２２０を作成する。見出し系列作成部２２は、分解した複数の部分アルファベット文字列と一致する辞書部分アルファベット文字列を見出し辞書３１から検索する。また、辞書部分アルファベット文字列の部分読みを対応する部分アルファベット文字列に割り当てる。ここで、図５は実施の形態１に係る見出し系列の例を示す図である。この例において、見出し系列作成部２２は、入力アルファベット文字列「ｒｅ、ｖｅｒ、ｓａ、ｌ」の部分アルファベット文字列「ｒｅ」、「ｖｅｒ」、「ｓａ」、「ｌ」にそれぞれ部分読み「リ」、「バ／バー」、「セー／ザ／サ」、「ル」を割り当てる。 In step 3, the headline series creation unit 22 creates a headline series 220 with reference to the headline dictionary 31. The heading series creation unit 22 searches the heading dictionary 31 for a dictionary partial alphabet character string that matches a plurality of decomposed partial alphabet character strings. Further, partial reading of the dictionary partial alphabet character string is assigned to the corresponding partial alphabet character string. Here, FIG. 5 is a diagram showing an example of the header series according to the first embodiment. In this example, the headline series creation unit 22 performs partial reading “re-read” on partial alphabet character strings “re”, “ver”, “sa”, and “l” of the input alphabet character string “re, ver, sa, l”. ”,“ Bar / bar ”,“ se / the / sa ”,“ le ”.

ステップ４において、推定対象決定部２３は、ステップ３で作成された見出し系列２２０の部分アルファベット文字列のうち、部分読みを推定する部分アルファベット文字列を選択する。この例では「ｓａ」を選択する。 In step 4, the estimation target determining unit 23 selects a partial alphabet character string for which partial reading is estimated from the partial alphabet character strings of the header series 220 created in step 3. In this example, “sa” is selected.

ステップ５において、ステップ４で選択された部分アルファベット文字列に部分読みが複数対応づけられている場合はステップ６に進む。一方、選択された部分アルファベット文字列の部分読みが１つである場合は、ステップ８に進む。 In step 5, if a plurality of partial readings are associated with the partial alphabet character string selected in step 4, the process proceeds to step 6. On the other hand, if the selected partial alphabetic character string has one partial reading, the process proceeds to step 8.

ステップ６において、特徴アルファベット文字列作成部２４は、推定対象決定部２３で選択された部分アルファベット文字列について特徴アルファベット文字列を作成する。ここで、図６は実施の形態１に係る特徴アルファベット文字列の例を示す図である。ステップ１で入力された入力アルファベット文字列が「ｒｅｖｅｒｓａｌ」で、ステップ４で選択された部分アルファベット文字列が「ｓａ」であるので、「部分アルファベット文字列」（図６では「中心全部」と示す）は、「ｓａ」となる。また、「部分アルファベット文字列を分解して抽出した文字列」（図６では、「中心１」、「中心２」と示す）は、それぞれ「ｓ」、「ａ」となる。さらに、「部分アルファベット文字列を中心に前又は後の文字を含めて抽出した文字列」（図６では、１つ前の文字を含めた文字列を「前」、２つ前の文字まで含めた文字列を「前前」、後ろの文字を含めた文字列は「後」、２つ後ろの文字まで含めて抽出した文字列を「後後」と示す）は、それぞれ「ｒ−ｓａ」、「ｅｒ−ｓａ」、「ｓａ＋ｌ」、「ｓａ＋ｌ＄」となる。 In step 6, the characteristic alphabet character string creating unit 24 creates a characteristic alphabet character string for the partial alphabet character string selected by the estimation target determining unit 23. Here, FIG. 6 is a diagram illustrating an example of the characteristic alphabet character string according to the first embodiment. Since the input alphabet character string input in step 1 is “reversal” and the partial alphabet character string selected in step 4 is “sa”, “partial alphabet character string” (in FIG. 6, “all center” is indicated). ) Becomes “sa”. “Character strings extracted by decomposing partial alphabetic character strings” (indicated as “center 1” and “center 2” in FIG. 6) are “s” and “a”, respectively. Further, “a character string extracted including the preceding or following character centering on the partial alphabetic character string” (in FIG. 6, the character string including the previous character is included up to the “before” and the previous two characters. The character string including the preceding character is indicated as “rear”, and the character string extracted including the character after the second character is indicated as “after”. , “Er−sa”, “sa + l”, and “sa + l $”.

ステップ７において、部分アルファベット文字列得点算出部２５は、部分得点辞書３２を参照して部分アルファベット文字列の合計得点３４を算出する。図７（ａ）は実施の形態１に係る部分アルファベット文字列「ｓａ」の合計得点３４の例である。部分アルファベット文字列得点算出部２５は、部分得点辞書３２を参照してステップ４で選択された部分アルファベット文字列の特徴アルファベット文字列と一致する辞書特徴アルファベット文字列の得点を対応する部分読みごとに設定する。この例の場合、見出し系列２２０において、選択された部分アルファベット文字列は「ｓａ」で、部分読みが「セー/ザ/サ」なので、部分アルファベット文字列得点算出部２５は、「ｓａ」の特徴アルファベット文字列の「ｓ、ａ、ｓａ、ｒ−ｓａ、ｅｒ−ｓａ、ｓａ＋ｌ、ｓａ＋ｌ＄」の得点を部分読み「セー/ザ/サ」のそれぞれに割り当てる。さらに、部分読みごとに全ての特徴アルファベット文字列の部分得点３３を合算して合計得点３４を得る。なお、特徴アルファベット文字列「ｓａ＋ｌ＄」は部分得点辞書５０１に記述されていないので、部分アルファベット文字列の合計得点３４に影響しない。 In step 7, the partial alphabet character string score calculation unit 25 refers to the partial score dictionary 32 and calculates the total score 34 of the partial alphabet character string. FIG. 7A shows an example of the total score 34 of the partial alphabet character string “sa” according to the first embodiment. The partial alphabet character string score calculation unit 25 refers to the partial score dictionary 32 and determines the score of the dictionary characteristic alphabet character string that matches the characteristic alphabet character string of the partial alphabet character string selected in step 4 for each corresponding partial reading. Set. In this example, since the selected partial alphabet character string is “sa” and the partial reading is “se / the / sa” in the heading series 220, the partial alphabet character string score calculation unit 25 has the feature of “sa”. The score of “s, a, sa, r-sa, er-sa, sa + l, sa + l $” of the alphabet string is assigned to each of the partial readings “se / the / sa”. Furthermore, the total score 34 is obtained by adding the partial scores 33 of all characteristic alphabet character strings for each partial reading. Since the characteristic alphabet character string “sa + l $” is not described in the partial score dictionary 501, the total score 34 of the partial alphabet character string is not affected.

ステップ８において、部分アルファベット文字列読み判定部２６は、部分アルファベット文字列の合計得点３４を参照して、この部分読みの合計得点３４のうち１番高いものを正しい部分読みと推定する。この例の場合、部分アルファベット文字列「ｓａ」の特徴アルファベット文字列「ｓ、ａ、ｒ−ｓａ、ｅｒ−ｓａ、ｓａ＋ｌ」の部分得点３３の合計得点３４は、「サ」の行の１．３点が１番高い。したがって、部分アルファベット文字列読み判定部２６は、「ｓａ」の部分読みを「サ」であると推定する。なお、ステップ５において、選択された部分アルファベット文字列の部分読みが１つであると判断された場合には、ステップ８はその部分読みを正しい読みであると判断する。 In step 8, the partial alphabet character string reading determination unit 26 refers to the total score 34 of the partial alphabet character strings, and estimates the highest one of the total partial readings 34 as the correct partial reading. In this example, the total score 34 of the partial scores 33 of the characteristic alphabet string “s, a, r-sa, er-sa, sa + 1” of the partial alphabet string “sa” is 1. Three points are the highest. Therefore, the partial alphabetic character string reading determining unit 26 estimates that the partial reading of “sa” is “sa”. If it is determined in step 5 that there is only one partial reading of the selected partial alphabet character string, step 8 determines that the partial reading is a correct reading.

ステップ９において、見出し系列２２０における全ての部分アルファベット文字列に対して日本語読みを推定した場合はステップ１０に進む。一方、まだ日本語読みが推定されていない部分アルファベット文字列が存在する場合は、ステップ４に戻り、同じ動作を繰り返す。すなわち、部分アルファベット文字列「ｓａ」以外の部分アルファベット文字列である「ｒｅ」、「ｖｅｒ」、「ｌ」についても日本語読みを推定する。例えば、図７（ｂ）は、実施の形態１に係る部分アルファベット文字列「ｖｅｒ」の合計得点３４の例である。部分アルファベット文字列「ｓａ」の部分読みの推定と同様に合計得点３４が高い部分読み「バー」を正しい部分読みであると推定する。 If it is determined in step 9 that Japanese readings have been estimated for all partial alphabetic character strings in the headline series 220, the process proceeds to step 10. On the other hand, if there is a partial alphabet character string for which Japanese reading is not yet estimated, the process returns to step 4 to repeat the same operation. That is, Japanese readings are also estimated for “re”, “ver”, and “l” that are partial alphabet character strings other than the partial alphabet character string “sa”. For example, FIG. 7B is an example of the total score 34 of the partial alphabetic character string “ver” according to the first embodiment. Similar to the estimation of partial reading of the partial alphabet character string “sa”, a partial reading “bar” having a high total score 34 is estimated as a correct partial reading.

ステップ１０において、ステップ８で全ての部分読みが推定された場合には、出力装置４０は、入力アルファベット文字列の読み「ｒｅ、ｖｅｒ、ｓａｌ」の日本語読み「リ、バー、サ、ル」を出力装置４０へ出力する。 In step 10, when all partial readings are estimated in step 8, the output device 40 reads the input alphabetic character string reading “re, ver, sal” and the Japanese reading “re, bar, sa, le”. Is output to the output device 40.

以上のように、実施の形態１に係るアルファベット読み推定装置１の動作は終了する。 As described above, the operation of the alphabet reading estimation device 1 according to Embodiment 1 ends.

なお、実施の形態１に係るアルファベット読み推定装置１の説明において、入力アルファベット文字列は「ｒｅｖｅｒｓａｌ」としたが、これは１例であり、未知の入力アルファベット文字列であればこれに限られるものではない。 In the description of the alphabet reading estimation apparatus 1 according to the first embodiment, the input alphabet character string is “reversal”, but this is only an example, and the input alphabet character string is not limited to this as long as it is an unknown input alphabet character string. is not.

また、実施の形態１に係る推定対象決定部２３では、見出し系列２２０の中から推定する部分アルファベット文字列「ｓａ」を最初に選択した例を示しているが、部分アルファベット文字列を選択する順番はこれに限られず、前から順に「ｒｅ」、「ｖｅｒ」、「ｓａ」、「ｌ」と選ぶように構成しても良い。 In addition, in the estimation target determination unit 23 according to the first embodiment, an example in which the partial alphabet character string “sa” to be estimated from the heading series 220 is first selected is shown, but the order in which the partial alphabet character strings are selected. Is not limited to this, and “re”, “ver”, “sa”, and “l” may be selected in order from the front.

さらに、入力アルファベット文字列は、英語である例を示したが、フランス語やドイツ語等のアルファベットを用いた言語であればよく、英語に限られるものではない。また、出力される部分読みを日本語読みとしたが、これに限られるものではなく、中国語などの別の言語でも良い。 Furthermore, although the input alphabet character string showed the example which is English, it should just be a language using alphabets, such as French and German, and is not restricted to English. Moreover, although the partial reading to be output is Japanese reading, it is not limited to this, and another language such as Chinese may be used.

なお、特徴アルファベット文字列作成部２４で作成される特徴アルファベット文字列の部分アルファベット文字列を分解して抽出した特長」は、部分アルファベット文字列の前後２文字まで抽出したが、これに限られるものではなく、前後１文字以上であればよい。 Note that “feature obtained by decomposing and extracting a partial alphabet character string of the characteristic alphabet character string created by the characteristic alphabet character string creating unit 24” is extracted up to two characters before and after the partial alphabet character string, but is limited to this. Instead, it may be one or more characters before and after.

以上のように実施の形態１に係るアルファベット読み推定装置１は特徴アルファベット文字列作成部２４により、読みを推定する部分アルファベット文字列について部分アルファベット文字列を分割した文字列や、部分アルファベット文字列、さらに、部分アルファベット文字列とその部分アルファベット文字列の前後の文字列を含めたものを特徴アルファベット文字列として作成し、この複数の特徴アルファベット文字列に基づいて読みを推定する構成としたので、見出し辞書３１に登録されたデータが少ない場合でも、読みをより正確に推定することができる。 As described above, the alphabet reading estimation device 1 according to Embodiment 1 uses the characteristic alphabet character string creating unit 24 to divide a partial alphabet character string into partial alphabet character strings for which reading is estimated, a partial alphabet character string, In addition, a partial alphabet character string and a character string including the character string before and after the partial alphabet character string are created as a characteristic alphabet character string, and the reading is estimated based on the plurality of characteristic alphabet character strings. Even when the data registered in the dictionary 31 is small, the reading can be estimated more accurately.

実施の形態２
以下、図８、図９、を用いて実施の形態２を説明する。図８は実施の形態２に係る部分得点辞書の例を示す図である。図９は実施の形態２に係る部分アルファベット文字列の合計得点の例を示す図である。なお、実施の形態２に係るアルファベット読み推定装置１の構成は、図１を用いて説明した実施の形態１に係るアルファベット読み推定装置１の構成と同様であるため、説明を省略する。なお、実施の形態１に係る推定対象決定部２３は、部分読みを推定する順序は任意であるとした。一方、実施の形態２に係る推定対象決定部２３は、部分読みを入力アルファベット文字列の前から順に選択するものとする。 Embodiment 2
The second embodiment will be described below with reference to FIGS. FIG. 8 is a diagram showing an example of the partial score dictionary according to the second embodiment. FIG. 9 is a diagram showing an example of the total score of the partial alphabetic character strings according to the second embodiment. In addition, since the structure of the alphabet reading estimation apparatus 1 which concerns on Embodiment 2 is the same as that of the alphabet reading estimation apparatus 1 which concerns on Embodiment 1 demonstrated using FIG. 1, description is abbreviate | omitted. Note that the estimation target determining unit 23 according to Embodiment 1 has an arbitrary order of estimating partial readings. On the other hand, the estimation target determining unit 23 according to Embodiment 2 selects partial readings sequentially from the front of the input alphabet character string.

実施の形態１に係る部分得点辞書３２は、辞書特徴アルファベット文字列と、部分読みと、部分得点３３とで構成される得点情報であった。一方、実施の形態２に係る部分得点辞書３２は、部分読みの代わりに先行部分読みを採用したものである。先行部分読みとは、読みを推定する辞書部分アルファベット文字列の部分読みに加えて、１つ前の部分読み（推定した部分読み）を含めたものである。図８は実施の形態２に係る部分得点辞書の例を示す図である。辞書特徴アルファベット文字列「ｒ−ｓａ」に対して先行部分読み「バー・サ」と読む場合の部分得点３３は、「０．６」である。一方、辞書特徴アルファベット文字列「ｒ−ｓａ」に対して先行部分読み「バー・セー」と読む場合の部分得点３３は、「０．４」である。これは、辞書部分アルファベット文字列「ｓａ」の前の辞書部分アルファベット文字列に対応する読みが「バー」である場合において、辞書特徴アルファベット文字列「ｒ−ｓａ」の「ｓａ」の部分読みは、「セー」と読む場合よりも「サ」と読む場合の方が多いことを表している。なお、実施の形態１に係る部分得点辞書３２と同様、実施の形態２に係る部分得点辞書３２は設計者の知見や多数のデータの性能テストの結果から設定される。 The partial score dictionary 32 according to the first embodiment is score information including a dictionary characteristic alphabet character string, a partial reading, and a partial score 33. On the other hand, the partial score dictionary 32 according to the second embodiment employs a preceding partial reading instead of a partial reading. The preceding partial reading includes a partial partial reading (estimated partial reading) in addition to the partial reading of the dictionary partial alphabet character string for which the reading is estimated. FIG. 8 is a diagram showing an example of the partial score dictionary according to the second embodiment. The partial score 33 in the case of reading the preceding partial reading “bar / sa” for the dictionary characteristic alphabet character string “r-sa” is “0.6”. On the other hand, the partial score 33 in the case of reading the preceding partial reading “Bar Sa” for the dictionary characteristic alphabet character string “r-sa” is “0.4”. This is because, when the reading corresponding to the dictionary partial alphabet string before the dictionary partial alphabet string “sa” is “bar”, the partial reading of “sa” in the dictionary characteristic alphabet string “r-sa” is This means that there are more cases of reading “sa” than reading “se”. As with the partial score dictionary 32 according to the first embodiment, the partial score dictionary 32 according to the second embodiment is set based on the knowledge of the designer and the results of performance tests on a large number of data.

また、実施の形態１に係る部分アルファベット文字列得点算出部２５は、見出し系列２２０のうち推定対象決定部２３で選択された部分アルファベット文字列の特徴アルファベット文字列と、これに対応する部分読みを抽出し、部分得点辞書３２にならい得点を設定した。一方、実施の形態２に係る部分アルファベット文字列得点算出部２５は、部分読みに代えて先行部分読みを抽出する。例えば、図９は実施の形態２に係る部分アルファベット文字列の合計得点の例である。なお、この例では、推定対象決定部２３で選択された部分アルファベット文字列は「ｓａ」であるとし、部分アルファベット文字列「ｒｅ」、「ｖｅｒ」について、既に部分読みが推定されているものとする。部分アルファベット文字列得点算出部２５は、既に部分アルファベット文字列読み判定部２６によって推定した１つ前の部分アルファベット文字列の部分読み「バー」と、推定対象決定部２３で選択した部分アルファベット文字列「ｓａ」の読みである「セー／ザ／サ」を抽出して先行部分読みとする。 Further, the partial alphabet character string score calculation unit 25 according to the first embodiment reads the characteristic alphabet character string of the partial alphabet character string selected by the estimation target determination unit 23 in the heading series 220 and the partial reading corresponding thereto. The extracted score was set according to the partial score dictionary 32. On the other hand, the partial alphabet character string score calculation unit 25 according to Embodiment 2 extracts the preceding partial reading instead of the partial reading. For example, FIG. 9 is an example of the total score of partial alphabetic character strings according to the second embodiment. In this example, it is assumed that the partial alphabet character string selected by the estimation target determining unit 23 is “sa”, and partial reading has already been estimated for the partial alphabet character strings “re” and “ver”. To do. The partial alphabet character string score calculation unit 25 includes the partial reading “bar” of the previous partial alphabet character string already estimated by the partial alphabet character string reading determination unit 26 and the partial alphabet character string selected by the estimation target determination unit 23. The “sa / the / sa” reading of “sa” is extracted and used as the preceding partial reading.

なお、この例では、入力アルファベット文字列「ｒｅｖｅｒｓａｌ」を分解した先頭の部分アルファベット文字列「ｒｅ」の部分読みの推定において、部分アルファベット文字列「ｒｅ」は先行する部分読みがないため、実施の形態１と同様の方法で部分読みが推定されるものとする。 In this example, since the partial alphabet character string “re” has no preceding partial reading in the estimation of partial reading of the first partial alphabet character string “re” obtained by decomposing the input alphabet character string “reversal”, It is assumed that partial reading is estimated by the same method as in the first mode.

以上のように、実施の形態２に係るアルファベット読み推定装置１は、見出し辞書３１と部分アルファベット文字列得点算出部２５によって先行部分読みを使用するように構成したので、１つの部分アルファベット文字列の部分読みに対して、先行の部分アルファベット文字列の部分読みを考慮した部分アルファベット文字列の合計得点３４を算出することが可能となり、より正確な日本語読みを推定することが可能となる。 As described above, the alphabet reading estimation device 1 according to the second embodiment is configured to use the preceding partial reading by the heading dictionary 31 and the partial alphabet character string score calculation unit 25, so that one partial alphabet character string With respect to the partial reading, it is possible to calculate the total score 34 of the partial alphabet character string in consideration of the partial reading of the preceding partial alphabet character string, and it is possible to estimate a more accurate Japanese reading.

実施の形態３
以下、実施の形態３について説明する。なお、実施の形態３に係るアルファベット読み推定装置１において、実施の形態１ないし実施の形態２に係るアルファベット読み推定装置１と同じ構成、動作であるものに関しては説明を省略する。 Embodiment 3
Hereinafter, the third embodiment will be described. In addition, in the alphabet reading estimation apparatus 1 which concerns on Embodiment 3, description is abbreviate | omitted about what is the same structure and operation | movement as the alphabet reading estimation apparatus 1 which concerns on Embodiment 1 thru | or Embodiment 2. FIG.

実施の形態１ないし実施の形態２に係るアルファベット読み推定装置１は、見出し辞書３１の部分読みを日本語（片仮名文字）からなる文字列とした。一方、実施の形態３に係るアルファベット読み推定装置１は、見出し辞書３１の部分読みを音声合成用記号と片仮名文字から構成したことを特徴とするものである。 In the alphabet reading estimation apparatus 1 according to the first or second embodiment, the partial reading of the heading dictionary 31 is a character string made up of Japanese (Katakana characters). On the other hand, the alphabet reading estimation device 1 according to Embodiment 3 is characterized in that the partial reading of the heading dictionary 31 is composed of a speech synthesis symbol and a katakana character.

音声合成用記号とは、片仮名文字と、無声化音節（声帯の振動を伴わない音）を表す文字と、アクセントの有無（音の高低）を表す文字から構成される記号である。例えば、アルファベット文字列「ｒｅｓｔａｕｒａｎｔ」の音声合成用記号は、「レＨス％ＬトＬラＬンＬ」と示す。ここで、Ｈは声が高いことを示し、Ｌは声が低いことを示し、％は無声化音節であることを示す。例えば、「ス％Ｌ」は、片仮名文字スで表される音節が無声化音節であり、声が低いことを表している。アルファベット文字列と音声合成用記号との対応は、「ｒｅ／ｓ／ｔａｕ／ｒａ／ｎｔ」＝「レＨ/ス％Ｌ/トＬ/ラＬ/ンＬ」と表現する。 The symbol for speech synthesis is a symbol composed of Katakana characters, characters that indicate unvoiced syllables (sounds that do not involve vocal cord vibration), and characters that indicate the presence or absence of accents (sound pitch). For example, the symbol for speech synthesis of the alphabet character string “restaurant” is indicated as “less H% L L L L L L”. Here, H indicates that the voice is high, L indicates that the voice is low, and% indicates an unvoiced syllable. For example, “su% L” indicates that the syllable represented by the katakana character S is an unvoiced syllable and the voice is low. The correspondence between the alphabetic character string and the speech synthesis symbol is expressed as “re / s / tau / ra / nt” = “les H / s% L / t L / la L / n L”.

以上のように、実施の形態３に係るアルファベット読み推定装置１は、見出し辞書３１の部分読みを音声合成用記号と片仮名文字とで構成したので、入力アルファベット文字列の日本語読みとアクセントを同時に推定することができる。 As described above, the alphabet reading estimation apparatus 1 according to the third embodiment is configured such that the partial reading of the heading dictionary 31 is composed of the speech synthesizing symbols and the katakana characters. Can be estimated.

実施の形態４
以下、実施の形態４について説明する。なお、実施の形態４に係るアルファベット読み推定装置１の説明において、実施の形態１ないし実施の形態３に係るアルファベット読み推定装置１と同じ構成、動作であるものに関しては説明を省略する。 Embodiment 4
Hereinafter, the fourth embodiment will be described. In addition, in description of the alphabet reading estimation apparatus 1 which concerns on Embodiment 4, description is abbreviate | omitted about what is the same structure and operation | movement as the alphabet reading estimation apparatus 1 which concerns on Embodiment 1 thru | or Embodiment 3. FIG.

実施の形態１ないし実施の形態３に係るアルファベット読み推定装置１は、部分得点辞書３２の部分得点３３を設計者の知見や多数データの性能テストの結果から設定した。一方、実施の形態４に係るアルファベット読み推定装置１において、部分得点辞書３２の部分得点３３は、条件付き確率場モデル（ＣＲＦ；ｃｏｎｄｉｔｉｏｎａｌｒａｎｄｏｍｆｉｅｌｄ）のモデルパラメータによって部分得点３３が設定されることを特徴とする。 In the alphabet reading estimation apparatus 1 according to the first to third embodiments, the partial score 33 of the partial score dictionary 32 is set based on the knowledge of the designer and the result of the performance test of a large number of data. On the other hand, in the alphabet reading estimation device 1 according to the fourth embodiment, the partial score 33 of the partial score dictionary 32 is set as a partial score 33 by a model parameter of a conditional random field model (CRF). Features.

条件付き確率場モデルは、下記に示す式（１）から式（３）で示される。式（１）から式（３）において、ｗはベクトルで表されるモデルパラメータ、x^（ｉ）は第ｉ番目の入力アルファベット文字列、y^（ｉ）は第ｉ番目の部分読み、Ｐ（y^（ｉ）｜x^（ｉ））は、x^（ｉ）が与えられたときに、y^（ｉ）が起きる確率（条件付き確率）、Ｃは実験的に決める定数である。｜・｜はベクトルの大きさを表す。 The conditional random field model is expressed by the following equations (1) to (3). In Expressions (1) to (3), w is a model parameter represented by a vector, x ⁽ⁱ⁾ is the i-th input alphabet character string, y ⁽ⁱ⁾ is the i-th partial reading, and P (y ^(I) | x ⁽ⁱ⁾ ) is the probability (conditional probability ⁾ that y ⁽ⁱ⁾ will occur when x ⁽ⁱ⁾ is given, and C is a constant determined experimentally. | · | Represents the magnitude of the vector.

なお、φ（x^（ｉ），y^（ｉ））は、ベクトル値を返す関数である。φ（x^（ｉ），y^（ｉ））について、（４）式にベクトルの第ｋ要素の例を式（４）に示す。 Note that φ (x ⁽ⁱ⁾ , y ⁽ⁱ⁾ ) is a function that returns a vector value. For φ (x ⁽ⁱ⁾ , y ⁽ⁱ⁾ ), an example of the k-th element of the vector is shown in Equation (4) as in Equation (4).

最急勾配法などを用いて、Ｌ（w）が最大となるモデルパラメータｗを求める。得られたモデルパラメータｗを部分得点３３とする。 Using the steepest gradient method or the like, a model parameter w that maximizes L (w) is obtained. The obtained model parameter w is set as a partial score 33.

このように、条件付き確率場モデルのモデルパラメータを部分得点３３とするようにしたので、複数の特徴アルファベット文字列の作成において、部分得点３３を適切かつ自動的に設定できる効果があり、方式の構築の時間を縮小できる。 Thus, since the model parameter of the conditional random field model is set to the partial score 33, there is an effect that the partial score 33 can be appropriately and automatically set in the creation of a plurality of characteristic alphabet character strings. The construction time can be reduced.

１アルファベット読み推定装置、１０入力装置、２０情報処理装部、３０記憶部、４０出力装置、２１入力アルファベット文字列分解部、２２見出し系列作成部、２３推定対象決定部、２４特徴アルファベット文字列作成部、２５部分アルファベット文字列得点算出部、２６部分アルファベット文字列読み判定部、３１見出し辞書、３２部分得点辞書 DESCRIPTION OF SYMBOLS 1 Alphabet reading estimation device, 10 input device, 20 Information processing unit, 30 Storage unit, 40 Output device, 21 Input alphabet character string decomposition unit, 22 Headline series creation unit, 23 Estimation target determination unit, 24 Feature alphabet character string creation Part, 25 partial alphabet character string score calculation part, 26 partial alphabet character string reading determination part, 31 heading dictionary, 32 partial score dictionary

Claims

The input alphabet character string is decomposed into a plurality of partial alphabet character strings, and the partial alphabet is referred to by referring to a dictionary in which the partial alphabet character string and a plurality of partial readings that are candidates for reading are registered in association with each other. In the alphabet reading estimation device that estimates partial reading for each character string,
Corresponds to the dictionary partial alphabet character string created based on the dictionary partial alphabet character string that is estimated to be a plurality of partial readings among the plurality of dictionary partial alphabet character strings constituting the alphabet character string, and the plurality of partial readings A partial score dictionary representing the frequency with which the reading of the dictionary feature alphabetic character string is used as a partial reading,
A characteristic alphabet character string creating unit that creates a plurality of characteristic alphabet character strings based on a partial alphabet character string in which a plurality of partial readings are estimated among a plurality of partial alphabet character strings constituting the input alphabet character string; ,
A characteristic alphabet character string score calculation unit that calculates a score of the characteristic alphabet character string that matches a dictionary characteristic alphabet character string from which the plurality of partial readings is estimated with reference to the partial score dictionary. When,
A characteristic alphabet character string reading determination unit that estimates partial reading based on the score calculated for each of the plurality of partial readings;
An alphabet reading estimation device comprising:

2. The alphabet reading estimation device according to claim 1, wherein the characteristic alphabet character string creating unit generates a plurality of characteristic alphabet character strings by dividing the partial alphabet character string from which the plurality of partial readings are estimated. .

The characteristic alphabet character string creating unit creates a plurality of characteristic alphabet character strings based on the partial alphabet character string from which the plurality of partial readings are estimated and a partial alphabet character string adjacent to the partial alphabet character string. The alphabet reading estimation apparatus according to claim 1 or 2, wherein the alphabet reading estimation apparatus is characterized.

The plurality of characteristic alphabet character strings are created by adding a character string one or two characters before the partial alphabet character string to the partial alphabet character string from which the plurality of partial readings are estimated. The alphabet reading estimation apparatus according to claim 3.

The plurality of characteristic alphabet character strings are created by adding a character string after one or two characters of the partial alphabet character string to the partial alphabet character string from which the plurality of partial readings are estimated. The alphabet reading estimation apparatus according to claim 3.

The alphabet reading estimation apparatus according to claim 3, wherein the plurality of characteristic alphabet character strings are partial alphabet character strings from which the plurality of partial readings are estimated.

The plurality of characteristic alphabet character strings are a character string obtained by adding a character string one or two characters before the partial alphabet character string to the partial alphabet character string from which the plurality of partial readings are estimated;
A character string obtained by adding a character string after one or two characters of the partial alphabet character string to the partial alphabet character string from which the plurality of partial readings are estimated;
4. The alphabet reading estimation apparatus according to claim 3, comprising a partial alphabet character string from which the plurality of partial readings are estimated.

The partial scoring dictionary includes a partial reading corresponding to a dictionary partial alphabet character string immediately preceding a dictionary partial alphabet character string from which the plurality of partial readings in the alphabet character string are estimated. It consists of partial reading of the alphabet string,
The plurality of partial readings of the characteristic alphabet character string creating unit include partial readings of a partial alphabet character string immediately preceding a partial alphabet character string from which the plurality of partial readings are estimated. The alphabet reading estimation device according to claim 7.

9. The alphabet reading estimation apparatus according to claim 1, wherein the partial score dictionary includes the plurality of partial readings as readings and symbols for speech synthesis.

The alphabet reading estimation apparatus according to claim 1, wherein the partial score dictionary determines a score using a conditional random field model.