JP2003242145A

JP2003242145A - Language processing program, language processor and method there, and recording medium

Info

Publication number: JP2003242145A
Application number: JP2002024741A
Authority: JP
Inventors: Ryoji Sato; 良治佐藤; Kumi Suzuki; 久美鈴木; Jianfeng Gao; ジャンフェンガオ
Original assignee: Microsoft Corp
Current assignee: Microsoft Corp
Priority date: 2002-01-31
Filing date: 2002-01-31
Publication date: 2003-08-29

Abstract

<P>PROBLEM TO BE SOLVED: To provide a language program, a language processor and a method, and a recording medium for improving analysis precision in language processing. <P>SOLUTION: A probability table with combinations of a plurality of word strings and generation probabilities thereof recorded thereon is preliminarily recorded in the language processor. The word strings are arrangements of independent words semantically having an influence on the connection with other words. The language processing program has the step of selecting a word as a conversion result of the combination of word strings for acquiring the generation probability, and the step of selecting an independent word semantically having an influence on the connection with other words. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、入力された読みを
漢字に変換したり、音声認識結果を漢字の形態で出力す
る場合などに好適な言語処理プログラム、言語装置、そ
の方法および記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a language processing program, a language device, a method thereof, and a recording medium suitable for converting an input reading into kanji and outputting a voice recognition result in the form of kanji. .

【０００２】[0002]

【従来の技術】従来、文字あるいは記号の形態で入力さ
れた単語の読み（発音）を、その読みに対応する漢字に
変換する言語処理技術が知られている。このような言語
処理技術は、たとえば、パソコンに対して、漢字を入力
する場合や音声認識結果を漢字の形態で出力する場合に
使用される。このようなかな漢字変換技術はパソコンに
限らず種々の装置で使用されている。2. Description of the Related Art Conventionally, there has been known a language processing technique for converting a reading (pronunciation) of a word input in the form of characters or symbols into a kanji corresponding to the reading. Such a language processing technique is used, for example, when inputting a Chinese character or outputting a voice recognition result in the form of a Chinese character to a personal computer. Such kana-kanji conversion technology is used not only in personal computers but also in various devices.

【０００３】読みを漢字に変換する場合には漢字辞書が
使用される。漢字辞書の一例を図５に示す。漢字辞書
は、読みを表す文字列、読みに対応する単語、すなわ
ち。変換候補、品詞、その他の属性情報がデータベース
や表の形態で記載されている。図５の例では、「あかさ
か」という読みに対して「赤坂」、「赤阪」．．．．と
いう変換候補が複数あることを示している。日本語では
１つの読みに対して複数の変換候補が存在する単語（同
音異義語）が存在する。このため、ユーザから読みが言
語処理装置に入力されると、言語処理装置は漢字辞書か
ら読みに対応する複数の複数の同音異義語が存在する場
合には、ユーザが複数の同音異義語の中から所望の変換
候補を選択している。A kanji dictionary is used to convert readings into kanji. An example of a Kanji dictionary is shown in FIG. A kanji dictionary is a character string that represents reading, a word that corresponds to reading, ie. Conversion candidates, parts of speech, and other attribute information are described in the form of a database or table. In the example of FIG. 5, “Akasaka”, “Akasaka” and “Akasaka” are read. ．．． Indicates that there are multiple conversion candidates. In Japanese, there is a word (homophone) in which a plurality of conversion candidates exist for one reading. Therefore, when the user inputs a phonetic reading to the language processing device, the language processing device determines that if the user has a plurality of homophones corresponding to the phonetic reading from the Kanji dictionary, the user selects one of the homophones. A desired conversion candidate is selected from.

【０００４】ユーザは、複数の変換候補の中から１つの
変換候補を見つけることは大変な労力を要するために、
自動的に変換候補を決定したり、ユーザが所望する単語
である確率に応じて変換候補に対して順位付けを行う必
要が生じてきた。Since it takes a great deal of effort for a user to find one conversion candidate from a plurality of conversion candidates,
It has become necessary to automatically determine conversion candidates and to rank conversion candidates according to the probability that the word is desired by the user.

【０００５】このような必要性に対処するために、言語
モデルを使用した言語処理技術が提案されている。この
技術を簡単に説明する。To cope with such a need, a language processing technique using a language model has been proposed. This technique will be briefly described.

【０００６】言語モデルとは、入力された文字列にマッ
チするような、確率が最大である単語列を出力するもの
である。言語モデルの代表的なものに、バイグラムとト
ライグラムがある。バイグラムは、先行する１単語
（ｘ）から次の単語（ｙ）を予測するもので、Ｐｒｏｂａｂｉｌｉｔｙ（ｙ｜ｘ）＝Ｃｏｕｎｔ（ｘ
ｙ）／Ｃｏｕｎｔ（ｘ）で計算する。ここで、Ｐｒｏｂａｂｉｌｉｔｙ（ｙ｜
ｘ）は単語ｘのあとに単語ｙが来る確率である。Ｃｏｕ
ｎｔ（ｘｙ）は単語の並びｘｙの出現数である。Ｃｏｕ
ｎｔ（ｘ）は単語ｘの出現数である。The language model is for outputting a word string having a maximum probability that matches an input character string. Bigram and trigram are typical language models. The bigram predicts the next word (y) from the preceding one word (x), and Probability (y | x) = Count (x
y) / Count (x). Here, Probability (y |
x) is the probability that word y comes after word x. Cou
nt (xy) is the number of appearances of the word sequence xy. Cou
nt (x) is the number of appearances of the word x.

【０００７】トライグラムは、先行する２単語（ｘおよ
びｙ）から次の単語ｚを予測するもので、Ｐｒｏｂａｂｉｌｉｔｙ（ｚ｜ｘｙ）＝Ｃｏｕｎｔ（ｘ
ｙｚ）／Ｃｏｕｎｔ（ｘｙ）で得られる。ここで、Ｐｒｏｂａｂｉｌｉｔｙ（ｚ｜ｘ
ｙ）は単語の並びｘｙのあとに単語ｚが来る確率であ
る。Ｃｏｕｎｔ（ｘｙｚ）は単語の並びｘｙｚの出現数
である。Ｃｏｕｎｔ（ｘｙ）は単語の並びｘｙの出現数
である。The trigram predicts the next word z from the preceding two words (x and y): Probability (z | xy) = Count (x
yz) / Count (xy). Here, Probability (z | x
y) is the probability that the word z comes after the word sequence xy. Count (xyz) is the number of appearances of the word sequence xyz. Count (xy) is the number of appearances of the word sequence xy.

【０００８】このような単語の組み合わせに対して、こ
の組み合わせが発生する確率が与えられる。確率を決定
するためには統計処理が使用される。たとえば、無作為
に３つの単語が組み合わされ、組み合わされた３つの単
語を有する文が記事の中に現れる頻度が計数される。頻
度が高いほど高い確率が与えられる。このようにして予
め複数の３つの単語の組み合わせと、それらの確率の値
（以下、単に確率あるいは発生確率と略記することがあ
る）を表に記載しておく。図６はトライグラムの組み合
わせ単語列−確率表の一例（以下、単に確率表と略記す
る）を示す。For such a combination of words, the probability of occurrence of this combination is given. Statistical processing is used to determine the probabilities. For example, three words are randomly combined, and the frequency of occurrence of a sentence having three combined words in an article is counted. The higher the frequency, the higher the probability given. In this way, the combinations of a plurality of three words and the values of their probabilities (hereinafter sometimes simply abbreviated as probabilities or occurrence probabilities) are listed in the table. FIG. 6 shows an example of a combination word string of trigram-probability table (hereinafter, simply referred to as a probability table).

【０００９】このような確率表を使用して、漢字変換候
補を決定するための処理を次に説明する。A process for determining a Kanji conversion candidate using such a probability table will be described below.

【００１０】言語処理装置に対して読み「じめんからお
ゆがいきなりわいた」が入力されたとする。「地面から
お湯がいきなり」まですでに変換ずみであるとする。
「わ」に対して、言語処理装置は、上記漢字変換辞書
（図５参照）を参照して、変換候補「沸」、
「湧」．．．．を取得する。次に、言語処理装置は、
「が」「いきなり」「沸」の３つの単語の組み合わせを
作成し、この組み合わせに対応する確率を確率表から取
得する。取得された確率はメモリに一時記憶記憶され
る。次に、言語処理装置は「が」と「いきなり」と
「湧」との３つの単語の組み合わせを作成し、その組み
合わせに対応する確率を確率表から取得する。取得され
た確率はメモリに一時記憶される。このようにして、言
語処理装置は複数の変換候補（図５参照）を使用して、
３つの単語の組み合わせを作成し、それらの組み合わせ
に対応する確率を取得する。取得した確率の中で、最大
値を有する組み合わせの中の３番目の単語、「沸」が読
み「わ」に対応する変換候補として自動決定される。ユ
ーザが複数の変換候補の中から選択を行う場合には、高
い確率を有する変換候補の順に言語処理装置の表示画面
に複数の変換候補が表示される。It is assumed that the reading "Ojyu suddenly spreads" is input to the language processing device. It is assumed that it has already been converted until "the hot water suddenly flows from the ground".
For “wa”, the language processing device refers to the Kanji conversion dictionary (see FIG. 5) and refers to the conversion candidate “bo”,
"Spring". ．．． To get. Next, the language processor
A combination of three words "ga", "suddenly" and "boiling" is created, and the probability corresponding to this combination is acquired from the probability table. The acquired probability is temporarily stored in the memory. Next, the language processing device creates a combination of three words “ga”, “suddenly”, and “spring”, and acquires the probability corresponding to the combination from the probability table. The acquired probability is temporarily stored in the memory. In this way, the language processing device uses a plurality of conversion candidates (see FIG. 5),
Create combinations of three words and get the probabilities corresponding to those combinations. Of the acquired probabilities, the third word, "boiling" in the combination having the maximum value is automatically determined as the conversion candidate corresponding to the reading "wa". When the user selects from the plurality of conversion candidates, the plurality of conversion candidates are displayed on the display screen of the language processing device in the order of conversion candidates having a high probability.

【００１１】以上述べた言語処理を使用することによ
り、先行する単語に基づき、統計的に的に発生確率の高
い単語に対して高い順位付けが行われるので、変換候補
を自動決定したり、変換候補を確率順に表示することが
可能になってきた。By using the language processing described above, words having a statistically high probability of occurrence are highly ranked based on preceding words, so that conversion candidates can be automatically determined or converted. It has become possible to display candidates in order of probability.

【００１２】[0012]

【発明が解決しようとする課題】従来技術は、連続する
単語の統計に基づいて出現する単語を予測する。しか
し、上述の例が示すように、「地面」「涌」という離れ
た概念語の間の意味的な関係を捉えることができないた
めに、「沸」という誤変換をしてしまう。単語は、「地
面」「お湯」「いきなり」「沸く」など概念を持つ自立
語と、「から」「が」など概念を持つ単語に付随してそ
の概念をより詳細化したり、ほかの概念といかに関係す
るかを示す付属語とに分類できる。本発明は、従来技術
に示したような、連続する単語間の統計に加えて、さら
に、離れた自立語の並びに注目して、たとえば、「地
面」「お湯」「湧」という並びの統計を利用すること
で、離れた単語間の意味的な関係を捉え、正しい変換を
可能とし、仮名漢字変換プログラムの変換精度を向上さ
せる。The prior art predicts words that appear based on statistics of consecutive words. However, as the above-mentioned example shows, since it is impossible to capture the semantic relationship between the separated conceptual words “ground” and “Waku”, an erroneous conversion of “boiling” is made. Words are independent words that have concepts such as “ground”, “hot water”, “suddenly”, and “boiling”, and words that have concepts such as “kara” and “ga” that make the concept more detailed, and other concepts. It can be categorized as an annex that shows how it is related. The present invention, in addition to the statistics between consecutive words as shown in the prior art, further pays attention to the order of distant independent words, for example, to collect statistics such as “ground”, “hot water”, and “spring”. By using it, we can capture the semantic relationship between distant words, enable correct conversion, and improve the conversion accuracy of the Kana-Kanji conversion program.

【００１３】本発明の目的は、言語処理に係わる解析精
度を向上させることができる言語処理プログラム、言語
処理装置、その方法および記録媒体を提供することにあ
る。An object of the present invention is to provide a language processing program, a language processing apparatus, a method therefor, and a recording medium capable of improving the analysis accuracy related to language processing.

【００１４】[0014]

【課題を解決するための手段】このような目的を達成す
るために、本発明は、少なくとも１以上の文字列の特徴
を該特徴を有する文字列に変換するための言語処理装置
で実行される言語処理プログラムであって、文字列の特
徴を表す特徴情報に対してこの特徴情報に類似した文字
列の変換候補が複数組、存在し、複数組の変換候補の各
々と過去の変換結果とを組み合わせ、複数の組み合わせ
に関する発生確率に基づいて複数の変換候補の優先順位
を決定する言語処理プログラムにおいて、複数の単語列
の組み合わせおよび発生確率を記載した確率表が予め言
語処理装置内に記憶されており、前記複数の単語列は、
単語の並び、および意味的に他の単語とのつながりに影
響する自立語であり、前記言語処理プログラムは、前記
発生確率を取得するために組み合わせる単語として、単
語、および意味的に他の単語とのつながりに影響する自
立語を選択するステップを有することを特徴とする言語
処理プログラムによって実現される。本発明は、「地面
からお湯がこんこんと涌く」「地面からお湯がいきなり
涌く」など、離れた単語間の意味的な関係にとって本質
的でない「こんこんと」「いきなり」などの単語を無視
することで、正確に概念間の関係を捉える。In order to achieve such an object, the present invention is implemented by a language processing device for converting a characteristic of at least one character string into a character string having the characteristic. In a language processing program, there are a plurality of sets of conversion candidates for a character string similar to the characteristic information representing the characteristics of the character string, and each of the plurality of sets of conversion candidates and the past conversion result are stored. In the language processing program that determines the priority of a plurality of conversion candidates based on the combination and the occurrence probability of the plurality of combinations, a probability table in which the combinations and the occurrence probabilities of a plurality of word strings are described is stored in advance in the language processing device. And the plurality of word strings are
It is an independent word that affects the sequence of words and the connection with other words in terms of meaning, and the language processing program uses words as words to be combined in order to obtain the probability of occurrence, and words in terms of meaning and other words. It is realized by a language processing program characterized by including a step of selecting an independent word that influences the connection of. The present invention ignores words such as "Konkonto" and "Ikinari" that are not essential for the semantic relationship between distant words, such as "Hot water on the ground" and "Hot water on the ground suddenly" By doing so, the relationship between concepts can be accurately captured.

【００１５】また、本発明は、少なくとも１以上の文字
列の特徴を該特徴を有する文字列に変換するための言語
処理装置で実行される言語処理プログラムであって、文
字列の特徴を表す特徴情報に対してこの特徴情報に類似
した文字列の変換候補が複数組、存在し、複数組の変換
候補の各々と過去の変換結果とを組み合わせ、複数の組
み合わせに関する発生確率に基づいて複数の変換候補の
優先順位を決定する言語処理プログラムにおいて、複数
の単語列の組み合わせおよび発生確率を記載した確率表
が予め言語処理装置内に記憶されており、前記言語処理
プログラムは、過去の変換結果の文字列と１つの変換候
補の文字列とを組み合わせた文字列を作成するステップ
と、当該作成した文字列により前記確率表を検索するス
テップと、検索結果が得られない場合には、前記作成し
た文字列の中の文字列の並び順を入れ替えた文字列を新
たに作成するステップと、当該新たに作成した文字列に
より前記確率表を検索して、前記変換候補に対応する発
生確率を取得するステップとを備えたを特徴とする言語
処理プログラムにより実現することが可能となる。本発
明は、「お湯が地面から沸く」でも「地面からお湯が沸
く」のいずれも的確に捉えるために、「お湯、地面、沸」
の確率が既知である場合、語順を入れ替えた概念語の並
び「地面、お湯、沸」の確率を利用する。The present invention is also a language processing program executed by a language processing apparatus for converting at least one or more character string features into a character string having the features, the feature representing the character string features. There are a plurality of sets of conversion candidates of character strings similar to the characteristic information for information, each of the plurality of sets of conversion candidates is combined with the past conversion result, and a plurality of conversions are performed based on the occurrence probability of the plurality of combinations. In the language processing program for determining the priority order of the candidates, a probability table in which a combination of a plurality of word strings and an occurrence probability are described is stored in advance in the language processing device, and the language processing program is configured such that the characters of past conversion results are used. A step of creating a character string in which a string and a character string of one conversion candidate are combined, a step of searching the probability table with the created character string, and a search result When not obtained, a step of newly creating a character string in which the arrangement order of the character strings in the created character string is exchanged, and the probability table is searched by the newly created character string, And a step of acquiring the probability of occurrence corresponding to the conversion candidate. In order to accurately capture both "hot water boil from the ground" and "hot water boil from the ground", the present invention uses "hot water, ground, boiling"
If the probability of is known, the probability of a sequence of conceptual words with the word order switched "ground, hot water, boiling" is used.

【００１６】本発明は、さらに、少なくとも１以上の文
字列の特徴を該特徴を有する文字列に変換するための言
語処理プログラムであって、文字列の特徴を表す特徴情
報に対してこの特徴情報に類似した文字列の変換候補が
複数組、存在し、複数組の変換候補の各々と過去の変換
結果とを組み合わせ、複数の組み合わせに関する発生確
率に基づいて複数の変換候補の優先順位を言語処理装置
により決定する言語処理プログラムにおいて、複数の単
語列の組み合わせおよび発生確率を記載した確率表およ
び同一の分類に属する複数の文字列を分類ごとに記載し
た分類表が予め言語処理装置内に記憶されており、前記
言語処理プログラムは、過去の変換結果の文字列と１つ
の変換候補の文字列とを組み合わせた組み合わせ文字列
を作成するステップと、当該作成した組み合わせ文字列
により前記確率表を検索するステップと、検索結果が得
られない場合には、前記分類表の中から前記変換候補の
文字列と同じ分類に含まれる他の文字列を取得するステ
ップと、当該取得した文字列を前記変換候補の代わりに
使用した組み合わせ文字列を作成するステップと、当該
新たに作成した文字列により前記確率表を検索して、前
記変換候補に対応する発生確率を取得するステップとを
備えたことを特徴とする言語処理プログラムによっても
実現される。また、本発明は、水、泥水、お湯が同じ液
体であるという単語の分類を利用して、「地面から水が
沸く」「地面から泥水が沸く」の場合でも、「地面、お
湯、沸」の確率が既知である場合、同じ分類の単語の並
び「地面、お湯、沸」の確率を利用することができる。The present invention is also a language processing program for converting the characteristics of at least one or more character strings into a character string having the characteristics, and the characteristic information for the characteristic information representing the characteristics of the character strings. There are multiple sets of conversion candidates for character strings similar to the above, each of the plurality of sets of conversion candidates is combined with the past conversion result, and the priority of the plurality of conversion candidates is linguistically processed based on the occurrence probability of the plurality of combinations. In the language processing program determined by the device, a probability table describing combinations and occurrence probabilities of a plurality of word strings and a classification table describing a plurality of character strings belonging to the same classification for each class are stored in advance in the language processing device. Therefore, the language processing program creates a combination character string that combines the character string of the past conversion result and the character string of one conversion candidate. And a step of searching the probability table with the created combination character string, and if no search result is obtained, another character string included in the same classification as the conversion candidate character string from the classification table And a step of creating a combination character string using the acquired character string instead of the conversion candidate, searching the probability table with the newly created character string, and corresponding to the conversion candidate. It is also realized by a language processing program characterized by including the step of acquiring the occurrence probability. Further, the present invention uses the classification of the words that water, muddy water, and hot water are the same liquid, and even in the case of “boiling water from the ground” and “muddy water boiling from the ground”, “ground, hot water, boiling” When the probability of is known, the probability of the word sequence “ground, hot water, boiling” of the same classification can be used.

【００１７】本発明によれば、前記特徴情報は読みであ
ってもよい。According to the present invention, the characteristic information may be reading.

【００１８】本発明によれば、前記複数の組み合わせは
３つの文字列の組み合わせであてもよい。According to the present invention, the plurality of combinations may be a combination of three character strings.

【００１９】本発明によれば、３つの文字列の組み合わ
せの内の第１番目の文字列と第２番目の文字列が入れ替
られてもよい。According to the present invention, the first character string and the second character string in the combination of the three character strings may be interchanged.

【００２０】本発明によれば、前記特徴情報は音声認識
において得られる音素列であってもよい。According to the present invention, the characteristic information may be a phoneme string obtained in speech recognition.

【００２１】本発明によれば、前記特徴情報は手書き文
字認識において得られてもよい。According to the present invention, the characteristic information may be obtained in handwritten character recognition.

【００２２】上述の言語処理プログラムが種々の情報処
理装置に搭載された場合に、その情報処理装置は、本発
明の言語処理装置として機能し、その言語処理装置が本
発明の言語処理方法で規定する処理ステップを実行す
る。また、上述の言語処理プログラムを記録した種々の
記録媒体が、本発明の記録媒体となる。When the above language processing program is installed in various information processing apparatuses, the information processing apparatus functions as the language processing apparatus of the present invention, and the language processing apparatus is defined by the language processing method of the present invention. The processing steps to be performed. Further, various recording media in which the above language processing program is recorded serve as the recording medium of the present invention.

【００２３】[0023]

【発明の実施の形態】以下、図面を参照して本発明の実
施形態を詳細に説明する。DETAILED DESCRIPTION OF THE INVENTION Embodiments of the present invention will be described in detail below with reference to the drawings.

【００２４】図１は本発明を適用した言語処理装置のシ
ステム構成を示す。言語処理装置のハードウェアとし
て、パーソナルコンピュータ等、プログラム実行可能な
情報処理装置を使用できるので、ハードウェアの構成の
説明は簡単に留める。FIG. 1 shows the system configuration of a language processing apparatus to which the present invention is applied. As the hardware of the language processing device, a program-executable information processing device such as a personal computer can be used, and thus the description of the hardware configuration will be briefly described.

【００２５】図１において、ＣＰＵ１０１、ＲＯＭ１０
２、ＲＡＭ１０３、ＣＤＲＯＭドライブ１０４、入力デ
バイス１０５、表示デバイス１０６、ハードディスク
（ＨＤと略記する）１０７およびその他のデバイス１０
８がバスに接続されている。上記デバイスをバスに接続
すためのインターフェースは周知であるので、図には記
載していない。ＣＰＵ１はＲＡＭ１０２にロードされた
プログラムを実行する。ＲＯＭ１０２には電源起動時に
実行するブートプログラムや、システム関連の制御に必
要なデータが記憶されている。ＲＡＭ１０３はＣＰＵ１
０１への入力データやＣＰＵ１０１からの出力データの
一時記憶用に使用される。ＣＤＲＯＭドライブは、ＣＤ
ＲＯＭを受け付け、ＣＤＲＯＭに記録されたプログラム
や各種のデータをＣＰＵ１０１の制御の下に読み出す。In FIG. 1, CPU 101 and ROM 10
2, RAM 103, CDROM drive 104, input device 105, display device 106, hard disk (abbreviated as HD) 107 and other device 10
8 is connected to the bus. Interfaces for connecting the above devices to the bus are well known and are not shown in the figures. The CPU 1 executes the program loaded in the RAM 102. The ROM 102 stores a boot program executed at power-on and data necessary for system-related control. RAM103 is CPU1
01 is used for temporary storage of input data and output data from the CPU 101. CD ROM drive is a CD
The ROM is accepted, and the programs and various data recorded in the CDROM are read out under the control of the CPU 101.

【００２６】入力デバイス１０５は、ＣＰＵ１０１に対
して情報を入力する。この形態では、キーボードおよび
マウスのようなポインティングデバイスが入力デバイス
１０５として使用される。The input device 105 inputs information to the CPU 101. In this form, a pointing device such as a keyboard and a mouse is used as the input device 105.

【００２７】ＨＤ１０７は、プログラムおよび各種デー
タの保存用に使用される。その他デバイス１０８には、
フロッピー（登録商標）ディスクデバイス、モデム等が
用意される。The HD 107 is used for storing programs and various data. Other devices 108 include
A floppy (registered trademark) disk device, a modem, etc. are prepared.

【００２８】本発明に係わるプログラムの構成を図２を
参照して説明する。後述するプログラム等はＣＤＲＯＭ
やフロッピー（登録商標）ディスクからＨＤ１０７に予
めインストールされる。The structure of the program according to the present invention will be described with reference to FIG. CDROM for programs etc.
It is preinstalled in the HD 107 from a floppy (registered trademark) disk.

【００２９】オペレーティングシステム２０１（以下、
ＯＳと略記する）は、図１の複数のデバイスを制御する
プログラムであり、たとえば、マイクロソフト社のウィ
ンドウズ（登録商標）を使用することができる。キーボ
ード（入力でバス１０５）から入力される文字情報や動
作指示はキーコード信号であるので、ＯＳ２０１により
キーコード信号が文字コード信号（以下、単に文字コー
ド）や動作指示を表す識別信号に変換される。Operating system 201 (hereinafter,
1 is a program for controlling the plurality of devices in FIG. 1, and for example, Windows (registered trademark) of Microsoft Corporation can be used. Since character information and operation instructions input from the keyboard (input bus 105) are key code signals, the OS 201 converts the key code signals into character code signals (hereinafter, simply character codes) and identification signals representing operation instructions. It

【００３０】言語処理プログラムの一例としてかな漢字
変換プログラム２０２を説明する。かな漢字変換プログ
ラムはフロントエンドプロセッサ（ＦＥＰ）とも呼ばれ
る。かな漢字変換プログラムは、ＯＳ２０１から引き渡
される文字コードや動作指示に基づいて読みを漢字に変
換する。なお、ここで言う変換後の漢字の概念には、ひ
らがな文字、片仮名文字およびかな混じり漢字、数字等
を含むことに注意されたい。A kana-kanji conversion program 202 will be described as an example of a language processing program. The Kana-Kanji conversion program is also called a front end processor (FEP). The kana-kanji conversion program converts the reading into kanji based on the character code and operation instruction delivered from the OS 201. It should be noted that the concept of the converted Kanji referred to here includes Hiragana characters, Katakana characters, Kana mixed Kanji characters, and numbers.

【００３１】読みを表す文字としてはひらがな文字、片
仮名文字、ローマ字のいずれかを使用することが従来か
ら知られているが、この形態では読みの入力としてはひ
らがなを使用する例を示す。Hiragana characters, Katakana characters, or Roman letters have been conventionally used as the characters for reading, but in this embodiment, an example of using hiragana as an input for reading is shown.

【００３２】かな漢字変換プログラム（仮名漢字変換プ
ログラムと表記することがある。）２０２はかな漢字変
換辞書２０３を使用して読みを漢字に変換する。読みに
対して複数の変換候補が存在する場合には、かな漢字変
換プログラム２０２は確率表２０４を参照して、本発明
に係わる言語処理方法を使用して複数の変換候補の優先
順位の順位付けを行う。A kana-kanji conversion program (sometimes referred to as a kana-kanji conversion program) 202 uses the kana-kanji conversion dictionary 203 to convert readings into kanji. When there are a plurality of conversion candidates for reading, the kana-kanji conversion program 202 refers to the probability table 204 and uses the language processing method according to the present invention to prioritize the plurality of conversion candidates. To do.

【００３３】また、確率表２０４に変換候補が記載され
ていない場合には、単語分類表２０５から代替えの単語
が取得される。If no conversion candidate is described in the probability table 204, a substitute word is acquired from the word classification table 205.

【００３４】後述の言語処理方法により優先順が付され
た変換候補が、ＯＳ２０１を介して表示デバイス１０６
に送られて表示される。ユーザが、入力デバイス１０５
により所望の変換候補を指定すると、指定された変換候
補がＯＳ２０１を介してかな漢字変換プログラム２０２
に引き渡される。かな漢字変換プログラム２０２は、指
定された変換候補を読みに対応する変換結果として、ア
プリケーション２０６に引き渡す。アプリケーション２
０６は、変換された漢字を入力情報として使用するプロ
グラムであり、たとえば、ワードプロセッサと呼ばれる
文書処理プログラムや、表計算ソフトや、その他各種の
プログラムがよく知られている。The conversion candidates, which are prioritized by the language processing method described later, are displayed on the display device 106 via the OS 201.
Sent to and displayed. The user inputs the input device 105
When a desired conversion candidate is designated by, the designated conversion candidate is converted into the kana-kanji conversion program 202 via the OS 201.
Be delivered to. The kana-kanji conversion program 202 delivers the designated conversion candidate to the application 206 as a conversion result corresponding to reading. Application 2
Reference numeral 06 is a program that uses the converted Kanji as input information. For example, a word processing program called a word processor, spreadsheet software, and various other programs are well known.

【００３５】かな漢字変換辞書２０３および確率表２０
４は、従来技術でも説明したので詳細な説明を省略す
る。単語分類表２０５について説明する。Kana-Kanji conversion dictionary 203 and probability table 20
Since No. 4 has been described in the related art, detailed description will be omitted. The word classification table 205 will be described.

【００３６】本実施形態では、同一の分類に属する漢字
の単語の集合が１つにまとめられており、その集合に対
して分類番号を与えられている。分類番号と１つの集合
の中に含まれる単語を分類番号ごとに図７に示すように
表の中に記載しておく。この表が分類表である。分類方
法は、プログラム開発者が所望の方法を使用することが
できる。この形態では、たとえば、ＵＳ出願番号０９／
５６５６０８号（５／４／２０００出願）に記載された
分類方法やその他従来から知られた情報分類方法により
分類された確率表が予め作成されて、ＨＤ１０７にイン
ストールされる。In the present embodiment, a set of Kanji words belonging to the same classification is grouped into one, and a classification number is given to the set. The classification number and the words included in one set are listed in the table for each classification number as shown in FIG. This table is the classification table. As a classification method, a program developer can use a desired method. In this form, for example, US application no. 09 /
Probability tables classified by the classification method described in No. 565608 (5/4/2000 application) and other conventionally known information classification methods are created in advance and installed in the HD 107.

【００３７】図２のかな漢字変換プログラム２０２の処
理内容を図３および図４に示す。図２はメイン処理手順
を示し、図４は本発明に係わる処理手順を示す。The processing contents of the Kana-Kanji conversion program 202 of FIG. 2 are shown in FIGS. 3 and 4. 2 shows a main processing procedure, and FIG. 4 shows a processing procedure according to the present invention.

【００３８】図３において、ステップＳ１０は入力デバ
イス１０５（キーボード）から入力された読みを受け取
り、ＲＡＭ１０３に格納するステップである。In FIG. 3, step S10 is a step of receiving a reading input from the input device 105 (keyboard) and storing it in the RAM 103.

【００３９】ステップＳ２０は入力された読みに対応す
る漢字変換候補を優先順位にしたがって表示し、ユーザ
から漢字変換候補の選択を受け付ける変換候補確定ステ
ップである。Step S20 is a conversion candidate decision step in which the Kanji conversion candidates corresponding to the input reading are displayed in order of priority and the selection of the Kanji conversion candidates is accepted from the user.

【００４０】ステップＳ３０は選択された（確定され
た）漢字変換候補を他のアプリケーションに出力するス
テップである。Step S30 is a step of outputting the selected (confirmed) Kanji conversion candidate to another application.

【００４１】上記ステップＳ２０の詳細手順が図４に示
されている。具体的な事例を使用して図４の処理手順を
説明する。The detailed procedure of step S20 is shown in FIG. The processing procedure of FIG. 4 will be described using a specific example.

【００４２】（実施形態１）「じめんからおゆがわい
た」を例にする。(Embodiment 1) An example is "Oyuwaiwatari from Jimen".

【００４３】図２において、確率表２０４は、実際に
は、すべての単語の並びの確率を持つ確率表Ａと、概念
を持つ自立語の並びの確率を持つ確率表Bとから、構成
される。In FIG. 2, the probability table 204 is actually composed of a probability table A having a probability of arrangement of all words and a probability table B having a probability of arrangement of independent words having a concept. .

【００４４】「地面からお湯が」まで変換がなされ、
「わいた」が仮名漢字変換プログラム２０２に入力され
たとする。"From the ground to hot water" is converted,
It is assumed that “Waita” is input to the Kana-Kanji conversion program 202.

【００４５】仮名漢字変換プログラム２０２は、まず仮
名漢字変換辞書を検索し、「わいた」の変換候補として
「沸いた」か「湧いた」などであることを記録する。The kana-kanji conversion program 202 first searches the kana-kanji conversion dictionary and records that it is "boiled" or "spring" as a conversion candidate for "waita."

【００４６】次に、従来技術によると、仮名漢字変換プ
ログラム２０２は、図４における変換候補を取得するス
テップS１００において、先行する２つの単語「お湯」
「が」と現在入力された「わ」の変換候補「沸」との３
つの単語の並びの確率と、先行する２つの単語「お湯」
「が」と現在入力された「わ」の変換候補「湧」との３
つの単語の並びの確率と、などを、確率表Aを検索して
得て、確率が高い変換候補を選ぶ。この場合、「沸い
た」が変換結果となる。Next, according to the prior art, the kana-kanji conversion program 202 uses the preceding two words "hot water" in step S100 of acquiring conversion candidates in FIG.
3 with "ga" and the conversion candidate of "wa" that is currently input
Probability of one word sequence and the preceding two words "hot water"
3 with "ga" and the conversion candidate "spring" of "wa" currently input
The probability of a sequence of one word, and so on are obtained by searching probability table A, and a conversion candidate with a high probability is selected. In this case, "boiled" is the conversion result.

【００４７】本発明は、仮名漢字変換プログラム２０２
は、図４における変換候補を取得するステップＳ１００
において、確率表Aによって変換候補の上位のものだけ
を絞り、さらに確率表Bで変換結果を選ぶ。仮名漢字変
換辞書２０３は、各単語の品詞を持っている。仮名漢字
変換プログラム２０２は、単語の品詞から、ある単語が
自立語か付属語かを判定することができる。仮名漢字変
換プログラム２０２は、入力された文字列に含まれる単
語の品詞および現在変換しようとしている文字列の変換
候補に含まれる単語の品詞を得て、自立語だけを選ぶ。
本例では、「地面」「お湯」「沸」、「地面」「お湯」
「湧」などである。仮名漢字変換プログラム２０２は、
確立表Bを検索し、「地面」「お湯」「湧」のほうが、
確率が高いということを知り、変換結果として、「湧い
た」を返すことができる。The present invention provides a kana-kanji conversion program 202.
Is a step S100 for acquiring conversion candidates in FIG.
In, the probability table A is used to narrow down only the top conversion candidates, and the probability table B is used to select the conversion result. The kana-kanji conversion dictionary 203 has a part of speech of each word. The kana-kanji conversion program 202 can determine whether a certain word is an independent word or an adjunct word from the word part of speech. The kana-kanji conversion program 202 obtains the part of speech of the word included in the input character string and the part of speech of the word included in the conversion candidate of the character string currently being converted, and selects only an independent word.
In this example, "ground""hotwater""boiling","ground""hotwater"
For example, "spring". The Kana-Kanji conversion program 202
Search the establishment table B, "ground""hotwater""spring" is
Knowing that the probability is high, you can return "Spring" as the conversion result.

【００４８】以上、確率表Aを参照するステップによっ
て、上位候補を絞り込み、そのあとで確率表Bを参照す
るステップを実行すると説明したが、確率表Aを参照す
るステップと確率表Bを参照するステップとを同時に実
行してもよい。Although it has been described that the step of referring to the probability table A narrows down the top candidates and then the step of referring to the probability table B is executed, the steps of referring to the probability table A and the probability table B are referred to. You may perform step and simultaneously.

【００４９】（実施形態１の第１の改良形態）たとえ
ば、「地面」「お湯」「沸」という概念を持つ単語の並
びの確率表Bを作成する際、「地面から」の「から」
は、「地面」と「沸く」の関係を示す。また、「お湯が」
の「が」は「お湯」と「沸く」の関係を示す。たとえ
ば、「地面からさえ」という表現では「から」が関係を示
し「さえ」は関係を補足する。(First Improvement of First Embodiment) For example, when creating a probability table B of word sequences having the concepts of "ground", "hot water", and "boiling", "from ground"
Indicates the relationship between "ground" and "boiling". Also, "hot water"
"Ga" indicates the relationship between "hot water" and "boiling". For example, in the expression "even from the ground,""from" indicates a relationship and "even" complements the relationship.

【００５０】「赤かった帽子」や「赤い帽子」では「赤
い」の活用形が連体形であることが、帽子と「赤い」の関
係を示す上で、本質的である。In the "red hat" and the "red hat", it is essential to show the relation between the hat and the "red" that the "red" is a connected form.

【００５１】このように、概念を表す自立語が、他の自
立語と、意味的な関係を持つとき、自立語には関係を示
す付随情報がある場合がある。「から」「が」などの助
詞や、文節の末尾の活用語の活用形などである。As described above, when an independent word representing a concept has a semantic relationship with another independent word, the independent word may have accompanying information indicating the relationship. Examples include postpositional particles such as “kara” and “ga”, and inflectional forms of inflectional words at the end of phrases.

【００５２】上述に説明した、概念を示す自立語の並び
の確率表Bを構成する際には、各自立語に、このような
関係を示す付随情報をつけて確率を持たせる。たとえ
ば、確率表Bに、「地面（＋から）」「お湯（＋が）」
「沸」という３つの並びの確率を持つ。確率を検索する
ステップでは、このような付随情報も一致した場合にの
み確率を利用する。When constructing the probability table B of the array of independent words indicating the concept described above, each independent word is provided with accompanying information indicating such a relationship so as to have a probability. For example, in probability table B, "ground (+)""hot water (+)"
There are three probabilities of "boiling". In the step of retrieving the probability, the probability is used only when such accompanying information also matches.

【００５３】（実施形態１の第２の改良形態）たとえば
入力が「地面からお湯が急にわいた」だとする。３つの
自立語の並びの確率を利用する限り、「お湯」「急に」
「沸」の確率が、「お湯」「急に」「湧」の確率より大
きいであろう。ここで、「急に」という自立語は、「地
面からお湯が湧いた」という概念関係にとって、付属的
な意味しか加えない。(Second Modification of First Embodiment) For example, it is assumed that the input is "hot water suddenly flows from the ground". As long as you use the probability of the sequence of three independent words, "hot water""suddenly"
The probability of "boiling" will be greater than the probability of "hot water", "suddenly", and "spring". Here, the independent word "suddenly" adds only a supplementary meaning to the conceptual relationship "hot water has boiled from the ground".

【００５４】本願発明者達は、このように、自立語の中
にも他の単語（自立語）との意味的なつながりに影響し
ない自立語があることを発見した。この発見に基づいた
第１の改良形態を説明する。この形態では、意味的なつ
ながりに影響を与える自立語だけで確率表が構成され
る。自立語の中で他の単語とのつながりに影響を与えな
い単語は、言語学的には副詞（たとえば、急に）、接頭
語（御）、接尾語（さん）、接続詞（そして）、間投詞
（さて）、副詞的名詞（今日）、記号等の文字列であ
る。The inventors of the present application have thus discovered that some independent words do not affect the semantic connection with other words (independent words). A first improved mode based on this finding will be described. In this form, the probability table is composed only of independent words that affect the semantic connection. Among the independent words, words that do not affect the connection with other words are linguistically adverbs (for example, suddenly), prefixes (go), suffixes (san), conjunctions (and), interjections. (Well), adverbal nouns (today), character strings such as symbols.

【００５５】したがって、このような確率表Bを使用す
る場合には、言語処理装置は入力された読みの文字列の
中からかな漢字変換辞書に記載された言語学的な単語の
種類情報に基づき、意味的に他の単語とつながりのある
自立語の読みを検出する。言語処理装置は検出された読
みに対応する１以上の変換候補を取得する。取得された
変換候補とそれ以前の確定文字列の組み合わせが作成さ
れて、確率表から確率が取得される。取得された確率に
基づいて、変換候補の優先順位が決定されることは、上
記実施形態と同様である。Therefore, when such a probability table B is used, the language processing apparatus uses the linguistic word type information described in the Kana-Kanji conversion dictionary from the input reading character strings, Detects readings of independent words that are semantically related to other words. The language processing device acquires one or more conversion candidates corresponding to the detected reading. A combination of the acquired conversion candidate and the fixed character string before that is created, and the probability is acquired from the probability table. Similar to the above embodiment, the priority of conversion candidates is determined based on the acquired probability.

【００５６】（実施形態１の第３の改良形態）具体的な
事例としては、「全快まで療養に努める」という漢字列
を使用する。確率表２０４の確率表２には「全快」、
「療養」、「努め」の順の組み合わせはなく、「療
養」、「全快」、「努め」の順の組み合わせが記載され
ているものとする。(Third improved form of the first embodiment) As a concrete example, a kanji string "work for medical treatment until complete recovery" is used. In the probability table 2 of the probability table 204, "perfect recovery",
It is assumed that there is no combination of "medical care" and "effort" in this order, but a combination of "medical care", "complete recovery" and "effort" in that order.

【００５７】ユーザが「全快まで」、「療養に」に対す
る読みを入力し、従来と同様にしてかな漢字変換プログ
ラムによりＲＡＭ１０３上に読みに対応する変換結果が
格納されている。次に、「努める」の読み「つとめる」
が入力デバイス１０５から入力されると、ＣＰＵ１０１
はＨＤ１０７に保存されているかな漢字変換辞書２０３
を参照して読みに対応する漢字変換候補、たとえば、
「勤める」、「努める」、「勉める」等を取得する。取
得された変換候補はＲＡＭ１０３に格納される。The user inputs readings for "up to perfection" and "for medical treatment", and the conversion result corresponding to the readings is stored in the RAM 103 by the kana-kanji conversion program as in the conventional case. Next, read "work" and "work"
Is input from the input device 105, the CPU 101
Is the Kana-Kanji conversion dictionary 203 stored in the HD 107.
See Kanji conversion candidates for reading, for example,
Acquire "work", "work", "study", etc. The acquired conversion candidates are stored in the RAM 103.

【００５８】ＣＰＵ１０１は図４の処理手順に基づい
て、最初に得られた変換候補「勤める」を取得すると
（ステップＳ１００）、現時点より前に得られた変換結
果の自立語「全快」、「療養」とを組み合わせ、その組
み合わせた単語列「全快」＋「療養」＋「勤め」で確率
表２０４（図６参照）を検索する（ステップＳ１１
０）。なお、＋記号は組み合わせを意味し、実際の検索
に使用される単語列には含まれていない。When the CPU 101 acquires the first obtained conversion candidate "work" based on the processing procedure of FIG. 4 (step S100), the independent words "zenkai" and "medical treatment" of the conversion result obtained before the present time. , And the probability table 204 (see FIG. 6) is searched by the combined word string “zenkai” + “medical care” + “work” (step S11).
0). The + symbol means a combination and is not included in the word string used for the actual search.

【００５９】この組み合わせが確率表２０４の記載され
ている場合には、ステップＳ１２０の検索結果の有無判
定ではＹＥＳの判定が得られる。手順はステップＳ１２
０からステップＳ２００にジャンプする。ＣＰＵ１０１
は、検索結果の確率（図６参照）を取得する。取得され
た確率は、ＲＡＭ１０３に記憶されている変換候補、こ
の場合、「勤め」に関連付けて確率を格納する。When this combination is described in the probability table 204, a YES determination is obtained in the determination of the presence or absence of the search result in step S120. The procedure is step S12.
Jump from 0 to step S200. CPU 101
Acquires the probability of the search result (see FIG. 6). The acquired probability is associated with the conversion candidate stored in the RAM 103, in this case, “work”, and stored.

【００６０】本実施形態では、変換候補を使用した単語
列が確率表２０４に記載されていない場合には、その単
語列の第１番目の単語と第２番目の単語を入れ替えた単
語列が作成される。作成された単語列により確率表２０
４を検索するので、確率表には、１組の単語列だけを用
意しておけばよく、単語を入れ替えた単語列は不要とな
る。これにより、確率表のデータ容量を従来よりも半分
に減らすことができる。In this embodiment, when the word string using the conversion candidates is not described in the probability table 204, a word string in which the first word and the second word of the word string are exchanged is created. To be done. Probability table 20 based on the created word string
Since 4 is searched, it is sufficient to prepare only one set of word strings in the probability table, and the word string in which the words are replaced is unnecessary. As a result, the data capacity of the probability table can be reduced to half that of the conventional one.

【００６１】（実施形態１の第４の改良形態）次にＣＰ
Ｕ１０１は図４の処理手順を第２番目の変換候補「勉
め」に対して施す。(Fourth Improved Mode of Embodiment 1) CP
U101 applies the processing procedure of FIG. 4 to the second conversion candidate “study”.

【００６２】ＣＰＵ１０１は、第１番目の自立語「全
快」と第２番目の自立語「療養」を組み合わせた単語列
「全快」＋「療養」＋「努め」を作成し、作成した単語
列で確率表２０４を検索する（ステップＳ１１０）。こ
の組み合わせは確率表には記載されていないので、ステ
ップＳ１２０の検索結果の有無判定ではＮＯの判定結果
が得られる。The CPU 101 creates a word string "zenkai" + "medical care" + "work" that is a combination of the first independent word "zenkai" and the second independent word "medical care", and uses the created word string. The probability table 204 is searched (step S110). Since this combination is not listed in the probability table, a NO determination result is obtained in the determination of the presence or absence of the search result in step S120.

【００６３】ＣＰＵ１０１は第１番目の自立語「全快」
と第２番目の自立語「療養」を入れ替えた単語列「療
養」＋「全快」＋「努め」を作成し、作成した単語列で
確率表２０４を検索する（ステップＳ１４０）。The CPU 101 uses the first independent word "zenkai".
Then, a second word string “medical care”, which is a replacement of the independent word “medical care”, “medical care” + “perfect” + “effort” is created, and the probability table 204 is searched by the created word string (step S140).

【００６４】この組み合わせの単語列は確率表２０４に
記載されているので、ステップＳ１５０の検索結果の有
無判定ではＹＥＳ判定が得られる。したがって、手順は
ステップＳ１５０からステップＳ２００へとジャンプす
る。ステップＳ２００では、検索結果として得られる確
率が変換候補「努め」に関連付けられてＲＡＭ１０３に
格納される。Since the word string of this combination is described in the probability table 204, a YES judgment is obtained in the judgment of the presence / absence of the search result in step S150. Therefore, the procedure jumps from step S150 to step S200. In step S200, the probability obtained as the search result is stored in the RAM 103 in association with the conversion candidate “effort”.

【００６５】最後の変換候補「勉め」についても図４の
処理手順が施される。なお、この単語「勉め」は確率表
２０４にはまったく記載されておらず、分類表２０５に
記載されているものとする。例として、「勉め」が含ま
れる集合の中には他の単語「勉強する」、「努力」が含
まれているものとする。また、「努力」は確率表２０４
には「全快」＋「療養」＋「努力」の形態で記載されて
いるものとする。The processing procedure of FIG. 4 is also applied to the last conversion candidate “study”. The word “study” is not described in the probability table 204 at all, but is described in the classification table 205. As an example, it is assumed that other words “study” and “effort” are included in the set including “study”. Also, “effort” is the probability table 204
It is assumed that the description is in the form of “complete recovery” + “treatment” + “effort”.

【００６６】ＣＰＵ１０１は現時点より前の変換結果と
今回の変換候補を組み合わせた単語列「全快」＋「療
養」＋「勉め」を作成し、作成した単語列で確率表２０
４を検索する（ステップＳ１００→Ｓ１１０）。ステッ
プ１２０の検索結果有無判定はＮＯが得られるので、次
に第１番目の単語と第２番目の単語を入れ替えた単語列
「療養」＋「全快」＋「勉め」が作成され、作成された
単語列で確率表２０４が検索される（ステップＳ１４
０）。ステップＳ１５０の検索結果有無判定ではＮＯの
判定結果が得られるので、手順はステップＳ１６０に進
む。The CPU 101 creates a word string “zenkai” + “medical care” + “study”, which is a combination of the conversion result before this time and the conversion candidate this time, and the probability table 20 is created by the created word string.
4 is searched (steps S100 → S110). Since NO is obtained in the search result presence / absence determination in step 120, the word string "medical care" + "zenkai" + "study" in which the first word and the second word are replaced next is created and created. The probability table 204 is searched for with the word string (step S14).
0). Since a NO determination result is obtained in the search result presence / absence determination in step S150, the procedure proceeds to step S160.

【００６７】このステップでＣＰＵ１０１は３番目の単
語「勉め」が含まれる集合を単語分類表（図７参照）を
検索する。検索結果として得られる複数の単語、この場
合、「勉強」、「努力」がＲＡＭ１０３に格納される。In this step, the CPU 101 searches the word classification table (see FIG. 7) for the set including the third word "study". A plurality of words obtained as a search result, in this case, "study" and "effort" are stored in the RAM 103.

【００６８】以下、最初に単語分類表から得られた単語
「勉強」を第１番目の単語と第２番目の単語と組み合わ
せ、得られる単語列「療養」＋「全快」＋「勉強」につ
いての確率を確率表２０４か取得する（ステップＳ１９
０→Ｓ２００）。Below, the word "study" obtained from the word classification table is first combined with the first word and the second word, and the obtained word string "medical treatment" + "zenkai" + "study" The probability is acquired from the probability table 204 (step S19).
0 → S200).

【００６９】取得された確率は、変換候補「勉め」に関
連付けてＲＡＭ１０３に格納される。一方、上記組み合
わせが確率表２０４に記載されていない場合は、ステッ
プＳ１９０の確率表の検索結果の有無判定でＮＯ判定が
得られるので、手順はステップＳ１９０からステップＳ
３００に進む。The acquired probability is stored in the RAM 103 in association with the conversion candidate “study”. On the other hand, if the above combination is not described in the probability table 204, a NO determination is obtained by the presence / absence determination of the search result of the probability table in step S190, and therefore the procedure is from step S190 to step S190.
Proceed to 300.

【００７０】このステップではあらかじめ用意されてい
る確率により変換候補「勉め」に与える確率が自動決定
される。通常、確率表２０４に記載されていない単語列
は文法的にありえない単語の並び順であることが多いの
で、付与する確率としては低い値が使用される。使用さ
れた確率が、変換候補「勉め」に関連付けてＲＡＭ１０
３に格納される。In this step, the probability to be given to the conversion candidate “study” is automatically determined by the previously prepared probability. In general, a word string that is not listed in the probability table 204 has a grammatically impossible word arrangement order in many cases, so a low value is used as the probability of giving. The probability of being used is associated with the conversion candidate “study” and RAM10
3 is stored.

【００７１】このようにして、読み「つとめる」に対す
る３つの変換候補「勤める」、「努める」、「勉める」
についての発生の程度を示す確率が得られると、ＣＰＵ
１０１は従来と同様にして、値の大きさの順に変換候補
を並び替えて、並び替えられた変換候補を表示デバイス
１０６にユーザによる選択のために表示させる。In this way, the three conversion candidates for the reading "work" are "work", "work", and "study".
When the probability indicating the degree of occurrence of
In the same manner as in the related art, 101 rearranges the conversion candidates in the order of the magnitude of the value, and displays the rearranged conversion candidates on the display device 106 for selection by the user.

【００７２】本実施形態では、意味的に類似した単語群
を記載した分類表を用意しておき、変換候補の単語が確
率表に記載されてない場合には、上記単語群に含まれる
単語を使用して確率表を検索する。したがって、同一の
分類に含まれる代表的な単語を確率表に記載しておけば
よい。確率表は３つの単語と確率で１レコードが構成さ
れるで、分類表の１つの集合に含まれる単語がたとえ
ば、１０の単語とすると、３×１０の単語＋１０の確率
のデータ容量が必要である。これに対して、本実施形態
ではブ確率表と分類表のデータを併せても３×１の単語
＋１つの確率＋１０の単語のデータ容量となり、従来よ
りも本実施形態の方がデータ容量が小さくなる。In the present embodiment, a classification table in which semantically similar word groups are described is prepared, and if the conversion candidate words are not described in the probability table, the words included in the word group are Use to search the probability table. Therefore, typical words included in the same classification may be described in the probability table. Since one record is composed of three words and a probability in the probability table, if the number of words included in one set of the classification table is, for example, 10 words, a data capacity of 3 × 10 words + 10 probability is required. is there. On the other hand, in the present embodiment, even if the data of the probability table and the classification table are combined, the data capacity of 3 × 1 words + 1 probability + 10 words is obtained, and the data capacity of the present embodiment is smaller than the conventional one. Become.

【００７３】（複数の単語の読みが入力された場合の処
理）上述の実施形態は予め２つの自立語を含む文字列の
変換結果が既に確定されており、３つ目の自立語を服務
文字列の読みが入力された場合の言語処理の例である。
かな漢字変換処理には複数の文節の読みが一括して入力
されることがある。この場合の言語処理に対して本発明
をどのように適用するかを説明しておく。(Processing when Readings of Multiple Words are Input) In the above-described embodiment, the conversion result of the character string including two independent words is already determined, and the third independent word is the service character. It is an example of the language processing when the reading of the column is input.
In the kana-kanji conversion process, readings of multiple phrases may be input all at once. How to apply the present invention to the language processing in this case will be described.

【００７４】読みとして「みなときょうあかさかにい
く」が言語処理装置に入力されたものとする。言語処理
装置は従来と同様にしてかな漢字単語辞書に記載されて
いる単語の読みを参照して読みを単語単位で分解する。
単語の区切り位置を｜記号で表すと、次のような複数の
区切り候補が得られる。「みな｜と｜きょう｜あか｜さか｜に｜いく」「みな｜と｜きょう｜あかさか｜に｜いく」、「みなと｜きょう｜あか｜さか｜に｜いく」、．．．．．．が得られる。こような複数の区切り候補の中の各単語に
複数の変換候補が存在する。上記１番目の区切り候補で
は（）で漢字変換候補の集合をあらわすと、（みな、
皆、美奈、．．．）（と、ト、戸、．．．）（あか、
赤、垢、．．．）（さか、坂、差か．．．．）（に、
２、似、．．．）（いく、行く、幾．．．）という漢字
変換候補の集合により第１番目の区切り候補を表すこと
ができる。As a reading, it is assumed that “everyone's daytime is gone” is input to the language processing device. In the same manner as in the conventional art, the language processing device refers to the reading of the words described in the Kana-Kanji word dictionary and decomposes the readings word by word.
If the delimiter position of a word is represented by | symbol, the following plural delimiter candidates are obtained. "Everyone | To | Kyo | Aka | Saka | To | Go""Mina | To | Kyo | Akasaka | To | Go", "Minato | Kyo | Aka | Saka | To | Go" ,. ．．．．． Is obtained. There are a plurality of conversion candidates for each word among the plurality of delimiter candidates. In the above first delimiter candidate, if the set of Kanji conversion candidates is represented by (), (all,
Mina, everyone. ．． ) (And To, door, ...) (Aka,
Red, dirt ,. ．． ) (Saka, Saka, the difference ...) (Ni,
2, similar ,. ．． ) (Iku, Iku, Iku ...) A set of Kanji conversion candidates can represent the first delimiter candidate.

【００７５】各単語の変換候補を読みの入力順につなげ
ると、そのつながりは単語変換候補をノードとするネッ
トワークとみなすことができる。When the conversion candidates of each word are connected in the reading input order, the connection can be regarded as a network having the word conversion candidates as nodes.

【００７６】ネットワークの中の１つの経路は、たとえ
ば、みな→と→あか→さか→に→いくとなる。このような経路において、言語処理装置は最初
に経路上の最初の漢字変換候補「みな」および引き続く
「と」「あか」の３つの漢字変換候補を選択し、確率表
から確率Ｐ１を取得する。One route in the network is, for example, all → and → red → dark → → →. In such a route, the language processing device first selects the three Kanji conversion candidates “min” and the subsequent “to” “red” on the route, and acquires the probability P1 from the probability table.

【００７７】次に２番目の漢字変換候補「と」およびそ
れに続く経路上の２つの単語変換候補を組み合わせる。
組み合わせた単語変換候補について確率Ｐ２が確率表か
ら取得される。このようにして経路上に沿って得られる
３つの単語（漢字変換候補）の組み合わせの確率Ｐ１，
Ｐ２，．．．．を取得すると、言語処理装置は確率Ｐ
１，Ｐ２，．．．全て掛け合わせて内部メモリ（ＲＡＭ
１０３）に記憶しておく。Next, the second Kanji conversion candidate “to” and the two subsequent word conversion candidates on the path are combined.
The probability P2 is acquired from the probability table for the combined word conversion candidates. In this way, the probability P1 of the combination of three words (kanji conversion candidates) obtained along the route is
P2 ,. ．．． , The language processor obtains the probability P.
1, P2 ,. ．． Internal memory (RAM
103).

【００７８】言語処理装置は、次に２番目の経路、たと
えば、みな→と→あか→さか→に→行くを選択し、この経路についての確率を上記と同様の処理
により取得し、内部に記憶する。The language processor next selects a second route, for example, all → and → red → dark → → to go, acquires the probability for this route by the same process as above, and stores it internally. To do.

【００７９】このようにして、全ての確率を取得する
と、言語処理装置は値の高い順に区切り候補の優先順を
決定し、最も値の高い区切り候補をユーザの単語選択の
ために表示する。In this way, when all probabilities have been acquired, the language processing device determines the priority order of the delimiter candidates in the descending order of the value, and displays the delimiting candidate with the highest value for the user's word selection.

【００８０】以上述べた処理の中で、確率表Bを参照し
たときに、確率表Bに該当する単語がないときに、上述
した実施形態１の第２、第３の改善点による処理を適用
するとよい。In the processing described above, when referring to the probability table B, when there is no word corresponding to the probability table B, the processing according to the second and third improvement points of the first embodiment described above is applied. Good to do.

【００８１】（他の実施形態）上述の実施形態は本発明
の言語処理方法をかな漢字変換処理に適用する例であっ
たが、音声認識処理にも適用することができる。この場
合には、図２のかな漢字変換プログラム２０２が音声認
識プログラムとなる。また、かな漢字変換辞書２０３が
音素漢字変換辞書となる。音素漢字変換辞書は、単語を
構成する音素（たとえば、母音や子音）を表す識別記号
（本発明の特徴情報）と、それに対応の少なくとも１以
上の文字を記載した辞書である。同一の特徴（音素の識
別記号の列）に対して、複数の変換候補の文字を辞書内
に用意しておく。音声認識では音素の形態で音声認識結
果が得られるので、音声認識結果の音素の種類を表す識
別記号の列に基づいて音素漢字変換辞書を参照して漢字
などの文字（ひらがなの場合もある）に変換する。変換
候補に対する優先順位の決めるための言語処理方法は上
述の方法と同様である。これにより、音声認識結果の音
素列に対応する漢字の単語が複数存在する場合に複数の
単語の優先順位を決定することができる。(Other Embodiments) Although the above embodiment is an example in which the language processing method of the present invention is applied to the kana-kanji conversion processing, it can also be applied to the voice recognition processing. In this case, the Kana-Kanji conversion program 202 of FIG. 2 becomes the voice recognition program. Further, the kana-kanji conversion dictionary 203 becomes a phoneme-kanji conversion dictionary. The phoneme-kanji conversion dictionary is a dictionary in which identification symbols (feature information of the present invention) representing phonemes (for example, vowels or consonants) forming a word and at least one or more characters corresponding thereto are described. A plurality of conversion candidate characters are prepared in the dictionary for the same feature (phoneme identification symbol string). Since the speech recognition results are obtained in the form of phonemes in speech recognition, characters such as kanji (sometimes hiragana) are referenced by referring to the phoneme-kanji conversion dictionary based on the sequence of identification symbols that represent the type of phonemes in the speech recognition result. Convert to. The language processing method for determining the priority order of the conversion candidates is the same as the above method. Accordingly, when there are a plurality of Kanji words corresponding to the phoneme sequence of the speech recognition result, the priority order of the plurality of words can be determined.

【００８２】また、手書き文字認識にも本発明の言語処
理方法を適用可能である。この場合には、図２のかな漢
字変換プログラム２０２が手書き文字認識プログラムに
なり、かな漢字変換辞書２０３がパターン−文字変換辞
書となる。通常、手書き文字の認識においては、文字を
構成する線分の長さと方向が文字の特徴として抽出され
る。パターン−文字変換辞書は、上記文字の特徴、その
特徴に対応する文字を記載した辞書であり、同一の特徴
に対して、複数の変換候補の文字を用意しておく。たと
えば、漢字の数字の「一」、ハイホンの「−」、アンダ
ーバーの「＿」はほぼ同一の特徴を有するので、複数の
変換候補として用意される。これまでに入力された複数
の手書き入力文字と複数の変換候補をそれぞれ組み合わ
せ、その組み合わせ文字の確率を比較することでも、正
しい認識結果を取得することができる。Further, the language processing method of the present invention can be applied to handwritten character recognition. In this case, the kana-kanji conversion program 202 of FIG. 2 becomes a handwritten character recognition program, and the kana-kanji conversion dictionary 203 becomes a pattern-character conversion dictionary. Normally, in the recognition of handwritten characters, the length and direction of the line segments that make up the character are extracted as the characteristics of the character. The pattern-character conversion dictionary is a dictionary in which the characteristics of the above characters and the characters corresponding to the characteristics are described, and a plurality of conversion candidate characters are prepared for the same characteristics. For example, the kanji number “1”, the hyphen “−”, and the underbar “_” have almost the same characteristics, and thus are prepared as a plurality of conversion candidates. The correct recognition result can also be obtained by combining a plurality of handwritten input characters input so far and a plurality of conversion candidates and comparing the probabilities of the combined characters.

【００８３】１）言語処理装置（あるいは言語処理プロ
グラム）は、単体で使用してもよいし、情報処理装置
（あるいは情報処理プログラム）の中に組み込んでもよ
い。２）言語処理装置として使用するハードウェアの構成は
上述以外の種々の機器を使用することが可能である。３）本実施形態でいうプログラムの概念には、ＣＰＵが
直接、実行可能なマシン語形態のプログラム、マシン語
に変換するための高級言語（たとえば、Ｃ言語やＣ＋＋
言語、）のプログラム、さらには、ＨＴＭＬ（ハイパー
テキストマークアップランゲージ）文書等や他のスクリ
プト言語で使用されたオブジェクトが含まれる。４）図
２では組み合わせた単語列の中の単語の入れ替えの後
に、分類表から単語を取得する処理を実行しているが逆
の順でこれらの処理を行ってもよい。５）上述の実施形態では３つの単語を組み合わせるトラ
イグラムの例を示したが２つの単語の組み合わせでもよ
いし、３より大きい単語数の組み合わせでもよい。６）上述の実施形態では、特徴情報を日本語の漢字に変
換する例を示したが、変換対象の文字は日本語の文字に
限ることはない、英文字、中国語の文字およびその他の
文字に本発明を適用できる。７）かな漢字変換処理では複数の単語の読みが一括して
入力されることがある。1) The language processing device (or language processing program) may be used alone or may be incorporated in the information processing device (or information processing program). 2) For the hardware configuration used as the language processing device, various devices other than those described above can be used. 3) The concept of the program in the present embodiment includes a machine language form program that can be directly executed by the CPU, and a high-level language (for example, C language or C ++) for converting into a machine language.
Language) programs as well as objects used in HTML (Hypertext Markup Language) documents and other scripting languages. 4) In FIG. 2, after replacing the words in the combined word string, the process of acquiring the words from the classification table is executed, but these processes may be performed in the reverse order. 5) In the above-described embodiment, an example of a trigram in which three words are combined is shown, but a combination of two words may be used, or a combination of the number of words larger than 3 may be used. 6) In the above embodiment, an example in which the characteristic information is converted into Japanese kanji is shown, but the conversion target characters are not limited to Japanese characters, English characters, Chinese characters, and other characters. The present invention can be applied to. 7) In the kana-kanji conversion processing, the readings of a plurality of words may be input collectively.

【００８４】この場合には従来と同様にして、入力され
た読みの列を単語単位で分解するとよい。分解された読
みに対応する漢字の単語をかな漢字変換辞書から取得し
て３つの単語の組み合わせを作成するとよい。この場合
は３つ目の単語の発生確率は３つの単語の組み合わせ全
体の発生確率ともみなすことができる。発生確率に対応
させて、３つの単語の組み合わせの各々に対して優先順
位が与えられる。In this case, as in the conventional case, the input reading sequence may be decomposed word by word. It is advisable to acquire a kanji word corresponding to the decomposed reading from the kana-kanji conversion dictionary and create a combination of three words. In this case, the occurrence probability of the third word can also be regarded as the occurrence probability of the entire combination of the three words. A priority is given to each of the three word combinations in association with the occurrence probability.

【００８５】以上述べた複数の実施形態は本発明の技術
思想を説明するための例示であって、これら実施形態に
限定されるものではない、上述の実施形態以外にも種々
の変形が可能である。それらの変形が特許請求の範囲の
記載の示す本発明の技術思想に基づくものである限り、
それらの変形は本発明の技術範囲となる。The plurality of embodiments described above are examples for explaining the technical idea of the present invention, and the present invention is not limited to these embodiments, and various modifications other than the above-described embodiments are possible. is there. As long as those modifications are based on the technical idea of the present invention described in the claims,
Those modifications are within the technical scope of the present invention.

【００８６】[0086]

【発明の効果】以上、説明したように、本発明では、変
換の上位候補から変換結果を選ぶ処理において、概念を
持つ単語にのみ注目した確率計算を行うことで、離れた
単語の概念の間の関係を捉えることができ、より変換精
度の高い変換結果が得られる。As described above, according to the present invention, in the process of selecting the conversion result from the high-order candidates for conversion, the probability calculation focusing only on the words having the concept is performed, so that the concept of the words separated from each other can be calculated. Therefore, the conversion result with higher conversion accuracy can be obtained.

【００８７】本発明によれば、変換候補を使用して最初
に組み合わせた複数の文字列により確率表を検索し、検
索結果が得られない場合は、組み合わせの文字列の並び
順を入れ替えて検索する。これにより、確率表に記載す
る複数の文字列は並び順を考慮しなくてもよいので、確
率表のデータ容量を減少し、もって、言語処理時間を短
縮することができる。また、より多くの種類の入力に対
して、離れた単語の関係を捉えることができ、その結
果、変換精度を向上させることができる。According to the present invention, the probability table is searched using a plurality of character strings first combined using the conversion candidates, and when the search result cannot be obtained, the order of arrangement of the character strings of the combination is changed. To do. As a result, the order of arrangement of the plurality of character strings described in the probability table does not have to be taken into consideration, so that the data capacity of the probability table can be reduced, and thus the language processing time can be shortened. In addition, it is possible to capture the relationship between distant words for more types of input, and as a result, it is possible to improve conversion accuracy.

【００８８】本発明によれば、さらに、確率表に記載さ
れていない文字列を分類表から取得できるので、確率表
には同一の分類に属する文字列を記載しておくことがで
き、これによっても確率表のデータ容量を減少すること
ができる。また、より多くの種類の入力に対して、離れ
た単語の関係を捉えることができ、その結果、変換精度
を向上させることができる。Further, according to the present invention, since the character strings which are not described in the probability table can be obtained from the classification table, the character strings belonging to the same classification can be described in the probability table. Can also reduce the data capacity of the probability table. In addition, it is possible to capture the relationship between distant words for more types of input, and as a result, it is possible to improve conversion accuracy.

【００８９】本発明は、文字列（１文字を含む）の特徴
を文字列に変換する種々の言語処理、たとえば、かな漢
字変換処理、音声認識処理、手書き文字認識処理に適用
可能である。変換精度が低く、１つの文字の特徴に対し
て複数の候補が存在する場合には、本発明を使用して、
複数の候補の優先順位を決定することが可能となる。The present invention can be applied to various language processes for converting the characteristics of a character string (including one character) into a character string, for example, kana-kanji conversion process, voice recognition process, and handwritten character recognition process. When the conversion accuracy is low and there are multiple candidates for one character feature, the present invention is used to
It is possible to determine the priority order of a plurality of candidates.

【００９０】文字列の組み合わせを３つとした場合に
は、第１番目の文字列と第２番目の文字列の順を入れ替
え。その組み合わせの文字列で確率表を検索すればよい
ので、文字列の入れ回数も１回でよく、言語処理の時間
に影響することはない。When there are three combinations of character strings, the order of the first character string and the second character string is exchanged. Since it is only necessary to search the probability table with the character strings of the combination, the number of times the character string is inserted only needs to be 1, and it does not affect the time required for language processing.

[Brief description of drawings]

【図１】本発明実施形態の言語処理装置のシステム構成
を示すブロック図である。FIG. 1 is a block diagram showing a system configuration of a language processing device according to an embodiment of the present invention.

【図２】本発明実施形態のソフトウェアの構成を示すブ
ロック図である。FIG. 2 is a block diagram showing a configuration of software according to the embodiment of the present invention.

【図３】かな漢字変換処理のメイン処理手順を示すフロ
ーチャートである。FIG. 3 is a flowchart showing a main processing procedure of kana-kanji conversion processing.

【図４】本発明に係わる言語処理の内容を示すフローチ
ャートである。FIG. 4 is a flowchart showing the contents of language processing according to the present invention.

【図５】漢字辞書の内容を示す説明図である。FIG. 5 is an explanatory diagram showing the contents of a Kanji dictionary.

【図６】確率表の内容を示す説明図である。FIG. 6 is an explanatory diagram showing the contents of a probability table.

【図７】分類表の内容を示す説明図である。FIG. 7 is an explanatory diagram showing the contents of a classification table.

[Explanation of symbols]

１０１ＣＰＵ１０３ＲＯＭ１０５入力デバイス１０７ＨＤ 101 CPU 103 ROM 105 input device 107 HD

フロントページの続き (72)発明者佐藤良治東京都調布市調布ケ丘１−18−１マイクロソフト株式会社マイクロソフト調布技術センター内 (72)発明者鈴木久美アメリカ合衆国 98052 ワシントン州レッドモンドサマミッシュパークウェイノースイーストウエストレイク 4250 (72)発明者ガオジャンフェン中華人民共和国 100080 ペキンジチュンロードハイディアンディストリクトペキンシグマセンターナンバー 49 ５エフマイクロソフトリサーチ. アジアＦターム(参考） 5B009 KB06 MC02 MH01 5D015 BB01 HH23 Continued front page (72) Inventor Ryoji Sato 1-18-1 Chofugaoka, Chofu-shi, Tokyo Microphone Rosoft Corporation Microsoft Chofu Gi Inside the surgery center (72) Inventor Kumi Suzuki United States 98052 Washington Redmond Sumamish Park We Lee Northeast West Lake 4250 (72) Inventor Gao Jang Feng People's Republic of China 100080 Peking Jichu Download Haidian District Topekin Sigma Center Number 49 5F Microsoft Research. Asia F-term (reference) 5B009 KB06 MC02 MH01 5D015 BB01 HH23

Claims

[Claims]

1. A language processing program executed by a language processing apparatus for converting at least one or more character string features into a character string having the features, wherein the feature information represents the feature of the character string. There are a plurality of sets of conversion candidates of character strings similar to this characteristic information, each of the plurality of sets of conversion candidates is combined with a past conversion result, and the priority of the plurality of conversion candidates is based on the occurrence probability of the plurality of combinations. In the language processing program for determining, the probability table in which the combinations of a plurality of word strings and the occurrence probabilities are described is stored in advance in the language processing device. Is a sequence of independent words that affect the connection with the word, and the language processing program selects a word as a conversion result to be combined to obtain the occurrence probability. A language processing program comprising steps and steps of selecting an independent word that semantically affects connection with another word.

2. The language processing program according to claim 1, wherein the independent words that do not semantically affect the connection with other words include adverbs, prefixes, conjunctions, interjections, adverbial nouns, and character strings of symbols. A language processing program characterized by being one of the following.

3. The language processing program according to claim 1, wherein the independent word is accompanied by auxiliary information indicating a relationship with another independent word.

4. The language processing program according to claim 1, wherein the characteristic information is reading.

5. The language processing program according to claim 1, wherein the plurality of combinations are combinations of three character strings.

6. The language processing program according to claim 1, wherein the characteristic information is a phoneme string obtained in speech recognition.

7. The language processing program according to claim 1, wherein the characteristic information is obtained by handwriting character recognition.

8. The language processing program according to claim 1, wherein in the step of selecting an independent word that semantically affects connection with another word as a conversion result to be combined to obtain the occurrence probability, When the result is not obtained, a step of newly creating a character string in which the arrangement order of the character strings in the created character string is changed, and the probability table is searched by the newly created character string, And a step of acquiring a probability corresponding to the conversion candidate.

9. The language processing program according to claim 1, wherein a probability table in which a combination of a plurality of word strings and an occurrence probability are described and a classification table in which a plurality of character strings belonging to the same classification are described in advance are provided. When a search result is not obtained in the step of selecting an independent word that is stored in the language processing device and that is combined to obtain the occurrence probability, semantically affecting the connection with other words. Includes a step of acquiring another character string included in the same classification as the conversion candidate character string from the classification table, and creating a combination character string using the acquired character string instead of the conversion candidate. And a step of searching the probability table with the newly created character string and acquiring the occurrence probability corresponding to the conversion candidate. Language processing program and butterflies.

10. A language processing device for converting at least one or more character string features into a character string having the feature, wherein a character similar to the feature information representing the feature information of the character string. There are multiple sets of column conversion candidates, combining each of the multiple sets of conversion candidates with past conversion results,
In a language processing device that determines the priority order of a plurality of conversion candidates based on the occurrence probabilities of a plurality of combinations, a probability table that describes combinations of a plurality of word strings and occurrence probabilities is stored in advance in the language processing device, First term Multiple word strings are word sequences and sequences of independent words that semantically affect the connection with other words. A language processing apparatus comprising: a unit that selects a word as a conversion result to be combined to obtain the occurrence probability, and a unit that selects an independent word that semantically affects the connection with another word.

11. The language processing apparatus according to claim 10, wherein the independent words that do not semantically affect the connection with other words are adverbs, prefixes, conjunctions, interjections, adverbial nouns, and character strings of symbols. A language processing device characterized by being any one.

12. The language processing program according to claim 10, wherein the independent word is accompanied by auxiliary information indicating a relationship with another independent word.

13. The language processing apparatus according to claim 10, wherein the characteristic information is reading.

14. The language processing apparatus according to claim 10, wherein the plurality of combinations is a combination of three character strings.

15. The language processing apparatus according to claim 10, wherein the feature information is a phoneme string obtained in speech recognition.

16. The language processing apparatus according to claim 10, wherein the characteristic information is obtained by handwriting character recognition.

17. The language processing device according to claim 10, wherein in the step of selecting an independent word that semantically affects connection with another word as a conversion result to be combined to obtain the occurrence probability, If no result is obtained, a means for newly creating a character string in which the arrangement order of the character strings in the created character string is changed, and the probability table is searched by the newly created character string. And a means for acquiring the probability corresponding to the conversion candidate.

18. The language processing device according to claim 10, wherein a probability table in which a combination of a plurality of word strings and an occurrence probability are described, and a classification table in which a plurality of character strings belonging to the same classification are described in advance for each classification are provided. When a search result is not obtained in the step of selecting an independent word that is stored in the language processing device and that is combined to obtain the occurrence probability, semantically affecting the connection with other words. Means to obtain another character string included in the same classification as the conversion candidate character string from the classification table, and to create a combination character string using the acquired character string instead of the conversion candidate. And a means for retrieving the probability table with the newly created character string to acquire the occurrence probability corresponding to the conversion candidate. Management apparatus.

19. A language processing method executed by a language processing device for converting at least one or more character string features into a character string having the feature, wherein the feature information representing the feature of the character string is There are a plurality of sets of conversion candidates of character strings similar to this characteristic information, each of the plurality of sets of conversion candidates is combined with a past conversion result, and the priority of the plurality of conversion candidates is based on the occurrence probability of the plurality of combinations. In the language processing method for determining, a probability table in which a combination of a plurality of word strings and an occurrence probability are described is stored in advance in the language processing device, and the plurality of word strings include a sequence of words and a semantically different one. Is a sequence of independent words that affect the connection with the word, the language processing device, as a conversion result to be combined to obtain the occurrence probability, a word, and semantically other words A language processing method characterized by selecting an independent word that influences the connection of.

20. In the language processing program according to claim 19, independent words that do not semantically affect the connection with other words include adverbs, prefixes, conjunctions, interjections, adverbial nouns, and character strings of symbols. A language processing method characterized by being one of the following.

21. The language processing method according to claim 19, wherein the independent word is accompanied by auxiliary information indicating a relationship with another independent word.

22. The language processing method according to claim 19, wherein the characteristic information is reading.

23. The language processing method according to claim 19, wherein the plurality of combinations is a combination of three character strings.

24. The language processing method according to claim 19, wherein the feature information is a phoneme string obtained in speech recognition.

25. The language processing method according to claim 19, wherein the characteristic information is obtained by handwriting character recognition.

26. The language processing method according to claim 19, wherein an independent word that semantically affects connection with another word is selected as a conversion result to be combined to obtain the occurrence probability, and the search result is If not, create a new character string in which the arrangement order of the character strings in the created character string is changed, search the probability table with the newly created character string, and convert the conversion candidate. A language processing method comprising: obtaining a probability corresponding to.

27. The language processing method according to claim 19, wherein a probability table in which a combination of a plurality of word strings and an occurrence probability are described, and a classification table in which a plurality of character strings belonging to the same classification are described in advance are provided. If the independent word that is stored in the language processing device and is combined to obtain the occurrence probability is selected as an independent word that semantically affects the connection with other words, and no search result is obtained, , Other character strings included in the same classification as the conversion candidate character string are acquired from the classification table, and the acquired character string is used in place of the conversion candidate to create a combination character string. A language processing method, comprising: searching the probability table with the character string created in 1. to acquire an occurrence probability corresponding to the conversion candidate.

28. A storage medium executed by a language processing device for converting at least one or more character string features into a character string having the character string, wherein the storage medium is used for the characteristic information representing the character string features. There are multiple sets of conversion candidates for character strings similar to the feature information, each of the conversion candidates of the plurality of sets is combined with the past conversion result, and the priority order of the plurality of conversion candidates is based on the occurrence probability of the plurality of combinations. In the storage medium to be determined, a probability table describing combinations of a plurality of word strings and occurrence probabilities is stored in advance in the language processing device, and the word sequence and the self-sustainability that affects the connection with other words semantically. The program selects a word and an independent word that semantically affects the connection with another word as a conversion result to be combined to obtain the occurrence probability. A storage medium having a step of selecting.

29. In the storage medium according to claim 28, the independent words that do not semantically affect the connection with other words are any of adverbs, prefixes, conjunctions, interjections, adverbial nouns, and character strings of symbols. A storage medium characterized by being

30. The storage device according to claim 28, wherein the independent word is accompanied by auxiliary information indicating a relationship with another independent word.

31. The storage medium according to claim 28, wherein the characteristic information is reading.

32. The storage medium according to claim 28, wherein the plurality of combinations is a combination of three character strings.

33. The storage medium according to claim 28, wherein the characteristic information is a phoneme string obtained in speech recognition.

34. The storage medium according to claim 28, wherein the characteristic information is obtained by handwritten character recognition.

35. The storage medium according to claim 28, wherein in the step of selecting an independent word that semantically affects connection with another word as a conversion result to be combined to obtain the occurrence probability, the search result If is not obtained, a step of newly creating a character string in which the arrangement order of the character strings in the created character string is changed, and the probability table is searched by the newly created character string, A step of obtaining a probability corresponding to a conversion candidate.

36. The storage medium according to claim 31, wherein a probability table in which a combination of a plurality of word strings and an occurrence probability are described and a classification table in which a plurality of character strings belonging to the same classification are described in each language If a search result is not obtained in the step of selecting an independent word that is stored in the processing device and semantically affects the connection with other words as the conversion result to be combined to obtain the occurrence probability. And a step of acquiring another character string included in the same classification as the conversion candidate character string from the classification table, and creating a combination character string using the acquired character string instead of the conversion candidate. And a step of searching the probability table with the newly created character string and acquiring an occurrence probability corresponding to the conversion candidate. That storage medium.