JPS62121570A

JPS62121570A - Continuous clause conversion processing method based on connection probability

Info

Publication number: JPS62121570A
Application number: JP60262783A
Authority: JP
Inventors: Naoko Isohara; 磯原　直子; Mitsuru Shirakawa; 満白川; Fumiko Kobayashi; 文子小林; Keiko Ishii; 啓子石井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1985-11-22
Filing date: 1985-11-22
Publication date: 1987-06-02

Abstract

PURPOSE:To improve operability by providing a means which extracts the connection probability of each word converted from a probability information storing means and performs a probability calculation, and the means which regulates the priority order of a conversion based on a calculated result. CONSTITUTION:A bit of connection probability information which represents the easiness of a connection between each group is stored in advance in a form of matrix table reaccessed with a preceding bit of group specification information and a succeeding bit of group specification information at a connection probability information storing part 16. A probability calculation part 17 finds the bit of connection probability information of words adjacent with each other from each group specification of the words resulted from the retrieval of a dictionary, and performs the probability calculation, such as the calculation of a probability as a whole in which each probability is multiplied with each other, or that of a forecasted probability, and selects and displays preferentially the word with a high probability. Thereby, the easiness in the connection of the words can be evaluated from the probability point of view, and an appropriate expression can be selected, therefore, a correct expression can be obtained in a short time, thereby the operability being improved.

Description

【発明の詳細な説明】〔１既要〕本発明の接続確率に基づく連文節変換処理方式は、連文
節変換を行う日本語文書作成装置において、予め変換の
単位となる言葉を文法的要素・意味的要素に関して同じ
性質を持つもの同士でグループ化し、各グループ間の接
続のしやすさを示す確率情報を１前後するグループの組
合わせ毎にマトリ・ックス状に記憶しておき、辞書検索
結果について、隣接する言葉の接続確率情報を求めるこ
とにより、この確率の高いものが優先して変換対象とし
て選択されるようにし、迅速に正しい表現が得られるよ
うにしている。[Detailed Description of the Invention] [1 Already Required] The continuous clause conversion processing method based on the connection probability of the present invention is used in a Japanese document creation device that performs continuous clause conversion to convert words, which are units of conversion, into grammatical elements and semantics. Elements that have the same properties are grouped together, and probability information indicating the ease of connection between each group is stored in a matrix for each combination of groups around 1, and the dictionary search results are By determining the connection probability information of adjacent words, words with a high probability are prioritized and selected as conversion targets, so that correct expressions can be quickly obtained.

[Industrial application field]

本発明は、複数の単語を含む日本語の読みを入力し、辞
書を検索して、漢字混じり文に変換しつつ１文書を編集
・作成するいわゆる日本語ワードプロセッサであって、
連文節変換に関して迅速に正しい表現を得ることができ
るようにした日本語文書作成装置における接続確率に基
づく連文節変換処理方式に関するものである。The present invention is a so-called Japanese word processor that edits and creates a single document by inputting Japanese pronunciations containing multiple words, searching a dictionary, and converting them into sentences containing kanji.
The present invention relates to a continuous clause conversion processing method based on connection probability in a Japanese document creation device, which allows a correct expression to be quickly obtained regarding continuous clause conversion.

[Conventional technology]

日本語文書作成装置では、仮名人力またはローマ字入力
などにより、読みを入力し、辞書を参照して仮名漢字変
換を行い、漢字混じり文を作成することが行われる。最
近では、単語１文節単位の変換だけでなく、複合語の変
換や複数の文節からなる連文節に対して変換可能とした
装置が用いられ始めている。なお、この説明では、複合
語変換も連文節変換に含まれるものとして説明する。In a Japanese document creation device, the pronunciation is input by human input of kana or Roman characters, conversion of kana to kanji is performed with reference to a dictionary, and a sentence containing kanji is created. Recently, devices have begun to be used that are capable of converting not only words and clauses, but also compound words and consecutive clauses consisting of a plurality of clauses. In this explanation, compound word conversion will also be explained as being included in continuous clause conversion.

従来の連文節変換を行う日本語文書作成装置では、いわ
ゆる最長−敗法により、最も長い変換単位が得られるよ
うな読みの分割によって、辞書検索を行い、その検索結
果による漢字混じり文を表示し、変換キーなどの入力に
よって、順次１次に長い変換ｍ位となる変換結果を表示
していくようにされている。Conventional Japanese document creation devices that perform continuous clause conversion perform a dictionary search using the so-called longest-lost method, which divides the readings that yield the longest conversion unit, and displays sentences containing kanji based on the search results. By inputting a conversion key or the like, conversion results corresponding to the mth first-order longest conversion are displayed in sequence.

[Problem that the invention seeks to solve]

上記従来の方式では、長い読みを一度に変換しようとす
ると、その区切り方や同音異義語の組合わせなどにより
、多数の表現ができてしまい、その中で適切な表現が優
先して表示されるとは限らないので、正しい表現を選び
出すための操作が。In the conventional method described above, if you try to convert long readings at once, many expressions will be created depending on the way they are separated or the combination of homophones, and among them, the appropriate expression will be displayed with priority. Since it is not always the case, there are operations to select the correct expression.

煩雑であるという問題があった。The problem was that it was complicated.

本発明は、上記問題点の解決を図り、長い読みを一度に
変換する場合にも、適切な表現が優先して選択され、所
望する結果を迅速に得られるようにして、操作性を向上
させることを目的としている。The present invention aims to solve the above problems, and even when converting long readings at once, appropriate expressions are prioritized and selected, allowing the desired result to be quickly obtained, thereby improving operability. The purpose is to

[Failure to solve the problem]

第１図は本発明の基本構成を示すブロック図である。 FIG. 1 is a block diagram showing the basic configuration of the present invention.

第１図において、１０は読みのキーや変換指示などのキ
ー。を有するキーボード、１１はキーボード１０からの
入力制御を行う入力制御部、１２は入力した連文節の読
みを分割する読み分割部、１３は読み分割部１２が分割
した読みの区切りにより辞書を検索する辞書検索部、１
４は読みとそれに対応する漢字コード列とが登録されて
いる辞書。In FIG. 1, 10 is a reading key, a conversion instruction key, etc. 11 is an input control unit that controls input from the keyboard 10; 12 is a reading division unit that divides the readings of the input continuous clause; 13 is a dictionary that searches the dictionary based on the divisions of readings divided by the reading division unit 12; Search section, 1
4 is a dictionary in which readings and corresponding kanji code strings are registered.

１５は変換対象の言葉毎にその言葉が属する予め定めら
れたグループ種別情報を与えるグループ種別付与部、１
６は読みの変換において隣合う言葉がそれぞれ属するグ
ループ種別間のつながり易さに関する確率情報を記憶す
る接続確率情報記憶部。Reference numeral 15 denotes a group type assigning unit that provides predetermined group type information to which the word belongs for each word to be converted;
Reference numeral 6 denotes a connection probability information storage unit that stores probability information regarding the ease of connection between group types to which adjacent words belong in reading conversion.

１７は辞書検索部１３の変換結果による読みの区切り毎
のグループ種別情報に従って、接′ｆｆ、確率情報記憶
部１６を参照し、変換の適切さに関する確率計算を実行
する確率計算部、１８は確率計算部１７の計算結果に基
づいて、確率の高い変換を優先的に選択する処理を実行
する変換選択部、１９は変換選択部１８が選択した変換
結果を表示する制御を行う表示制御部、２０は読みおよ
び変換結果が表示されるディスプレイを表す。Reference numeral 17 denotes a probability calculation unit that refers to the contact 'ff and probability information storage unit 16 and performs probability calculation regarding the appropriateness of the conversion according to the group type information for each reading division based on the conversion result of the dictionary search unit 13, and 18 a probability calculation unit. 19 is a display control unit that performs control to display the conversion result selected by the conversion selection unit 18; 20; represents the display where reading and conversion results are displayed.

本発明の場合、変換の単位となる言葉（単語または文節
）が、予め文法的要素・意味的要素に関して同じ性質を
持つもの同士でグループ化される。In the case of the present invention, words (words or phrases) that are units of conversion are grouped in advance into groups that have the same properties in terms of grammatical and semantic elements.

なお、必要に応じて、出現頻度などもグループ化の基阜
として考慮してもよい。各単語のグループ種別情報は、
辞書作成時に予め読みに対応する単語毎に、辞書１４に
登録される。また、辞書１４に、そのままの形で登録さ
れていないもので９文節として処理されるもののグルー
プ種別は、助詞などとの結び付きからグループ種別付与
部１５がグイナミソクに付与する。Note that, if necessary, the frequency of appearance may also be considered as a basis for grouping. Group type information for each word is
At the time of dictionary creation, each word corresponding to the pronunciation is registered in the dictionary 14 in advance. Furthermore, the group type for items that are not registered as they are in the dictionary 14 and are processed as nine clauses is assigned by the group type assigning unit 15 to Guinamisoku based on the association with particles and the like.

接続確率情報記憶部１６には、各グループ間の接続のし
やすさを示す接続確率情報が、前のグループ種別情報と
後のグループ種別情報とによりアクセスされるマトリッ
クス・テーブルの形で、予め記ｔαされる。Connection probability information indicating ease of connection between each group is stored in the connection probability information storage unit 16 in advance in the form of a matrix table accessed by previous group type information and subsequent group type information. tα is done.

確率計算部エフは、辞書検索結果の言葉の各グループ種
別から、各隣接する言葉の接続確率情報を求め、各確率
を掛は合わせた全体的な確率または予想確率などの確率
計算を行う。確率計算部１７は、この確率の高いものを
優先して選択し１表示する。The probability calculation unit F obtains connection probability information of each adjacent word from each group type of words in the dictionary search result, and calculates the probability such as the overall probability or predicted probability by multiplying each probability. The probability calculation unit 17 selects the one with higher probability and displays it as one.

[Effect]

本発明によれば、読みの区切りによる変換結果に対して
、言葉のつながりやすさが、確率的に評価されることに
なるので、適確な表現が優先して。According to the present invention, the ease with which words are connected is probabilistically evaluated for the conversion result based on reading divisions, so accurate expressions are prioritized.

早期に正しい表現が得られるようになる。なお。You will be able to obtain correct expressions early on. In addition.

確率が同じ場合などには、最長−成性または全体の文節
数が少ない最小文節法などにより、その中の優先順位を
定めるとよい。If the probabilities are the same, it is preferable to determine the priority among them by using the longest-formation method or the minimum clause method with a small number of clauses in total.

Ｃ実施例〕第２図はグループ種別の付与について説明するための図
、第３図は接続確率情報記憶部の記憶内容を説明するた
めの図、第４図は本発明の一実施例に係る確率計算を説
明する図、第５図は本発明の他の一実施例に係る変換を
説明するための図である。C Embodiment] FIG. 2 is a diagram for explaining assignment of group types, FIG. 3 is a diagram for explaining the storage contents of the connection probability information storage unit, and FIG. 4 is according to an embodiment of the present invention. FIG. 5 is a diagram illustrating probability calculation, and FIG. 5 is a diagram illustrating conversion according to another embodiment of the present invention.

本発明による単語および文節のグループ化には。For grouping words and phrases according to the invention.

名詞、動詞、形容詞／形容動詞、副詞・・・というよう
に品詞が考慮され、さらに接頭語となるもの。The part of speech is taken into consideration, such as nouns, verbs, adjectives/adjective verbs, adverbs, etc., and then they are prefixes.

接尾語となるもの９名詞であれば９人名、地名。If there are 9 nouns that can be used as suffixes, there are 9 names of people and places.

数字という意味的要素や、複合語になりやすいもの、複
合語になりにくいものというような要素が考慮される。Elements such as the semantic elements of numbers, words that are likely to become compound words, and words that are difficult to form into compound words are taken into consideration.

動詞、形容詞／形容動詞などについては、それぞれ連用
格、連体格、終止形というような活用形などが基準にさ
れて分類され９名詞を含む文節についても、主格、目的
格、連用格、連体格、終止形・・・というような基準に
より細分されて、それぞれのグループに、グループ種別
を示すグループＩＤが付与される。Verbs, adjectives/adjective verbs, etc. are classified based on their conjugation forms, such as adjunctive case, adjunctive case, and final form, and clauses containing nine nouns are also categorized into nominative case, objective case, adjunctive case, and adjunctive case. , final form, etc., and a group ID indicating the group type is assigned to each group.

このグループＩＤ情報は２例えば第２図（イ）に示すよ
うに、辞書１４中に、各読みに対応する単語毎に、予め
格納される。文節については、第２図（ロ）図示のよう
に、予め定めた文法要素の基準により、グループ種別付
与部１５がグイナミソクに付与する。This group ID information is stored in advance in the dictionary 14 for each word corresponding to each reading, for example, as shown in FIG. 2(A). As for phrases, as shown in FIG. 2(b), the group type assigning unit 15 assigns them to Guinamisoku based on predetermined standards of grammatical elements.

接続確率情報記憶部１６には、第３図図示のように、隣
接する前側のグループ種別（ＩＤ＝Ｇ１゜Ｇ２．Ｇ３・
・・）および隣接する後ろ側のグループ種別（ＩＤ＝Ｇ
１．Ｇ２，０３・・・）の各組合わせに対応して１文法
的または統計的に予め定められた接続のしやすさを示す
確率ｐＨ，Ｒ１２，・・・・・・。As shown in FIG. 3, the connection probability information storage unit 16 stores adjacent front group types (ID=G1°G2.G3.
) and the adjacent rear group type (ID=G
1. Probability pH, R12, . . . indicating ease of connection that is predetermined grammatically or statistically corresponding to each combination of G2, 03, . . .

Ｒ５５・・・が登録される。R55... is registered.

次に、第４図に示す読み（ＲＯ，Ｒ１，・・・Ｒ５）を
変換する場合の優先順位を定める接続確率計算について
説明する。関連する言葉のグループＩＤは、第２図（イ
）、（ロ）に示すようになっており、各グループ間の接
′ｔＣＧｔｌ率は、第３図に示すように定められている
ものとする。Next, connection probability calculation for determining the priority order when converting the readings (RO, R1, . . . R5) shown in FIG. 4 will be explained. The group IDs of related words are as shown in Figure 2 (a) and (b), and the contact CGtl rate between each group is determined as shown in Figure 3. .

読み（ＲＯ，Ｒ１，・・・Ｒ５）を、後方の読みについ
て順次カットしながら、従来の最長−成性による処理と
同様に辞書１４を検索し、すべての変換可能な表現をさ
がすと、第４図に示す（ａ）ないしくｆ）の表現が得ら
れる。While sequentially cutting the backward readings (RO, R1,...R5), we search the dictionary 14 in the same way as the conventional longest-form processing and search for all convertible expressions. Expressions (a) to f) shown in FIG. 4 are obtained.

第４図（ａ）の表現の場合、読みはＲＯ−Ｒ３とＲ４＋
Ｒ５とに分割され、前半の文節ＷＯのグループＩＤはＧ
１．後半の単語Ｗ１のグループＩＤはＧ３である。この
前側の０１および後側の０３について、第３図に示す接
続確率情報記憶部１６を参照し、接続確率を求めると、
全体の接続確率Ｐは、Ｒ１３となる。第４図（ｂ）の表
現の場合。In the case of the expression in Figure 4(a), the readings are RO-R3 and R4+
The group ID of the first half of the clause WO is G.
1. The group ID of the second half word W1 is G3. For 01 on the front side and 03 on the rear side, the connection probability is calculated by referring to the connection probability information storage unit 16 shown in FIG.
The overall connection probability P is R13. In the case of the expression shown in FIG. 4(b).

後の単語Ｗ２のグループＩＤは、Ｇ４であるので。The group ID of the next word W2 is G4.

接続確率ＰはＲ１４となる。The connection probability P is R14.

読みを３個以上に分割した場合には、すべての接読確率
を掛は合わせた結果の確率を、全体の接続確率とする。If the reading is divided into three or more parts, the probability of the result obtained by multiplying all the reading probabilities together is the overall connection probability.

即ち３例えば第４図（ｄ）の表現の場合、Ｇ５と０６と
の接続確率Ｐ５６と、Ｇ６とＧ３との接続確率Ｐ６３と
を掛は合わせた確率Ｐ＝Ｐ５６・Ｒ６３が、全体の接続
確率とされる。For example, in the case of the expression in FIG. 4(d), the probability P = P56 · R63, which is the product of the connection probability P56 between G5 and 06 and the connection probability P63 between G6 and G3, is the overall connection probability. It is said that

このようにして、計算された確率Ｐの高いものから順に
表現の候補としていけば、最も妥当な表現から順番に選
択されていくことになる。In this way, if expression candidates are selected in descending order of the calculated probability P, the most appropriate expressions will be selected in order.

ところで、読みが長くなってくると、読みの区切り方の
数も多くなってくるので、すべての可能な表現を求めて
から全体の確率計算を行ったのでは２時間がかかり過ぎ
て、不都合が生じる場合がある。そこで１次に第５図を
参照して説明するように１表現をサーチする各ノードで
、最終的な確率を予想し、ダイナミックに接続確率の予
想確率を求めて、予想確率の一番高い方へとサーチを進
めていくことにより、適切な表現を迅速に選択できるよ
うにすることができる。By the way, as the reading gets longer, the number of ways to divide the reading also increases, so if we calculated the overall probability after finding all possible expressions, it would take two hours, which would be inconvenient. may occur. Therefore, first, as explained with reference to Figure 5, the final probability is predicted at each node that searches for one expression, and the predicted probability of the connection probability is dynamically calculated, and the one with the highest predicted probability is By proceeding with the search, it is possible to quickly select an appropriate expression.

第５図に示す実施例では、「キョウハヨイテンキデス」
の読みを変換する。In the example shown in FIG.
Convert the reading.

予想確率は２例えば次のように定められる。The expected probability is determined as follows.

（予想確率）＝ＰａＸＰｂｘＰｃＰａ：現在注目している言葉までの接続確率。(Expected probability) = PaXPbxPc Pa: Connection probability to the currently focused word.

Ｐｂ：その言葉を前半の言葉とした場合の平均確率。Pb: Average probability when the word is the first half word.

ＰＣ：残りの読みの長さに対する確率。PC: Probability for remaining reading length.

この式における最後の項の「残りの読みの長さに対する
確率ＰｃＪは、残りの読みが長いほど小さい確率となる
ように、長さに対応して予め定められた数値である。Ｐ
ａは、最初の言葉（単語または文節）に対しては「１」
となる。２番目の言葉に対しては、１番目の言葉と２番
目の言葉の接続確率となる。ｎ番目の言葉に対しては、
Ｐａは。The last term in this equation, ``Probability PcJ for the length of the remaining reading,'' is a predetermined numerical value corresponding to the length so that the longer the remaining reading, the smaller the probability.
a is "1" for the first word (word or phrase)
becomes. For the second word, it is the connection probability between the first word and the second word. For the nth word,
Pa is.

ｎ番目までのすべての接続確率を掛は合わせたものとな
る。The product is the sum of all connection probabilities up to the nth one.

平均確率ｐｂは、第３図に示すＰＬ、Ｐ２．・・・。The average probability pb is calculated from PL, P2. ....

Ｐ５・・・のように、接続確率情報記憶部１６における
各前側のグループ種別に対応して記憶され、その行にお
ける接続確率の平均を考慮した値をもつ。P5 . . . is stored in correspondence with each previous group type in the connection probability information storage unit 16, and has a value that takes into account the average connection probability in that row.

「キョウハヨイテンキデス」の読みに対する第０番目の
サーチでは、第１のグループ「教派」。In the 0th search for the pronunciation of "Kyohayoitenkides", the first group is "sect".

第２のグループ「今日は２京は、経は、凶は、姿は」、
第３のグループ「今日１京、経、凶」、第４のグループ
「強」・・・・・・が、検索される。The second group: ``Today is 2 kyō, sutra, evil, appearance.''
The third group "Today Ikyo, Sutra, Kyoku", the fourth group "Strength", etc. are searched.

「教派」のグループ種別は（名車）であって。The group type of "sect" is (famous car).

これは、複合度の小さい一般名詞のグループである。第
２のグループのグループ種別［名主コは。This is a group of common nouns with low complexity. The group type of the second group [Nashukoha.

主格の名詞節であることを表している。第３のグループ
のグループ種別（名車）は、単独で用いられる一般名詞
のグループを表している。「強」のグループ種別は、（
＠小）であり、全般的に使用頻度の小さい接頭語のグル
ープである。This indicates that it is a nominative noun clause. The group type (famous car) of the third group represents a group of common nouns that are used alone. The group type of "Strong" is (
@small), which is a group of prefixes that are generally used less frequently.

今、　（名車）のグループの平均確率ｐｂが（１／４）
、残りの読み「ヨイテンキデス」の長さの確率Ｐｃが（
３／４）であるとすると、Ｐａは１であるので、「教派
」の予想接続確率は。Now, the average probability pb of the group of (famous cars) is (1/4)
, the probability Pc of the length of the remaining reading ``Yoitenkides'' is (
3/4), then Pa is 1, so the expected connection probability of "sect" is.

Ｐ　ａ　ｘ　Ｐ　ｂ　ｘ　Ｐ　ｃ　＝　Ｉ　Ｘ（１／４
）Ｘ（３／４）＝３／１６となる。P a x P b x P c = I X (1/4
)X(3/4)=3/16.

第２のグループ［名主］の平均確率が、　（３／４）で
あるとすると、第２のグループの予想される接続確率は
。Assuming that the average probability of the second group [name head] is (3/4), the expected connection probability of the second group is:

Ｐ　ａ　ｘ　Ｐ　ｂ　ｘ　Ｐ　ｃ　＝　Ｉ　Ｘ　（３／
４）　Ｘ　（３／４）　＝９／１６となる。P a x P b x P c = I X (3/
4) X (3/4) = 9/16.

同様な予想接続確率の計算により、上記第３のグループ
の予想確率は（３／１６）、上記第４のグループの予想
確率は（３／８）となっている。By similar calculation of expected connection probabilities, the expected probability of the third group is (3/16), and the expected probability of the fourth group is (3/8).

なお、第３のグループおよび第４のグループにおける残
りの読みの長さに対する確率Ｐｃは、もし確率計算の精
度を高くする場合には、上記第１゜第２のグループのも
のより値を小さく定めるのがよい。Note that the probability Pc for the remaining reading length in the third group and the fourth group should be set smaller than that in the first and second groups, if the accuracy of probability calculation is to be increased. It is better.

これまでの予想確率の中では、第２のグループの予想確
率が高いので、これが選択され７次に第０番目のサーチ
が行われる。ここで検索された「良い、善い、好い」の
グループは、グループ種別が［形体］であって、形容詞
または形容動詞の連体格を表し、「宵、酔、金蔵」のグ
ループは、その種別が（名車）である。図示省略してい
るが。Among the predicted probabilities up to now, the predicted probability of the second group is high, so this is selected and the 0th search is performed. The group "good, good, good" searched here has a group type of [form], which represents an adjective or an adjective verb, and the group "evening, drunkenness, kinzo" has a group type of [form]. (famous car). Although not shown.

他にも「世、夜、余１代（名車）」のグループや。There are also groups such as ``Se, Yoru, Yuichidai (Famous Cars)''.

「四（数字）」のグループなどが検索される。Groups such as "four (number)" are searched.

第０番目のサーチにおける「良い・・・」のグループに
ついての予想確率を求める場合、まず接続確率情報記憶
部１６を参照し、前の言葉のグループ種別［名主コと、
それに続く言葉のグループ種別し形体］との接続確率Ｐ
　ａ　（＝２／４）を求める。また、　［形体］のグル
ープの平均確率Ｐ　ｂ　（＝３／４）を求める。残りの
長さに関する確率Ｐｃが「１」であるとすると、このグ
ループの予想確率は。When calculating the predicted probability for the group "good..." in the 0th search, first refer to the connection probability information storage unit 16,
The connection probability P with the following word group type and shape]
Find a (=2/4). Also, find the average probability P b (=3/4) of the group of [shape]. Assuming that the probability Pc regarding the remaining length is "1", the expected probability of this group is.

Ｐａ　ｘＰ　ｂｘＰｃ＝（２／４）Ｘ（３／４）Ｘ　１
＝３／８となる。Pa xP bxPc=(2/4)X(3/4)X 1
= 3/8.

なお、「宵・・・」のグループの場合には、平均確率Ｐ
ｂが（１／４）であり、予想確率を計算すると、　　（
１／８）となる。従って、ここでは、「良い・・・」の
グループが優先して選択される。In addition, in the case of the group "Evening...", the average probability P
When b is (1/4) and the expected probability is calculated, (
1/8). Therefore, here, the "good..." group is selected with priority.

同様に第０番目のサーチで、「天気です、転記です」の
グループの接続確率、「転機」のグループの確率等が求
められる。「天気です、転記です」の場合　最後の読み
まで求められているので。Similarly, in the 0th search, the connection probability of the group "It's weather, it's transcription", the probability of the group "turning point", etc. are determined. In the case of ``It's the weather, here's the transcription,'' the last reading is required.

予想確率ではなく、実際の接続確率であって、その接続
確率は、それまでのＰａの全確率を掛は合わせたものと
なる。即ち、　　（２／４）　Ｘ　（３／４）　＝　３
　／　８である。平均確率Ｐｂ＝１．残り長さの確率Ｐ
ｃ−１と考えてもよい。「転機」の場合、予想確率の計
算結果は、　　（１／８）となっている。This is not an expected probability but an actual connection probability, and the connection probability is the sum of all the probabilities of Pa up to that point. That is, (2/4) x (3/4) = 3
/ 8. Average probability Pb=1. Probability of remaining length P
It may be considered c-1. In the case of "turning point", the expected probability calculation result is (1/8).

従って、「今日は１京は、・・・」のグループと。Therefore, with the group ``Today is 1 quintillion...''.

「良い、善い、好い」のグループと、「天気です。There is a group of ``good, good, good'' and a group of ``weather.''

転記です」のグループの接続による表現が優先的に選択
され、この中では１例えば学習によって順序付けられた
先頭のものから順番に選択されていく。以上のようにし
て、適切な表現が優先して選び出されることになる。特
に、この実施例によれば、接続確率に関する予想確率を
扱うので、妥当な表現のものを速く検索することができ
る。Expressions by connecting groups of ``transcription'' are selected preferentially, and among these, 1 is selected in order from the first one ordered by learning, for example. In the manner described above, appropriate expressions are selected with priority. In particular, according to this embodiment, since expected probabilities related to connection probabilities are handled, it is possible to quickly search for valid expressions.

〔Effect of the invention〕

以上説明したように２本発明によれば、複合語を含む連
文節変換において２前後の単語または文節の関係を評価
することによって、妥当な表現を選択するので、正しい
表現が早期に得られることになり、操作性が向上する。As explained above, according to the present invention, a valid expression is selected by evaluating the relationship between the words or clauses before and after the second phrase in continuous clause conversion including compound words, so that the correct expression can be obtained quickly. This improves operability.

[Brief explanation of drawings]

第１図は本発明の基本構成図、第２図はグループ種別の
付与について説明するための図、第３図は接続確率情報
記憶部の記憶内容を説明するための図、第４図は本発明
の一実施例に係る確率計算を説明する図、第５図は本発
明の他の一実施例に係る変換を説明するための図である
。図中、１０はキーボード、１１は入力制御部。１２は読み分割部、１３は辞書検索部、１４は辞書、１
５はグループ種別付与部、１６は接続確率情報記憶部、
１７は確率計算部、１８は変換選択部、１９は表示制御
部、２０はディスプレイを表す。特許出願人　　　富士通株式会社代理人弁理士　　森１）寛（外１名）木企Ｂ８の蟇本庫以巴第　　１　　図第ユ図FIG. 1 is a basic configuration diagram of the present invention, FIG. 2 is a diagram for explaining assignment of group types, FIG. 3 is a diagram for explaining the storage contents of the connection probability information storage unit, and FIG. FIG. 5 is a diagram for explaining probability calculation according to one embodiment of the invention, and FIG. 5 is a diagram for explaining conversion according to another embodiment of the invention. In the figure, 10 is a keyboard, and 11 is an input control unit. 12 is a reading division section, 13 is a dictionary search section, 14 is a dictionary, 1
5 is a group type assigning unit; 16 is a connection probability information storage unit;
17 is a probability calculation section, 18 is a conversion selection section, 19 is a display control section, and 20 is a display. Patent Applicant Fujitsu Ltd. Representative Patent Attorney Hiroshi Mori 1) (and 1 other person) Figure 1

Claims

[Claims] A Japanese document creation device that edits and creates a document by inputting Japanese pronunciations containing multiple words, searching a dictionary, and converting them into sentences containing kanji. The words are grouped in advance into groups having the same properties in terms of grammatical and semantic elements, and means (14, 15) for providing each word with group type information for identifying the group to which the word belongs (14, 15). ), a probability information storage means (16) for storing probability information indicating the ease of word connection between each of the groups, corresponding to the combination of the preceding and succeeding groups; Obtain the group type information of the group to which the word to be converted corresponds to, and extract the connection probability of each word to be converted from the probability information storage means (16) based on the group type information, and calculate the probability. 1. A continuous clause conversion processing method based on connection probability, comprising: means (17) for performing the calculation, and means (18) for determining priority of conversion based on the calculation result by the probability calculation means (17).