JPH0415503B2

JPH0415503B2 -

Info

Publication number: JPH0415503B2
Application number: JP57199271A
Authority: JP
Inventors: Tooru Kanamori; Makoto Sueda; Tadayasu Sugita
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-11-12
Filing date: 1982-11-12
Publication date: 1992-03-18
Also published as: JPS5990167A

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は自動翻訳や文字音声変換等のために、
文字で表現された文章から、その文章を構成する
個々の単語を切分ける単語の同定装置に関する。[Detailed Description of the Invention] [Technical Field of the Invention] The present invention is applicable to automatic translation, text-to-speech conversion, etc.
The present invention relates to a word identification device that separates individual words constituting a sentence from a sentence expressed in characters.

[Background of the invention]

自動翻訳、あるいは文字音声変換においては、
文章の解析が必須である。特に日本語のような単
語境界が不明確で、同字異議語、同字異音語の多
い言語では、単語境界の判定と語の同定が重要で
あり、かつ難しいものである。例えば、「畜産物
価格安定法」の単語構成では、イ畜産物・価格・安定法ロ畜産・物価・格安・定法ハ畜産・物価格・安定法などいくつか考えられるが、が正解であると判
定できなければならない。 In automatic translation or text-to-speech conversion,
Analysis of sentences is essential. Particularly in languages such as Japanese, where word boundaries are unclear and there are many homographs and allophones, determining word boundaries and identifying words is important and difficult. For example, in the word structure of ``Livestock Products Price Stabilization Act'', there are several possibilities such as ``Livestock Products, Prices and Stabilization Act'', ``Livestock Products, Prices, and Stabilization Act'', ``Livestock Products, Prices, Low Prices'', and ``Standard Law'', and ``Livestock Products, Prices, and Stabilization Act''. Must be able to judge.

[Prior art to the invention]

文章内の単語を同定するためには、従来、最長一致と又法的接続関係を用いて順次検索
判定する方法可能性のあるすべての候補文字列単位の組合
わせを抽出し、各々を評価関数を用いて評価
し、最良の組合わせを選択する方法が考えられていた。しかし、では最適解が得られない場合があり、処理も
複雑であつた（バツクトラツクが必要）。 Conventionally, to identify words in a sentence, the method of sequential search judgment using longest match and legal connection relationships extracts all possible combinations of candidate string units and applies evaluation functions to each one. A method was considered in which the best combination was selected using the evaluation method. However, the optimal solution may not be obtained in some cases, and the processing is complicated (requires backtracking).

また、では組合わせの数が膨大となり長い文
字列に適用することはできなかつた。 In addition, the number of combinations would be enormous, making it impossible to apply to long character strings.

[Purpose of the invention]

本発明は日本語のような単語境界の明確でない
文章を解析して、文章を構成する単語などの文字
列単位の境界を判定し、さらに各文字列単位を同
定する作業に関し、正確な解析を簡単かつ少ない
処理にて行うことを目的とする。 The present invention analyzes sentences such as Japanese with unclear word boundaries, determines the boundaries of character string units such as words that make up the sentence, and furthermore, performs accurate analysis for the task of identifying each character string unit. The purpose is to perform it easily and with less processing.

[Structure of the invention]

上記の目的を達成するため、本発明は複数の文
字列単位からなる文章を解析して、文章を構成す
る文字列単位を同定する文章解析装置において、
各文字列単位に付与され、各文字列単位の照合の
用いる文字コードを有する文字列単位辞書１と、入力文章を該文字列単位辞書の文字コードと照
合し、入力文章の構成単位となり得るすべての候
補文字列単位を抽出する辞書照合部２と、上記抽出したすべての候補文字に関して、周囲
の状況によらない評価である第１の評価点と、該
候補文字列までの合計評価点に該候補文字列に対
する他の候補文字列との文法的接続関係による評
価を加えた第２の評価点とを、各境界単位にその
都度求め、該求められた該第１の評価点と該第２
の評価点を用いて候補文字列と他の候補文字列と
の同定位置を、各境界単位に動的計画法に従いそ
の都度決定していき、文字列単位を同定するDP
部３とを具備することを特徴とする。 In order to achieve the above object, the present invention provides a text analysis device that analyzes a text consisting of a plurality of character string units and identifies the character string units that make up the text.
A character string unit dictionary 1 that has a character code assigned to each character string unit and used for matching each character string unit, and an input sentence is checked against the character code of the character string unit dictionary, and all possible constituent units of the input sentence are A dictionary collation unit 2 extracts candidate character string units, and for all the candidate characters extracted above, a first evaluation point that is an evaluation that does not depend on the surrounding situation, and a total evaluation point up to the candidate character string. A second evaluation score, which is obtained by adding an evaluation based on the grammatical connection relationship of the candidate character string with other candidate character strings, is obtained for each boundary unit each time, and the obtained first evaluation score and the second evaluation score are calculated for each boundary unit.
The identification position between a candidate character string and other candidate character strings is determined each time according to dynamic programming for each boundary unit using the evaluation points, and the DP identifies each character string unit.
It is characterized by comprising a part 3.

以下図面により詳細に説明する。 This will be explained in detail below with reference to the drawings.

第１図は具体的文章について、上記の従来例
による場合と、本発明による動的計画法（以下、
DP法と称す）による場合とを比較したものであ
る。文章解析装置には予め考えられ得るすべての
文字例単位（いわゆる単語の他に慣用的な単語
列、文字列も含む）が記憶されている。図の例で
は文字「島」には「シマ」と「トウ」との各々の
読みに対して１つづつの文字列が用意されてい
る。 Figure 1 shows the case using the conventional example described above and the dynamic programming method according to the present invention (hereinafter referred to as
This is a comparison with the case using the DP method. The text analysis device stores in advance all conceivable character example units (including conventional word strings and character strings in addition to so-called words). In the illustrated example, one character string is prepared for each of the readings of the character "shima" and "tou".

また、「から」という読みに対しては５種類の
文字列、例えば名詞としての「殻」、「唐」、各助
詞としての「〜から」などが用意されている。 Furthermore, five types of character strings are prepared for the pronunciation of ``kara'', such as ``kiku'' as a noun, ``kara'' as a noun, and ``from'' as each particle.

さらに「か」の文字単独についても６種類の文
字列（１文字のものも含めて文字列と称する）、
例えば疑問を表わす「〜か？」、選択の意を表わ
す「〜か〜か」、反問を表わす「か」などが用意
されている。他の文字についても同様である。 Furthermore, there are 6 types of character strings for the single character "ka" (including single character strings, which are referred to as character strings),
For example, there are words such as ``--ka?'' to express a question, ``--ka'' to express a choice, and ``ka'' to express a counter-question. The same applies to other characters.

図示Ａの部分は従来例で述べたように、すべ
ての組合わせについて評価するとした場合の組合
わせの数を示しており、10万通り以上の組合わせ
となり、実用的でないことが判る。 As described in the conventional example, the part A in the figure shows the number of combinations when all combinations are evaluated, and it can be seen that there are more than 100,000 combinations, which is not practical.

図示Ｂの部分は本発明のDP法による場合で、
文頭、文末（読点）を示す文字列を含めても、わ
ずか288回の処理で済むことが判る。 Part B in the diagram shows the case using the DP method of the present invention.
Even if you include the character strings that indicate the beginning and end of a sentence (comma), it only takes 288 processing times.

[Embodiments of the invention]

第２図は本発明のDP法の概念を説明する図で
あり、ある語境界に注目し、その境界で終わる文
字列がイ、ロの２種類、その境界から始まる文字
列がα，β，γの３種類ある場合を示している。 Figure 2 is a diagram explaining the concept of the DP method of the present invention. Focusing on a certain word boundary, the character strings that end at that boundary are of two types, A and B, and the character strings that start from that boundary are of α, β, The case where there are three types of γ is shown.

ある文字例（Ｘとする）を選択した場合、Ｘま
での合計評価点をｇ（Ｘ）とし、Ｘに関して周囲
状況によらない評価を（Ｘ）、他の文字列Ｙと
の接続関係による評価をＣ（Ｘ，Ｙ）とする。 If you select a certain character example (let it be X), the total evaluation score up to X is g(X), the evaluation of X that does not depend on the surrounding situation is (X), and the evaluation based on the connection relationship with other character strings Y. Let be C(X,Y).

このとき第２図に示す境界における左側から右
側へ評価を進める場合に、以下の如くの処理を行
う。 At this time, when the evaluation proceeds from the left side to the right side of the boundary shown in FIG. 2, the following processing is performed.

ｇ(イ)＋Ｃ（イ，α）ｇ（α）＝（α）＋MAX ｇ(ロ)＋Ｃ（ロ，α）ｇ(イ)＋Ｃ（イ，β）ｇ（β）＝（α）＋MAX ｇ(ロ)＋Ｃ（ロ，β）ｇ(イ)＋Ｃ（イ，γ）ｇ（γ）＝（α）＋MAX ｇ(ロ)＋Ｃ（ロ，γ）尚、MAX｛｝はカツコ内の最大値をとるこ
とを意味する。 g(a)+C(a,α) g(α)=(α)+MAX g(b)+C(b,α) g(a)+C(a,β) g(β)=(α)+MAX g( B)+C(B,β) g(A)+C(B,γ) g(γ)=(α)+MAX g(B)+C(B,γ) Note that MAX { } takes the maximum value within the bracket. It means that.

このように左から（文の頭から）順に各文字列
について、自分自身の評価と、１つ前の文字列と
の接続関係による評価とから、その場所における
自分の評価を求めていくことを各境界において行
つていく。 In this way, for each character string from the left (from the beginning of the sentence), calculate your own evaluation at that location from your own evaluation and the evaluation based on the connection relationship with the previous string. Go to each boundary.

尚、第１図の矢印Ｃの如く、一部の文字列にと
つてのみ境界となる位置においてもその文字列に
ついて上記と同様の処理をすればよい。 Incidentally, even at a position that is a boundary for only a part of a character string, as indicated by arrow C in FIG. 1, the same processing as described above may be performed for that character string.

また評価点のとり方によつてはMAX｛｝の
代わりにMIN｛｝を用いてもよい。 Also, depending on how evaluation points are taken, MIN{ } may be used instead of MAX{ }.

また本来の文字列を加えて、文頭及び文末（読
点があれば不要）を示す文字列を考慮する。 In addition to the original character strings, consider character strings that indicate the beginning and end of a sentence (unnecessary if there are commas).

このようにして次々と評価を求めていくと、最
後の文字列（読点）の評価を求める際に、その直
前にあるいくつかの文字列候補（第１図の例では
10通りの候補）の中のどれとつながる場合が最大
値となるかが判る。よつてその最大値を与える文
字列候補を順にたどつていけば、最適な文字列単
位の組合わせが得られることとなる。 When evaluations are obtained one after another in this way, when obtaining an evaluation for the last character string (comma), several character string candidates immediately before it (in the example in Figure 1) are evaluated.
You can find out which of the 10 candidates) the connection will have the maximum value. Therefore, by sequentially tracing the character string candidates that give the maximum value, the optimal combination of character string units can be obtained.

次に第３図，第４図を用いて、本発明のDP法
を実現する具体的一実施例について述べる。 Next, a specific embodiment for realizing the DP method of the present invention will be described using FIGS. 3 and 4.

第３図は本発明の一実施例の概略ブロツク図で
あり、１は文字列単位辞書、２は辞書照合部、３
はDP部である。 FIG. 3 is a schematic block diagram of an embodiment of the present invention, in which 1 is a character string unit dictionary, 2 is a dictionary collation unit, and 3 is a block diagram of an embodiment of the present invention.
is the DP department.

文字列単位辞書１には、各文字列単位に対する
文字列単位の表記（照合に用いる文字コード）の
他DP部３で用いる接続関係情報（右側及び左側
の接続関係の識別を番号で表したもの）、周囲の
文字列によらず定まる評価点、文字列単位番号等
をあらかじめ設定しておく。 The string unit dictionary 1 contains not only string unit notation for each string unit (character code used for matching), but also connection relationship information used by the DP unit 3 (identification of right and left connection relationships expressed by numbers). ), evaluation points that are determined regardless of surrounding character strings, character string unit numbers, etc. are set in advance.

辞書照合部２は入力文章を文字列単位辞書１と
照合することにより、入力文章の構成単位となり
得るすべての候補文字単位を抽出し、その結果を
DP部に設定する。 The dictionary matching unit 2 extracts all candidate character units that can be constituent units of the input text by comparing the input text with the character string unit dictionary 1, and uses the results as
Set in the DP section.

そしてDP部において、第２図に関連して説明
したことい評価計算によつて、どの文字列単位の
組合わせが最も好ましいかを決定する。 Then, in the DP section, which combination of character string units is the most preferable is determined by the evaluation calculation described in connection with FIG.

尚、辞書照合部２の機能および構成は従来技術
と同様でよいので、以下にはDP部３について詳
述する。 Note that the function and configuration of the dictionary collation section 2 may be the same as those of the prior art, so the DP section 3 will be described in detail below.

第４図はDP部３の一実施例ブロツク図である。 FIG. 4 is a block diagram of one embodiment of the DP unit 3.

各部の説明は以下の通りである。 The explanation of each part is as follows.

WM：候補文字列単位の情報を格納するメモリで
あり、以下のＡ〜Ｐの各部からなり、WHAに
WM内アドレスを入力し、Ｒに信号を与えるこ
とにより１度に１文字列単位の各部の情報を出
力し、Ｗに信号を与えることにより、Ｇ及びＰ
に情報を読み込み記憶する。Ａ，Ｂ，Ｖ，Ｎは
辞書照合部により設定される。Ｇ部は辞書照合
部により０に初期設定される。WM: A memory that stores information in candidate character string units, and consists of the following parts A to P.
By inputting the address in WM and giving a signal to R, the information of each part is output one character string at a time, and by giving a signal to W, G and P
reads and stores information. A, B, V, and N are set by the dictionary matching section. The G section is initialized to 0 by the dictionary checking section.

Ａ：文字列単位（以下単語と略す）の前方接続関
係の種別を格納している。A: Stores the type of forward connection relationship in character string units (hereinafter abbreviated as words).

Ｂ：単語の後方接続関係の種別を格納している。B: Stores the type of backward connection relationship of words.

Ｖ：単語の周囲の文字列によらず定まる評価点
（xi）を格納している。V: Stores the evaluation score (xi) that is determined regardless of the character strings surrounding the word.

Ｎ：単語の単語番号を格納している。N: Stores the word number of the word.

Ｇ：その単語までの総合評価点Ｇ（xi）を格納し
ている。G: Stores the overall evaluation score G(xi) up to that word.

Ｐ：その単語までの最も良い評価点を与える１つ
前の単語のWM内アドレスを格納する。P: Stores the address in WM of the previous word that gives the best evaluation score up to that word.

EWM：Ｃ３およびＣ１の内容の上位及び下位の
アドレスとしてアクセスされるメモリであり、
Ｃ３にて示される境界にて終了する単語の情報
の格納されているWM内アドレスが辞書照合部
により設定されている。EWM: A memory accessed as upper and lower addresses of the contents of C3 and C1,
The address in the WM where the information of the word ending at the boundary indicated by C3 is stored is set by the dictionary matching unit.

BWM：EWMと同様に、Ｃ３にて示される境界
にて始まる単語の情報の格納されているWM内
アドレスが辞書照合部により設定されている。BWM: Similar to EWM, the address in WM where information on words starting at the boundary indicated by C3 is stored is set by the dictionary matching unit.

Ｃ１：Ｃ１Ｕに信号が与えられると１増加し、Ｃ
１Ｃに信号を与えられる０にクリアされるカウ
ンタであり、ある境界にて終了単語のEWM内
の順番を示す。C1: When a signal is given to C1U, it increases by 1, and C
A counter that is cleared to 0 when signaled to 1C, indicating the order in the EWM of the ending word at a certain boundary.

Ｃ２：Ｃ２Ｃに信号が与えられると１増加し、Ｃ
２Ｃに信号を与えられる０にクリアされるカウ
ンタであり、ある境界にて終了単語のBWM内
の順番を示す。C2: When a signal is given to C2C, it increases by 1, and C2
2C is a counter that is cleared to 0 and is signaled to indicate the order in the BWM of the ending word at a certain boundary.

Ｃ３：Ｃ３Ｃに信号が与えられると１増加し、Ｃ
３Ｃ信号を与えられると０にクリヤされるカウ
ンタであり、境界の番号を示す。C3: When a signal is given to C3C, it increases by 1, and C3C
This counter is cleared to 0 when the 3C signal is applied, and indicates the boundary number.

r₅：１つの文章に対する境界番号の上限を示すレ
ジスタであり、辞書照合により設定される。 _r5 : A register indicating the upper limit of the boundary number for one sentence, and is set by dictionary comparison.

COMP₄：Ｃ３及びr₅の値を比較し、Ｃ３＞r₅の
場合、Ｃ３Ｅの信号を発する比較器。COMP ₄ : A comparator that compares the values of C3 and _r5 and issues a signal of C3E if C3> _r5 .

COPM₁：EWMより読み出される出力が０、即
ち１つの境界に対する単語に対する単語の終わ
りを表わす符号であるか否かをチエツクするロ
ジツクで、０の場合Ｃ１Ｅの信号を発する COMP₃：COMP₁と同様にBWMよりの出力をチ
エツクし、Ｃ２Ｅを発するロジツク。COPM ₁ : Logic that checks whether the output read from EWM is 0, that is, the code representing the end of a word for one boundary, and if it is 0, it issues a C1E signal.COMP ₃ : Same as COMP ₁ Logic that checks the output from BWM and issues C2E.

r₄：判定結果を読み出すためにWM内のアドレス
を一時格納するレジスタ。r ₄ : Register that temporarily stores the address in WM to read the judgment result.

MPX：Ｓに与えられた信号によりWMAをEWM
の出力又はr₄の出力に切換えるアドレスマルチ
プレクサ。MPX: WMA to EWM by the signal given to S
Address multiplexer to switch between the outputs of R4 and _R4 .

r₁：WMのＡよりBWMによりアクセスされたあ
る境界より始まる単語の前方接続関係の種別を
保持するレジスタで、r₁L信号によりロードを
行う。r ₁ : A register that holds the type of forward connection relationship of words starting from a certain boundary accessed by BWM from A of WM, and is loaded by the r ₁ L signal.

Ｔ：ある境界より始まる単語の前方接続関係及び
その境界で終わる単語の後方接続関係により定
まる接続関係の評点を定める定数メモリであ
り、r₁及びEWMによりアクセスされるWMの
Ｂ部の値によりアクセスされ、１つの評点を出
力するものである。T: A constant memory that determines the score of the connection relationship determined by the forward connection relationship of words starting from a certain boundary and the backward connection relationship of words ending at that boundary, and is accessed by the value of part B of WM accessed by r ₁ and EWM. and outputs one score.

r₂：BWMによりアクセスされるWMのＶ部の値
を保持するレジスタで、r₂L信号によりロード
を行う。r ₂ : A register that holds the value of the V section of WM accessed by BWM, and is loaded by the r ₂ L signal.

ADD：Ｔの出力、r₂及びEWMによりアクセスさ
れるWMのＧ部の値を加算する加算器である。ADD: An adder that adds the output of T, _r2 , and the value of the G part of WM accessed by EWM.

r₆：ある境界より始まるある単語に対する一連の
処理中、ADDの出力の最大値を保持するレジ
スタであり、r₆C信号を入力することによりク
リアされる。 _r6 : A register that holds the maximum value of the ADD output during a series of processing for a certain word starting from a certain boundary, and is cleared by inputting the _r6C signal.

r₃：ある境界より始まるある単語に対する一連の
処理中、ADDの出力の最大値を与える単語情
報のWM内アドレスを保持するレジスタであ
る。 _r3 : A register that holds the address in WM of word information that gives the maximum value of the ADD output during a series of processing for a certain word starting from a certain boundary.

COMP₃：ADDの出力と、r₆の出力とを比較する
比較器であり、ADD出力＞r₆出力の場合、
r₃₆Lの信号を出力してr₆にADDの出力、r₃に
EWMの出力をロードさせる。r₃₆Lに挿入され
ているゲートは、CL信号によつて同期をとる
ためのものである。COMP ₃ : Comparator that compares the output of ADD and the output of r _6. If ADD output > r ₆ output,
Output the r ₃₆ L signal, output the ADD to r ₆ , and output the ADD signal to r ₃ .
Load EWM output. The gate inserted in _r36L is for synchronization with the CL signal.

TMG：Ｃ１Ｅ，Ｃ２Ｅ，Ｃ３Ｅを入力し、Ｃ１
Ｕ，Ｃ１Ｃ，Ｃ２Ｕ，Ｃ２Ｃ，Ｓ，Ｒ，Ｗ，
r₁L，r₂L，r₆C，r₄L，Ｃ３Ｃ，Ｃ３Ｕを出力す
るタイミング制御回路であり、以下に述べる動
作手順に従い、各信号の制御を行うものであ
る。TMG: Enter C1E, C2E, C3E, C1
U, C1C, C2U, C2C, S, R, W,
This is a timing control circuit that outputs r ₁ L, r ₂ L, r ₆ C, r ₄ L, C3C, and C3U, and controls each signal according to the operating procedure described below.

第５図はEWMの内容の一例を第１図の例に沿
つて示したもので、Ｘ１，Ｙ１〜Ｙ２，Ｚ１〜Ｚ
６，ZZ１〜ZZ９等はWM内アドレスを意味する。
そして例えばＣ３＝0011の境界が第１図の矢印ｄ
の位置に相当する。BWMについても同様である
ので省略する。 Figure 5 shows an example of the contents of EWM along the example in Figure 1, with X1, Y1~Y2, Z1~Z
6, ZZ1 to ZZ9, etc. mean addresses within WM.
For example, the boundary of C3=0011 is the arrow d in Figure 1.
corresponds to the position of The same applies to BWM, so it will be omitted.

以下に１つの文章を解析するための手順を示
す。 The procedure for analyzing one sentence is shown below.

尚、本例では、単語Ｘ，Ｙ間の接続関係による
評点Ｃ（Ｘ，Ｙ）として第４図における定数テー
ブルＴの出力を用い、かつ、ｇ（X₁）＋Ｃ（Ｘ，Ｙ）の計算手順で（Ｙ）＋max ｇ（X₂）＋Ｃ（X₂，Ｙ）Ｖ（Ｙ）＋Ｇ（X₁）＋Ｔ（X₁，Ｙ） max Ｖ（Ｙ）＋Ｇ（X₁）＋Ｔ（X₁，Ｙ）としている。 In this example, the output of the constant table T in Figure 4 is used as the score C (X, Y) based on the connection relationship between words X and Y, and the calculation of g (X ₁ ) + C (X, Y) In the procedure (Y) + max g (X ₂ ) + C (X ₂ , Y) V (Y) + G (X ₁ ) + T (X ₁ , Y) max V (Y) + G (X ₁ ) + T (X ₁ , Y ).

また、r₅、EWM，BWM，Ａ，Ｂ，Ｖ，Ｎ，
Ｇは各項で説明したように辞書照合部２により初
期設定されているものとする。またWMのアドレ
ス０には、考え得る最も小さいADD出力を与え
るようなＢ，Ｖ，Ｇが格納されているものとす
る。 Also, r ₅ , EWM, BWM, A, B, V, N,
It is assumed that G has been initialized by the dictionary matching unit 2 as explained in each section. It is also assumed that B, V, and G that give the smallest possible ADD output are stored at address 0 of WM.

(1) Ｃ３Ｃ信号を発し、Ｃ３（境界番号）を０ク
リアする。(1) Issue the C3C signal and clear C3 (boundary number) to 0.

(2) Ｃ２Ｃ信号を発し、Ｃ２（その境界より始ま
る単語のBWM内の順番）を０クリアする。(2) Issue the C2C signal and clear C2 (the order in BWM of words starting from that boundary) to 0.

(3) Ｓ信号を発し、MPXをBWMの出力に切り
かえる。(3) Emit the S signal and switch MPX to BWM output.

(4) Ｒ信号を発し、WMよりその境界より始まる
Ｃ２にて示される単語のＡ及びＶを出力させ
る。(4) Generate the R signal to cause WM to output A and V of the word indicated by C2 starting from that boundary.

(5) r₁L，r₂L信号を発し、r₁及びr₂にＡ及びＶの
出力をロードする。(5) Generate r ₁ L, r ₂ L signals and load the outputs of A and V into r ₁ and r ₂ .

(6) Ｃ１Ｃ信号を発し、Ｃ１（その境界にて終了
する単語のEWM内の順番）を０クリアする。(6) Issue the C1C signal and clear C1 (the order in the EWM of words that end at that boundary) to 0.

(7) Ｓ信号を発し、MPXをEWMに出力に切り
かえる。(7) Emit S signal and switch MPX to EWM output.

(8) r₆C信号を発し、r₆（その境界より始まる１単
語に対するADDの出力の最大値）を０クリア
する。(8) Issue the r ₆ C signal and clear r ₆ (the maximum value of ADD output for one word starting from that boundary) to 0.

(9) 一定周期でCIEに信号が現れるまで、CIU及
びCL信号を発生させることにより、その境界
より始まる１単語に対するADD出力の最大値
及びその最大値を与える単語情報のWM内アド
レスをそれぞれr₆，r₃に格納させる。(9) By generating CIU and CL signals until a signal appears in CIE at a constant cycle, the maximum value of the ADD output for one word starting from the boundary and the address in WM of the word information that gives the maximum value are r ₆ , stored in _r3 .

(10) Ｓ信号を発し、MPXの出力をBWMの出力
に切りかえる。(10) Emit the S signal and switch the MPX output to the BWM output.

(11) Ｗ信号を発し、r₆及びr₃の内容をＧ及びＰ
に書き込む。(11) Emit the W signal and convert the contents of r ₆ and r ₃ to G and P.
write to.

(12) Ｒ，r₄L信号に発し、書き込んだＰの内容
をr₄にロードする。(12) Issue the R, r ₄ L signal and load the written contents of P into r ₄ .

(13) Ｃ２Ｕ信号を発し、Ｃ２Ｕを１増加させ
る。(13) Emit C2U signal and increase C2U by 1.

(14) (4)〜（13）までの手順をＣ２Ｅ信号が現れ
るまで繰り返す。(14) Repeat steps (4) to (13) until the C2E signal appears.

(15) Ｃ３Ｕ信号を発し、Ｃ３Ｕを１増加させ
る。(15) Emit C3U signal and increase C3U by 1.

(16) (2)〜（15）までの手順をＣ３Ｅに信号が現
れるまで繰り返す。(16) Repeat steps (2) to (15) until a signal appears on C3E.

(17) Ｓ信号を発し、MPXの出力をr₄に出力に切
りかえる。(17) Emit the S signal and switch the MPX output to _r4 .

(18) Ｒ信号を発し、Ｎを出力する。(18) Emit R signal and output N.

(19) r₄L信号を発し、Ｎを出力する。(19) r ₄ Emit L signal and output N.

(20) （18）、（19）を繰り返すことにより、順次
判定結果である単語情報Ｎを読み出す。(20) By repeating (18) and (19), the word information N that is the determination result is sequentially read out.

以上の手順により、判定結果を文章の後方の単
語より順次出力する。 Through the above procedure, the determination results are output in order from the last words in the sentence.

上記の実施例では、各メモリやレジスタ等を専
用のハードウエアとして設けるものとして説明し
たが、汎用計算機を用いてソフトウエアにより実
現することも可能である。第６図に処理フローを
示す。 In the above embodiment, each memory, register, etc. is provided as dedicated hardware, but it is also possible to implement it by software using a general-purpose computer. FIG. 6 shows the processing flow.

上記、周囲の文字列によらぬ固有の評価として
は、候補文字列をかな書きした場合の文字数（発
声した場合の拍数、音節数も含む）又は接頭語、
接尾語を加えた文字数に対応した評価点を用いる
ことができる。 As for the above-mentioned unique evaluation that does not depend on the surrounding character strings, the number of characters when the candidate character string is written in kana (including the number of beats and syllables when uttered) or the prefix,
An evaluation point corresponding to the number of characters including the suffix can be used.

或いは、候補文字列の一般的な（或いは使用分
野を限つた）統計的出現頻度（使用頻度）情報を
利用してもよい。さらには自立語、接辞語、等の
区別を、品詞による区別等を用いてもよい。或い
はそれらの組合わせであつてもよい。 Alternatively, general (or limited to a field of use) statistical appearance frequency (usage frequency) information of candidate character strings may be used. Furthermore, distinctions between independent words, affix words, etc. may be made using parts of speech. Or it may be a combination thereof.

また上記、接続関係による評価としては、前後
の品詞の組合わせの頻度情報、語幹・語尾の接続
頻度、接辞との接続頻度、あるいは文頭、分末に
なる頻度、数字や助数詞との接続頻度などを利用
することができる。あるいは文章全体のどの辺の
位置に置かれる率が高いか等の情報も利用でき
る。 In addition, the above-mentioned evaluation based on connection relationships includes frequency information of combinations of parts of speech before and after, frequency of connection between stems and endings, frequency of connections with affixes, frequency of connections at the beginning or end of sentences, frequency of connections with numbers and particles, etc. can be used. Alternatively, information such as which side of the entire sentence is most likely to be placed can also be used.

また上記の例では、文頭から文末に向つて評価
計算を行なつたが、文末から文頭に向つて行うこ
ともできる。 Further, in the above example, the evaluation calculation was performed from the beginning of the sentence to the end of the sentence, but it can also be performed from the end of the sentence to the beginning of the sentence.

さらに、いくつかの部分に分けて行なつてから
全体を統合したり、両方向の処理を組合わせても
よい。 Furthermore, processing may be performed in several parts and then integrated as a whole, or processing in both directions may be combined.

〔Effect of the invention〕

以上の如く本発明によれば、候補文字列の妥当
性を数値で表現することによりDP法が容易に利
用でき、そのため処理が単純でかつ処理量がきわ
めて少なくて済み、かつ最適解を求めることがで
きる。 As described above, according to the present invention, the DP method can be easily used by expressing the validity of candidate character strings numerically, and therefore the processing is simple and the amount of processing is extremely small, and the optimal solution can be found. I can do it.

[Brief explanation of drawings]

第１図は本発明の従来例とを比較する説明図、
第２図は本発明の概念図、第３図は本発明の概略
ブロツク図、第４図は本発明の一実施例ブロツク
図、第５図はEWMの内容の一具体例を示す図、
第６図は本発明の一実施例処理フローチヤートで
ある。第３図において、１は文字列単位辞書、２は辞
書照合部、３はDP部である。 FIG. 1 is an explanatory diagram comparing the present invention with a conventional example,
FIG. 2 is a conceptual diagram of the present invention, FIG. 3 is a schematic block diagram of the present invention, FIG. 4 is a block diagram of an embodiment of the present invention, and FIG. 5 is a diagram showing a specific example of the contents of EWM.
FIG. 6 is a processing flowchart of an embodiment of the present invention. In FIG. 3, 1 is a character string unit dictionary, 2 is a dictionary matching section, and 3 is a DP section.

Claims

[Claims] 1. Analyzing a sentence consisting of a plurality of character string units,
In a text analysis device that identifies character string units constituting a sentence, a character string unit dictionary 1 has a character code assigned to each character string unit and used for matching each character string unit.
a dictionary matching unit 2 that matches the input sentence with the character code of the character string unit dictionary and extracts all candidate character string units that can be constituent units of the input sentence; a first evaluation point, which is an evaluation that does not depend on the situation;
A second evaluation score, which is the sum of the evaluation points up to the candidate character string and an evaluation based on the grammatical connection relationship of the candidate character string with other candidate character strings, is determined for each boundary unit each time, and the obtained result is calculated for each boundary unit. Using the first evaluation point and the second evaluation point, the fixed position between the candidate character string and other candidate character strings is determined each time according to dynamic programming for each boundary, and the character string identify units
A text analysis device characterized by comprising a DP unit 3. 2. The text analysis device according to claim 1, wherein part or all of the first evaluation score is information corresponding to the number of characters when the candidate character string unit is written in kana. 3. The text analysis device according to claim 1, wherein part or all of the first evaluation score is usage frequency information for each candidate character string. 4. The text analysis device according to claim 1, wherein part or all of the first evaluation score is part-of-speech information for each candidate character string. 5. The text analysis device according to claims 1 to 4, wherein a part or all of the second evaluation score is positional information of the candidate character string in the text.