JPH04191959A

JPH04191959A - Paragraph segmenting device

Info

Publication number: JPH04191959A
Application number: JP2324937A
Authority: JP
Inventors: Shigeki Kuga; 空閑　茂起
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1990-11-26
Filing date: 1990-11-26
Publication date: 1992-07-10

Abstract

PURPOSE:To simplify a device and to speed up a processing by successively storing character type decision result and storing a sentence while inserting the paragraph segmentation position information in a read sentence based on the changed point of stored character type and punctuations. CONSTITUTION:When a unit of segmentation processing, for example, a sentence is segmented from a sentence stored in a sentence storage means 1, the type of character for each character constituting the sentence is decided by a character type decision means 4, and the decision result is successively stored in a decision result storage means 5. Then a paragraph segmentation position insertion means 6 inserts information indicating the paragraph segmentation position at the changing point, for example, from KANA(Japanese syllabary) to KANJI (Chinese character) against the decided character type string, and the paragraph segmentation position information stores the inserted sentence in a storage means 7 to output the result to an output means 8. Thus, the constitution of the device is simplified since a dictionary is not used, and the paragraph segmentation processing and keyword retrieval processing are speeded up.

Description

【発明の詳細な説明】（イ）産業上の利用分野この発明は文節切り出し装置に関し、詳しくはワードプ
ロセッサ、翻訳装置、校正装置、データベースを利用す
る装置等のように言語処理を行う装置に好適な文節切り
出し装置に関する。DETAILED DESCRIPTION OF THE INVENTION (a) Industrial Application Field The present invention relates to a phrase segmentation device, and more specifically, it is suitable for language processing devices such as word processors, translation devices, proofreading devices, devices using databases, etc. This invention relates to a phrase extraction device.

（ロ）従来の技術文節区切りの情報が挿入されていない変換済みの日本語
文書から、例えば翻訳や校正のために文節を切り出すた
めには、従来、自立語辞書、付属語辞書、接辞辞書等の
辞書類と、それらの要素の接続関係を表すテーブル、文
法テーブル等のテーブル類をそれぞれ参照しながら文節
の切り出しを行っていた。(b) Conventional technology In order to extract phrases from a converted Japanese document in which phrase separation information has not been inserted, for example, for translation or proofreading, it is necessary to use independent word dictionaries, attached word dictionaries, affix dictionaries, etc. The phrases were extracted by referring to dictionaries, tables showing the connections between these elements, grammar tables, and other tables.

（ハ）発明が解決しようとする課題このような、辞書類、テーブル類を利用する従来の文節
切り出し装置においては、（１）それらの辞書、テーブ
ルを蓄積するために大量の記憶装置が必要になる。（２
）辞書検索またはテーブル検索を行うために文節切り出
し処理の時間が長くなる。(c) Problems to be solved by the invention In the conventional phrase extraction device that uses dictionaries and tables, (1) a large amount of storage device is required to store the dictionaries and tables; Become. (2
) The phrase extraction process takes a long time to perform a dictionary search or table search.

（３）文節切り出しのための制御プログラムが複雑にな
る等の問題があった。(3) There were problems such as a complicated control program for segmentation.

この発明は以上の事情を考慮してなされたもので、上記
問題を解消しうる文節切り出し装置を掛供する。The present invention has been made in consideration of the above-mentioned circumstances, and provides a phrase cutting device that can solve the above-mentioned problems.

（ニ）課題を解決するための手段第１図はこの発明の基本構成を明示するブロック図であ
る。同図において、この発明は、文章を蓄積する文章蓄
積手段１と、文章蓄積手段ｌから所望の文を読み出すた
めの指示を行う指示手段２と、指示された文を文意蓄積
手段１から読み出す読出手段３と、読み出した文につい
て１文字ずつ漢字、ひらがな、カタカナなどの字種を判
別するとともに句点を判別する字種判別手段４と、字種
判別手段４による判別結果を順次蓄積する判別結果蓄積
手段５と、判別結果蓄積手段５に蓄積された字種の変移
点、句点に基づいて、読み出しに文中に文節の切り出し
位置情報を挿入する切り出し位置挿入手段６と、切り出
し位置情報が挿入された文を記憶する記憶手段７と、切
り出し位置情報が挿入されε文を出力する出力手段８と
からなる文節切り出し装置である。(d) Means for Solving the Problems FIG. 1 is a block diagram showing the basic configuration of the present invention. In the figure, the present invention includes a sentence storage means 1 for storing sentences, an instruction means 2 for instructing to read a desired sentence from the sentence storage means 1, and a command means 2 for reading the instructed sentence from the sentence meaning storage means 1. A reading means 3, a character type discriminating means 4 that discriminates the character type of each read sentence, such as kanji, hiragana, katakana, etc., as well as a period, and a discrimination result that sequentially accumulates the discrimination results by the character type discriminating means 4. an accumulating means 5; a cutting-out position inserting means 6 for inserting segment cut-out position information into a sentence in reading based on the transition points and punctuation points of character types accumulated in the discrimination result accumulating means 5; This phrase extraction device is composed of a storage means 7 for storing the extracted sentence, and an output means 8 for outputting the ε sentence into which the extraction position information is inserted.

この発明における文節切り出し装置は、ワードプロセッ
サ、翻訳装置、校正装置、データベースを利用する装置
等に適用することができ、また、文章を音声出力する装
置においてら文節切り出し処理が必要なため、この発明
を適用することができる。The phrase extraction device of the present invention can be applied to word processors, translation devices, proofreading devices, devices that use databases, etc. Furthermore, since phrase extraction processing is required in devices that output sentences as audio, the phrase extraction device of the present invention is applicable to word processors, translation devices, proofreading devices, devices that use databases, etc. Can be applied.

（ホ）作用この発明に従えば、文章蓄積手段ｌに蓄積されている文
章から、切り出し処理の単位、例えば１文を切り出すと
、その文を構成する各文字の字種が字種判別手段４によ
って判別され、その判別結果が判別結果蓄積手段５に順
次蓄積される。次いで切り出し位置挿入手段６は、その
判別された字種列に対して、例えば仮名から漢字の変わ
り目に文節の切り出し位置を表す情報を挿入し、切り出
し位置情報が挿入された文を記憶手段７に記憶させ、そ
の結果を出力手段８に出力するよう作用する。(e) Operation According to the present invention, when a unit of extraction processing, for example, one sentence, is extracted from the sentences stored in the sentence storage means l, the character type of each character constituting the sentence is determined by the character type determination means 4. The determination results are sequentially stored in the determination result storage means 5. Next, the cut-out position insertion means 6 inserts information representing the cut-out position of the phrase at the transition from kana to kanji, for example, to the determined character type string, and stores the sentence into which the cut-out position information has been inserted into the storage means 7. It acts to store the results and output them to the output means 8.

（へ）実施例以下図に示す実施例に基づいてこの発明を詳述する。な
お、これによってこの発明は限定されるものではない。(F) EXAMPLES The present invention will be described in detail below based on examples shown in the figures. Note that this invention is not limited by this.

第２図はこの発明をワードプロセッサに適用した第１の
実施例を示す構成図である。同図において９はワードプ
ロセッサ本体である。ｌＯは文章蓄積手段としての文章
蓄積装置であり、外１記壇装置としての例えばフローｔ
ピーディスク、ハードディスク、あるいは内部記憶装置
としての例えばＲＡＭ、あるいはその他の蓄積装置とし
ての例えばデータベース等から構成することができ、か
な漢字交じり文からなる日本語文書が蓄積されている。FIG. 2 is a block diagram showing a first embodiment in which the present invention is applied to a word processor. In the figure, 9 is the main body of the word processor. lO is a text storage device as a text storage means, and an external platform device such as flow t
It can be configured from a PC disk, a hard disk, an internal storage device such as a RAM, or another storage device such as a database, in which Japanese documents consisting of sentences mixed with kana and kanji are stored.

１１は指示手段としてのキーボードであり、文章編集、
文章校正等を行うための文字入カキ−１各種の指示キー
等を備えており、文章を入力するとともに、文章蓄積装
置１０から所望の文を読み出すための指示を入力する。11 is a keyboard as an instruction means, text editing,
Character input key 1 for proofreading etc. It is equipped with various instruction keys, etc., and is used to input sentences and input instructions for reading a desired sentence from the sentence storage device 10.

１２はＣＰＵ１３と共働する読出装置であり、キーボー
ド１１にて指示された文を文章蓄積装置１０から読み出
す。１４はＣＰＵｌ３と協働する字種判別装置であり、
文章蓄積装置１０から読み出した文について、１文字ず
つ漢字、ひらがな、カタカナ等の字種を判別するととも
に、句点を判別する。１５は判別結果蓄積手段および記
憶手段としての結果蓄積装置であり、ＲＡＭから構成さ
れ、字種判別装置Ｉ４による判別結果を順次蓄積すると
ともに、後述する切り出し位置挿入装置によって切り出
し位置情報が挿入された文を記憶する。切り出し位置挿
入装置１６は、ＣＰＵ１３と協働し、結果蓄積装置１５
に蓄積された字種の変移点、句点に基づいて、読み出し
７３文中に文節の切り出し位置情報を挿入する。１７は
出力制御部１８を介してＣＰＵｌ３と接続される出力手
段としての表示装置であり、ＣＲＴやＬＣＤ等のドブト
マトリクスタイブの表示装置から構成され、切り出し位
置情報が挿入された文を表示する。Reference numeral 12 denotes a reading device which works together with the CPU 13, and reads out a sentence instructed by the keyboard 11 from the text storage device 10. 14 is a character type discrimination device that cooperates with CPU13,
Regarding the sentences read out from the sentence storage device 10, character types such as kanji, hiragana, katakana, etc. are determined for each character, and punctuation marks are determined. Reference numeral 15 denotes a result storage device as a discrimination result storage means and storage means, which is composed of a RAM, and sequentially stores the discrimination results by the character type discriminator I4, and inserts cutout position information by a cutout position insertion device to be described later. Memorize sentences. The cutout position insertion device 16 cooperates with the CPU 13, and the result storage device 15
Based on the transition points and punctuation points of the character types stored in , the segment cutout position information is inserted into the reading 73 sentence. Reference numeral 17 denotes a display device as an output means connected to the CPU 13 via the output control unit 18, and is composed of a dot matrix type display device such as a CRT or LCD, and displays sentences in which cutout position information has been inserted. .

このような構成において、例文「特許庁に出す特許をワ
ープロで作成し電子出願する。」を用い、第８図に示す
フローチャートにしたがって第１の実施例による文節切
り出し処理を説明する。In such a configuration, the clause extraction process according to the first embodiment will be explained using the example sentence "Create a patent to be submitted to the Japan Patent Office using a word processor and file it electronically" according to the flowchart shown in FIG.

第３図は文章ファイル、その他のデータベース等か蓄積
された文章蓄積装置１０から処理の単位に合わせて例え
ば１文を切り出し、結果蓄積装置１５に蓄積し１こ状態
を示している。このように、例えば１文、１段落、１章
などのように、処理単位に合わせ、文章蓄積装置１０か
ら所望の文が読み出されると（ステップ３０）、その文
を構成している字種のコードが判別される（ステップ３
１）。FIG. 3 shows a state in which, for example, one sentence is cut out from the text storage device 10 in which text files, other databases, etc. are stored, in accordance with the unit of processing, and stored in the result storage device 15. In this way, when a desired sentence is read out from the sentence storage device 10 in accordance with the processing unit, such as one sentence, one paragraph, one chapter, etc. (step 30), the character types that make up the sentence are read out. The code is determined (step 3
1).

詳しくは、読み出された文の各文字にはＪＥＳコードな
どの固有の文字コードが割り当てられているため、その
文字コードを、第４図に示すコード判別テーブルの各条
件とを照合することにより、字種を判別する。条件にお
いてＣＣは字種判別対象の文字であり、＆ｌとｂｌは漢
字コードの先頭および終端を表し、ａ２とｂ２はひらが
なコードの先頭および終端を表し、ａ３とｂ３はカタカ
ナコードの先頭および終端を表し、ａ４は句点を表して
いる。Specifically, each character in the read sentence is assigned a unique character code such as the JES code, so by comparing that character code with each condition in the code discrimination table shown in Figure 4, , determine the character type. In the conditions, CC is the character to be distinguished, &l and bl represent the beginning and end of the kanji code, a2 and b2 represent the beginning and end of the hiragana code, and a3 and b3 represent the beginning and end of the katakana code. A4 represents a period.

例文の字種を判別した結果を第５図に示す。ここに、「
漠」、「ひ」、「力」、「句」はそれぞれ漢字コード、
ひらがなコード、カタカナコード、句点コードであるこ
とを示す記号である。この判別結果は結果蓄積装置１５
に蓄積される（ステップ３２）。FIG. 5 shows the results of determining the character type of the example sentences. Here,"
``boku'', ``hi'', ``chiri'', and ``ku'' are respectively kanji codes,
This symbol indicates a Hiragana code, Katakana code, or period code. This determination result is stored in the result storage device 15.
(step 32).

次に蓄積された字種コードの内容を、第６図に示す切り
出し判別テーブルの判別点を参照することにより、文節
の切り出し位置を判別する（ステップ３３）。すなわち
、（１）ひらがなから漢字への変移点でキーワードの切
れ目（文節の切れ目）を入れる。（２）ひらがなからカ
タカナへの変移点でキーワードの切れ目を入れる。（３
）句点の次ぎにキーワードの切れ目を入れる。Next, by referring to the contents of the accumulated character type codes and the determination points in the extraction determination table shown in FIG. 6, the extraction position of the clause is determined (step 33). That is, (1) a keyword break (a clause break) is inserted at the transition point from hiragana to kanji. (2) Add a keyword break at the transition point from hiragana to katakana. (3
) Insert a keyword break after the period.

次いで、切り出し判別テーブルの処理にしたがって文節
の切れ目に切り出し記号、例えば「／」を挿入し、その
結果を結果蓄積装置１５に蓄積する（ステップ３４）。Next, a cutting symbol, for example "/", is inserted at the break of the clause according to the processing of the cutting discrimination table, and the result is stored in the result storage device 15 (step 34).

そして切り出し記号が挿入された文を表示装置ｔ７の画
面上に表示する。上記した文節切り出し処理によって得
られた結果を第７図に示す。The sentence into which the cutout symbol has been inserted is then displayed on the screen of the display device t7. FIG. 7 shows the results obtained by the phrase extraction process described above.

次いで終了条件がｎ、　ｏであれば、すなわち次ぎに文
節切り出しを行うべき文があれば、次の文を文章蓄積装
置１０から読み出す処理を行い、また、蓄積する位置が
重複しないように制御を行う（ステップ３５）。Next, if the end condition is n or o, that is, if there is a sentence to be segmented next, the next sentence is read out from the sentence storage device 10, and control is performed so that the storage positions do not overlap. (Step 35).

ステップ３５においてｙｅｓ、すなわち、切り出し処理
を行う対象がなくなれば、必要とする情報を結果蓄積装
置１５に蓄積し、処理を終了する（ステップ３６）。If YES in step 35, that is, there are no more targets to be cut out, the necessary information is stored in the result storage device 15, and the process ends (step 36).

次に第２の実施例として、字種をデジタル値に変換した
結果を利用して文節を切り出す構成を第９図に基づいて
説明する。なお、第１図と同じ構成部分については同一
符号を付して説明を省略する。同図において、４０はＣ
ＰＵ１３と協働する字種デジタル化装置であり、字種判
別装置１４による字種の判別結果としての漢字およびカ
タカナに対しては第１のコード、具体的には“Ｈ“を付
し、ひらがなおよび句点に対しては第２のコード、具体
的には“Ｌ”を付し、それにより字種判別結果を２種類
のコード“Ｈ”または“Ｌ”のいずれかに変換する。Next, as a second embodiment, a configuration for cutting out phrases using the result of converting character types into digital values will be described with reference to FIG. Note that the same components as in FIG. 1 are designated by the same reference numerals, and the description thereof will be omitted. In the same figure, 40 is C
This is a character type digitization device that works together with the PU 13, and assigns a first code, specifically "H", to kanji and katakana as a result of character type discrimination by the character type discrimination device 14, and hiragana and katakana. A second code, specifically "L", is attached to the period and the period, thereby converting the character type discrimination result into one of two types of codes: "H" or "L".

結果蓄積装置４１は、字種判別結果であるコード“Ｈ゛
、“Ｌ”の記号列を記憶する。切り出し位置挿入装置４
２は、コード“Ｈ”、“Ｌ”の変移点に基づいて、読み
出した文中に文節の切り出し位置情報を挿入する。The result storage device 41 stores symbol strings of codes “H” and “L” which are the character type discrimination results.The cutout position insertion device 4
2 inserts segment segmentation position information into the read sentence based on the transition points of codes "H" and "L".

このような構成において、第２の実施例による文節切り
出し処理を、第１の実施例と同じ例文を用い％第１３図
のフローチャートにしｒ二がって説明する。In such a configuration, the phrase extraction process according to the second embodiment will be explained using the same example sentences as in the first embodiment and with reference to the flowchart of FIG. 13.

例えば１文、１段落、１章などのように、処理単位に合
わせ、文章蓄積装置１０から所望の文か読み出されると
（ステップ５０）、字種コードが判別される（ステップ
５１）。読み出された文の各文字にはＪＩＳコードなど
の固有の文字コードが割り当てられている１こめ、その
文字コードと第４図に示すコード判別表の各条件とを照
合することにより、字種が判別される。判別されｆこ字
種コードは結果蓄積装置４１に蓄積され（ステップ５２
）、デジタル化が行われる（ステップ５３）。When a desired sentence is read out from the text storage device 10 according to the processing unit, such as one sentence, one paragraph, or one chapter (step 50), the character type code is determined (step 51). Each character in the read sentence is assigned a unique character code such as a JIS code, and the character type is determined by comparing that character code with each condition in the code discrimination table shown in Figure 4. is determined. The determined f-character type code is stored in the result storage device 41 (step 52).
), digitization is performed (step 53).

第１０図に、判別された出力コードをデジタル化するた
めに参照されるコードデジタル化テーブルを示す。すな
わち、字種が漢字と判別されると“Ｈ”に変換され、ひ
らがなと判別されると“Ｌ”に変換され、同じくカタカ
ナは“Ｈｏに、句は′″Ｌ°にそれぞれ変換される。コ
ードデジタル化テーブルとの照合により文をデジタル化
し１こ結果は、第１１図に示す記号列にて結果蓄積装置
４Ｉに記憶される。なお、第１１図においては説明上、
字種コードの判別結果も併せて示している。FIG. 10 shows a code digitization table that is referenced to digitize the determined output code. That is, if the character type is determined to be a kanji, it is converted to "H", if it is determined to be hiragana, it is converted to "L", and similarly, katakana is converted to "Ho" and phrase is converted to ``''L°. The sentence is digitized by comparison with the code digitization table, and the result is stored in the result storage device 4I as a symbol string shown in FIG. In addition, in FIG. 11, for explanation purposes,
The results of character type code discrimination are also shown.

次にデジタル化しｒこ結果を、第１２［Ｋに示す切り出
し判別テーブルと照合し、文節切り出し位置を判別する
（ステップ５４）。切り出し位置の判別は、（１）デジ
タル出力における“Ｌ“と“Ｈ”の変移点でキーワード
（文節）の切れ目を入れる。（２）句点の次にキーワー
ドの切れ目を入れることにより判別される。Next, the digitized result is compared with the cutout determination table shown in the 12th [K, and the phrase cutout position is determined (step 54). To determine the cutting position, (1) insert a break in the keyword (phrase) at the transition point between "L" and "H" in the digital output. (2) Identification is made by inserting a keyword break next to a period.

次に、切り出し判別テーブルとの照合による切り出し位
置の判別に基づいて切り出し記号、例えば「／」を挿入
し、切り出し記号が挿入された文を結果蓄積装置４１に
蓄積する（ステップ５５）。Next, a cutting symbol, for example "/", is inserted based on the cutting position determined by comparison with the cutting discrimination table, and the sentence into which the cutting symbol has been inserted is stored in the result storage device 41 (step 55).

上記処理により得られる結果は、第７図に示す表示内容
と同じである。The results obtained by the above processing are the same as the display contents shown in FIG.

次いで終了条件がｎｏであれば、すなわち次ぎに文節切
り出しを行うべき文かあれば、次の文を文章蓄積装置１
０から読み出す処理、また、蓄積する位置が重複しない
ように制御を行う（ステップ５６）。Next, if the end condition is no, that is, if there is a sentence that should be segmented next, the next sentence is stored in the sentence storage device 1.
The process of reading from 0 is performed, and control is performed so that the storage positions do not overlap (step 56).

ステップ５６においてｙｅｓ、すなわち、切り出し処理
を行う対象がなくなれば、必要とする情報を結果蓄積装
置４１に蓄積し、処理を終了する（ステップ５７）。If the answer is yes in step 56, that is, there are no more targets to be cut out, the necessary information is stored in the result storage device 41, and the process ends (step 57).

（ト）発明の効果この発明によれば、（１）文節の切り出しを行う際に、
辞書を利用しないため装置のｌＩＩ成を簡略化できる。(g) Effects of the invention According to this invention, (1) when cutting out a phrase,
Since no dictionary is used, the configuration of the device can be simplified.

それにより、ワードプロセッサやオフィスコンピュータ
はもちろん、それ以外の小型機器、具体的には電子手帳
やプログラム機能付き電卓においてもこの発明を適用す
ることかできる。（２）文節切り出し処理、キーワード
検索処理を高速で行うことができる。（３）文節切り出
しのための制御プログラムを簡単にすることができる。As a result, the present invention can be applied not only to word processors and office computers, but also to other small devices, specifically electronic notebooks and calculators with program functions. (2) Phrase extraction processing and keyword search processing can be performed at high speed. (3) The control program for segmenting phrases can be simplified.

（４）文節を切り出す場合、日本語ではベタ書きのため
、文節の位置がわからないという欠点があり、そのため
、文節をどこから始め、どこで終了するかを決定するの
に多大な処理と時間を必要としている。(4) When cutting out bunsetsu, the disadvantage is that the position of the bunsetsu cannot be determined because Japanese is written solidly, and as a result, it requires a great deal of processing and time to determine where the bunsetsu begins and ends. There is.

二の発明によれば、文節位置を決定した後から言語処理
を行うことができるため、処理時間を大幅に短縮するこ
とができる。（５）字種判別結果をディジタル回路で２
値に置き換え１こ場合、処理が高速になり、回路が簡略
化され、かつ文節切り出し装置を安価で実現することが
できる。According to the second invention, language processing can be performed after determining the bunsetsu position, so processing time can be significantly shortened. (5) Character type discrimination results are converted into 2 parts using a digital circuit.
In this case, the processing becomes faster, the circuit is simplified, and the phrase extraction device can be realized at low cost.

[Brief explanation of the drawing]

第１図はこの発明の基本構成を明示するブロック図、第
２図はこの発明の第１の実施例であるワードプロセッサ
の構成を示すブロック図、第３図は文章蓄積装置に蓄積
され几文の一例を示す説明図、第４図は字種判別テーブ
ルの内容を示す説明図、第５図は字種判別結果を示す説
明図、第６図は切り出し判別テーブルの内容を示す説明
図、第７図は切り出し結果を示す説明図、第８図は第１
の実施例による文節切り出し処理を示すフローチャート
、第９図は第２の実施例であるワードプロセッサの構成
を示すブロック図、第１０図は同じくコードデジタル化
テーブルの内容を示す説明図、第１工図は字種判別され
た結果を示す説明図、第１２図は切り出し判別テーブル
の内容を示す説明図、第１３図は同しく文節切り出し処
理を示すフローチャートである。 ■・・・・・文章蓄積手段、２・・・・・・指示手段、
３　　・読出手段、　　　４・・　字種判別手段、５・
・・・・判別結果蓄積手段、６・　・・・切り出し位置挿入手段、７・・・・・記憶手段、　　　８・・・・・・出力手段
。第３図第４図第５図第８図第１０図第１１図第１２図第１３図FIG. 1 is a block diagram showing the basic configuration of this invention, FIG. 2 is a block diagram showing the configuration of a word processor that is the first embodiment of this invention, and FIG. FIG. 4 is an explanatory diagram showing the contents of the character type discrimination table; FIG. 5 is an explanatory diagram showing the character type discrimination results; FIG. 6 is an explanatory diagram showing the contents of the cutout discrimination table; The figure is an explanatory diagram showing the cutout results, and Figure 8 is the first
FIG. 9 is a block diagram showing the configuration of a word processor according to the second embodiment, FIG. 10 is an explanatory diagram showing the contents of the code digitization table, and the first engineering drawing 12 is an explanatory diagram showing the result of character type discrimination, FIG. 12 is an explanatory diagram showing the contents of the extraction discrimination table, and FIG. 13 is a flow chart showing the phrase extraction process. ■... text storage means, 2... instruction means,
3. Reading means, 4. Character type discrimination means, 5.
. . . Discrimination result accumulating means, 6. . . Cutting position insertion means, 7 . . . Storage means, 8 . . . Output means. Figure 3 Figure 4 Figure 5 Figure 8 Figure 10 Figure 11 Figure 12 Figure 13

Claims

[Scope of Claims] 1. A text storage means for storing sentences; an instruction means for instructing to read a desired sentence from the text storage means; a reading means for reading the instructed sentence from the text storage means; character type discriminating means for discriminating character types such as kanji, hiragana, katakana, etc. character by character for a given sentence, as well as determining a period; a discriminating result accumulating means for sequentially accumulating the discriminating results of the character discriminating means; and discriminating result accumulating means. a cutout position insertion means for inserting clause cutout position information into a read sentence based on the transition points and period points of character types stored in the character type, a storage means for storing the sentence in which the cutout position information has been inserted; A phrase extraction device comprising an output means for outputting a sentence into which information has been inserted. 2. The phrase cutting device according to claim 1, wherein the transition points of the character type are a position where the character type changes from hiragana to kanji and a position where hiragana changes to katakana. 3. The phrase segmentation device according to claim 1 converts kanji and katakana resulting from the character type discrimination into a first code, converts hiragana and full periods into a second code, and thereby converts the character type discrimination result into a binary code. The discrimination result storage means further comprises a means for storing the binarized code, and the cutout position insertion means performs the digitization based on the transition point and period position of the binarized code. A phrase extraction device comprising means for inserting phrase extraction position information into a read sentence.