JPH05128102A

JPH05128102A - Sentence compression device

Info

Publication number: JPH05128102A
Application number: JP3287864A
Authority: JP
Inventors: Akira Hamada; 明濱田; Hitoshi Suzuki; 等鈴木; Hirokatsu Akiyama; 広勝秋山
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1991-11-01
Filing date: 1991-11-01
Publication date: 1993-05-25
Anticipated expiration: 2013-12-14
Also published as: JP2837006B2

Abstract

PURPOSE:To provide a sentence compression method which can enlarge a compression rate by encoding a long character string. CONSTITUTION:This device is provided with dictionaries 14 and 15 registering one of the similar character strings from a sentence including plural equal or similar character strings formed from plural words, a control part 11 which is connected with the dictionaries 14 and 15 and replaces the similar character strings in the sentence into identification data and difference data of one character string registered in the dictionary 14 and 15, a data base 16, buffers 17 and 18 and a memory 19.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ワ―ドプロセッサや電
子化された書籍の文章圧縮装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a word processor and an electronic book sentence compression apparatus.

【０００２】[0002]

【従来の技術】従来の文章圧縮装置は、単語や一定文字
数の文字列を単位として、同一の文字列が複数あればそ
の文字列を辞書に登録し、辞書に登録された文字列を識
別するコ―ドによって文章中の同じ文字列を置き換えて
文章を圧縮する文章圧縮方法を用いている。2. Description of the Related Art A conventional text compressing apparatus registers a character string registered in a dictionary with a plurality of identical character strings in units of a word or a character string having a certain number of characters, and identifies the character string registered in the dictionary. A text compression method is used in which the same character string in a text is replaced by a code to compress the text.

【０００３】図９は、上述した従来の文章圧縮装置で用
いられている単語単位の文字列を登録する辞書の構造を
示す。FIG. 9 shows the structure of a dictionary for registering word-by-word character strings used in the above-described conventional sentence compression apparatus.

【０００４】図９の辞書を用いた従来の文章圧縮装置で
は、例えば、「また、風の力を源とするエレメンタル魔
法を使います。」のような文章を、「また、風の力を源
とする［語０１４１１］［語１２９５９］を使いま
す。」のように圧縮する。In the conventional sentence compression device using the dictionary shown in FIG. 9, for example, a sentence such as "I also use elemental magic that uses the power of the wind." And use [word 01411] [word 12959].

【０００５】この圧縮文中の［語０１４１１］、［語１
２９５９］などはコ―ド化された単語を示しており、コ
―ドに含まれる単語番号で図９の辞書を検索すれば元の
文字列を復元できる。[Word 01411], [Word 1] in this compressed sentence
2959] and the like indicate coded words, and the original character string can be restored by searching the dictionary in FIG. 9 with the word number included in the code.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、上述し
た従来の文章圧縮装置では、辞書に登録する文字列が２
５６種類を超えた場合、識別コ―ドとして２バイトが必
要である。しかし、漢語では２文字で形成されたものが
多く、対象となる語は数文字の長さにしかならないの
で、文章全体の圧縮率を大きくできないという問題点が
あった。However, in the above-described conventional sentence compression apparatus, the character string registered in the dictionary is 2 characters.
When the number of types exceeds 56, 2 bytes are required as an identification code. However, since many Chinese words are formed of two characters and the target word is only a few characters long, there is a problem that the compression rate of the entire sentence cannot be increased.

【０００７】本発明は、上述した従来の文章圧縮装置に
おける問題点に鑑み、文字列をコ―ド化して文章の圧縮
率を拡大できる文章圧縮装置を提供する。In view of the above-mentioned problems in the conventional text compression apparatus, the present invention provides a text compression apparatus which can code a character string to increase the text compression rate.

【０００８】[0008]

【課題を解決するための手段】本発明は、複数の語から
形成された同一または類似した文字列を複数含む文章の
中から類似した文字列のうち特定の文字列を登録する登
録手段と、登録手段に接続されており文章中の類似した
文字列を登録手段に登録した特定の文字列の識別デ―タ
及び差分デ―タに置き換える変換手段とを備えている文
章圧縮装置によって達成される。According to the present invention, there is provided registration means for registering a specific character string among similar character strings from a sentence including a plurality of identical or similar character strings formed from a plurality of words, This is achieved by a sentence compression device that is connected to the registration means and that has conversion means for replacing similar character strings in a sentence with identification data and difference data of a specific character string registered in the registration means. ..

【０００９】[0009]

【作用】本発明の文章圧縮装置では、登録手段は複数の
語から形成された同一または類似した文字列を複数含む
文章の中から類似した文字列のうち特定の文字列を登録
し、変換手段は登録手段に接続されており文章中の類似
した文字列を登録手段に登録した特定の文字列の識別デ
―タ及び差分デ―タに置き換える。In the sentence compressing apparatus of the present invention, the registering means registers a specific character string among similar character strings from a sentence including a plurality of identical or similar character strings formed from a plurality of words, and converting means. Is connected to the registration means and replaces a similar character string in the sentence with the identification data and the difference data of the specific character string registered in the registration means.

【００１０】[0010]

【実施例】以下、図面を参照して本発明の文章圧縮装置
の実施例を説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the sentence compression apparatus of the present invention will be described below with reference to the drawings.

【００１１】図１は、本発明の文章圧縮装置の一実施例
の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of an embodiment of the sentence compression apparatus of the present invention.

【００１２】図１の文章圧縮装置は、圧縮された文章デ
―タの復元や文字列の表示を制御する変換手段の一部で
ある制御部11、制御部11に接続されており復元する文章
を指示する入力部12、制御部11に接続されており復元し
た文章を表示する陰極線管（ＣＲＴ）や液晶表示装置
（ＬＣＤ）などからなる表示部13、制御部11に接続され
ており原文（図２参照）を圧縮して長い文字列中の同一
の文字列をコ―ド化（以下、文コ―ドと称する）したデ
―タから、復元される文字列を格納する登録手段の一部
である辞書14、制御部11に接続されており単語コ―ドか
ら復元される文字列を格納する登録手段の一部である辞
書15、制御部11に接続されており文コ―ドでコ―ド化さ
れかつ差分デ―タで圧縮された文章を有機的に結合して
集合させた変換手段の一部であるデ―タベ―ス16、制御
部11に接続されており最終的に復元された結果を蓄える
変換手段の一部であるバッフア17、制御部11に接続され
ており辞書14から取り出した文字列を一時的に蓄える変
換手段の一部であるバッフア18、制御部11に接続されて
おりバッフア17，18及び圧縮デ―タの文字位置を格納す
る変換手段の一部であるメモリ19によって構成されてい
る。The text compression apparatus shown in FIG. 1 is connected to the control unit 11 and the control unit 11, which is a part of conversion means for controlling the restoration of compressed text data and the display of character strings, and the text to be restored. Is connected to the input unit 12 and the control unit 11 for displaying the restored text, such as a cathode ray tube (CRT) and a liquid crystal display (LCD), and the control unit 11. One of the registration means for storing the character string to be restored from the data obtained by compressing (see FIG. 2) and coding the same character string in a long character string (hereinafter referred to as sentence code). Dictionary 14 which is a part of the dictionary, which is connected to the control unit 11 and which is a part of the registration means for storing the character string restored from the word code, and which is connected to the control unit 11 and which is a sentence code. A part of conversion means that organically combines coded and compressed data with difference data A certain database 16, a buffer 17 that is connected to the control unit 11 and is a part of the conversion means that stores the finally restored result, and a character string that is connected to the control unit 11 and that is retrieved from the dictionary 14. Is composed of a buffer 18 which is a part of a converting means for temporarily storing the data, a buffer 19 connected to the control section 11 and a memory 19 which is a part of the converting means for storing the character position of the compressed data. ing.

【００１３】次に、上記各構成部分を説明する。Next, each of the above components will be described.

【００１４】制御部11は、圧縮された文章デ―タの復元
や文字列の表示を制御すると共に、他の部分の動作を制
御する。The control unit 11 controls the decompression of compressed text data and the display of character strings, and also controls the operation of other parts.

【００１５】入力部12は、復元する文章を指示するよう
にキ―ボ―ドで構成されている。The input unit 12 is constituted by a keyboard so as to instruct a sentence to be restored.

【００１６】表示部13は、復元した文章を表示するよう
に陰極線管（ＣＲＴ）や液晶表示装置（ＬＣＤ）などで
構成されている。The display unit 13 is composed of a cathode ray tube (CRT), a liquid crystal display (LCD) or the like so as to display the restored text.

【００１７】辞書14は、図３に示すように１対１で文番
号に対応するインデックス141 と可変長の文字列を格納
する本体部142 とから構成されている。As shown in FIG. 3, the dictionary 14 is composed of an index 141 corresponding to sentence numbers in a one-to-one correspondence and a main body 142 for storing variable length character strings.

【００１８】インデックス141 は、本体部142 の対応す
る文字列の先頭を示し、インデックス141 の次の要素が
指す文字位置との差から文字列の長さがわかるように構
成されている。The index 141 indicates the beginning of the corresponding character string of the main body 142, and the length of the character string can be known from the difference from the character position pointed to by the next element of the index 141.

【００１９】辞書15は、図９に示すように単語番号と１
対１で対応する構造となっており、デ―タベ―ス16は、
文コ―ド及び差分デ―タでコ―ド化及び圧縮された文章
を有機的に結合して集合させ、入力部12の指示により必
要な部分が取り出せるように構成されている。The dictionary 15 has word numbers and 1 as shown in FIG.
The structure is one-to-one correspondence, and the database 16 is
It is configured such that the sentences coded and compressed by the sentence code and the difference data are organically combined and collected, and a necessary portion can be taken out by an instruction of the input unit 12.

【００２０】バッフア17は、最終的に復元された結果を
蓄え、バッフア18は、辞書14から取り出した文字列を一
時的に蓄えるように構成されている。The buffer 17 stores the finally restored result, and the buffer 18 is configured to temporarily store the character string retrieved from the dictionary 14.

【００２１】メモリ19は、文章デ―タベ―ス16の処理位
置を示すポインタ191 、復元バッフア17の末尾位置を示
すポインタ192、復元バッフア18の末尾位置を示すポイ
ンタ193 及び復元バッフア18の挿入位置を示すポインタ
194 によって構成されている。The memory 19 includes a pointer 191, which indicates the processing position of the text database 16, a pointer 192, which indicates the end position of the restoration buffer 17, a pointer 193, which indicates the end position of the restoration buffer 18, and an insertion position of the restoration buffer 18. Pointer to
It is composed by 194.

【００２２】図２は、本実施例の文章圧縮装置により圧
縮する原文の一部をに示す。また、図３は、本実施例の
文章圧縮装置の長い文字列を登録した辞書の一構成例を
示しており、図４は、本実施例の文章圧縮装置により長
い文字列をコ―ド化した一例を示す。FIG. 2 shows a part of an original sentence compressed by the sentence compressing apparatus of this embodiment. Further, FIG. 3 shows an example of the structure of a dictionary in which long character strings are registered in the text compression device of this embodiment, and FIG. 4 is a diagram showing a long character string coded by the text compression device of this embodiment. An example is shown below.

【００２３】本実施例の文章圧縮装置は、文、節または
句などのような長い文字列を辞書14に登録して、同一の
文字列をコ―ド化（文コ―ド）すると共に、辞書14に登
録した文字列と完全に一致しない場合には、その差分デ
―タを付加するように構成されている。即ち、本実施例
の文章圧縮装置は、図２のような原文を図３の辞書を用
いて図４の形に圧縮する文章圧縮方法を用いている。The sentence compressing apparatus of this embodiment registers long character strings such as sentences, clauses or phrases in the dictionary 14 to code the same character strings (sentence code), and When the character string registered in the dictionary 14 does not completely match, the difference data is added. That is, the text compression apparatus of the present embodiment uses the text compression method for compressing the original text as shown in FIG. 2 into the shape shown in FIG.

【００２４】次に、図１の文章圧縮装置により圧縮され
た図４の文章の各部分を詳細に説明する。Next, each part of the sentence of FIG. 4 compressed by the sentence compressing device of FIG. 1 will be described in detail.

【００２５】図４の［文］は、文コ―ド及び差分デ―タ
を通常文字コ―ドから分離するセパレ―タである。ま
た、図４の［０００２３］は、文コ―ドを示しており、
コ―ドに含まれる文番号で図３の辞書を検索すれば元の
文字列が復元されるように構成されている。[Sentence] in FIG. 4 is a separator that separates the sentence code and the difference data from the normal character code. Also, [00023] of FIG. 4 shows a sentence code,
The original character string is restored by searching the dictionary of FIG. 3 with the sentence number included in the code.

【００２６】図４の［０１−０４］は、文コ―ドから復
元された文字列からの削除デ―タであり、ハイフンの後
ろに配置された数字が削除文字の先頭位置を示し、ハイ
フンの前に配置された数字が削除文字数を示している。
この削除文字デ―タ及び削除文字デ―タに続く挿入文字
列により差分デ―タを構成する。[01-04] in FIG. 4 is deletion data from the character string restored from the sentence code. The number placed after the hyphen indicates the beginning position of the deletion character, and the hyphen. The number placed in front of indicates the number of deleted characters.
The deletion data and the insertion character string following the deletion character data constitute difference data.

【００２７】図５は、本実施例の文章圧縮装置における
長い文字列と単語単位の文字列の双方のコ―ド化を行っ
た一例を示す模式図であり、図６は、本実施例の文章圧
縮装置におけるコ―ドデ―タ構造を説明するための図で
ある。FIG. 5 is a schematic diagram showing an example in which both a long character string and a word-unit character string are coded in the sentence compressing apparatus of this embodiment, and FIG. 6 is of this embodiment. It is a figure for demonstrating the code data structure in a text compression device.

【００２８】図５に示した内容は、図４の後半を図９の
単語辞書を用いて更に圧縮したものである。なお、実際
のコ―ドとしては、通常、文字をＪＩＳの漢字コ―ドを
用いて２バイト（最上位バイト（ＭＳＢ）“０”）で表
現し、その他の圧縮コ―ドは図６に示すような構成であ
る。この構造に収めるため、差分が先頭から２５６文字
以降になる場合は登録する文を分割して２５５文字以内
に調整し、また、差分が８文字以上になる場合は７文字
以内になるように調整する。The content shown in FIG. 5 is the latter half of FIG. 4 further compressed using the word dictionary of FIG. As an actual code, characters are usually expressed in 2 bytes (most significant byte (MSB) “0”) using JIS Kanji code, and other compression codes are shown in FIG. The configuration is as shown. To accommodate this structure, if the difference is 256 characters or more from the beginning, the sentence to be registered is divided and adjusted within 255 characters, and if the difference is 8 characters or more, adjusted within 7 characters. To do.

【００２９】次に、図７及び図８のフロ−チャ−トを参
照して、図１の文章圧縮装置の動作、特に入力部12（図
１参照）で指示された圧縮文書を復元する手順を説明す
る。Next, referring to the flow charts of FIGS. 7 and 8, the operation of the text compression apparatus of FIG. 1, particularly the procedure for decompressing the compressed document designated by the input unit 12 (see FIG. 1). Will be explained.

【００３０】図５の文の１行目を取り上げれば、まず、
ポインタ191 が指示された圧縮デ―タの先頭を指すよう
にしてポインタ192 がバッフア17の先頭、ポインタ193
がバッフア18の先頭をそれぞれ指すように初期化し（ス
テップＳ１）、ポインタ191が指すデ―タの種類を判別
し（ステップＳ２）、図５の場合には、セパレ―タ
［文］がくるのでステップＳ３に進む、ここからは、文
コ―ドデ―タの復元を行う図８のサブ・ル―チンに入
る。Taking the first line of the sentence in FIG. 5, first,
Pointer 191 points to the beginning of the indicated compressed data, and pointer 192 points to the beginning of buffer 17 and pointer 193.
Are initialized so that they respectively point to the beginning of the buffer 18 (step S1), the type of data pointed to by the pointer 191 is determined (step S2), and in the case of FIG. 5, the separator [sentence] comes. Proceed to step S3. From here, enter the sub-routine of FIG. 8 for restoring the sentence code data.

【００３１】文セパレ―タの直後に文コ―ドがあるの
で、文番号［０００２３］を取り出し（ステップＳ３
１）、この文番号でインデックス４１を介して辞書４２
を検索して“また、風の…”という文字列を見付け（ス
テップＳ３２）、見付けた文字列をバッフア18の先頭に
コピ―してその末尾位置をポインタ193 が指すようにし
（ステップＳ３３）、ポインタ191 を次に進めて（ステ
ップＳ３４）、ポインタ191 が指すデ―タの種類を判別
する（ステップＳ３５）。この場合、次のデ―タは削除
文字デ―タ［０１−０４］であるのでステップＳ３６に
進む。Since there is a sentence code immediately after the sentence separator, the sentence number [00023] is taken out (step S3).
1), dictionary 42 with index 41 with this sentence number
To find the character string “against the wind ...” (step S32), copy the found character string to the beginning of the buffer 18 so that the end position is pointed to by the pointer 193 (step S33). The pointer 191 is advanced to the next (step S34), and the type of data pointed to by the pointer 191 is determined (step S35). In this case, since the next data is the deleted character data [01-04], the process proceeds to step S36.

【００３２】バッフア18から、削除する文字位置及び文
字長を取り出すと共に、文字位置をポインタ194 に保存
し（ステップＳ３６）、この場合、先頭から４文字目の
１字のみ（削除対象となった文字（この場合、
“風”））をバッフア18から削除し、ポインタ193 が文
字列の末尾を指すように操作し（ステップＳ３７）、ス
テップＳ３４へ戻りステップＳ３５の判別で、次のデ―
タが通常文字の“彼”なのでステップＳ３９へ進む。The character position and the character length to be deleted are extracted from the buffer 18, and the character position is stored in the pointer 194 (step S36). In this case, only the first character of the fourth character (the character to be deleted) (in this case,
"Wind")) is deleted from the buffer 18, and the pointer 193 is operated so as to point to the end of the character string (step S37). Then, the process returns to step S34 and the next data is determined in step S35.
Since the normal character is "he", the process proceeds to step S39.

【００３３】この文字をバッフア18のポインタ194 が示
す位置に挿入し、ポインタ193 及びポインタ194 を挿入
文字数分進めてそれぞれ末尾位置と次の挿入位置を指す
ようにし（ステップＳ３９）、ステップＳ３４、ステッ
プＳ３５及びステップＳ３９をル―プし、“ら”、
“は”を同様に挿入した後、ステップＳ３５の判別によ
り次のデ―タが単語コ―ド［語０８３９４］であるので
ステップＳ３８に進む。This character is inserted into the position indicated by the pointer 194 of the buffer 18, and the pointers 193 and 194 are advanced by the number of inserted characters to point to the end position and the next insertion position, respectively (step S39), step S34, step Loop S35 and step S39,
After inserting "ha" in the same manner, since the next data is the word code [word 08394] in the determination of step S35, the process proceeds to step S38.

【００３４】単語コ―ドの単語番号から辞書15を検索し
て対応する文字列、この場合“大地”を取り出す（ステ
ップＳ３８）。ステップＳ３９で取り出した“大地”を
バッフア18に挿入してステップＳ３４に戻る。The dictionary 15 is searched from the word number of the word code and the corresponding character string, in this case, "earth" is taken out (step S38). The "ground" taken out in step S39 is inserted into the buffer 18 and the process returns to step S34.

【００３５】ステップＳ３５で、次のデ―タが文セパレ
―タであることが分かるので図７のメインル―チンに戻
る。At step S35, since it is found that the next data is the sentence separator, the process returns to the main routine of FIG.

【００３６】文字列（バッフア18に用意された文字列）
をバッフア17のポインタ192 が指す位置へ追加する。Character string (character string prepared for buffer 18)
Is added to the position pointed to by the pointer 192 of buffer 17.

【００３７】ポインタ193 をバッフア18の先頭を指すよ
うにし（ステップＳ５）、ポインタ191 を進め（ステッ
プＳ６）、デ―タの終わりの判別を行う（ステップＳ
７）。この場合、ここでデ―タは終わっているのでステ
ップＳ８へ進み、バッフア17に蓄えられた文字列を表示
部13に表示して処理を終了する（ステップＳ８）。The pointer 193 is made to point to the beginning of the buffer 18 (step S5), the pointer 191 is advanced (step S6), and the end of data is determined (step S5).
7). In this case, since the data is finished here, the process proceeds to step S8, the character string stored in the buffer 17 is displayed on the display unit 13 and the process is finished (step S8).

【００３８】次に、図５の文の３行目を取り上げて説明
する。Next, the third line of the sentence shown in FIG. 5 will be described.

【００３９】まず、１行目と同様にステップＳ１の初期
化を行った後、ステップＳ２の判別で最初の文字が通常
文字“た”であるので、ステップＳ５へ進み、この文字
をバッフア17に入力し（ステップＳ５）、ポインタ191
を進め（ステップＳ６）、デ―タの終りの判別をする
（ステップＳ７）。この場合には、まだデ―タがあるの
でステップＳ２へ戻る。そして、ステップＳ２、ステッ
プＳ５、ステップＳ６及びステップＳ７をル―プして、
“だ”、“し”、“、”を同様にバッフア17に追加した
後、ステップＳ２の判別により次のデ―タが単語コ―ド
［語００８０８］であるのでステップＳ４に進む。First, after the initialization of step S1 is performed as in the case of the first line, the first character is the normal character "ta" in the determination of step S2, so the process proceeds to step S5 and this character is stored in the buffer 17. Input (step S5), pointer 191
(Step S6), the end of data is determined (Step S7). In this case, since there is still data, the process returns to step S2. Then, step S2, step S5, step S6 and step S7 are looped,
After "da", "shi", and "," are added to the buffer 17 in the same manner, the next data is the word code [word 00808] according to the determination in step S2, and therefore the process proceeds to step S4.

【００４０】単語コ―ドの単語番号から辞書15を検索し
対応する文字列、この場合“石つぶて”を取り出し（ス
テップＳ４）、ステップＳ５でこの文字列をバッフア17
に追加する。この後、同様のル―プで通常文字列を追加
し、最後に１行目の文の場合と同様に文コ―ド［０００
２５］を復元してバッフア17に追加する。The dictionary 15 is searched from the word number of the word code and the corresponding character string, in this case "stone crushed", is taken out (step S4), and this character string is buffered in step S5.
Add to. After this, add a normal character string with a similar loop, and finally add the sentence code [000 as in the case of the first line sentence.
25] and restore it to buffer 17.

【００４１】ステップＳ７でデ―タの終わりを判別し、
ステップＳ８で復元した結果を表示して処理を終わる。In step S7, the end of data is determined,
The result restored in step S8 is displayed and the process ends.

【００４２】上述したように、辞書に登録した文、節、
句などの長い文字列と同一または類似の文字列をコ―ド
化して圧縮した文章デ―タから元の文字列が復元され
る。As described above, sentences, clauses registered in the dictionary,
The original character string is restored from the sentence data obtained by coding a character string that is the same as or similar to a long character string such as a phrase and compressed.

【００４３】[0043]

【発明の効果】本発明の文章圧縮装置は、複数の語から
形成された同一または類似した文字列を複数含む文章の
中から類似した文字列のうち特定の文字列を登録する登
録手段と、登録手段に接続されており文章中の類似した
文字列を登録手段に登録した特定の文字列の識別デ―タ
及び差分デ―タに置き換える変換手段とを備えているの
で、類似した言い回しが多く現れる文章の圧縮率を向上
できる。EFFECT OF THE INVENTION The sentence compression apparatus of the present invention includes a registration means for registering a specific character string among similar character strings from a sentence including a plurality of identical or similar character strings formed from a plurality of words, Since it is connected to the registration means and has a conversion means for replacing the similar character string in the sentence with the identification data and the difference data of the specific character string registered in the registration means, there are many similar expressions. The compression ratio of the appearing sentence can be improved.

[Brief description of drawings]

【図１】本発明の文章圧縮装置の一実施例の構成を示す
ブロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment of a text compression device of the present invention.

【図２】図１の文章圧縮装置による圧縮の一例を説明す
るための原文章の一部を示す説明図である。FIG. 2 is an explanatory diagram showing a part of an original sentence for explaining an example of compression by the sentence compression device in FIG.

【図３】図１の文章圧縮装置の長い文字列を登録する辞
書の一構成例を示す説明図である。FIG. 3 is an explanatory diagram showing a configuration example of a dictionary for registering a long character string of the sentence compression device in FIG.

【図４】図１の文章圧縮装置で長い文字列をコ―ド化し
た例文を示す説明図である。FIG. 4 is an explanatory diagram showing an example sentence in which a long character string is coded by the sentence compression device of FIG.

【図５】図１の文章圧縮装置の長い文字列と単語単位の
文字列の双方のコ―ド化を行った一例の構造を示す模式
図である。5 is a schematic diagram showing a structure of an example in which both a long character string and a character string of a word unit are coded in the sentence compression apparatus of FIG.

【図６】図１の文章圧縮装置におけるコ―ドデ―タの構
造を示す説明図である。FIG. 6 is an explanatory diagram showing the structure of code data in the sentence compression apparatus of FIG.

【図７】図１の文章圧縮装置の動作を説明するためのメ
イン・ル―チンのフロ−チャ−トである。FIG. 7 is a flowchart of a main routine for explaining the operation of the sentence compression apparatus of FIG.

【図８】図１の文章圧縮装置の動作を説明するためのサ
ブ・ル―チンのフロ−チャ−トである。FIG. 8 is a sub-routine flowchart for explaining the operation of the sentence compression apparatus of FIG.

【図９】従来の文章圧縮装置の単語単位の文字列を登録
する辞書の構造を示す説明図である。FIG. 9 is an explanatory diagram showing a structure of a dictionary for registering character strings in word units of a conventional sentence compression device.

[Explanation of symbols]

11 制御部 12 入力部 13 表示部 14 文字列辞書 15 単語辞書 16 圧縮文章デ―タベ―ス 17 復元文バッフア 18 文字列辞書バッフア 19 メモリ集合 141 文字列辞書インデックス 142 文字列辞書本体 191 〜194 メモリポインタ 11 Control unit 12 Input unit 13 Display unit 14 Character string dictionary 15 Word dictionary 16 Compressed sentence database 17 Restored sentence buffer 18 Character string dictionary Buffer 19 Memory set 141 Character string dictionary index 142 Character string dictionary body 191 to 194 Memory Pointer

Claims

[Claims]

1. A registration unit for registering a specific character string of a similar character string from a sentence including a plurality of identical or similar character strings formed of a plurality of words, and a registration unit connected to the registration unit. A sentence compression apparatus, comprising: a conversion unit that replaces the similar character string in the sentence with the identification data and difference data of the specific character string registered in the registration unit.