JP2792147B2

JP2792147B2 - Character processing method and device

Info

Publication number: JP2792147B2
Application number: JP1270649A
Authority: JP
Inventors: 幸恵衣川; 淳市久保田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1989-10-18
Filing date: 1989-10-18
Publication date: 1998-08-27
Anticipated expiration: 2013-08-27
Also published as: JPH03131960A

Description

【発明の詳細な説明】産業上の利用分野本発明は、文書の作成、管理等の文書処理を目的とし
た文字処理方法およびその装置に関するものである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character processing method and an apparatus therefor for the purpose of document processing such as creation and management of a document.

従来の技術日本語文章の中ではある単語を表わすためにその単語
の一部の文字から構成される略語が用いられることがあ
り、１つの文章中でも、ある１つの単語についてこのよ
うな複数の表記が混在していることが多い（例、パーソ
ナルコンピュータとパソコン、ソビエト連邦とソ連な
ど）。このような現象は、文章の統一性を損ない、読み
易さを阻害すると言われている。これに対し、１つの単
語に対して許容される表記の情報を辞書内に持ち、文章
中の文字列とこの辞書内の単語表記を比較して表記の統
一を図る文字処理装置が近年考案されている（特開昭63
−15359）。2. Description of the Related Art In a Japanese sentence, an abbreviation composed of some characters of the word may be used to represent a certain word. Are often mixed (eg, personal computer and personal computer, Soviet Union and USSR). It is said that such a phenomenon impairs the uniformity of sentences and impairs readability. On the other hand, in recent years, a character processing apparatus has been devised which has information of a notation allowed for one word in a dictionary and compares a character string in a sentence with a word notation in the dictionary to unify notations. (JP 63
-15359).

第２図は、前記の従来の文字処理装置の構成図であ
る。FIG. 2 is a configuration diagram of the conventional character processing apparatus.

図中、21は、入力部であり、漢字かな混じりの文字列
を入力する。22は、単語テーブルであり、複数の表記が
許容される単語を許容される表記と共に記憶する。23
は、単語検出部であり、前記単語テーブル22に記憶され
ている表記と比較して、入力部21で入力された文字列か
ら複数の表記が許容される単語を検出する。24は、一時
記憶部であり、入力された文字列の中から単語検出部23
で検出された複数の表記が許容される単語とその表記を
一時記憶する。25は、比較部であり、入力部21で入力さ
れた単語が単語検出部23で検出されたとき、一時記憶部
24で一時記憶している単語の表記と入力文字列の表記を
比較する。26は、文書バッファであり、決定した文書を
記憶する。27は、表示部であり、比較部25において表記
の方法が異なると判定されたときにそのことを出力す
る。それら以外にも構成要素が存在するが、本発明との
対比のためには必要がないので省略する。In the figure, reference numeral 21 denotes an input unit for inputting a character string mixed with kanji or kana. Reference numeral 22 denotes a word table, which stores words for which a plurality of expressions are allowed, together with allowable expressions. twenty three
Is a word detection unit, which detects words in which a plurality of notations are allowed from the character string input by the input unit 21 in comparison with the notations stored in the word table 22. Reference numeral 24 denotes a temporary storage unit, which detects a word from the input character string.
Temporarily store the words that are allowed in the plural expressions detected in and the expressions. Reference numeral 25 denotes a comparing unit, and when the word input by the input unit 21 is detected by the word detecting unit 23, a temporary storage unit
At 24, the notation of the word temporarily stored and the notation of the input character string are compared. 26 is a document buffer for storing the determined document. Reference numeral 27 denotes a display unit, and when the comparison unit 25 determines that the notation method is different, it outputs the fact. There are other components other than these, but they are not necessary for comparison with the present invention and will not be described.

上記のように構成された従来の文字処理装置におい
て、まず、入力部21から入力された文字列の表記が単語
テーブル22の中に存在するかどうかを判断し、存在すれ
ば、一時記憶部24に入力部21から入力された文字列の表
記を一時記憶する。また、既に一時記憶している単語の
中から同一単語を検索する。同一単語が検索されて、入
力しようとしている単語の表記と、前にもちいられたそ
の単語の表記が一致しないとき、表示部27よりそのこと
を出力する。In the conventional character processing device configured as described above, first, it is determined whether or not the notation of the character string input from the input unit 21 exists in the word table 22. The character string input from the input unit 21 is temporarily stored. Further, the same word is searched for from the words that have already been temporarily stored. When the same word is searched and the notation of the word to be input does not match the notation of the word used before, the fact is output from the display unit 27.

発明が解決しようとする課題従来の文字処理装置を用いて略語の表記のゆれを検出
する場合、あらかじめ許容される複数の表記を対応させ
て表記のゆれ辞書として記憶しておく必要があった。こ
のため、表記のゆれ辞書に存在しない表記のゆれを検出
することができなかった。また、実用上十分な表記のゆ
れを検出するためには膨大な量の情報を記憶している必
要があった。Problems to be Solved by the Invention When detecting fluctuations in abbreviations using a conventional character processing device, it is necessary to store a plurality of permissible expressions in advance in correspondence with a notation fluctuation dictionary. For this reason, it was not possible to detect a sway of a spelling that does not exist in the spelling dictionary. Further, in order to detect a fluctuation of notation that is practically sufficient, it is necessary to store an enormous amount of information.

本発明は、従来の文字処理装置が有していた前記の問
題点に鑑み、表記のゆれ辞書を用いずに、ある文字列の
表記をあたえれば、その文字列の一部の文字からなる略
語である可能性がある文字列を文章中から検索する。ま
たは、ある単語の一部の文字からなる略語の表記をあた
えれば、その文字列の正式な表記である可能性がある文
字列を文章中から検索することができる文字処理装置お
よびその方法を提供することを目的とする。The present invention has been made in view of the above-described problems of a conventional character processing apparatus, and does not use a spelling dictionary. Search the text for possible abbreviations in the text. Alternatively, given a notation of an abbreviation consisting of some characters of a certain word, a character processing device and a method capable of searching a text for a character string that may be a formal notation of the character string are provided. The purpose is to provide.

課題を解決するための手段（１）文章を記憶する文章記憶部と、前記文章記憶部に
記憶している文章の中から文字列を抽出する文字列抽出
部と、文字列の表記を入力する入力部と、前記入力部か
ら入力された文字列を一時記憶する入力文字列一時記憶
部と、前記入力文字列一時記憶部が一時記憶している文
字列と前記文字列抽出部が抽出した文字列の文字列長を
比較して短い方を第１の文字列とし、長い方を第２の文
字列とする文字列長判定部と、第１の文字列と第２の文
字列の表記を比較して、第１の文字列を構成する各文字
が、第２の文字列の中に出現箇所を問わずにすべて存在
し、かつ、その出現順が同じである場合に、第１の文字
列は第２の文字列の略語、すなわち、第２の文字列は第
１の文字列の正式な表記であると判定する略語判定部と
を備えた文字処理装置である。Means for Solving the Problems (1) A sentence storage unit for storing sentences, a character string extraction unit for extracting a character string from sentences stored in the sentence storage unit, and input of a notation of the character string An input unit, an input character string temporary storage unit for temporarily storing a character string input from the input unit, a character string temporarily stored in the input character string temporary storage unit, and a character extracted by the character string extraction unit By comparing the character string lengths of the columns, a character string length determining unit that determines the shorter one as the first character string and the longer one as the second character string, and describes the notation of the first and second character strings. By comparison, if all the characters that make up the first character string are present in the second character string regardless of where they appear and are in the same order of appearance, the first character Column is the abbreviation for the second string, ie, the abbreviation that determines that the second string is a formal notation of the first string This is a character processing device including a determination unit.

（２）文章の中から文字列を抽出する文字列抽出段階
と、入力された文字列を一時記憶する入力文字列一時記
憶段階と、前記入力文字列一時記憶段階が一時記憶して
いる文字列と前記文字列抽出手段が抽出した文字列の文
字列長を比較して短い方を第１の文字列とし、長い方を
第２の文字列とする文字列長判定段階と、第１の文字列
と第２の文字列の表記を比較して、第１の文字列を構成
する各文字が、第２の文字列の中に出現箇所を問わずに
すべて存在し、かつ、その出現順が同じである場合に、
第１の文字列は第２の文字列の略語、すなわち、第２の
文字列は第１の文字列の正式な表記であると判定する略
語判定段階とを備えた文字処理方法である。(2) a character string extraction step of extracting a character string from a sentence, an input character string temporary storage step of temporarily storing an input character string, and a character string temporarily stored in the input character string temporary storage step And comparing the character string lengths of the character strings extracted by the character string extracting means with each other. The character string length determining step of setting the shorter one as the first character string and the longer one as the second character string, By comparing the notation of the string with the notation of the second character string, all the characters constituting the first character string are present in the second character string regardless of their appearance, and their appearance order is If they are the same,
The first character string is an abbreviation of the second character string, that is, an abbreviation determination step of determining that the second character string is a formal notation of the first character string.

作用本発明は前記した構成により、文字列抽出部は、文書
記憶部で記憶している文書の中から文字列を抽出する。
文字列一時記憶部は、入力部から入力された文字列の表
記を一時記憶する。略語判定部は、文字列抽出部が抽出
した文字列表記と一時記憶部が一時記憶している文字列
の表記を比較して、どちらか一方がもう一方の略語にな
っているか否かを判定する。According to the present invention, with the above-described configuration, the character string extracting unit extracts a character string from a document stored in the document storage unit.
The character string temporary storage unit temporarily stores the notation of the character string input from the input unit. The abbreviation determination unit compares the notation of the character string extracted by the character string extraction unit with the notation of the character string temporarily stored in the temporary storage unit, and determines whether one of the abbreviations is the other abbreviation. I do.

実施例以下、本発明の実施例を図面を用いて説明する。Embodiments Hereinafter, embodiments of the present invention will be described with reference to the drawings.

第１図は、本発明における一実施例の文字処理装置の
構成図である。FIG. 1 is a configuration diagram of a character processing apparatus according to an embodiment of the present invention.

第１図において、11は文章記憶部であり、入力された
文章を記憶する。12は文字列抽出部であり、文章記憶部
11が記憶している文章をひらがなから他の種類の文字に
変わるところで区切り、区切りと区切りの間の文字列を
１つずつ順に抽出する。13は入力部であり、ある単語の
略語、あるいは、正式な表記を入力する。14は入力文字
列一時記憶部であり、入力部13で入力された文字列の表
記を一時記憶する。15は文字列長判定部であり、文字列
抽出部12で抽出された文字列の表記と入力文字列一時記
憶部14で一時記憶している２つの文字列の長さを比較し
てどちらの文字列が短い（同じ長さを含む）かを判定
し、後述する略語判定部に長短の区別をつけて２つの文
字列を渡す。16は略語判定部であり、前記文字列長判定
部15で判定した文字列長の短い方の表記を構成する各文
字が、他方の表記の文字列中にその出現順にすべて含ま
れているとき、文字列長の短い方の文字列は、他方の文
字列に含まれる文字列の略語になっている可能性がある
と判定する。17は、表示部であり、略語判定部16でどち
らか一方が他方の略語になっている可能性があると判定
されたときに、どちらの文字列がどちらの文字列の略語
になっているかをそれらの文字列の表記と共に一覧表示
する。In FIG. 1, reference numeral 11 denotes a text storage unit for storing input text. Reference numeral 12 denotes a character string extraction unit, which is a sentence storage unit.
When the sentence stored in 11 is changed from hiragana to another type of character, the sentence is separated, and character strings between the delimiters are sequentially extracted one by one. Reference numeral 13 denotes an input unit for inputting an abbreviation of a certain word or a formal notation. Reference numeral 14 denotes an input character string temporary storage unit that temporarily stores the notation of the character string input by the input unit 13. Reference numeral 15 denotes a character string length determination unit which compares the notation of the character string extracted by the character string extraction unit 12 with the length of the two character strings temporarily stored in the input character string temporary storage unit 14 to determine which one of the two. It is determined whether the character strings are short (including the same length), and the two character strings are passed to an abbreviation determination unit, which will be described later, with a distinction between long and short. 16 is an abbreviation determination unit, when each character constituting the shorter notation of the character string length determined by the character string length determination unit 15 is all included in the character string of the other notation in the order of appearance It is determined that the character string with the shorter character string length may be an abbreviation of the character string included in the other character string. Reference numeral 17 denotes a display unit, and when the abbreviation determination unit 16 determines that one of them may be the other abbreviation, which character string is the abbreviation of which character string Are listed with their string notation.

第３図は、本発明の一実施例の文字処理方法を説明す
るフロー図である。FIG. 3 is a flowchart illustrating a character processing method according to an embodiment of the present invention.

31は入力文字列一時記憶段階であり、入力部13で入力
された文字列の表記を一時記憶する。32は文字列抽出段
階であり、文章記憶部11が記憶している文章をひらがな
から他の種類の文字に変わるところで区切り、区切りと
区切りの間の文字列を１つずつ順に抽出する。33は文字
列長判定段階であり、文字列抽出段階32抽出された文字
列と入力文字列一時記憶部14で一時記憶している文字列
の長さを比較し、どちらが文字列長が短い（同じ長さを
含む）かを判定し、後述する略語判定段階に長短の区別
をつけて２つの文字列を渡す。34は略語判定段階であ
り、前記文字列長判定段階で文字列長が短いと判定され
た文字列を構成する各文字が、他方の文字列中にその出
現順にすべて含まれているとき、文字列長の短い方の文
字列は、他方の文字列の略語になっている可能性がある
と判定する。35は文章終了判定段階であり、文章中から
文字列をすべて抽出したか否かを判定し、まだ文字列が
残っていれば32の段階に戻る。文字列をすべて抽出した
ら表示部17に判定結果を一覧表示する。Reference numeral 31 denotes an input character string temporary storage stage, in which the notation of the character string input by the input unit 13 is temporarily stored. Numeral 32 denotes a character string extraction stage in which the sentence stored in the sentence storage unit 11 is delimited at a point where the text is changed from hiragana to another type of character, and the character string between the delimiters is sequentially extracted one by one. Reference numeral 33 denotes a character string length determination step. The character string extraction step 32 compares the extracted character string with the length of the character string temporarily stored in the input character string temporary storage unit 14, and which one has the shorter character string length ( (Including the same length), and two character strings are passed to the abbreviation determination step, which will be described later, with a distinction between long and short. Reference numeral 34 denotes an abbreviation determination step.When each character constituting the character string determined to be short in the character string length determination step is included in the other character string in the order of appearance, the character It is determined that the character string with the shorter column length may be an abbreviation of the other character string. Reference numeral 35 denotes a sentence end determination step in which it is determined whether or not all character strings have been extracted from the sentence. If character strings still remain, the process returns to step 32. When all the character strings have been extracted, a list of the judgment results is displayed on the display unit 17.

以上のように構成された本実施例の文字処理装置およ
びその方法について以下その動作を具体的に説明する。The operation of the character processing apparatus and method according to the present embodiment configured as described above will be specifically described below.

まず、文章記憶部11で記憶している文章中で「ソビエ
ト連邦」という文字列がいろいろな表記で既述されてい
ないかを調べるときに、入力部13より、「ソビエト連
邦」を入力する。すると、入力文字列一時記憶部14に
「ソビエト連邦」が一時記憶される。First, when checking whether the character string “Soviet Union” is described in various notations in the text stored in the text storage unit 11, “Soviet Union” is input from the input unit 13. Then, “Soviet Union” is temporarily stored in the input character string temporary storage unit 14.

次に、文字列抽出部12は、文章記憶部11に記憶されて
いる文章をひらがなから他の種類の文字に変わるところ
で区切り、区切りと区切りの間の文字列を１つずつ抽出
する。文字列抽出部12で抽出された文字列の表記が、
「ソ連」だった場合、文字列長判定部15は、「ソビエト
連邦」と「ソ連」の文字列長を比較する。このとき「ソ
連」の方が文字列長が短いとして、略語判定部16に渡
す。略語判定部16は、文字列長の短い方の文字列「ソ
連」を構成する各文字「ソ」、「連」は、もう一方の文
字列「ソビエト連邦」に出現順と同じ順序ですべて含ま
れるので、「ソ連」は、「ソビエト連邦」の略語である
可能性があると判定する。Next, the character string extraction unit 12 separates the text stored in the text storage unit 11 where the text changes from hiragana to another type of character, and extracts character strings between the delimiters one by one. The notation of the character string extracted by the character string extraction unit 12 is
In the case of “Soviet Union”, the character string length determination unit 15 compares the character string lengths of “Soviet Union” and “Soviet Union”. At this time, it is determined that the character string length of “USSR” is shorter and is passed to the abbreviation determining unit 16. The abbreviation determination unit 16 determines that each of the characters `` so '' and `` ren '' constituting the shorter character string `` Soviet Union '' are all included in the other character string `` Soviet Union '' in the same order as they appear. Therefore, it is determined that "USSR" may be an abbreviation of "Soviet Union".

さらに、文字列抽出部12で、「ソビエト社会主義共和
国連邦」が抽出された場合、文字列長判定部15は、「ソ
ビエト連邦」と［ソビエト社会主義共和国連邦」を比較
して、「ソビエト連邦」の方が文字列長が短いと判定す
る。略語判定部16は、「ソビエト連邦」が、「ソビエト
社会主義共和国連邦」の略語である可能性があると判定
する。Further, when “Soviet Union Socialist Republic” is extracted by the character string extraction unit 12, the character string length determination unit 15 compares “Soviet Union” with “Soviet Union Is determined to have a shorter character string length. The abbreviation determination unit 16 determines that “Soviet Union” may be an abbreviation of “Soviet Socialist Republic”.

このようにして、文章中のすべての文字列と入力文字
列一時記憶部14で一時記憶している文字列を比較し、略
語であるか否かを判定する。すべての文字列について略
語判定が終了したら、表示部17に略語と判定されたもの
を表示する。In this way, all the character strings in the text are compared with the character strings temporarily stored in the input character string temporary storage unit 14 to determine whether or not the text is an abbreviation. When the abbreviation determination is completed for all the character strings, the display unit 17 displays the abbreviation determined.

以上のように、本実施例によれば、文字列長判定部を
設け、略語判定部において２つの文字列を相互に比較す
る場合に、あらかじめ文字列長の短い方を判定して略語
になる可能性がある文字列を一方に限定することによっ
て、比較する回数が減り、処理速度が速くなる。As described above, according to the present embodiment, the character string length determination unit is provided, and when the two character strings are compared with each other in the abbreviation determination unit, the shorter character string length is determined in advance to become an abbreviation. By limiting the possible character strings to one, the number of comparisons is reduced and the processing speed is increased.

なお、本実施例では、文字列抽出部は、文章をひらが
なから他の種類の文字に変わるところで区切り、区切り
と区切りの間の文字列を１つずつ抽出するとしたが、カ
タカナ文字列などの同一種の文字からなる１続きの文字
列を抽出するとしてもよい。また、小規模の付属語表記
辞書を持ち、付属語で区切られる文字列を抽出するとし
てもよい。In the present embodiment, the character string extraction unit separates a sentence from a hiragana to another type of character, and extracts one character string between the delimiters. A series of character strings consisting of one type of character may be extracted. Further, a small-sized attached word notation dictionary may be provided, and a character string delimited by the attached word may be extracted.

さらに、表示部は、略語と判定された部分を一覧表示
するとしたが、文章中において該当部分を反転表示、下
線表示して他の部分と異なる表示を行うとしてもよい。Further, although the display unit displays a list of the portions determined to be abbreviations, the corresponding portions in the text may be highlighted and underlined so as to be displayed differently from other portions.

発明の効果本発明によれば、ある単語の略語を文章中から検索し
たり、または、ある略語の正式な表記の文字列を文書中
から検索することができる。さらに、表記のゆれ辞書を
用いる必要がないので、少ないメモリで実現することが
できるため、その実用的効果は大きい。According to the present invention, an abbreviation of a certain word can be searched for in a sentence, or a formal notation of a certain abbreviation can be searched for in a document. Further, since it is not necessary to use a spelling dictionary, it can be realized with a small memory, and the practical effect is large.

[Brief description of the drawings]

第１図は本発明の一実施例の文字処理装置の構成図、第
２図は従来の文字処理装置の構成図、第３図は本発明の
一実施例の文字処理方法のフロー図である。 11……文章記憶部、12……文字列抽出部、13……入力
部、14……入力文字列語一時記憶部、15……文字列長判
定部、16……略語判定部、17……表示部。FIG. 1 is a block diagram of a character processing apparatus according to one embodiment of the present invention, FIG. 2 is a block diagram of a conventional character processing apparatus, and FIG. 3 is a flowchart of a character processing method according to one embodiment of the present invention. . 11 ... sentence storage unit, 12 ... character string extraction unit, 13 ... input unit, 14 ... input character string word temporary storage unit, 15 ... character string length judgment unit, 16 ... abbreviation judgment unit, 17 ... ... Display unit.

Claims

(57) [Claims]

A sentence storage unit for storing a sentence, a character string extraction unit for extracting a character string from the sentences stored in the sentence storage unit, an input unit for inputting a description of the character string, An input character string temporary storage unit for temporarily storing a character string input from the input unit, a character string temporarily stored in the input character string temporary storage unit, and a character string length of the character string extracted by the character string extraction unit And comparing the notation of the first character string and the second character string with the character string length determination unit that sets the shorter one as the first character string and the longer one as the second character string, If all the characters constituting the character string of 1 exist in the second character string regardless of their appearance, and the appearance order is the same, the first
Is an abbreviation of the second character string, that is, an abbreviation determination unit that determines that the second character string is a formal notation of the first character string.

2. A character string extracting step of extracting a character string from a sentence, an input character string temporary storing step of temporarily storing an input character string, and a character string temporarily stored in the input character string temporary storing step. And a character string extracted by the character string extracting means (hereinafter, referred to as an extracted character string). The shorter character string is used as the first character string, and the longer character string is used as the second character string. By comparing the length determination step with the notation of the first character string and the notation of the second character string, all of the characters constituting the first character string are all If they exist and have the same appearance order, it is determined that the first character string is an abbreviation of the second character string, that is, the second character string is a formal notation of the first character string. The character processing method.