JPS61282965A

JPS61282965A - Character string dividing method

Info

Publication number: JPS61282965A
Application number: JP60124684A
Authority: JP
Inventors: Shunichi Fukushima; 俊一福島
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1985-06-07
Filing date: 1985-06-07
Publication date: 1986-12-13

Abstract

PURPOSE:To prevent a numeric string from being divided by punctuation mark, comma, and period in the numeric string by preventing a character string from dividing when numerics exist immediately before and after a punctuation mark. CONSTITUTION:A punctuation mark extracting means 3 retrieves a KANA (Japanese syllabary) string stored in a character string memory means 2, and extracts punctuation marks such as a comma and a period. An immediately before non-numeric deciding means 4 receives the position of the punctuation mark from the punctuation mark extracting means 3, and outputs a signal to an OR gate when the character immediately before the punctuation mark is not a numeric. An immediately after non-numeric deciding means 5 outputs a signal to the OR gate 6 when the character immediately after the punctuation mark is not numeric. When the signal is transmitted from the OR gate 6, a conversion unit reading means 7 receives the position of the punctuation mark from the punctuation mark extracting means 3, reads in the KANA string up to the position of the punctuation mark from the character string memory means 2, and outputs it to a KANA/KANJI (Chinese character) converting means 8.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、ワードプロセッサ等において、べた書き入力
仮名漢字変換等を行う前処理として必要とされる文字列
を自動的に複数の処理単位に分割する方法に関するもの
である。Detailed Description of the Invention (Industrial Field of Application) The present invention is a method for automatically dividing character strings into multiple processing units, which is required as preprocessing for converting solid input from kana to kanji, etc. in a word processor or the like. It's about how to do it.

（従来技術とその問題点）従来、ワードプロセッサ等において、べた書き入力され
た仮名文字列の仮名漢字変換処理単位への分割は、句読
点やカンマ、ピリオド等の区切り記号によって行われる
ことがある。その−例を次に示す。(Prior art and its problems) Conventionally, in word processors and the like, division of a kana character string input in solid writing into kana-kanji conversion processing units is sometimes performed using delimiters such as punctuation marks, commas, and periods. An example of this is shown below.

キョウハ、ヨイテンキデス。Kyoha, Yoitenkides.

一／キョウハ、ｌヨイテンキデス、！この場合、２回に分けて「キョウハ」と［ヨイテンキデ
ス］がそれぞれ仮名漢字変換されることになる。なお、
ｌ”は分割したことを示す信号である。1/Kyoha, lyoitenkides,! In this case, ``Kyoha'' and ``Yoitenkides'' will be converted into kana and kanji in two separate steps. In addition,
l'' is a signal indicating division.

しかしながら、このように句読点で単純に分割する方法
では、次のように数字列中の読点、カンマ、ピリオドに
より数字列を分割してしまうという欠点を有する。However, this method of simply dividing by punctuation marks has the disadvantage that the number string is divided by punctuation marks, commas, and periods as shown below.

コメトケイヲ、　２９．８００エンデカイマシタ。Kometokeiwo, 29.800 Endekai Mashita.

→コノトケイヲ、／２９．／８００エンデカイマシタＪ
′コノモンダイノセイ力イリツハ、　６５．５５％ダッ
タ。→Kono Tokeio, /29. /800 Endekai Mashita J
'Konomon Dainosei Riki Iritsuha, 65.55% Datta.

→コノモンダイノセイ力イリツハ、　／　６５．　／　
５５％ダッタＪ仮名漢字変換結果としては誤りではない
ことがあるが、数値をもとに演算を行ったり、読み合わ
せ等を行ったりする場合、この分割の誤りが致命的にな
ることが多い。例えば、上記の例について、読み合わせ
を行った場合、数字の部分を読み上げるならば、分割の
誤りのため次のような誤った読み上げが行われることに
なる。→Konomon Dino Seiriki Iritsuha, / 65. /
55% Datta J Although the result of Kana-Kanji conversion may not be an error, this division error is often fatal when performing calculations based on numerical values, reading together, etc. For example, in the above example, when reading together, if the number part is read aloud, the following erroneous reading will occur due to an error in division.

／ニー、’ユークｌハッヒ・ヤク・・・ｌロクジューゴ
ｌゴジューゴ・・・（発明の目的）本発明の目的は、数字列中の読点、カンマ、ピリオドに
より数字列を分割してしまうことのない、正しく文字列
を分割できる方法を提供することである。/nee, 'yuk l hahi yak... l rokujugo l gojugo... (Objective of the invention) An object of the present invention is to prevent a number string from being divided by a comma, comma, or period in the number string. , to provide a method that can correctly split strings.

（発明の構成）本発明は、文字列を複数の単位に分割する際に、前記文
字列から読点、カンマ、ピリオド等の区切り記号を抽出
し、前記区切り記号の前後の文字が数字であるかを判定
し、前記区切り記号の直前あるいは直後に数字が存在し
ないときには前記区切り記号により文字列を分割し、前
記区切り記号の直前及び直後に数字が存在するときには
前記区切り記号では文字列を分割しないことを特徴とし
た文字列分割方法である。(Structure of the Invention) When dividing a character string into a plurality of units, the present invention extracts delimiters such as commas, commas, and periods from the character string, and determines whether the characters before and after the delimiter are numbers. , and if there is no number immediately before or after the delimiter, divide the string using the delimiter, and if there are numbers immediately before or after the delimiter, do not divide the string using the delimiter. This is a string segmentation method featuring the following.

（実施例）図面を用いて、本発明の構成を詳細に説明する。(Example) The configuration of the present invention will be explained in detail using the drawings.

第１図は、本発明の文字列分割方法を具体的にした日本
語文人力装置の一実施例を示すブロック図である。第１
図において、１は文字列入力手段であり、仮名文字列を
入力するために仮名キーボード等が用いられる。FIG. 1 is a block diagram showing an embodiment of a Japanese literary device that embodies the character string division method of the present invention. 1st
In the figure, 1 is a character string input means, and a kana keyboard or the like is used to input a kana character string.

２は文字列記憶手段であり、文字列入力手段１によって
入力された仮名文字列を記憶する。磁気ディスク装置、
磁気テープ装置、ＩＣメモリ等を用いて実現できる。Reference numeral 2 denotes a character string storage means, which stores the kana character string inputted by the character string input means 1. magnetic disk device,
This can be realized using a magnetic tape device, IC memory, etc.

３は区切り記号抽出手段であり、文字列記憶手段２に記
憶された仮名文字列を検索し、句点（。）、読点（、）
、カンマ（１）、ピリオド（、）等の区切り記号を抽出
し、抽出された区切り記号の位置を直前非数字判定手段
４、直後非数字判定手段５、及び変換単位読込手段７へ
出力する。3 is a delimiter extracting means, which searches the kana character string stored in the character string storage means 2 and extracts period marks (.) and commas (,).
, a comma (1), a period (,), etc., and output the position of the extracted delimiter to the immediately preceding non-numeric determining means 4, immediately following non-numeric determining means 5, and conversion unit reading means 7.

４は直前非数字判定手段であり、区切り記号抽出手段３
から区切り記号の位置を受は取り、区切り記号の直前の
文字が数字であるか否かを判定する。4 is the immediately preceding non-numeric determining means, and the delimiter extracting means 3
The position of the delimiter is taken from , and it is determined whether the character immediately before the delimiter is a number.

区切り記号の直前の文字が数字でないときに、ＯＲゲー
ト６へ信号を出力する。A signal is output to the OR gate 6 when the character immediately before the delimiter is not a number.

５は直後非数字判定手段であり、区切り記号抽出手段３
から区切り記号の位置を受は取り、区切り記号の直後の
文字が数字であるか否かを判定する。5 is a non-numeric determination means immediately after, and a delimiter extraction means 3
The position of the delimiter is taken from , and it is determined whether the character immediately after the delimiter is a number.

区切り記号の直後の文字が数字でないときに、ＯＲゲー
ト６へ信号を出力する。A signal is output to the OR gate 6 when the character immediately after the delimiter is not a number.

６はＯＲゲートであり、直前非数字判定手段４と直後非
数字判定手段５との少なくとも一方から信号が送られて
きたときに、変換単位読込手段７へ信号を出力する。Reference numeral 6 denotes an OR gate, which outputs a signal to the conversion unit reading means 7 when a signal is sent from at least one of the immediately preceding non-numeric determining means 4 and the immediately following non-numeric determining means 5.

７は変換単位読込手段であり、ＯＲゲート６から信号が
送られてきたとき、区切り記号抽出手段３から区切り記
号の位置を受は取り、その区切り記号の位置までの仮名
文字列を文字列記憶手段２から読み込み、仮名漢字変換
手段８へ出力する。7 is a conversion unit reading means, which receives the position of a delimiter from the delimiter extraction means 3 when a signal is sent from the OR gate 6, and stores the kana character string up to the position of the delimiter as a character string. It is read from the means 2 and output to the kana-kanji conversion means 8.

８は仮名漢字変換手段であり、変換単位読込手段７から
送られてきた仮名文字列を漢字仮名混じり文字列に変換
して出力する。この仮名漢字変換手段８は公知の技術を
用いて実現できる。Reference numeral 8 denotes a kana/kanji conversion means, which converts the kana character string sent from the conversion unit reading means 7 into a character string containing kanji/kana and outputs the converted character string. This kana-kanji conversion means 8 can be realized using known technology.

次に、この実施例の日本語入力装置の動作を例を用いて
説明する。Next, the operation of the Japanese language input device of this embodiment will be explained using an example.

文字列入力手段１により次のような仮名文字列が入力さ
れ、文字列記憶手段２に記憶されているものとする。It is assumed that the following kana character string is input by the character string input means 1 and stored in the character string storage means 2.

イチロウクンハ、オツ力イニイキマシタ、２゜５００エ
ンツカサブカツチ、１マンエンサッヲダシマシタ、オツ
リハイクラデショウ、コノモンダイノセイ力イリツハ、
　９５．５％ダッタ。Ichiro Kunha, Otsuriki Inikimashita, 2゜500 Entsu Ka Sabukatsuchi, 1 Man Ensawodashimashita, Otsuri Haikuladesho, Konomon Dai no Seiriki Iritsuha,
95.5% Datta.

このとき、区切り記号抽出手段３は、まず「イチロウク
ンハ」の直後のカンマ（１）を抽出し、その位置く８文
字目〉を、直前非数字判定手段４、直後非数字判定手段
５、及び変換単位読込手段７へ出力する。直前非数字判
定手段４はく８文字目〉の直前の位置く７文字目〉が数
字であるか否かを判定する。At this time, the delimiter extracting means 3 first extracts the comma (1) immediately after "Ichiro Kunha", and the 8th character at that position is passed to the immediately preceding non-numeric determining means 4, immediately following non-numeric determining means 5, and conversion. It is output to the unit reading means 7. Immediately before non-numeric character determining means 4 determines whether or not the position immediately before the 8th character (7th character) is a numeric character.

「ハ」は数字でないので、ＯＲゲート６へ信号を出力す
る。同時に、直後非数字判定手段５はく８文字目〉の位
置く９文字目〉が数字であるか否かを判定する。「オ」
は数字でないので、ＯＲゲート６へ信号を出力する。そ
の結果、ＯＲゲート６は変換単位読込手段７へ信号を出
力する。変換単位読込手段７では、ＯＲゲート６からの
信号を受けて、区切り記号抽出手段３から受は取った区
切り記号の位置く８文字目〉までの文字列「イチロウク
ンハ、」を仮名漢字変換手段８へ出力する。仮名漢字変
換手段８は、これを漢字仮名混じり文字列「一部君は、
」に変換して出力する。Since "c" is not a number, a signal is output to the OR gate 6. At the same time, the immediately following non-numeric character determining means 5 determines whether or not the 8th character (9th character) is a numeric character. "O"
Since is not a number, it outputs a signal to OR gate 6. As a result, the OR gate 6 outputs a signal to the conversion unit reading means 7. The conversion unit reading means 7 receives the signal from the OR gate 6 and converts the character string "Ichiro Kunha," up to the 8th character of the delimiter extracted from the delimiter extracting means 3 into the kana-kanji converter 8. Output to. The kana-kanji conversion means 8 converts this into a kanji-kana mixed character string ``Part-kun-kimi wa,
” and output.

次に、区切り記号抽出手段３は、「オツ力イニイキマシ
タ」の直後の＜１９文字目〉にあるピリオド（。Next, the delimiter extracting means 3 extracts the period (.

）を抽出する。直前非数字判定手段４では＜１８文字目
〉の「夕」が数字でないので、ＯＲゲート６へ信号を出
力する。直後非数字判定手段５では＜２０文字目〉の［
２）が数字であるので、ＯＲゲート６へ信号を出力しな
い。ＯＲゲート６では、直前非数字判定手段４から信号
が送られてきたので、変換単位読込手段７へ信号を出力
する。変換単位読込手段７では、ＯＲゲート６からの信
号を受けて、く９文字目〉から＜１９文字目〉までの文
字列「オツ力イニイキマシタ、」を仮名漢字変換手段８
へ出力し、仮名漢字変換手段８により、「お使いに行き
ました。］という漢字仮名混じり文字列が得られる。). The preceding non-numeric character determining means 4 outputs a signal to the OR gate 6 since the <18th character>"Yu" is not a numeric character. Immediately after, the non-numeric determining means 5 detects the <20th character> [
Since 2) is a number, no signal is output to the OR gate 6. Since the OR gate 6 receives the signal from the immediately preceding non-numeric determining means 4, it outputs the signal to the conversion unit reading means 7. The conversion unit reading means 7 receives the signal from the OR gate 6 and converts the character string "Otsuriki iniikimashita," from the 9th character to the 19th character into the kana-kanji conversion means 8.
Then, by the kana-kanji conversion means 8, a character string containing kanji and kana characters such as ``I went to run an errand.'' is obtained.

次に、区切り記号抽出手段３は、「２」の直後の＜２１
文字目〉にあるカンマ（１）を抽出する。直前非数字判
定手段４では＜２０文字目〉のｒ２Ｊが数字であるので
、ＯＲゲート６へ信号を出力しない。直後非数字判定手
段５では＜２２文字目〉の「５」が数字であるので、Ｏ
Ｒゲート６へ信号を出力しない。その結果、ＯＲゲート
６には、直前非数字判定手段４からも直後非数字判定手
段５からも信号が入力されず、変換単位読込手段７へ信
号は出力されない。従って、変換単位読込手段７及び仮
名漢字変換手段８は動作しない。Next, the delimiter extraction means 3 extracts <21 immediately after “2”.
Extract the comma (1) in the character>. The immediately preceding non-numeric determining means 4 does not output a signal to the OR gate 6 because the <20th character> r2J is a numeric value. Immediately after, the non-numeric determining means 5 determines that the <22nd character>"5" is a number, so O
No signal is output to R gate 6. As a result, no signal is input to the OR gate 6 from either the immediately preceding non-numeric determining means 4 or the immediately following non-numeric determining means 5, and no signal is output to the conversion unit reading means 7. Therefore, the conversion unit reading means 7 and the kana-kanji conversion means 8 do not operate.

同様にして、＜３４文字目〉のカンマ（１）、＜４８文
字目〉のピリオド（、）、＜６０文字目〉のピリオド（
、）、＜７５文字目〉のカンマ（１）、最後のピリオド
（。Similarly, <34th character> comma (1), <48th character> period (,), <60th character> period (
, ), the <75th character> comma (1), and the last period (.

）についてはＯＲゲート６から信号が出力されるが、＜
７８文字目〉のピリオド（、）についてはＯＲゲート６
から信号が出力されず、結果として、文字列記憶手段２
に記憶されていた仮名文字列は、次のような単位に分割
されて仮名漢字変換されることになる。), a signal is output from the OR gate 6, but <
For the period (,) of the 78th character, use OR gate 6.
As a result, no signal is output from the character string storage means 2.
The kana character strings stored in the .

ｌイチロウクンハ、ｌオツ力イニイキマシタ、１２゜５
００エンノカサヲカッテ、／１マンエンサッヲダシマシ
タ、ｌオツリハイクラデシシウＪコノモンダイノセイ力
イリツハ、７９５．５％ダッタ、ｌ−一部君は、お使い
に行きました。２．５００円の傘を買って、１万円札を
出しました。おつりはいくらでしよう。この問題の正解
率は、９５゜５％だった。l Ichiro Kunha, l Otsuriki Inikimashita, 12゜5
00 Ennokasawo Katte, /1manensawodashimashita, lotsurihaikuradesishiuJKonomondainoseirikiiritsuha, 795.5%datta, l-Some of you went to run an errand. 2. I bought an umbrella for 500 yen and took out a 10,000 yen bill. How much change will I have? The correct answer rate for this problem was 95.5%.

（発明の効果）以上説明したように、本発明の文字列分割方法によれば
、数字列中の読点、カンマ、ピリオドにより数字列を分
割してしまう致命的なことは全くなく、文字列を適切な
処理単位に分割することが可能となる。(Effects of the Invention) As explained above, according to the character string division method of the present invention, there is no fatal problem of dividing a number string due to a comma, comma, or period in the number string, and the character string is It becomes possible to divide into appropriate processing units.

[Brief explanation of the drawing]

第１図は本発明の文字列分割方法を具体的にした日本語
文人力装置の一実施例の構成を示すブロック図である。図において、１・・・文字列入力手段２・・・文字列記憶手段３・・・区切り記号抽出手段４・・・直前非数字判定手段５・・・直後非数字判定手段６・・・ＯＲゲート７・・・変換単位読込手段８・・・仮名漢字変換手段FIG. 1 is a block diagram showing the configuration of an embodiment of a Japanese literary device that embodies the character string division method of the present invention. In the figure, 1... Character string input means 2... Character string storage means 3... Delimiter extraction means 4... Immediate non-numeric determining means 5... Immediate non-numeric determining means 6...OR Gate 7: Conversion unit reading means 8: Kana-kanji conversion means

Claims

[Claims]

When dividing a character string into multiple units, extract delimiters such as commas, commas, and periods from the character string, determine whether the characters before and after the delimiter are numbers, and select the characters immediately before the delimiter. Alternatively, when there is no number immediately after the delimiter, the character string is divided by the delimiter, and when there are numbers immediately before and after the delimiter, the character string is not divided by the delimiter.