JPS63284676A

JPS63284676A - Character string processor

Info

Publication number: JPS63284676A
Application number: JP62118105A
Authority: JP
Inventors: Masako Bosu; 雅子望主
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1987-05-16
Filing date: 1987-05-16
Publication date: 1988-11-21

Abstract

PURPOSE:To effectively process numerals by recognizing the whole of a character string as a numeral in case said character string consists of numeric characters and auxiliary numeric characters such as prefixes or suffixes. CONSTITUTION:A numeric character table 20 stores Arabic numerals, Chinese numeric characters, numeric characters showing digits, the equivalent characters of the Chinese numeric characters, and flags showing the sections of these numeric characters. An auxiliary numeric prefix table 22 stores the auxiliary numeric prefixes like 'OYOSO', 'ZU', 'DAI', etc., and an auxiliary numeric suffix table 24 stores the auxiliary numeric suffixes like 'KAI', 'KO', etc. When the characters corresponding to these prefixes and suffixes are inputted, the auxiliary numeric prefixes and suffixes are regarded as a group and undergo the analysis of numerals through a numeral processing part 14. While such words as 'ICHININ MAE', 'GORI MUCHU', etc., are stored in an inhibition table 26 and undergo no analysis of numerals through the part 14.

Description

【発明の詳細な説明】技術分野本発明は文字列処理装置、特に数詞を含む文字列を処理
する装置に関する。TECHNICAL FIELD The present invention relates to a character string processing device, and more particularly to a device for processing character strings including numerals.

従来技術日本語の文字列を翻訳等のために解析する場合には、形
態素解析や構文解析の前処理として、数詞を解析する処
理が必要となる。BACKGROUND ART When analyzing Japanese character strings for purposes such as translation, it is necessary to analyze numerals as preprocessing for morphological analysis and syntactic analysis.

従来の装置においては、ｌ、２、三等の数字は解析でき
たが、数字を含む文字列、例えば「表１」に対しては、
ｒｌＪのみを数詞として認識し、１表１」全体を数詞と
して認識することができなかった。同様に「二億円」の
「億」や「数十人」の「数」を数詞として認識すること
ができないため、「二」と「億円」、「数」と「十人」
に分離して認識し、「二億円」、「数十人」全体を数詞
として認識することができなかった。Conventional devices were able to analyze numbers such as l, 2, and 3, but they were unable to analyze character strings containing numbers, such as "Table 1."
Only ``rlJ'' was recognized as a number word, and the entire ``1 Table 1'' could not be recognized as a number word. Similarly, the ``billion'' in ``200 million yen'' and the ``number'' in ``several tens'' cannot be recognized as number words, so ``two'' and ``billion yen'' and ``number'' and ``junin'' are used as numbers.
He was unable to recognize the words ``200 million yen'' and ``several tens of people'' as whole numbers.

また、「円」を数詞の桁を表す語とともに辞書に登録し
ておくため、大きな辞書の容量を必要とした。Furthermore, since ``yen'' was registered in the dictionary along with words representing the digits of number words, a large dictionary capacity was required.

さらに算用数字と漢数字の両方を含む表現の場合にはま
とめて数詞と認識できないため、適切な解析を行うこと
ができなかった。Furthermore, expressions containing both arithmetic numerals and Chinese numerals could not be recognized as numerals at the same time, making it impossible to perform appropriate analysis.

目　　　的本発明はこのような従来技術の欠点を解消し、日本語の
文字列において、数詞を効率良く検出することのできる
文字列処理装置を提供することを目的とする。OBJECTS It is an object of the present invention to provide a character string processing device that can eliminate the drawbacks of the prior art and can efficiently detect numerals in Japanese character strings.

構　　成本発明は上記の目的を達成させるため、文字列を入力す
る入力手段と、数字および漢数字相当字を格納する数字
記憶手段と、数字とともに用いられることにより数詞を
構成する曲数文字を格納する曲数文字記憶手段と、入力
手段から入力された文字列について、数字記憶手段およ
び曲数文字記憶手段を検索し、文字列が数詞であるか否
かを判断する数詞処理手段とを有し、数詞処理手段は、
文字列が数字および曲数文字からなる場合にもこれを全
体として数詞と認識することを特徴としたものである。Configuration In order to achieve the above object, the present invention includes an input means for inputting a character string, a number storage means for storing numbers and characters equivalent to Chinese numerals, and a number storage means for storing number characters that constitute a number word when used together with the numbers. and a number word processing means that searches the number storage means and the song number character storage means for a character string input from the input means and determines whether or not the character string is a number word. , the number word processing means is
Even if the character string consists of numbers and number letters, the character string is recognized as a number word as a whole.

以下、本発明の一実施例に基づいて具体的に説明する。Hereinafter, a detailed explanation will be given based on one embodiment of the present invention.

第１図には本発明による文字列処理装置の一実施例が示
されている。FIG. 1 shows an embodiment of a character string processing device according to the present invention.

本装置は入力部１０を有し、日本語の漢字かな混じり文
字列が入力される。入力部ｌＯは例えば、文字キーや機
能キー等を有するキーボード、紙に記録された日本語の
文字列を読み取る光学的文字読み取り装置（ＯＣＲ）お
よび磁気ディスク等の記憶媒体に記憶された日本語文を
読み込むファイル記憶装置等を含んでよい。This device has an input unit 10, into which a Japanese character string containing kanji and kana is input. For example, the input unit 1O may be a keyboard having character keys, function keys, etc., an optical character reader (OCR) that reads Japanese character strings recorded on paper, and Japanese text stored on a storage medium such as a magnetic disk. It may include a file storage device for reading, etc.

入力文字列ファイル１２には入力部１０から入力された
日本語の文字列が一時格納される。数詞処理部１４は入
力文字列ファイル１２から読み出された日本語の漢字か
な混じり文字列に含まれる数詞を、数字テーブル２０、
曲数接頭辞テーブル２２、曲数接尾辞テーブル２４、禁
止テーブル２６を参照することにより数詞処理し、数詞
の情報を付与して出力部ＩＢへ出力する。The input character string file 12 temporarily stores Japanese character strings input from the input unit 10. The numeral processing unit 14 converts the numerals included in the Japanese kanji/kana mixed character string read from the input character string file 12 into a number table 20,
Number words are processed by referring to the song number prefix table 22, the song number suffix table 24, and the prohibition table 26, and information on the number words is added and output to the output unit IB.

数字テーブル２０に格納されるデータの例が第２図に示
されている。同図に示されるように、数字テーブル２０
には、算用数字、漢数字、桁を表す漢数字、漢数字相当
字が格納され、それぞれに０〜３のフラグが立っている
。漢数字相当字とは、第２図に示されるように、何、数
、幾等のように数字の代わりに使用される文字である。An example of data stored in the number table 20 is shown in FIG. As shown in the figure, the number table 20
Arithmetic numerals, Kanji numerals, Kanji numerals representing digits, and characters equivalent to Kanji numerals are stored in , and a flag of 0 to 3 is set for each. Kanji numeral equivalent characters are characters used in place of numbers, such as ``what'', ``number'', ``number'', etc., as shown in Figure 2.

曲数接頭辞テーブル２２の例が第３図に示されている。An example of the song number prefix table 22 is shown in FIG.

同図に示されるように、曲数接頭辞テーブル２２には数
字を含む数詞の先頭に用いられる文字が格納されている
。As shown in the figure, the song number prefix table 22 stores characters used at the beginning of numerals including numbers.

曲数接尾辞テーブル２４の例が第４図に示されている。An example of the song number suffix table 24 is shown in FIG.

同図に示されるように、曲数接尾辞テーブル２４には数
詞の末尾に用いられる文字が格納されている。As shown in the figure, the song number suffix table 24 stores characters used at the end of number words.

禁止テーブル２６の例が第５図に示されている。An example of the prohibition table 26 is shown in FIG.

同図に示されるように、数字を含んだ文字列で独特の意
味を有する熟語または慣用表現となっているため、数詞
処理の対象としない方が良いものが登録ｙれている。禁
止テーブル２８は後述する禁止処理の時にアクセスされ
る。As shown in the figure, there are some strings of characters that are registered that are idioms or idiomatic expressions that have unique meanings and should not be subjected to number processing. The prohibition table 28 is accessed during prohibition processing, which will be described later.

数詞処理部１４における数詞処理は、数詞部分を次の４
つのパターンに分類して行われる。The number word processing in the number word processing unit 14 converts the number word part into the following four
It is classified into two patterns.

（ａ）曲数接頭辞＋数字＋曲数接尾辞例：第５回、約１２０年（ｂ）数字＋曲数接尾辞例＝１５人、数百台（Ｃ）曲数接頭辞＋数字例：表１、図２（ｄ）数字例二六、１９００したがって、数詞処理部１４は数字または曲数接頭辞を
検出した場合に数詞処理に入り、数字は続く限り読みと
ばし、数字のなくなったところまでを数字部とする。そ
の直後に曲数接尾辞があれば、ここまでを数詞部分とし
てまとめる。また、数詞を含む熟語でひとまとまりに扱
った方が良いものは禁止テーブル２６を参照することに
よって、数詞解析しないようにしている。(a) Song number prefix + number + song number suffix example: 5th, approximately 120 years (b) Number + song number suffix example = 15 people, several hundred (C) Song number prefix + number example : Table 1, Figure 2 (d) Number example 26, 1900 Therefore, when the number word processing unit 14 detects a number or a song number prefix, it enters number word processing, skips reading as many numbers as they last, and reads the numbers when there are no more numbers. up to the number part. If there is a song number suffix immediately after that, it will be summarized as a number part. Further, by referring to the prohibition table 26 for phrases that include number words that should be treated as a group, the number words are not analyzed.

出力部１６は例えばプリンタ、ディスプレイ、および磁
気ディスク等のファイル記憶装置を含む。The output unit 16 includes, for example, a printer, a display, and a file storage device such as a magnetic disk.

次に第６図（ａ）　（ｂ）のフローチャートにより、数
詞処理部１４における数詞処理の動作を説明する。Next, the operation of number word processing in the number word processing section 14 will be explained with reference to the flowcharts shown in FIGS. 6(a) and 6(b).

解析を行う文字列の先頭箇所を示すポインタＰと数字部
の先頭箇所Ｑを初期化しく１０２）　、文字列があるか
ないかを調べ（１０４）　、なければ処理を終了する。The pointer P indicating the beginning of the character string to be analyzed and the beginning Q of the numeric part are initialized (102), and the presence or absence of the character string is checked (104), and if not, the process is terminated.

文字列がある場合には、禁止処理を行う（１０６）。す
なわち、禁止テーブル２６を検索し、数詞を含む熟語が
先頭にあるか否かを検査する。この禁止処理の詳細につ
いては後述する。If there is a character string, prohibition processing is performed (106). That is, the prohibition table 26 is searched to check whether an idiom containing a number word is at the beginning. Details of this prohibition processing will be described later.

禁止処理の結果、解析できない、すなわち数詞処理して
はならないひとまとまりの熟語の場合にはに＝Ｏを返し
、解析できる、すなわち数詞処理できる場合にはに＝１
を返す。次にＫがＯであるか否かを判断しく＋０８）　
、　Ｋ　＝　Ｏの場合には数詞処理を行わずに、ステッ
プ１０４に戻り、次の文字に進む。Ｋ＝０でない場合に
は、数詞処理を行う。As a result of the prohibition process, if it cannot be parsed, that is, it is a group of idioms that should not be processed with number words, it returns =O, and if it can be analyzed, that is, it can be processed with number words, it returns =1.
return it. Next, determine whether K is O or not +08)
, K = O, the process returns to step 104 without performing numeral processing and proceeds to the next character. If K=0, numeral processing is performed.

まず、数字テーブル２０を調べることにより、文字列の
先頭に数字があるか否かを判断する（１１０）。First, by checking the number table 20, it is determined whether or not there is a number at the beginning of the character string (110).

文字列の先頭が数字である場合には、ここからを数詞部
分とみなし、この位置を示すポインタＰの値をＨに保存
しく１２４）　、　　ステップ１２８に進む。If the beginning of the character string is a number, this part is regarded as a numeral part, the value of pointer P indicating this position is stored in H (124), and the process proceeds to step 128.

文字列の先頭が数字でない場合には、文字列の先頭に曲
数接頭辞があるか否かを判断しく＋１２）、曲数接頭辞
がない場合には次の文字にポインタを進め（１１４）　
、ステップ１０４に戻る。If the beginning of the string is not a number, determine whether there is a song number prefix at the beginning of the string (+12), and if there is no song number prefix, advance the pointer to the next character (114).
, return to step 104.

曲数接頭辞がある場合には、数詞部分の先頭とみなし、
この位置を示すポインタＰの値をＨに保存する（１１Ｂ
）。接頭辞の長さだけポインタを進め（１１８）　、ポ
インタを進めた結果先頭となる文字の位置をＱに保存す
る（１２０）。この先頭の文字が数字か否かを判断しく
１２２）　、数字でなければ、数詞部分ではないと認め
、先頭となる文字の位置Ｑを初期化し、ポインタＰを１
つ進めて次の文字へ進み（１２Ｂ）　、　　ステップ１
０４に戻る。If there is a song number prefix, it is considered the beginning of the number part,
Save the value of pointer P indicating this position in H (11B
). The pointer is advanced by the length of the prefix (118), and the position of the first character resulting from advancing the pointer is stored in Q (120). The first character is determined whether it is a number or not (122), and if it is not a number, it is recognized that it is not a numeral part, the position Q of the first character is initialized, and the pointer P is set to 1.
Advance one character to the next character (12B), Step 1
Return to 04.

先頭の文字が数字の場合には、ポインタＰを１つ進め、
次の文字に進む（１２８）。また、ステップ１２４にお
いてポインタＰの値をＨに保存した後も、同様にポイン
タＰを１つ進め（１２８）　、次の文字に進む。If the first character is a number, advance the pointer P by one,
Proceed to the next character (128). Further, even after the value of pointer P is stored in H in step 124, pointer P is similarly advanced by one (128) to proceed to the next character.

次の文字が数字か否かを、数字テーブル２０を調べるこ
とにより判断しく１３０）　、数字であればステップ１
２８に戻り、ポインタＰを１つ進めて次の文字へ進み、
同様に数字か否かを調べる（１３０）。Determine whether the next character is a number by checking the number table 20 (130); if it is a number, step 1
Return to 28, advance pointer P by one, advance to the next character,
Similarly, it is checked whether it is a number or not (130).

数字でなくなった場合に、ループを抜け、その文字が曲
数接尾辞であるか否かを、曲数接尾辞テーブル２４を検
索することにより判断する（１３２）。曲数接尾辞であ
る場合には、Ｑ＞Ｈであるか否かを判断しく１３４）　
、　Ｑ＞Ｈの場合には先頭位置を保存したＨからＱ−１
までを接頭辞、数字部分の先頭位置Ｑからｐ−ｉまでを
数字部分、接尾辞の先頭部分のＰから最後までを接尾辞
と認識する（１３Ｂ）。すなわち、前記の（ａ）のパタ
ーンの数詞を認識する。If the character is no longer a number, the loop is exited, and whether or not that character is a song number suffix is determined by searching the song number suffix table 24 (132). If it is a song number suffix, determine whether Q>H or not.134)
, If Q>H, move from H to Q-1 with the starting position saved.
is recognized as a prefix, the number part from the first position Q to pi is recognized as a number part, and the part from the first position P of the suffix to the end is recognized as a suffix (13B). That is, the number words in the pattern (a) above are recognized.

Ｑ＞Ｈでない場合には接頭辞がないので、数詞部分の先
頭位置を保存したＨからＰ−１までが数字部分、Ｐから
後を接尾辞と認識する（１３Ｂ）。この場合には前記の
（ｂ）のパターンの数詞を認識する。If Q>H, there is no prefix, so the part from H to P-1, where the first position of the numeral part is saved, is recognized as the numeric part, and the part after P is recognized as the suffix (13B). In this case, the number words in the pattern (b) above are recognized.

ステップ１３２において曲数接尾辞がない場合には、Ｑ
＞Ｈであるか否かを判断しく１４０）　、　Ｑ＞Ｈの場
合には先頭位置を保存したＨからＱ−１までを接頭辞、
数字部分の先頭位置ＱからＰ−１までを数字部分と認識
する（１３Ｂ）。すなわち、前記の（Ｃ）のパターンの
数詞を認識する。If there is no song number suffix in step 132, then Q
>H or not (140), if Q>H, the prefix is from H to Q-1 where the starting position is saved,
The part from the beginning position Q of the number part to P-1 is recognized as the number part (13B). That is, the number words in the pattern (C) above are recognized.

Ｑ＞Ｈでない場合には接頭辞がないので、数詞部分の先
頭位置を保存したＨからＰ−１までを数字部分と認識す
る（１３８）。この場合には前記の（ｄ）のパターンの
数詞を認識する。If Q>H, there is no prefix, so the area from H to P-1, where the leading position of the numeral part is saved, is recognized as the numeric part (138). In this case, the number words in the pattern (d) above are recognized.

このようにして数詞部分が検出された後、接尾辞があっ
たステップ１３８　、１３８の後にはポインタを接尾辞
の先頭部分のＰから接尾辞の長さ分だけ進め、再びステ
ップ１０４に戻る。ステップ１４２．１４４の後はその
ままステップ１０４に戻る。After the numeral part is detected in this way, after steps 138 and 138 where the suffix was present, the pointer is advanced from P at the beginning of the suffix by the length of the suffix, and the process returns to step 104 again. After steps 142 and 144, the process directly returns to step 104.

次に、ステップ１０Ｂの禁止処理について第７図のフロ
ーチャートにより説明する。Next, the prohibition process in step 10B will be explained with reference to the flowchart of FIG.

禁止処理は、数詞部分を検出する前に、漢数字や漢数字
相当字で熟語をなしていてひとまとまりに扱った方がよ
いものを検出し、数詞処理しないようにするものである
。このような数詞処理しない方がよいものとしては、例
えば「四面楚歌」「五里霧中」　「白髪三千丈」等があ
る。The prohibition process detects, before detecting the number word part, phrases that are made up of Chinese numerals or characters equivalent to Chinese numerals and should be treated as a group, and prevents them from being processed as a number word. Examples of words that should not be processed in this way include ``Shimen Souka'', ``Gori Kirinchu'', and ``Shiragami Sanzenjo''.

入力された文字列に対し、数詞処理を行ってはならない
、すなわち解析禁止となるのは、次の４つのパターンに
該当する場合である。Number processing must not be performed on the input character string, that is, analysis is prohibited in the following four patterns.

（１）漢数字孔当字で直後に数字か曲数接尾辞がある・
・・・・・解析可能　例：焼入、数十（２）漢数字孔当
字で直後に数字か曲数接尾辞がなく、漢数字孔当字もな
い・・・・・・解析禁止例：数、何処、幾何（３）漢数字相当室以外の数字で禁止テーブルにある・
・・・・・解析禁止　例：四面楚歌（４）漢数字相当室
以外の数字で禁止テーブルにない・・・・・・解析可能
　例：四面、五人第７図において、まず文字列の先頭が
「何」「数」等の漢数字孔当字であるか否かを、数字テ
ーブル２０を調べて判断しく２０２）　、漢数字孔当字
である場合にはポインタを１つすすめる（２０４）。(1) A kanji character with a number or song number suffix immediately after it.
...Analysis possible Example: Quenching, number tens (2) Kanji numeral kanji with no numeral or song number suffix immediately after, and no kanji numeral kanji... Example where analysis is prohibited :Number, where, geometry (3) Numbers other than those in the room corresponding to Chinese numerals are on the prohibited table.
...Analysis prohibited Example: Shimen sanka (4) Numbers other than the Chinese numeral equivalent room and not in the prohibition table...Analysis possible Example: In the figure 7 for four men and five people, the beginning of the character string is The number table 20 is checked to determine whether or not the character is a Chinese numeral character such as "what" or "number" (202), and if it is a Chinese numeral character, the pointer is advanced by one (204).

次の文字が曲数接尾辞であるか否かを、曲数接尾辞テー
ブル２４を調べて判断しく２０Ｅｌ）　、曲数接尾辞で
あれば解析可能と判断してポインタを１つ戻しく２１０
）　、解析可能であることを表すに＝１を返す（２１２
）。Check the song number suffix table 24 to determine whether the next character is a song number suffix (20El), and if it is a song number suffix, determine that it can be analyzed, and move the pointer back by one 210
), returns =1 to indicate that it can be analyzed (212
).

ステップ２０６において曲数接尾辞でなければ、漢数字
孔当字であるか否かを、数字テーブルのフラグにより調
べ（２０８）　、漢数字孔当字であれば解析可能と判断
してポインタを１つ戻しく２１０）　、解析可能である
ことを表すに＝１を返す（２１２）。If it is not a song number suffix in step 206, it is checked by the flag of the number table whether it is a Chinese numeral or not. 210), returns =1 to indicate that it can be analyzed (212).

漢数字孔当字でない場合には、解析禁止と判断してポイ
ンタを１つ進め（２１４）　、解析禁止であることを表
すに＝Ｏを返す（２１Ｂ）。If it is not a Chinese numeral, it is determined that analysis is prohibited, the pointer is advanced by one (214), and =O is returned to indicate that analysis is prohibited (21B).

ステップ２０２において文字列の先頭が漢数字孔当字で
ない場合には、この文字列が禁止テーブル２Ｂにあるか
否かを調べ（２１Ｂ）　、禁止テーブル２６にない場合
にはポインタはそのままで、解析可能であることを表す
に＝１を返す（２２０）。In step 202, if the beginning of the character string is not a kanji kanji character, it is checked whether this character string exists in the prohibition table 2B (21B), and if it is not in the prohibition table 26, the pointer is left as is and the analysis is performed. =1 is returned to indicate that it is possible (220).

禁止テーブル２Ｇにある場合には解析禁止であるから、
ポインタを禁止テーブル２６にあった文字列の長さ分だ
け進め（２２２）　、解析禁止であることを表すに＝Ｏ
を返す（２２４）。If it is in the prohibition table 2G, analysis is prohibited, so
The pointer is advanced by the length of the character string in the prohibition table 26 (222), and =O indicates that analysis is prohibited.
is returned (224).

このようにして禁止処理を行う。In this way, prohibition processing is performed.

次に具体例を挙げて本装置の動作を説明する。Next, the operation of this device will be explained by giving a specific example.

入力文として第８図に示すような「私は第２５回の・・
・」という文が入力された場合に、ます、ポインタＰを
１とし、Ｑを０に初期化する（１０２）。The input sentence is ``I am the 25th...'' as shown in Figure 8.
When the sentence "・" is input, the pointer P is set to 1 and the pointer Q is initialized to 0 (102).

文字列があるから（１０４）　、第７図の禁止処理に移
り、「私Ｊは漢数字孔当字ではなく　（２０２）　、禁
止テーブルにもないから（２＋８）　、解析可能である
（２２０）　。＄　６図（ａ）の７ｏ−に戻って、Ｋ＝
Ｏでなく（＋０８）、先頭が数字でなく　（＋１０）　
、曲数接頭辞でもないので（１１２）　、ポインタを１
つ進め（＋１４）　、次の文字「は」に進む（１１４）
。Since there is a character string (104), we move on to the prohibition processing in Figure 7 and write, ``I J is not a Chinese numeral konduji (202), and it is not in the prohibition table (2+8), so it can be analyzed (220) .$ 6 Return to 7o- in Figure (a), K=
Not an O (+08), not a number at the beginning (+10)
, since it is not a song number prefix (112), the pointer is set to 1.
Advance one step (+14), advance to the next character "wa" (114)
.

「は」も「私」と同様に処理して、次の文字「第」に進
む（＋１４）　、禁止処理１０Ｂにおいて第７図に移り
、「第」は漢数字孔当字でな（（２０２）、禁止テーブ
ルにもないから（２１８）　、解析可能である（２２０
）　、第６図（ａ）のフローに戻って、Ｋ＝０でなく　
（１０８）　、先頭が数字でなく　（１１０）　、曲数
接頭辞であるから（１１２）　、　この位置を示すポイ
ンタＰの伯をＨに保存する（１１６）。すなわちＨ＝３
とされる。次にポインタを接頭辞の長さ分だけ進める（
＋１８）。この場合接頭辞「第」は１文字であるから、
ポインタＰを１つだけ進める。``Ha'' is also processed in the same way as ``Washi'' and proceeds to the next character ``No.'' (+14). In prohibition processing 10B, the process moves to Figure 7, and ``No.'' is a Chinese numeral konduji ((202) ), it is not in the prohibited table (218), so it can be analyzed (220
), returning to the flow in Figure 6(a), if K=0 instead of
(108), since the beginning is not a number (110) but a song number prefix (112), the number of pointer P indicating this position is stored in H (116). That is, H=3
It is said that Then advance the pointer by the length of the prefix (
+18). In this case, the prefix "No." is a single character, so
Advance pointer P by one.

ポインタを進めた結果先頭となる文字「二」の位置４を
Ｑに保存する（１２０）。この先頭となる文字「二」は
数字なので（１２２）　、ポインタを１つ進め（１２，
８）　、次の文字に進む。次の「十」は数字なので（１
３０）　、次の文字に進み（１２８）、「五」も数字な
ので（１３０）　、次の文字に進む（１２Ｂ）。ポイン
タが７となり「回」の文字にくると、数字ではないので
（１３０）　、このループを抜け、「回」は曲数接尾辞
であるから（１３２）　、ステップ１３４に進みＱとＨ
とを比較する。この場合、Ｑ＝４、Ｈ＝３であり、Ｑ＞
Ｈであるから、前記（ａ）のパターンである（１３Ｂ）
。As a result of advancing the pointer, position 4 of the character "2" which becomes the first character is stored in Q (120). The first character "2" is a number (122), so the pointer advances by one (12,
8) , move on to the next character. The next “ten” is a number (1
30), proceed to the next character (128), and since "five" is also a number (130), proceed to the next character (12B). When the pointer reaches 7 and reaches the character ``times'', it is not a number (130), so we exit this loop, and since ``times'' is a song number suffix (132), we proceed to step 134 and select Q and H.
Compare with. In this case, Q=4, H=3, and Q>
Since it is H, it is the pattern of (a) above (13B)
.

したがって、接頭辞はＨからＱ−１まで、すなわち３か
ら４−１までとなり、ポインタ３の「第」のみが接頭辞
となる。数字部分はＱからＰ−１まで、すなわち４から
７−１までとなり、ポインタ４から６の「二十五」が数
字部分となる。接尾辞はＰから、すなわち７からであり
、ポインタ７の「回」からが接尾辞となる。Therefore, the prefixes are from H to Q-1, that is, from 3 to 4-1, and only the "th" of pointer 3 is the prefix. The numerical part is from Q to P-1, that is, from 4 to 7-1, and "25" from pointers 4 to 6 is the numerical part. The suffix starts from P, that is, from 7, and the suffix starts from "time" of pointer 7.

これにより、数詞部分を検出したのでポインタを接尾辞
の長さ分だけ進め（１４Ｂ）　、次のポインタ８の「の
」について同様の処理を行う。As a result, the numeral part is detected, so the pointer is advanced by the length of the suffix (14B), and the same process is performed for the next pointer 8, "no".

次に、入力文として第９図に示すような［彼は幾何を勉
強する。」という文が入力された場合について説明する
。Next, the input sentence as shown in Figure 9 is [He studies geometry]. ” will be explained below.

「彼は」は、前記の例の「私は」と同様に、第７図の禁
止処理において、漢数字相当字ではなく（２０２）　、
禁止テーブルにもないから（２１８）　、解析可能であ
り（２２０）　、第６図（ａ）のに＝　Ｏｎ’なく（＋
０８）　、先頭が数字でなく（１１０）　、曲数接頭辞
でもないので（１１２）　、ポインタを１つ進め（１１
４）、次の文字「幾」に進む（１１４）。"He" is not a Chinese numeral equivalent (202) in the prohibition process in Figure 7, similar to "I am" in the above example.
Since it is not in the prohibition table (218), it is analyzable (220), and in Figure 6 (a), = On' is not (+
08), since the beginning is not a number (110) nor is it a song number prefix (112), the pointer is advanced by one (11).
4), proceed to the next character "Iku" (114).

「幾」は第７図の禁止処理において、漢数字相当字であ
るから（２０２）　、ポインタを１つ進め（２０４）　
、次の文字「何」を見ると、曲数接尾辞ではなく　（２
０Ｅｉ）　、漢数字相当字でもないので（２０８）　、
解析禁止と判断してポインタを進めて次の文字に進み（
２１４）　、　Ｋ　＝　Ｏを返す（２１８）。＄６図（
ａ）に戻り、Ｋ＝Ｏなので（１０Ｂ）　、数詞解析を行
わす、次の文字を処理する（１０４）。In the prohibition process in Figure 7, "Iku" is a character equivalent to a Chinese numeral (202), so the pointer is advanced by one (204).
, if you look at the next character "what", you will see that it is not a song number suffix but (2
0Ei), since it is not a kanji numeral equivalent (208),
Determines that parsing is prohibited and advances the pointer to the next character (
214), returns K=O (218). Figure $6 (
Returning to a), since K=O (10B), numeral analysis is performed and the next character is processed (104).

本実施例によれば、数詞を含む文字列に対して数詞特有
のパターンを考慮した解析を行うことができる。すなわ
ち、従来数字として扱わなかった接頭辞および接尾辞、
例えば漢数字相当字や桁を表す漢数字を数詞の一部とし
て扱うことにより、これらと数字からなる文字列全体を
数詞として認識することができ、適切な数詞の処理を行
うことができる。したがって、漢数字相当字や桁を表す
漢数字を個別に辞書に登録しておく必要もなく、辞書の
容量も小さくてすむ。According to this embodiment, it is possible to analyze a character string including a numeral in consideration of patterns specific to numerals. In other words, prefixes and suffixes that were not traditionally treated as numbers,
For example, by treating characters equivalent to Chinese numerals and Chinese numerals representing digits as part of number words, the entire string of characters consisting of these and numbers can be recognized as a number word, and appropriate number words can be processed. Therefore, there is no need to individually register characters corresponding to Chinese numerals or Chinese numerals representing digits in the dictionary, and the dictionary capacity can be reduced.

さらに、算用数字と漢数字の混合された文字列も正しく
解析することができる。Furthermore, character strings containing a mixture of arithmetic numerals and Chinese numerals can be correctly analyzed.

また、数詞や数字部分を含んだ熟語や慣用表現等のよう
な、ひとまとまりにして扱った方がよいと思われるもの
は禁止処理で検出し、数詞解析を行わないようにしてい
るから、このような慣用表現等を、誤って数詞として処
理することもない。In addition, the prohibition process detects words that should be treated as a group, such as idioms and idiomatic expressions that include number words or number parts, and prevents number word analysis. It also prevents idiomatic expressions such as verbs from being mistakenly treated as numerals.

効　　果本発明によれば、文字列が数字と接頭辞または接尾辞等
の曲数文字からなる場合にも、この文字列全体を数詞と
認識することができる。したがって、効率的に数詞を処
理することができる。Effects According to the present invention, even when a character string consists of numbers and number characters such as prefixes or suffixes, the entire character string can be recognized as a number word. Therefore, number words can be processed efficiently.

また、漢数字相当字や桁を表す漢数字を個別に辞書に登
録しておく必要がないため、辞書の容量を小さくするこ
とができる。Furthermore, since it is not necessary to individually register characters corresponding to Chinese numerals and Chinese numerals representing digits in the dictionary, the capacity of the dictionary can be reduced.

[Brief explanation of drawings]

第１図は本発明による文字列処理装置の一実施例を示す
機能ブロック図、第２図は第１図の数字テーブルに格納されるデータの一
例を示す図、第３図は第１図の曲数接頭辞テーブルに格納されるデー
タの一例を示す図、第４図は第１図の曲数接尾辞テーブルに格納されるデー
タの一例を示す図、第５図は第１図の禁止テーブルに格納されるデータの一
例を示す図、第６図（ａ）　（ｂ）は第１図の装置の動作を示すフロ
ーチャート、第７図は第６図（ａ）の禁止処理の動作を示すフローチ
ャート、第８図は第１図の装置に入力される入力文の一例を示す
図、第９図は第１図の装置に入力される入力文の他の例を示
す図である。主要部分の符号の説明１０、、、入力部１２、　、　、入力文字列ファイル１４、、、数詞処理部１６、、、出力部２０、、、数字テーブル２２、、、曲数接頭辞テーブル２４、、、曲数接尾辞テーブル２Ｂ、、、禁止テーブルFIG. 1 is a functional block diagram showing an embodiment of a character string processing device according to the present invention, FIG. 2 is a diagram showing an example of data stored in the number table shown in FIG. 1, and FIG. Figure 4 is a diagram showing an example of data stored in the song number prefix table. Figure 4 is a diagram showing an example of data stored in the song number suffix table in Figure 1. Figure 5 is the prohibition table in Figure 1. 6(a) and 6(b) are flowcharts showing the operation of the device in FIG. 1, and FIG. 7 is a flowchart showing the operation of the prohibition process in FIG. 6(a). , FIG. 8 is a diagram showing an example of an input sentence inputted into the device shown in FIG. 1, and FIG. 9 is a diagram showing another example of an input sentence inputted into the device shown in FIG. 1. Explanation of symbols of main parts 10, Input section 12, Input character string file 14, Number processing section 16, Output section 20, Number table 22, Song number prefix table 24, ,, Song number suffix table 2B, , Prohibited table

Claims

[Scope of Claims] 1. An input means for inputting a character string; a numeric storage means for storing numbers and characters equivalent to Chinese numerals; and an arbiter for storing an arbiter character that constitutes a numeral when used together with the numerals. character storage means; and numeral word processing means for searching the numeral storage means and the arbiter character storage means for the character string input from the input means and determining whether or not the character string is a numeral word. A character string processing device characterized in that the numeral processing means recognizes the character string as a numeral as a whole even when the character string consists of the numeral and the fractional character.