JPH07141347A

JPH07141347A - Method for segmenting japanese character string

Info

Publication number: JPH07141347A
Application number: JP4108691A
Authority: JP
Inventors: Takeki Yamada; 武樹山田
Original assignee: Individual
Current assignee: Individual
Priority date: 1992-03-31
Filing date: 1992-03-31
Publication date: 1995-06-02

Abstract

PURPOSE:To increase the processing speed and to reduce a program size by segmenting a Japanese character string by a character string starting with a KANJI (Chinese character) or a KATAKANA (square form of Japanese syllabary) and ending with a HIRAGANA (cursive form a Japanese syllabary) or a punctuation mark. CONSTITUTION:When a program is started, a variable is first decleared (S1), and a single line (single end of paragraph) is read out of a file and sent to an array L (S2). Then, a segmenting position is decided (S4), and it is decided whether a variable KPOS=0 is satisfied or not (S5). That is, it is decided whether the array L starts with a KANJI or a KATAKANA. When KPOS=0 is satisfied, a character string including 0 through HPOS is substituted for an array X (S6), and a line feed code is added to the array X and sent to a reading file (S7). It is decided whether an array L [i] is equal to a punctuation mark or not (S12). If not, the characters following (i+1) characters are substituted again for the array L (S13). If an array SL [i] is decided as a punctuation mark in S12, the characters following (i+2) charactersare substituted again for the array L (S14).

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、日本語のスペルチェッ
ク、機械翻訳、キーワード自動抽出等の用途に適した日
本語文字列区分け方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a Japanese character string segmentation method suitable for applications such as Japanese spell check, machine translation, and automatic keyword extraction.

【０００２】[0002]

【従来の技術】英語（英文）では、スペースやタブ等の
ホワイト・スペース文字で単語が区分けされていること
から、スペルチェック等を目的として文字列を単語に区
分けすることが行われている。2. Description of the Related Art In English (English), words are divided by white space characters such as spaces and tabs. Therefore, a character string is divided into words for the purpose of spell checking.

【０００３】しかしながら日本語の場合には、ホワイト
・スペース文字で文字列（日本文）を区分けすることは
行われていないために、文字列を区分けすることは困難
である。そこで、従来は漢字や平仮名等の文字種の変化
する点で文字列を区分けしたり、文章構造を解析した結
果から主語や述語等の文字列に区分けすることが行われ
ていた。However, in the case of Japanese, it is difficult to classify the character string because the character string (Japanese sentence) is not classified by the white space character. Therefore, conventionally, a character string has been classified according to the change of the character type such as kanji or hiragana, or it has been classified into a character string such as a subject or a predicate based on an analysis result of a sentence structure.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、文字種
の変化する点で文字列を区分けする方法では、区分けが
細かくなり過ぎ、しかも平仮名だけの文字列が多く区分
けされるために、平仮名だけの文字列が、送り仮名なの
か、形容詞なのか、副詞なのかが判断できない。また、
平仮名だけの文字列が、それだけでは送り仮名として適
当なのか否かの判断等もできない。更には、主語や述語
等の判別はほとんどできないという問題点がある。However, in the method of segmenting a character string at the point where the character type changes, the segmentation becomes too fine, and more character strings containing only hiragana are segmented. However, it cannot be determined whether it is a futuristic kana, an adjective, or an adverb. Also,
It is not possible to judge whether a character string consisting only of hiragana is suitable as a sending kana by itself. Furthermore, there is a problem that the subject and the predicate can hardly be discriminated.

【０００５】一方、文章構造の解析結果から文字列を区
分けする方法では、比較的に良好な区分け結果が得られ
る半面、文章構造を解析するプログラムのサイズが大き
くなるために、処理速度が遅く、しかも小型の機器に組
込むことができないという問題点がある。On the other hand, in the method of dividing the character string from the analysis result of the text structure, a relatively good division result can be obtained, but the size of the program for analyzing the text structure becomes large, so that the processing speed is slow. Moreover, there is a problem that it cannot be incorporated in a small device.

【０００６】本発明は、上記の問題点に鑑みてなされた
もので、独立して意味を判別できる程度に文字列を区分
けすると共に、処理速度が速く、かつ小型の機器に組込
むことのできるようにプログラムサイズをコンパクトに
することを目的とする。The present invention has been made in view of the above problems, and character strings can be divided to such an extent that the meanings can be independently discriminated, and the processing speed is high and the apparatus can be incorporated into a small device. The purpose is to make the program size compact.

【０００７】[0007]

【課題を解決するための手段】この目的を達成するため
に、本発明の日本語文字列区分け方法は、漢字または片
仮名で始り平仮名または句読点で終る文字列によって日
本語文字列の区分けを行うように構成されている。In order to achieve this object, the Japanese character string segmentation method of the present invention is to segment a Japanese character string by a character string starting with a kanji or katakana or ending with a hiragana or punctuation mark. Is configured.

【０００８】[0008]

【作用】上記構成の日本語文字列区分け方法において
は、漢字または片仮名で始り平仮名または句読点で終る
文字列によって日本語文字列の区分けするだけなので、
処理速度が速く、かつプログラムサイズをコンパクトに
することができる。また、不必要に細かく区分けするこ
とが無く、独立して意味を判別できる程度に文字列を区
分けすることができる。[Function] In the Japanese character string segmentation method having the above configuration, the Japanese character string is segmented only by a character string starting with a kanji or katakana and ending with a hiragana or punctuation mark.
The processing speed is fast and the program size can be made compact. Further, it is possible to divide the character string to such an extent that the meaning can be independently discriminated without unnecessary unnecessary fine division.

【０００９】[0009]

【実施例】以下、本発明の実施例を図面に基づいて説明
する。なお、実施例は、ファイルから読み込んだ文字列
を区分けして別ファイルに出力するプログラムに本発明
を適用した場合を例示している。ただし、実際のプログ
ラムでは１バイト文字と２バイト文字が混在する場合を
想定してプログラムを作成することになるが、ここでは
説明を簡明なものとするために、読み込むファイルの内
容は２バイト文字で統一されており、かつ２バイト文字
も１文字として扱うものとして説明する。Embodiments of the present invention will be described below with reference to the drawings. The embodiment exemplifies a case in which the present invention is applied to a program that divides a character string read from a file and outputs the divided character string to another file. However, in the actual program, the program will be created assuming that 1-byte characters and 2-byte characters are mixed, but in order to simplify the explanation, the contents of the file to be read are double-byte characters. Will be described as a single character and double-byte characters will be treated as one character.

【００１０】図１は、本発明による日本語文字列区分け
方法の一実施例を示すフローチャートである。FIG. 1 is a flowchart showing an embodiment of a Japanese character string segmentation method according to the present invention.

【００１１】図１において、プログラムがスタートする
と、まず変数の宣言を行う（ステップＳ１）。ここでい
う変数は、プログラム内で使用されるｉ、ＫＰＯＳ、Ｈ
ＰＯＳ、ＫＦＮＥＷ、ＫＦＯＬＤ、ＨＦＮＥＷ、ＨＦＯ
ＬＤの７つの変数であり、宣言と同時に各変数に０が代
入される。In FIG. 1, when the program starts, variables are first declared (step S1). The variables here are i, KPOS, H used in the program.
POS, KFNEW, KFOLD, HFNEW, HFO
There are seven variables of LD, and 0 is assigned to each variable at the same time as the declaration.

【００１２】次に、配列Ｌにファイルから１行（１段
落）を読み込む（ステップＳ２）。配列Ｌには、配列Ｌ
［０］から順に１行分の文字コードが収納される。ステ
ップＳ２でファイルから１行を読み込んだ結果、ファイ
ルエンドであった場合にはプログラムを終了し、ファイ
ルエンドでない場合は次のステップＳ４に移行して区切
り判別処理を行う。なお、ステップＳ２では、「この処
理は非常に高速に行われる。従ってユーザーの待ち時間
を最小限に抑えることができる。」の１行が配列Ｌに読
み込まれたものとして以下の説明を行う。Next, one line (one paragraph) is read from the file into the array L (step S2). Array L, array L
Character codes for one line are stored in order from [0]. As a result of reading one line from the file in step S2, if the file end is reached, the program is terminated, and if it is not the file end, the process proceeds to the next step S4 to perform the delimiter discrimination processing. It should be noted that in step S2, the following description will be made assuming that one row of "This processing is performed very quickly. Therefore, the waiting time of the user can be minimized."

【００１３】ステップＳ４では、図２に示すようにして
区切り位置を判別する処理が行われる。In step S4, a process for discriminating the delimiter position is performed as shown in FIG.

【００１４】図２において、まず配列Ｌ［ｉ］が漢字ま
たは片仮名であるか否かが判断される（ステップＳ４
１）。漢字または片仮名のときは、ステップＳ４３に移
行してＫＦＮＥＷに１を代入する。更に、ステップＳ４
４でＫＦＯＬＤが０であると判断した場合にはＫＰＯＳ
にｉの値を代入して（ステップＳ４５）、ステップＳ４
６に移行する。ＫＦＯＬＤが１である場合には直ちにス
テップＳ４６に移行する（ステップＳ４４）。In FIG. 2, first, it is determined whether the array L [i] is a kanji or katakana (step S4).
1). If it is a kanji or katakana, the process proceeds to step S43 and 1 is substituted for KFNEW. Further, step S4
If it is determined that KFOLD is 0 in step 4, KPOS
The value of i is substituted into (step S45), and step S4
Go to 6. If KFOLD is 1, the process immediately proceeds to step S46 (step S44).

【００１５】ステップＳ４４でＫＦＯＬＤが０でありＫ
ＦＮＥＷが１のときは、新たに漢字または片仮名の文字
列が始まったことを意味するので、ステップＳ４５でそ
の位置ｉをＫＰＯＳに代入して記憶するようにしてい
る。In step S44, KFOLD is 0 and K
When FNEW is 1, it means that a new character string of kanji or katakana has begun, so that the position i is stored in KPOS in step S45.

【００１６】ステップＳ４１で配列Ｌ［ｉ］が漢字また
は片仮名でないと判断したときは、ＫＦＮＥＷに０を代
入してからステップＳ４６に移行する（ステップＳ４
２）。When it is determined in step S41 that the array L [i] is not a kanji character or katakana, 0 is assigned to KFNEW and then the process proceeds to step S46 (step S4).
2).

【００１７】次に、配列Ｌ［ｉ］が平仮名または句読点
であるか否かが判断される（ステップＳ４６）。平仮名
または句読点でないときは、ステップＳ４８に移行して
ＨＦＮＥＷに０を代入する。更に、ステップＳ４９でＫ
ＦＯＬＤが１であると判断し、更にステップＳ５０で配
列Ｌ［ｉ］が句読点であると判断した場合には、ＨＰＯ
Ｓにｉ−１の値を代入して（ステップＳ５１）、ステッ
プＳ５３に移行する。ステップＳ４９でＫＦＯＬＤが１
であると判断し、ステップＳ５０で配列Ｌ［ｉ］が句読
点でないと判断した場合には、ＨＰＯＳにｉの値を代入
して（ステップＳ５２）、ステップＳ５３に移行する。
なお、ステップＳ４９でＫＦＯＬＤが０であると判断し
た場合には直ちにステップＳ５３に移行する。また、ス
テップＳ４６で配列Ｌ［ｉ］が平仮名または句読点であ
ると判断したときは、ＨＦＮＥＷに１を代入してからス
テップＳ５３に移行する（ステップＳ４７）。Next, it is determined whether the array L [i] is a hiragana or a punctuation mark (step S46). If it is neither a hiragana character nor a punctuation mark, the process proceeds to step S48 and 0 is assigned to HFNEW. Further, in step S49, K
If it is determined that FOLD is 1 and that the array L [i] is a punctuation mark in step S50, the HPO
The value of i-1 is substituted for S (step S51), and the process proceeds to step S53. KFOLD is 1 in step S49
If it is determined that the array L [i] is not a punctuation mark in step S50, the value of i is assigned to HPOS (step S52), and the process proceeds to step S53.
If it is determined in step S49 that KFOLD is 0, the process immediately proceeds to step S53. If it is determined in step S46 that the array L [i] is a hiragana or punctuation mark, 1 is assigned to HFNEW, and then the process proceeds to step S53 (step S47).

【００１８】ステップＳ４９でＨＦＯＬＤが１であり、
かつＨＦＮＥＷが０のときは、平仮名または句読点で文
字列が終了したことを意味するので、ステップＳ５１ま
たはステップＳ５２でその位置をＨＰＯＳに代入して記
憶するようにしている。ステップＳ５１でＨＰＯＳにｉ
−１の値を代入する理由は、句読点の直前の位置を記憶
するようにしているためである。In step S49, HHOLD is 1, and
When HFNEW is 0, it means that the character string ends with a hiragana or a punctuation mark, so that the position is stored in the HPOS in step S51 or step S52. In step S51, i
The reason for substituting the value of -1 is that the position immediately before the punctuation mark is stored.

【００１９】ステップＳ５３では、ＫＰＯＳ≧０、ＨＰ
ＯＳ≧１、ＫＰＯＳ＜ＨＰＯＳが論理積で成立するか否
かを判断している。これは、ステップＳ２でファイルか
ら読み込んだ１行（配列Ｌ）に、漢字または片仮名で始
り平仮名または句読点で終る文字列が存在するか否を判
断していることになる。先に示した例では、「この処理
は」までの文字列をｉをインクリメントしながら走査し
た時点で論理積が最初に成立する。即ち、ＫＰＯＳ＝
２、ＨＰＯＳ＝４、２＜４が論理積で成立する。なお、
配列Ｌの先頭は配列Ｌ［０］であり、０から始まってい
るので、ＫＰＯＳ＝３、ＨＰＯＳ＝５、３＜５ではない
点に注意されたい。In step S53, KPOS ≧ 0, HP
It is determined whether or not OS ≧ 1 and KPOS <HPOS are satisfied by a logical product. This means that it is determined whether or not there is a character string starting with a kanji or katakana or ending with a hiragana or punctuation mark in one line (array L) read from the file in step S2. In the example shown above, the logical product is first established when the character strings up to “this process” are scanned while incrementing i. That is, KPOS =
2, HPOS = 4, 2 <4 holds as a logical product. In addition,
Note that the beginning of the array L is the array L [0], which starts from 0, so that KPOS = 3, HPOS = 5, and 3 <5 are not satisfied.

【００２０】ステップＳ５３で論理積が成立しないとき
は、ステップＳ５４でＫＦＯＬＤにＫＦＮＥＷの値を代
入し、ＨＦＯＬＤにＨＦＮＥＷの値を代入し、更にｉを
インクリメントしてからステップＳ４１に戻る。ステッ
プＳ５３で論理積が成立するときは、次のステップＳ５
に移行する。When the logical product is not established in step S53, the value of KFNEW is substituted into KFOLD and the value of HFNEW is substituted into HFOLD in step S54, and i is further incremented before returning to step S41. When the logical product is established in step S53, the next step S5
Move to.

【００２１】ステップＳ５ではＫＰＯＳ＝０が成立する
か否かを判断する。これは配列Ｌが漢字または片仮名で
始るか否かを判断することを意味する。ＫＰＯＳ＝０の
ときは、配列Ｘに０からＨＰＯＳまでの文字列を代入し
て（ステップＳ６）、配列Ｘに改行コードを付けて書き
出しファイルに出力する（ステップＳ７）。In step S5, it is determined whether KPOS = 0 holds. This means to determine whether the array L starts with Kanji or Katakana. When KPOS = 0, a character string from 0 to HPOS is assigned to the array X (step S6), a line feed code is attached to the array X and output to the write file (step S7).

【００２２】ＫＰＯＳが０でないときは、配列Ｘに０か
らＫＰＯＳ−１までの文字列を代入して（ステップＳ
８）、配列Ｘに改行コードを付けて書き出しファイルに
出力する（ステップＳ９）。続いて、配列ＹにＫＰＯＳ
からＨＰＯＳまでの文字列を代入して（ステップＳ１
０）、配列Ｙに改行コードを付けて書き出しファイルに
出力する（ステップＳ１１）。When KPOS is not 0, the character strings from 0 to KPOS-1 are substituted into the array X (step S
8) Then, add a line feed code to the array X and output to the write file (step S9). Then, KPOS is added to the array Y.
To the character string from HPOS (step S1
0), the line feed code is added to the array Y and output to the write file (step S11).

【００２３】先に示した例では、「この処理は」までの
文字列を走査した時点でステップＳ５３の論理積が成立
してステップＳ５に移行してきているので、０から１
（即ち、ＫＰＯＳ−１）までの文字列「この」を配列Ｘ
に代入し（ステップＳ８）、改行コードを付けて書き出
しファイルに出力する（ステップＳ９）。続いて、２
（即ち、ＫＰＯＳ）から４（ＨＰＯＳ）までの文字列
「処理は」を配列Ｙに代入し（ステップＳ１０）、改行
コードを付けて書き出しファイルに出力する（ステップ
Ｓ１１）。In the above-described example, since the logical product of step S53 is established at the time when the character strings up to "this process" are scanned and the process proceeds to step S5, 0 to 1
(That is, KPOS-1) the character string "this" up to the array X
(Step S8), add a line feed code, and output to a write file (step S9). Then 2
(That is, KPOS) to 4 (HPOS) character strings "processed" are assigned to the array Y (step S10), a line feed code is added and output to a write file (step S11).

【００２４】ここで説明した文字列「この」がステップ
Ｓ９で出力されることから分かるように、「漢字または
片仮名で始り平仮名または句読点で終る文字列」で区切
った場合でも、文字列の前後に存在する「漢字または片
仮名で始り平仮名または句読点で終る文字列」以外の文
字列を区切ることができる点が、本発明の特徴の一つで
ある。これは、「漢字または片仮名で始り平仮名または
句読点で終る文字列」の前に「その」や「この」等の代
名詞が存在する場合や、「すぐに」等の副詞が存在する
場合にも有効に区切ることができることを意味する。As can be seen from the fact that the character string "this" described here is output in step S9, even if it is delimited by "a character string starting with a kanji or katakana or ending with a hiragana or punctuation mark", it is before and after the character string. One of the features of the present invention is that character strings other than the "character strings that start with a kanji or katakana and end with a hiragana or punctuation mark" existing in 3) can be delimited. This means that if there is a pronoun such as "that" or "this" before "a character string that begins with a kanji or katakana and ends with a hiragana or punctuation mark" or if there is an adverb such as "immediately" It means that it can be effectively separated.

【００２５】ステップＳ７、ステップＳ９、ステップＳ
１１で書き出しファイルへの出力が終わると、ステップ
Ｓ１２以降で次の区切り判別処理を行うための準備を行
う。Step S7, Step S9, Step S
When the output to the writing file is completed in 11, preparations are made for performing the next delimiter determination processing in step S12 and subsequent steps.

【００２６】ステップＳ１２では、配列Ｌ［ｉ］が句読
点であるか否かを判断し、配列Ｌ［ｉ］が句読点でない
と判断した場合には、配列Ｌにｉ＋１文字以降を代入し
直す（ステップＳ１３）。また、ステップＳ１２で配列
Ｌ［ｉ］が句読点であると判断した場合には、配列Ｌに
ｉ＋２文字以降を代入し直す（ステップＳ１４）。な
お、ステップＳ１３とステップＳ１４とで、加算する数
が相違する理由は、区切り後の文字列に句読点が入らな
いように削除するためである。句読点の削除処理を行わ
ないようにすることもできる。In step S12, it is determined whether or not the array L [i] is a punctuation mark, and if it is determined that the array L [i] is not a punctuation mark, the array L is replaced with i + 1 characters or more (step S12). S13). If it is determined in step S12 that the array L [i] is a punctuation mark, i + 2 characters and subsequent characters are substituted into the array L again (step S14). The reason why the numbers to be added are different between step S13 and step S14 is that they are deleted so that no punctuation marks are included in the delimited character string. It is also possible not to delete the punctuation marks.

【００２７】次に、ステップＳ１５において、ｉ、ＫＰ
ＯＳ、ＨＰＯＳ、ＫＦＮＥＷ、ＫＦＯＬＤ、ＨＦＮＥ
Ｗ、ＨＦＯＬＤの７つの変数に０が代入され、続いて配
列Ｌの走査が終わって行末に至ったか否かが判断される
（ステップＳ１６）。行末でない場合は、ステップＳ４
に戻って入れ直した配列Ｌについて区切り判別処理（ス
テップＳ４）以降を繰り返す。また、行末であるとき
は、ステップＳ２に戻ってファイルから配列Ｌに新たな
１行を読み込んで区切り判別処理（ステップＳ４）以降
を繰り返す。これらの繰り返し動作は、ステップＳ３で
ファイルエンドが検出されるまで行われる。Next, in step S15, i, KP
OS, HPOS, KFNEW, KFOLD, HFNE
0 is assigned to the seven variables W and HHOLD, and it is then determined whether or not the scanning of the array L is completed and the end of the line is reached (step S16). If it is not the end of line, step S4
The division determination process (step S4) and subsequent steps are repeated for the array L that has been replaced. If it is at the end of the line, the process returns to step S2, a new line is read from the file into the array L, and the delimiter determination process (step S4) and subsequent steps are repeated. These repeated operations are performed until the file end is detected in step S3.

【００２８】なお、先に示した入力例の「この処理は非
常に高速に行われる。従ってユーザーの待ち時間を最小
限に抑えることができる。」は、ステップＳ７、ステッ
プＳ９、ステップＳ１１における出力の結果、書き出し
ファイルに次のように出力される。In the input example shown above, "This processing is performed very quickly. Therefore, the waiting time of the user can be minimized." Is output in steps S7, S9, and S11. As a result, the following is output to the export file.

【００２９】この処理は非常に高速に行われる従ってユ
ーザーの待ち時間を最小限に抑えることができるThis process is very fast and therefore minimizes user latency

【００３０】以上、本発明を実施例により説明したが、
本発明の技術的思想によれば、種々の変形が可能であ
る。例えば、上述した実施例においては、句読点
が「、」と「。」である場合について説明した
が、「，」や「．」を句読点とすることも可能である。
また、「」（）［］【】等の括弧類を区切りに加えることもできる。また、
上述した実施例においては、句読点を出力しないように
したが、句読点は、文章構造の情報として有益なので、
句読点を出力するようにもできる。The present invention has been described above with reference to the embodiments.
Various modifications are possible according to the technical idea of the present invention. For example, in the above-described embodiment, the case where the punctuation marks are “,” and “.” Has been described, but “,” and “.” Can be used as the punctuation marks.
Also, parentheses such as "" () [] [] can be added to the delimiters. Also,
Although the punctuation marks are not output in the above-mentioned embodiment, the punctuation marks are useful as information of the sentence structure,
You can also output punctuation.

【００３１】また、上述した実施例においては、ステッ
プＳ７、ステップＳ９、ステップＳ１１で書き出しファ
イルに出力する例で説明したが、書き出しファイルに出
力するのではなく、ユーザーに編集するか否かを問合せ
るようにしたり、翻訳プログラムにデータ値として渡す
ようにすることもできる。Further, in the above-described embodiment, the example of outputting to the writing file in step S7, step S9 and step S11 has been described, but instead of outputting to the writing file, the user is asked whether or not to edit. Or pass it as a data value to the translator.

【００３２】また、「漢字または片仮名で始り平仮名ま
たは句読点で終る文字列」によって日本語文字列の区分
けを行うことを含んで、「漢字または片仮名で始り、平
仮名または句読点が続き、漢字または片仮名で終る文字
列」によって日本語文字列の区分け位置の判別を行うよ
うに、本発明を更に派生させることもできる。本発明
は、他の解析方法等との併用や組合せを禁止するもので
もない。In addition, including the division of a Japanese character string by "a character string that starts with a kanji or katakana and ends with a hiragana or punctuation mark,""starts with a kanji or katakana and is followed by hiragana or punctuation marks, The present invention can be further derived so that the division position of the Japanese character string is determined by the “character string ending with katakana”. The present invention does not prohibit the combination or combination with other analysis methods.

【００３３】更に、上述した実施例においては、区切っ
た文字列をそのまま出力するようにしたが、区切った後
に、前方一致検索で代名詞や副詞を抽出することや、後
方一致検索で助詞や助動詞を抽出する等の解析処理を行
ってから出力するようにもできる。Further, in the above-described embodiment, the delimited character strings are output as they are. However, after delimited, the pronouns and adverbs are extracted by the prefix match search and the particle and auxiliary verb are extracted by the suffix match search. It is also possible to output after performing analysis processing such as extraction.

【００３４】[0034]

【発明の効果】以上のように、本発明の日本語文字列区
分け方法によれば、漢字または片仮名で始り平仮名また
は句読点で終る文字列によって日本語文字列の区分けす
るので、処理速度が速く、かつプログラムサイズをコン
パクトにすることが可能となる。また、不必要に細かく
区分けすることが無く、独立して意味を判別できる程度
に文字列を区分けすることが可能となる。As described above, according to the Japanese character string segmentation method of the present invention, a Japanese character string is segmented by a character string starting with a kanji or katakana and ending with a hiragana or a punctuation mark, so that the processing speed is high. In addition, the program size can be made compact. In addition, it is possible to divide the character string to such an extent that the meaning can be independently determined without unnecessarily finely dividing.

[Brief description of drawings]

【図１】本発明による日本語文字列区分け方法の一実施
例を示すフローチャートである。FIG. 1 is a flowchart showing an embodiment of a Japanese character string segmentation method according to the present invention.

【図２】本発明による日本語文字列区分け方法の一実施
例を示すフローチャートである。FIG. 2 is a flowchart showing an embodiment of a Japanese character string segmentation method according to the present invention.

[Explanation of symbols]

Ｓ１〜Ｓ１６ステップＳ４１〜Ｓ５４ステップ Steps S1 to S16 Steps S41 to S54

Claims

[Claims]

1. A method for classifying a Japanese character string, wherein a Japanese character string is classified by a character string starting with a kanji or katakana and ending with a hiragana or a punctuation mark.