JPS60136864A

JPS60136864A - Extracting method of word

Info

Publication number: JPS60136864A
Application number: JP58249460A
Authority: JP
Inventors: Yasuyuki Numata; 泰之沼田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1983-12-26
Filing date: 1983-12-26
Publication date: 1985-07-20

Abstract

PURPOSE:To extract a specific word without accessing a dictionary by detecting the initial character of extracted words and setting up a prescribed register when an honorific word ''O'' or ''GO'' is arranged on the head of the input characters. CONSTITUTION:The input characters are stored in an input character string temporary storage part 2. A ''kanji'' (Chinese character) phonetic processing part 3 detects whether segmenting section for enabling the word extracting operation is to be executed or not on the basis of a ''kanji'' phonetic table storage part 5. A character to be retrieved string forming part 4 sets up several characters started from an analysis starting position in a previously prepared buffer to segment characters at a ''kanji'' phonetic level. At the execution of the word extracting operation, that an honorific word exists in the head of the segmented characters is set up in a register in a specific word processing part 6 by using matching processing for the specific word to recognize the honorific word. The succeeding word is extracted and its characters are retrieved from a word dictionary 8 by a dictionary retrieval part 7.

Description

【発明の詳細な説明】技術分野本発明は日本語ワードプロセッサ等における単語抽出方
式に関し、特に単語抽出プロセスに漢字前の概念を導入
し辞書検索時における被検索文字列を必要最低限の設定
にすることにより、不必要な候補を抽出しないようにし
て、誤解析の減少および辞書検索の速度向上を実現可能
とした単語抽出方式に関するものである。[Detailed Description of the Invention] Technical Field The present invention relates to a word extraction method in a Japanese word processor, etc., and in particular, introduces a pre-kanji concept into the word extraction process and sets the search string to the minimum necessary when searching in a dictionary. This invention relates to a word extraction method that can reduce erroneous analysis and improve dictionary search speed by not extracting unnecessary candidates.

従来技術従来のカナ漢字変換処理方式においては、入力されたカ
ナ文字列から単語を抽出するアルゴリズムは、一般に１
次の如きものであった。Prior Art In conventional kana-kanji conversion processing methods, the algorithm for extracting words from input kana character strings is generally 1.
It was something like this:

（１）文字列の対する解析スタート位置の設定特殊な場
合を除いて、一般番コは文字列の先頭文字（第１番目の
文字）を解析のスタート位置としてまず設定し、その位
置を先頭文字とする単語の切出しに成功したならば、次
に、単語切出し後の文字列の先頭文字を新たな解析のス
タート位置として設定する方式である。(1) Setting the analysis start position for a character string Except for special cases, general numbers first set the first character (first character) of a character string as the start position for analysis, and then use that position as the first character. Once the word has been successfully extracted, the first character of the character string after word extraction is then set as the starting position for a new analysis.

（例）入力文字列ｉ！＼いせきによりたんとの〜 ↑：最初の解析スタート位置ここで、「かいせき（解析）」の切出しに成功した場合
には次の如くなる。(Example) Input string i! \Iseki by Tanto ~ ↑: Initial analysis start position Here, if the extraction of "Kaiseki (analysis)" is successful, the result will be as follows.

かいせきによりたんとの〜 ↑：次の解析スタート位置（２）辞書検索のための被検索文字列の作成辞書中の読
みの長さが最長６文字であるとすれば、上記例文の場合
、次のような被検索文字列が設定される。↑: Next analysis start position (2) Creating a search string for dictionary search Assuming that the maximum reading length in the dictionary is 6 characters, in the example sentence above, The following string to be searched is set.

ａ）最初の単語の切出し ■かいせきによ ■かいせきに ■かいせき ■かいせ ■かい ■かｂ）ｒ解析」切出し成功後の単語の切出し■によりたん
と ■によりたん ■によりた ■により ■によ一部に（３）設定した被検索文字列と辞書中の見出し文字列と
のマツチング判定による候補の抽出上記例の場合は次の
ようになる。a) Cutting out the first word ■Kaisekiyo■Kaisekini■Kaiseki■Kaise■kai■or b) r analysis” Word extraction after successful extraction Part (3) Extraction of candidates by determining the matching between the set character string to be searched and the index character string in the dictionary In the case of the above example, the process is as follows.

■「かいせきによ」により候ｉ抽出できない ■「かいせき番；」により候補抽出できない ■「かいせき」により「会席」、［解析」、「懐石」を抽出 ■［かいせ」により　′ 候補抽出できない ■「かい」により「会」、「回Ｊ、ｒ快」、［戒］等を抽出■「か」によ
り「可」、「香」、「蚊Ｊ、ｒ＠Ｊ等を抽出（４）（３）
で抽出された候補群に対して種々の評価を行い、最も適
切と思われる候補を決定する。■Unable to extract candidate i by "Kaisekiyo" ■Cannot extract candidates by "Kaiseki Ban;" ■Extract "Kaiseki", [Analysis], and "Kaiseki" by "Kaiseki" ■ Extract ′ candidate by "Kaise" Not possible ■ Extracts “kai”, “kai J, r kai”, [kai], etc. with “kai” ■ Extracts “oka”, “ka”, “mosquito J, r@J, etc.” with “ka” (4) (3)
Various evaluations are performed on the candidate group extracted in , and the candidate considered to be the most appropriate is determined.

しかしながら、上述の如き単語抽出方式は入力文字列に
よっては、候補群が極めて多数抽出される場合があり、
誤解析および辞書検索速度低下の原因となるという問題
があった。また、良く使われる丁寧語である［お」、「
ご」等の誤認識に起因する単語抽出ミスを防止すること
が難かしいという問題もあった。However, the word extraction method described above may extract an extremely large number of candidate groups depending on the input character string.
There was a problem that it caused erroneous analysis and slowed down dictionary search speed. In addition, the often used polite words [o], ``
Another problem is that it is difficult to prevent word extraction errors caused by erroneous recognition of words such as "go".

目　的本発明は上記事情に鑑みてなされたもので、その目的と
するところは、従来の単語抽出方式における上述の如き
問題を解消し、誤解析の減少およ・び辞書検索の速度向
上を可能とする単語抽出方式を提供することにある。Purpose The present invention has been made in view of the above circumstances, and its purpose is to solve the above-mentioned problems in conventional word extraction methods, reduce erroneous analysis, and improve the speed of dictionary searches. The purpose of this invention is to provide a word extraction method that makes it possible.

構　成以下、実施例に基づいて１本発明の構成を詳細番；説明
する。Configuration The configuration of the present invention will be explained in detail below based on examples.

第１図は本発明の一実施例であるカナ漢字変換・処理装
置の概要を示すブロック図、第２図はその要部である漢
字音（おん）テーブルの内部の一部を示すものである。Fig. 1 is a block diagram showing an overview of a kana-kanji conversion/processing device that is an embodiment of the present invention, and Fig. 2 shows a part of the inside of the kanji-on table, which is the main part. .

第１図において、ｌはキーボード入力部、２は入力文字
列一時記憶部、３は漢字音処理部、４は被検索文字列作
成部、５は漢字音テーブル記憶部、６は特殊語処理部、
７は辞書検索部、８は単語辞書を示している。なお、第
２図はあくまでも、漢字音テーブルの一例を示すもので
あり、本発明はこＪしに限定されるべきものではない。In FIG. 1, l is a keyboard input section, 2 is an input character string temporary storage section, 3 is a kanji sound processing section, 4 is a searched string creation section, 5 is a kanji sound table storage section, and 6 is a special word processing section. ,
Reference numeral 7 indicates a dictionary search section, and reference numeral 8 indicates a word dictionary. Note that FIG. 2 merely shows an example of a kanji sound table, and the present invention should not be limited to this.

漢字音処理部３は後述する条件に従って処理結果を分類
するための分類モジュールおよび分類結果を保持するレ
ジスタを有するものである。以下。The kanji sound processing unit 3 has a classification module for classifying processing results according to conditions described later and a register for holding the classification results. below.

このレジスタをｒＴＹＰＥＪと呼ぶ。また、特殊語処理
部６は後述する特殊語テーブルの内容である予め定めた
特殊語とのマツチング処理を行うものである。なお、被
検索文字列作成部４に、切出された（抽出された）語が
、いわゆる丁寧語の「お」。This register is called rTYPEJ. Further, the special word processing unit 6 performs matching processing with predetermined special words, which are the contents of a special word table to be described later. Note that the word cut out (extracted) in the searched character string creation unit 4 is the so-called polite word "o".

「ご」であった場合に［１」をセットするレジスタを用
意する。以降、このレジスタをｒＴＥ　Ｔ　ＮＥ　ＴＪ
と呼ぶ。Prepare a register that will set [1] if it is "go". From now on, this register will be referred to as rTE T NE TJ
It is called.

以下、本実施例の動作を説明する。なお、被検索文字列
作成部４の一般的動作を説明するため、次の例文かいせきのけりかによれば〜（解析の結果によれば〜）を用いる。The operation of this embodiment will be explained below. In order to explain the general operation of the searched character string creation unit 4, the following example sentence (according to the analysis result) will be used.

被検索文字列作成部４では、従来と同時に設定した解析
スタート位置から始まる６文字を、予め用意したバッフ
ァにセットする（第３図参照）。このバッファは文字が
一次元的に６文字セットできるものであれば良く、以下
、このバッファを「ＷＩＮＤＯＷＪと呼ぶ。The searched character string creation section 4 sets six characters starting from the analysis start position, which was set at the same time as the conventional method, in a buffer prepared in advance (see FIG. 3). This buffer only needs to be able to set six characters one-dimensionally, and hereinafter this buffer will be referred to as "WINDOWJ".

次に、上記ＷＩＮＤＯＷ中の文字列と、第２図に示した
漢字前テーブルの各要素とのマツチング処理を行い、Ｗ
ＩＮＤＯＷ中の文字列に対して漢字前の要素レベルでの
区切りを施し、その結果を具体的に表現し得る方法で、
予め用意したバッファ等にセットする。ここでは、ＷＩ
ＮＤＯＷ２という、−次元的に大きさ６の配列という表
現を有するバッファを用意している。Next, a matching process is performed between the character string in the above WINDOW and each element of the kanji front table shown in FIG.
A method that delimits the string in INDOW at the element level before the kanji and expresses the result concretely.
Set it in a buffer prepared in advance. Here, W.I.
A buffer called NDOW2 is prepared that has the representation of an array of size 6 in the - dimension.

第４図は上記ＷＩＮＤＯＷ中の文字列に施した区切りと
、ＷＩＮＤＯＷ２の内容の一例を示すも勿である。ＷＩ
ＮＤＯＷに付された矢印は上記漢字音レベルでの区切り
を示し、ＷＩＮＤＯＷ２の内容である数字はその文字数
に対応する漢字前が前記漢字前テーブル中に存在してい
ることを示すものである。Of course, FIG. 4 shows an example of the delimiters applied to the character strings in WINDOW and the contents of WINDOW2. W.I.
The arrow attached to NDOW indicates a break at the Kanji sound level, and the number that is the content of WINDOW2 indicates that the Kanji front corresponding to the number of characters exists in the Kanji front table.

上記例文については、ＷＩＮＤＯＷ２（１）＝２（ｒかい」に対応する）ＷＩ
ＮＤＯＷ２（２）＝２（ｒせき」に対応する）ＷＩＮＤ
ＯＷ２（３）＝１（ｒの」に対応する）ＷＩＮＤＯＷ２
（４）＝１（ｒけ」に対応する）ＷＩＮＤＯＷ２（５）
＝０ＷＩＮＤＯＷ２（６）＝０となる。For the example sentence above, WINDOW2 (1) = 2 (corresponds to "rkai") WI
NDOW2 (2) = 2 (corresponding to r cough) WIND
OW2 (3) = 1 (corresponds to "r") WINDOW2
(4) = 1 (corresponds to rke) WINDOW2 (5)
=0 WINDOW2(6)=0.

上記処理の結果を次の条件に従って分類する。The results of the above processing are classified according to the following conditions.

（１）ＷＩＮＤＯＷ２（１）≧２、かつＷＩＮＤＯＷ２
（２）≧２の場合ＴＹＰＥに［１」をセットする。(1) WINDOW2(1)≧2, and WINDOW2
(2) If ≧2, set TYPE to [1].

（２）ＷＩＮＤＯＷ２（１）≧２．かつＷＩＮＤＯＷ２
（２）＝１（７）場合ＴＹＰＥに［２」をセットする。(2) WINDOW2(1)≧2. AND WINDOW2
If (2)=1(7), set TYPE to [2].

（３）ＷＩＮＤＯＷ２（１）＝　１、かつＷＩＮＤＯＷ
２（２）≧２の場合ＴＹＰＥに「３」をセットする。(3) WINDOW2 (1) = 1 and WINDOW
If 2(2)≧2, set “3” to TYPE.

（４）上記分類（１）〜（３）以外の場合ＴＹＰＥに「
４」をセットする。(4) In cases other than the above categories (1) to (3), enter “TYPE”.
4”.

上記処理の結果、ＴＹＰＥにセットされた値力１「３」
または［４」の場合のみ、本発明の単語抽出動作が実行
される。As a result of the above processing, the value force 1 "3" set in TYPE
Or only in the case of [4], the word extraction operation of the present invention is executed.

第５図に本実施例の処理ブローの概略を示す。FIG. 5 shows an outline of the processing blow of this embodiment.

以下、第５図に従って、本実施例の動作を説明する。The operation of this embodiment will be described below with reference to FIG.

漢字音処理に基づく分類の結果、ＴＹＰＥの値が「３」
または「４」であった場合には、前記レジスタｒＴＥ　
ＩＮＥ　ＩＪを参照して、セットされて６する値をチェ
ックする。この値が［１」であれば、前記ＷＩＮＤＯＷ
の先頭文字が、「お」または「ご」であり、これを丁寧
語であると認識して次の単語の抽出処理に移る。As a result of classification based on kanji sound processing, the value of TYPE is "3"
or if it is "4", the register rTE
Refer to INE IJ and check the value set to 6. If this value is [1], the WINDOW
The first character of ``o'' or ``go'' is recognized as a polite word, and the process moves on to extracting the next word.

第６図にごせいえいのことと〜（ご清栄のことと〜）の解析を行った場合を示す。In Figure 6 About Goseiei~ (About Gosei~) This shows the case where the analysis is performed.

この例では、ＴＹＰＥ＝３であり、かつ、先頭の語が「
ご」であることから、これを丁寧語と認識して次のせいえいのことと〜の処理に移る。In this example, TYPE=3 and the first word is "
Since it is ``go,'' it recognizes this as a polite word and moves on to the next sentence, ``go'' and ~.

上述の如き処理の結果、前記例文中のＶ）わゆる丁寧語
「お」、「ご」は、辞書に対するアクセスなしに適切に
抽出することが可能となる。As a result of the above-described processing, V) so-called polite words "o" and "go" in the example sentence can be appropriately extracted without accessing a dictionary.

以下、上述の如き処理によって作成した被検索文字列を
用いて従来と同様しこ辞書検索を行う。Thereafter, a dictionary search is performed in the same way as in the past using the searched character string created by the above-described processing.

上記各実施例においては、ＷＩＮＤＯＷを６文字分の大
きさを有するバッファとしたが、これは必らずしも６文
字自限られるものではない。また、上記ＷＩＮＤＯＷの
如きバッファの代りに、入力文字列をセットするバッフ
ァとそのバッファ中の位置を示す複数のポインタおよび
そのポインタの値をセクトし得るレジスタ等を用意して
も良ｂ）。In each of the above embodiments, WINDOW is a buffer having a size of six characters, but this is not necessarily limited to six characters. Furthermore, instead of a buffer such as the above-mentioned WINDOW, a buffer for setting an input character string, a plurality of pointers indicating positions in the buffer, and a register capable of sectoring the values of the pointers may be provided.

効　果以上述べた如く１本発明によれば、入力文字列の先頭に
丁寧語の「、お」、［ご」がある場合に、特殊語処理に
より、辞書をアクセスすることなしに該特殊語を抽出す
ることができるので、漢字を含む単語の抽出を高速化す
ることができるという顕著な効果を奏するものである。Effects As described above, according to the present invention, when there is a polite word ", o" or "go" at the beginning of an input character string, the special word is processed without accessing a dictionary by special word processing. This has the remarkable effect of speeding up the extraction of words containing kanji.

[Brief explanation of drawings]

第１図は本発明の一実施例を示すブロック図、第２図は
漢字前テーブルの内容の一部を示す図、第３図は入力文
字列バッファの内容の一例を示す図、第４図は入力文字
列と漢字前テーブルの内容とのマツチングを行った状況
を示す図、第５図は　゛処理フローチャート、第６図は
具体的処理例を示す図である。１：キーボード入力部、２：入力文字列一時記憶部、３
：漢字音処理部、４：被検索文字列作成部。５：漢字前テーブル記憶部、６：特殊語処理部。７＝辞書検索部、８：単語辞書。第　１　図第　６　図 ↓ ＴＹＰＥ＝３手続補正書（方式）％式％２、発明の名称　単語抽出方式３、　補正をする者事件との関係　特許出願人４代理人FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a diagram showing part of the contents of the kanji pre-table, FIG. 3 is a diagram showing an example of the contents of the input character string buffer, and FIG. Figure 5 is a flowchart of the process, and Figure 6 is a diagram showing a specific example of processing. 1: Keyboard input section, 2: Input character string temporary storage section, 3
: Kanji sound processing unit, 4: Searched character string creation unit. 5: Kanji pre-table storage unit, 6: Special word processing unit. 7=Dictionary search section, 8: Word dictionary. Figure 1 Figure 6 ↓ TYPE = 3 Procedural amendment (method) % formula % 2. Title of invention Word extraction method 3. Relationship with the case of the person making the amendment 4 agents for the patent applicant

Claims

[Claims]

(1) word dictionary storage means for storing a plurality of words in correspondence with character strings representing their pronunciations; means for temporarily storing input kana character strings; In the kana-kanji conversion processing device, the kana-kanji conversion processing device has table storage means in which kanji fronts having two or more pronunciations are registered. means for separating the input kana character string using the kanji mae; means for morphologically classifying the processing result by the separation means; and the first word of the input kana character string is r-o" or [go" A word extraction method is provided with a means for recognizing a word, and extracts polite words "o" and "go" based on the classification result by the classification means.