JP2000172706A

JP2000172706A - Character string classifying device

Info

Publication number: JP2000172706A
Application number: JP10347544A
Authority: JP
Inventors: Makoto Kosaka; 誠小坂
Original assignee: DIC JAPAN KK
Current assignee: DIC JAPAN KK
Priority date: 1998-12-07
Filing date: 1998-12-07
Publication date: 2000-06-23

Abstract

PROBLEM TO BE SOLVED: To provide a character string classifying device capable of automatically classifying an inputted character string. SOLUTION: This device has an OCR device (character string inputting means) 7 which inputs a character sting, a disk drive (keyword storing means) 3 which stores a prescribed keyword, a comparison counting means 10 which compares a character string inputted from the character sting inputting means with the keyword stored in the keyword storing means and counts the number of the keywords included in the character sting and a decision classifying means (1st decision classifying means) 11 which decides the contents represented by the character string in accordance with the number of the counted keywords and classifies the character string.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、自動車登録業務
総合処理装置により利用可能な文字列分類装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character string classification device that can be used by an automobile registration business integrated processing device.

【０００２】[0002]

【従来の技術】従来、自動車登録代行業務においては、
住民票や印鑑登録証明書、自動車検査証、土地登記簿抄
本等に記載された住所地、氏名を人手によりパソコンに
入力し、他の情報と共に所定フォーマットに印字してい
る。しかし、大量の住所地、氏名を入力する必要がある
場合、その労力は過大である。ここで、住所地、氏名等
をパソコンに入力する手段としては、人手の他に、ＯＣ
Ｒ（光学文字読取装置、以下ＯＣＲと記す）等の入力手
段により、イメージデータをテキストデータに変換する
という方法もとることができる。2. Description of the Related Art Conventionally, in a vehicle registration agency business,
The address and name written on the resident's card, seal registration certificate, automobile inspection certificate, land register excerpt, etc. are manually input into a personal computer and printed in a predetermined format along with other information. However, when it is necessary to input a large number of addresses and names, the effort is excessive. Here, as a means for inputting an address place, a name, and the like to the personal computer, in addition to manual operation, an OC
A method of converting image data into text data by input means such as R (optical character reader, hereinafter referred to as OCR) can be adopted.

【０００３】[0003]

【発明が解決しようとする課題】この場合、住所地、氏
名等が記載された印刷物をＯＣＲにより読み込み、イメ
ージデータをテキストデータに変換する。変換後のテキ
ストデータを然るべきファイルに格納、印字するために
は、パソコンにおいてどの文字列が住所地や氏名である
か分類することが必要である。しかし、従来において
は、文字列の内容までパソコンに理解させることはでき
なかったため、入力された各文字列ひとつひとつを人手
によって住所地、氏名等に分類する必要があり、大量の
データを入力する必要がある場合には、依然、労力が過
大であるという問題があった。In this case, a printed matter on which an address, a name, and the like are described is read by OCR, and the image data is converted into text data. In order to store and print the converted text data in an appropriate file, it is necessary for a personal computer to classify which character string is an address or a name. In the past, however, it was not possible for a personal computer to understand the contents of character strings, so it was necessary to manually classify each character string entered into an address, name, etc. In some cases, there was still a problem that the labor was excessive.

【０００４】上記事情に鑑み、本発明においては、入力
された文字列を自動的に分類することができる文字列分
類装置を提供することを目的とする。[0004] In view of the above circumstances, it is an object of the present invention to provide a character string classifying apparatus that can automatically classify input character strings.

【０００５】[0005]

【課題を解決するための手段】請求項１記載の文字列分
類装置は、文字列を入力する文字列入力手段と、所定の
キーワードを格納するキーワード格納手段と、前記文字
列入力手段により入力された文字列と前記キーワード格
納手段に格納されたキーワードとを比較して前記文字列
に含まれる前記キーワード数をカウントする比較カウン
ト手段と、カウントされた前記キーワード数に応じて前
記文字列の表す内容を判定して該文字列を分類する第１
の判定分類手段とを有することを特徴とする。According to a first aspect of the present invention, there is provided a character string classifying apparatus, comprising: a character string inputting means for inputting a character string; a keyword storing means for storing a predetermined keyword; Comparison counting means for comparing the number of keywords included in the character string by comparing the character string with the keyword stored in the keyword storage means, and contents represented by the character string according to the counted number of keywords. To classify the character string
And a judgment classification means.

【０００６】この文字列分類装置においては、入力され
た文字列に含まれる所定のキーワードの数に基づいて該
文字列の内容が判定、分類される。例えば、あらかじめ
キーワードをある分野によく用いられる用語としてお
き、入力された文字列がこれらキーワードを規定条件数
以上含んでいれば、その文字列は、前記分野の内容を表
すものと判定する。In this character string classification device, the contents of the character string are determined and classified based on the number of predetermined keywords included in the input character string. For example, if a keyword is previously set as a term often used in a certain field, and the input character string includes the keyword in a specified number or more, it is determined that the character string represents the contents of the field.

【０００７】請求項２記載の文字列分類装置は、請求項
１記載の文字列分類装置において、前記キーワードは住
所地用語により構成され、前記第１の判定分類手段は、
カウントされた前記キーワード数に応じて前記文字列を
住所地と判定することを特徴とする。According to a second aspect of the present invention, in the character string classifying apparatus according to the first aspect, the keyword is constituted by an address and place term, and the first judgment and classification means comprises:
The character string is determined to be an address according to the counted number of keywords.

【０００８】この文字列分類装置においては、キーワー
ドをあらかじめ「県、市、丁目、番地」等の住所地用語
としておき、文字列中に含まれる該キーワード数に応
じ、該文字列を住所地と判定して分類する。文字列が一
定数以上のキーワードを含んでいれば住所地としてもよ
いし、キーワードによって判定の重みを変え、「番地」
など住所地以外にはあまり用いられない語を含んでいる
場合には、住所地であるとの判断を強めてもよい。な
お、住所地用語とは住所地に多く用いられる語であり、
例として、国・自治体・行政区の単位・名称や、「丁
目」「マンション」「号室」の文字、番地を表す際のア
ラビア数字などが挙げられる。In this character string classification device, a keyword is set in advance as an address place term such as "prefecture, city, street, street address", and the character string is set as an address place according to the number of keywords included in the character string. Judge and classify. If the character string contains a certain number of keywords or more, the address may be used as the address, or the weight of judgment may be changed depending on the keyword,
For example, when a word that is rarely used other than the address is included, the determination that the address is an address may be strengthened. In addition, the address place term is a word often used for the address place,
Examples include units and names of countries, municipalities, and administrative districts, characters of “chome”, “apartment”, and “room number”, and Arabic numerals for representing addresses.

【０００９】請求項３記載の文字列分類装置は、請求項
２記載の文字列分類装置において、前記第１の判定分類
手段は、前記比較カウント手段がカウントした前記文字
列中の前記キーワード数が２以上である場合、該文字列
を住所地と判定することを特徴とする。According to a third aspect of the present invention, in the character string classifying apparatus according to the second aspect, the first determination / classification unit determines that the number of keywords in the character string counted by the comparison / counting unit is smaller. If the number is two or more, the character string is determined to be an address.

【００１０】この文字列分類装置においては、文字列中
に前記住所地用語からなるキーワードが２以上あった場
合、該文字列は住所地であると判定され分類される。２
未満であった場合には、該文字列は住所地ではないと判
定される。In this character string classifying device, when there are two or more keywords consisting of the address place term in the character string, the character string is determined to be an address place and is classified. 2
If less than, the character string is determined not to be an address.

【００１１】請求項４記載の文字列分類装置は、請求項
１から３いずれかに記載の文字列分類装置において、前
記文字列をＯＣＲにより入力することを特徴とする。According to a fourth aspect of the present invention, in the character string classifying apparatus according to any one of the first to third aspects, the character string is input by OCR.

【００１２】この文字列分類装置においては、ＯＣＲに
より文字列を入力するため、住民票や印鑑登録証明書、
土地登記簿抄本、自動車検査証など、ＯＣＲで読み取り
可能な媒体に記載された文字列を分類することができ
る。In this character string classification device, a character string is input by OCR, so that a resident card, a seal registration certificate,
Character strings written on OCR-readable media such as land registry excerpts and automobile inspection certificates can be classified.

【００１３】請求項５記載の文字列分類装置は、請求項
１から４いずれかに記載の文字列分類装置において、前
記文字列を分類後ＦＤ（フレキシブルディスク、以下Ｆ
Ｄと記す）に出力することを特徴とする。According to a fifth aspect of the present invention, in the character string classifying apparatus according to any one of the first to fourth aspects, the character string is classified into an FD (flexible disk, hereinafter referred to as F
D).

【００１４】この文字列分類装置においては、ＦＤを介
すことで、他のデータ処理装置において判定分類後のデ
ータを利用することができる。この場合、判定分類後の
データをそのままＦＤに格納しても良いし、所定のフォ
ーマットデータに変換して格納しても良い。従って、例
えば、判定分類後のデータ自体や所定フォーマットとし
て出力されたデータを、他のデータ処理装置に集積させ
てデータベースとして利用することができる。In this character string classification device, the data after the judgment classification can be used in another data processing device via the FD. In this case, the data after the determination and classification may be stored in the FD as it is, or may be converted into predetermined format data and stored. Therefore, for example, the data itself after the determination and classification or the data output in a predetermined format can be accumulated in another data processing device and used as a database.

【００１５】請求項６記載の文字列分類装置は、情報媒
体の記載の中から所定の識別標識を検出する識別標識検
出手段と、前記識別標識の近傍に記載されている文字列
の内容を該識別標識に応じて判定し、該文字列を分類す
る第２の判定分類手段を有することを特徴とする。According to a sixth aspect of the present invention, there is provided a character string classifying device, comprising: an identification mark detecting means for detecting a predetermined identification mark from a description of an information medium; and a character string described near the identification mark. It is characterized in that it has a second determination / classification unit that makes a determination according to the identification marker and classifies the character string.

【００１６】この文字列分類装置においては、情報媒体
に記載されている識別標識がその近傍の文字列の内容の
指標となっており、この識別標識に基づいて該文字列を
住所地や氏名等、所定の内容であると判定して分類す
る。例えば、紙に記載された文字情報の中から「住所」
の文字を検出し、その文字と同じ行にある文字列は住所
地であると判定分類される。情報媒体とは、文字が印刷
されるものであれば、紙の他、プラスチックやビニー
ル、金属など材質は問わない。In this character string classification device, the identification mark described on the information medium serves as an index of the contents of the character string in the vicinity of the identification mark. , Are determined to have predetermined contents and classified. For example, from the text information written on the paper,
Is detected, and a character string on the same line as the character is determined to be an address. The information medium is not limited to paper, and may be any material such as plastic, vinyl, and metal, as long as characters are printed.

【００１７】[0017]

【発明の実施の形態】以下、本発明の第１の実施形態に
係る住所地分類装置について図面を参照して説明する。
本実施形態において示す住所地分類装置は、都道府県名
を含む住所地を分類するものであり、この場合、文字列
中にキーワードが３以上含まれている場合に該文字列を
住所と判定、分類することが有効である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An address and place classification apparatus according to a first embodiment of the present invention will be described below with reference to the drawings.
The address place classification device shown in the present embodiment classifies an address place including a prefecture name. In this case, when a character string includes three or more keywords, the character string is determined as an address, Classification is effective.

【００１８】図１は、住所地分類装置のブロック図であ
る。図において、１はパーソナルコンピュータであり、
共通バス２を介して、ディスク装置（キーワード格納手
段）３、ＣＰＵ（セントラル・プロセッシング・ユニッ
ト）４、ＲＡＭ（ランダム・アクセス・メモリ）５、Ｒ
ＯＭ（リード・オンリー・メモリ）６が設けられてい
る。ＲＯＭ６には、パーソナルコンピュータ１の基本的
な動作を制御するプログラムが書き込まれている。ディ
スク装置３には、動作用プログラムファイル３ａ、キー
ワードデータファイル３ｂが格納されており、ＣＰＵ４
は、動作用プログラムファイル３ａをＲＡＭ５上で実行
する。キーワードデータファイル３ｂには、都・道・府・県・郡・市・町・村・区・丁目・大字・番
地等の、住所地用語からなる一連のキーワード群が格納さ
れている。（以下、これら各キーワードをそれぞれ第１
キーワード、第２キーワード…、と呼ぶ。）また、このパーソナルコンピュータ１には、スキャナ
（不図示）を含むＯＣＲ装置（文字列入力手段）７、Ｆ
Ｄ（フレキシブルディスク）８が格納されたＦＤドライ
ブ９が接続されている。FIG. 1 is a block diagram of the address and place classification device. In the figure, 1 is a personal computer,
Via a common bus 2, a disk device (keyword storage means) 3, a CPU (central processing unit) 4, a RAM (random access memory) 5, R
An OM (read only memory) 6 is provided. In the ROM 6, a program for controlling the basic operation of the personal computer 1 is written. The disk device 3 stores an operation program file 3a and a keyword data file 3b.
Executes the operation program file 3 a on the RAM 5. The keyword data file 3b stores a series of keywords composed of address and place terms such as capital, prefecture, prefecture, prefecture, county, city, town, village, ward, chome, larger section, and address. (Hereinafter, each of these keywords will be
A keyword, a second keyword,... The personal computer 1 has an OCR device (character string input means) 7 including a scanner (not shown),
An FD drive 9 in which a D (flexible disk) 8 is stored is connected.

【００１９】ここで、ＣＰＵ４及びＲＡＭ５は比較カウ
ント手段１０及び判定分類手段（第１の判定分類手段）
１１を構成している。Here, the CPU 4 and the RAM 5 are a comparison counting means 10 and a judgment and classification means (first judgment and classification means).
11.

【００２０】次に、この住所地分類装置の動作を説明す
る。動作用プログラムを起動させると、まず図２のフロ
ーチャートにおけるステップＳＰ１にて、プログラムは
ＯＣＲ装置７を介して、住所地、氏名が記載された紙を
スキャナで読み込む。なお、住所地、氏名が記載された
紙は、住所地、氏名のみがそれぞれ一行づつ印字されて
いるものとする。また、「住所地」とは、現住所の他、
本籍など、住所地により表されるもの全てを含むもので
あり、「氏名」とは、人名のほか、法人や団体の名称も
含むものである。ＯＣＲ装置７は、スキャナが画像とし
て読み込んだイメージデータの中から、文字列と認識さ
れた部分を選択し、該複数の文字列をテキストデータ
（ＪＩＳコード等の文字コード）に変換する。このテキ
ストデータは、ＲＡＭ５に格納される。この段階におい
て、ＲＡＭ５には、いずれか一方が住所地であり、他方
が氏名である二つの文字列が格納されている。以下、こ
れら文字列を、第１の文字列、第２の文字列と呼ぶ。Next, the operation of the address and place classification device will be described. When the operation program is started, first, in step SP1 in the flowchart of FIG. 2, the program reads the paper on which the address and the name are described by the scanner via the OCR device 7. The paper on which the address and the name are described is assumed to have only the address and the name printed on each line. In addition, "address location" means, in addition to the current address,
This includes everything represented by the place of residence, such as permanent residence, and the "name" includes the name of a corporation or organization in addition to a person's name. The OCR device 7 selects a part recognized as a character string from the image data read by the scanner as an image, and converts the plurality of character strings into text data (character code such as JIS code). This text data is stored in the RAM 5. At this stage, the RAM 5 stores two character strings, one of which is an address and the other is a name. Hereinafter, these character strings are referred to as a first character string and a second character string.

【００２１】次に、ステップＳＰ２に進む。比較カウン
ト手段１０としてのＣＰＵ４及びＲＡＭ５は、第１の文
字列中に含まれているキーワードの数をカウントする。
まず、ＲＡＭ５の所定位置に格納されているポインタＰ
に、初期値として１が代入され、カウンタＣに、初期値
として０が代入される。Next, the process proceeds to step SP2. The CPU 4 and the RAM 5 as the comparison counting means 10 count the number of keywords included in the first character string.
First, the pointer P stored at a predetermined position in the RAM 5
, 1 is substituted as an initial value, and 0 is substituted into the counter C as an initial value.

【００２２】そして、図３に示すように、第１キーワー
ド（「都」）と、第１の文字列のＰ（＝１）文字目とが
比較され、一致するか否かが検出される。これらが一致
する場合、カウンタＣに１が加えられる。一致しない場
合、第２、第３以降のキーワード（「道」「府」…）と
も同様に比較される。いずれのキーワードとも一致しな
い場合には、カウンタは変更されない（図３（１）参
照）。Then, as shown in FIG. 3, the first keyword ("to") is compared with the P (= 1) character of the first character string, and it is detected whether or not they match. If they match, 1 is added to counter C. If they do not match, they are similarly compared with the second, third and subsequent keywords ("road", "fu", ...). If none of the keywords match, the counter is not changed (see FIG. 3A).

【００２３】このようにしてキーワード群と比較され、
第１の文字列のＰ文字目から始まる文字列がいずれのキ
ーワードとも一致しない場合には、Ｐに１が足され、次
の文字との比較が引き続き行われる（図３（２）参
照）。いずれかのキーワードと一致していた場合には、
Ｐに一致キーワードの文字数が足され、新たなＰ文字目
との比較が引き続き行われる（図３（３）、図４参
照）。このようにして、増加後のＰについて再び上記の
比較を繰り返す。In this way, the keyword group is compared with the keyword group.
If the character string starting from the Pth character of the first character string does not match any keyword, 1 is added to P, and the comparison with the next character is continued (see FIG. 3 (2)). If any of the keywords match,
The number of characters of the matching keyword is added to P, and the comparison with the new P-th character is continued (see FIGS. 3 (3) and 4). In this way, the above comparison is repeated again for P after the increase.

【００２４】ここで、キーワードのカウント条件とし
て、第１の文字列中に同一キーワードが重複して含まれ
ている場合には、そのキーワードはカウントしないこと
とする。また、第１の文字列が、「都・道・府・県」の
文字を複数含む場合、そのうちの一つのみをカウントす
ることにする。「市・町・村」についても同様とする。
さらには、第１の文字列に含まれる各キーワード相互の
順番も考慮し、例えば「都・道・府・県」の文字は、
「郡・市・町・村」に先行する場合のみカウントする。Here, when the same keyword is repeatedly included in the first character string as a keyword counting condition, the keyword is not counted. Further, when the first character string includes a plurality of characters of "Tokyo / Province / Prefecture / Prefecture", only one of them is counted. The same applies to "city / town / village".
Furthermore, the order of each keyword included in the first character string is also taken into consideration.
Count only when preceding "County / City / Town / Village".

【００２５】この比較を、第１の文字列全体が比較され
るまで繰り返すと、カウンタＣには、第１の文字列が含
むキーワード数が格納されている。（なお、例えばキー
ワードが「大字」、「字」の両方を含み、第１の文字列
中に「大字」の文字が含まれている場合においても、上
記に従えば、「大字」中の「字」の文字は、キーワード
の「字」ではないと判断される。）When this comparison is repeated until the entire first character string is compared, the counter C stores the number of keywords included in the first character string. (Note that, for example, even if the keyword includes both “large font” and “character” and the first character string includes the character “large font”, according to the above, “ It is determined that the character of "character" is not the "character" of the keyword.)

【００２６】カウンタＣの値は、当該文字列を住所地と
して分類するか否かの判断指標として、判定分類手段１
１に利用される。カウンタＣが３以上の場合、すなわ
ち、第１の文字列中にキーワードが３種以上含まれてい
た場合には、図５（ａ）のように、都道府県名、市町村
名、町名・番地等を含む住所地と考えられるので、第１
の文字列は住所地であると判定し、ステップＳＰ１１に
おいてＦＤドライブ９に格納されたＦＤ８上の住所地格
納エリアに第１の文字列が出力される。この場合、もう
一つの文字列である第２の文字列は、判定分類手段１１
により氏名であると判定され、ステップＳＰ１２におい
てＦＤ８上の氏名格納エリアに第２の文字列が出力され
る。カウンタＣが３未満の場合、すなわち、図５の
（ｂ）〜（ｄ）のように、第１の文字列中にキーワード
が２種以下しか含まれていない場合、前記ＣＰＵ４は、
第１の文字列は住所地ではなく氏名であると判定し、ス
テップＳＰ２１においてＦＤ８上の氏名格納エリアに第
１の文字列が出力される。この場合、もう一つの文字列
である第２の文字列は住所地であると判定され、ステッ
プＳＰ２２においてＦＤ８上の住所地格納エリアに第２
の文字列が出力される。また、図５の（ｄ）の様に、同
じキーワード（「市」）が複数含まれている場合でも、
重複してカウントされない。さらに、図５（ｅ）のよう
に、キーワードを３種（「町」「村」「市」）含んでい
ても、前述のようにそのうちの一つしかカウントされな
いので、氏名と判定される。The value of the counter C is used as an index for determining whether or not to classify the character string as an address place.
1 is used. If the counter C is 3 or more, that is, if the first character string contains three or more keywords, as shown in FIG. 5A, the name of a prefecture, the name of a city, the name of a city, and the like are used. Because it is considered to be an address place including
Is determined to be an address place, and the first character string is output to the address place storage area on the FD 8 stored in the FD drive 9 in step SP11. In this case, the second character string, which is another character string,
Is determined as a name, and the second character string is output to the name storage area on the FD 8 in step SP12. When the counter C is less than 3, that is, when the first character string includes only two or less keywords as in (b) to (d) of FIG.
It is determined that the first character string is not an address but a name, and the first character string is output to the name storage area on the FD 8 in step SP21. In this case, the second character string, which is another character string, is determined to be an address place, and the second place is stored in the address place storage area on the FD 8 in step SP22.
Is output. Also, as shown in FIG. 5D, even when a plurality of the same keywords (“city”) are included,
It is not counted twice. Further, as shown in FIG. 5E, even if three keywords ("town", "village", "city") are included, only one of them is counted as described above, so that the keyword is determined.

【００２７】ＦＤ８には、その後、他の必要なデータが
書き込まれ、記録された住所地等の情報を利用する他の
処理手段へと受け渡される。After that, other necessary data is written in the FD 8, and is transferred to another processing means using the recorded information such as the address.

【００２８】このように、紙に記録されていた文字列が
住所地または氏名と判定、分類されてＦＤ８に自動的に
出力されるので、いずれの文字列が住所地であるかを人
手によって指定する必要がなく、労力を軽減することが
できる。As described above, the character string recorded on the paper is determined as an address place or name, classified and automatically output to the FD 8, so that which character string is the address place is manually designated. It is not necessary to do so, and labor can be reduced.

【００２９】なお、上記のような動作を実現するための
プログラムをコンピュータに読み取り可能な記録媒体に
記録して、この記録媒体に記録されたプログラムをコン
ピュータに読み込ませ、実行することとしてもよい。A program for realizing the above operation may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read and executed by a computer.

【００３０】また、上記の実施形態の他、以下のような
実施形態としてもよい。（１）本発明に係る判読分類装置は、自動車登録業務総
合処理装置など、住所地や氏名を入力することが必要な
装置と組み合わせることが可能である。（２）上記本実施形態においては、第２の文字列につい
てはその内容を検証することなく判定分類を行っている
が、第１の文字列について上記判定を行ったあと、更に
第２の文字列についても第１の文字列と同様に判定を行
い、双方の結果を比較して分類することとしてもよい。（３）上記本実施形態においてはカウンタＣが３以上で
住所地と判断したが、カウンタＣが２以上で住所地と判
定するようにしてもよい。この場合、例えば、都道府県
名が省略されていて含有キーワード数が少ない住所地で
も分類することができる。（４）上述したキーワードは一例であり、これらに限定
されるものではなく、キーワードを変更し、例えば「財
団法人」等とすることで、氏名、名称を分類することも
できる。さらに、キーワードの言語を変えることで、日
本語以外の文字列を分類することもできる。また、キー
ワードの文字数は本実施形態に限られず、単語、文章で
もよい。（５）識別された住所地、氏名をＦＤに出力するように
したが、ＦＤへの出力に代えて、あるいは、ＦＤへの出
力と共に、所定フォーマット（例えば、住所地、氏名を
記載する必要がある各種申請書の書式）でのプリントを
行ったり、その他の情報記録媒体への記録や、一旦メモ
リに格納した後、別の処理を行ってからＦＤ等への記録
を行ってもよい。別の処理としては、例えば、住所地で
はないと分類された文字列に対し、改めて氏名であるこ
とを検出する処理や、更に他の方法と組み合わせて判定
精度を上げることなどである。Further, in addition to the above embodiment, the following embodiment may be adopted. (1) The interpretation / classification device according to the present invention can be combined with a device that requires input of an address and a name, such as a vehicle registration business integrated processing device. (2) In the present embodiment, the determination and classification are performed without verifying the content of the second character string. However, after the above determination is performed on the first character string, the second character string is further processed. A determination may be made for a column in the same manner as for the first character string, and the results of both may be compared for classification. (3) In the above embodiment, the address is determined to be the address location when the counter C is 3 or more, but the address location may be determined when the counter C is 2 or more. In this case, for example, it is possible to classify even an address where the name of the prefecture is omitted and the number of contained keywords is small. (4) The above-mentioned keywords are merely examples, and the present invention is not limited to these keywords. By changing the keywords to, for example, “foundation” or the like, the names and names can be classified. Furthermore, character strings other than Japanese can be classified by changing the language of the keyword. Further, the number of characters of the keyword is not limited to the present embodiment, and may be a word or a sentence. (5) The identified address location and name are output to the FD. However, in place of the output to the FD or together with the output to the FD, a predetermined format (for example, it is necessary to describe the address location and name). It may be printed in a certain application form, recorded on another information recording medium, or temporarily stored in a memory, and then, after another processing, recorded on an FD or the like. As another process, for example, there is a process of detecting that a character string classified as not an address is a name again, or a combination with another method to improve determination accuracy.

【００３１】（６）本実施形態においては紙に印字され
ている住所地、氏名をＯＣＲにより入力することとした
が、印字ではなく手書きであってもよい。さらに、文字
列の入力をＯＣＲではなく、通信装置やディスク装置、
キーボード等によって入力することとしてもよい。キー
ボードで文字列を入力する場合、入力された文字列の内
容を即座に住所地などと判断することができる。また、
ＯＣＲにより入力される媒体としては、住民票、印鑑登
録証明書、土地登記簿抄本等の不動産に関する証明書、
自動車検査証等の動産に関する証明書、名刺、葉書な
ど、分類対象となる文字列が記載されている媒体であれ
ばいかなるものにでもよい。更に、テキストデータに変
換された文字列と共に、スキャナで読み込んだイメージ
データとしての文字列をもメモリに格納し、必要に応じ
て出力することとしてもよい。これにより、ＯＣＲが読
み込んだテキストデータと併せて、もともと紙に記録さ
れていた文字にもアクセスすることができる。（７）上記本実施形態においては、第１の文字列中にキ
ーワードが含まれていた場合、キーワードひとつにつき
カウンタＣを１増加させるとしたが、キーワードの種類
に応じてカウンタ増加量の配分を変え、含まれていると
住所地である蓋然性が高いようなキーワード（例えば
「番地」）については、より多くカウンタＣを増加させ
るようにしてもよい。(6) In this embodiment, the address and name printed on the paper are input by OCR, but may be handwritten instead of printing. Furthermore, the input of a character string is not OCR, but a communication device, a disk device,
It may be input by a keyboard or the like. When a character string is input with a keyboard, the contents of the input character string can be immediately determined as an address or the like. Also,
The media input by OCR include real estate certificates such as resident's card, seal registration certificate, land register excerpt,
Any medium may be used as long as a character string to be classified is described, such as a certificate related to movables such as an automobile inspection certificate, a business card, and a postcard. Further, a character string as image data read by a scanner may be stored in the memory together with the character string converted into the text data, and may be output as needed. As a result, the characters originally recorded on the paper can be accessed together with the text data read by the OCR. (7) In the present embodiment, when a keyword is included in the first character string, the counter C is incremented by one for each keyword, but the counter increment is distributed according to the type of keyword. Alternatively, for a keyword that is likely to be an address if included (for example, “address”), the counter C may be increased more.

【００３２】次に、本発明の第２の実施形態について説
明する。図６は、本発明の文字列分類装置の第２の実施
形態に係る住所地分類装置のブロック図である。図にお
いて、１はパーソナルコンピュータであり、共通バス２
を介して、ディスク装置３、ＣＰＵ（セントラル・プロ
セッシング・ユニット）４、ＲＡＭ（ランダム・アクセ
ス・メモリ）５、ＲＯＭ（リード・オンリー・メモリ）
６が設けられている。ＲＯＭ６には、パーソナルコンピ
ュータ１の基本的な動作を制御するプログラムが書き込
まれている。ディスク装置３には、動作用プログラムフ
ァイル３ａ、識別標識ファイル３ｂが格納されており、
ＣＰＵ４は、動作用プログラムファイル３ａをＲＡＭ５
上で実行する。Next, a second embodiment of the present invention will be described. FIG. 6 is a block diagram of an address place classification device according to a second embodiment of the character string classification device of the present invention. In the figure, reference numeral 1 denotes a personal computer, and a common bus 2
, Disk device 3, CPU (central processing unit) 4, RAM (random access memory) 5, ROM (read only memory)
6 are provided. In the ROM 6, a program for controlling the basic operation of the personal computer 1 is written. The disk device 3 stores an operation program file 3a and an identification marker file 3b.
The CPU 4 stores the operation program file 3a in the RAM 5
Run on

【００３３】識別標識ファイル３ｂには、「住所」「氏
名」の文字が格納されている。また、このパーソナルコ
ンピュータ１には、スキャナ（不図示）を含むＯＣＲ装
置７、ＦＤ（フロッピーディスク）８が格納されたＦＤ
ドライブ９が接続されている。The identification sign file 3b stores characters of "address" and "name". The personal computer 1 also includes an OCR device 7 including a scanner (not shown) and an FD (Floppy Disk) 8
Drive 9 is connected.

【００３４】ここで、ＣＰＵ４およびＲＡＭ５は、識別
標識検出手段１２および判定分類手段（第２の判定分類
手段）１３を構成している。Here, the CPU 4 and the RAM 5 constitute the identification sign detecting means 12 and the judgment and classification means (second judgment and classification means) 13.

【００３５】次に、この住所地分類装置の動作を説明す
る。動作用プログラムを動作させると、まず図７のフロ
ーチャートにおけるステップＳＰ１にて、紙に記載され
た情報がスキャナによって読み込まれる。この紙には、
図８に示すように、住所地、氏名が１行づつ印字されて
おり、それぞれ「住所」、「氏名」の文字に続けて印字
されている。ＯＣＲ装置７は、読み込まれたイメージデ
ータの中から文字列と認識された部分を選択し、該複数
の文字列をテキストデータ（ＪＩＳコード等の文字コー
ド）に変換する。このテキストデータの内容は、該テキ
ストデータの紙上における位置情報（何行目に記載され
ているか）とともに、ＲＡＭ５に格納される。この段階
において、ＲＡＭ５には、複数の文字列と、その位置情
報とが格納されている。Next, the operation of the address and place classification device will be described. When the operation program is operated, first, in step SP1 in the flowchart of FIG. 7, information written on paper is read by the scanner. On this paper,
As shown in FIG. 8, the address location and the name are printed one line at a time, and are printed following the characters "address" and "name", respectively. The OCR device 7 selects a portion recognized as a character string from the read image data, and converts the plurality of character strings into text data (character code such as JIS code). The contents of the text data are stored in the RAM 5 together with the position information of the text data on the paper (in what line is described). At this stage, the RAM 5 stores a plurality of character strings and their position information.

【００３６】次に、前記複数の文字列中に、識別標識フ
ァイル３ｂに格納されている識別標識が含まれているか
どうかが検出される。前記ＲＡＭ５に格納されている文
字列の中から、識別標識である「住所」の文字を検索す
るとともに、「住所」の文字が存在する行を検出する
（ステップＳＰ２）。続いて、ステップＳＰ３におい
て、判定分類手段１３がＲＡＭ５に格納されている他の
文字列の位置情報を調べ、前記識別標識としての「住
所」と同一の行に記載されていれば、その文字列を住所
地と判定し、ＦＤドライブ９に格納されたＦＤ８上の住
所地格納エリアに該文字列を出力する。同様に「氏名」
の文字についても、判定分類手段１３が「氏名」の文字
と同じ行に記載されている文字列を氏名と判定し、ＦＤ
ドライブ９に格納されたＦＤ８上の住所地格納エリアに
該文字列を出力する（ステップＳＰ４、５）。なお、本
実施形態において判定分類される「住所地」とは、現住
所の他、本籍など、住所地として表されるもの全てを含
むものである。また、「氏名」とは、法人や団体の名称
も含むものである。Next, it is detected whether or not the plurality of character strings include the identification mark stored in the identification mark file 3b. The character string stored in the RAM 5 is searched for the character of "address" which is the identification sign, and the line in which the character of "address" exists is detected (step SP2). Subsequently, in step SP3, the determination / classification means 13 checks the position information of another character string stored in the RAM 5, and if the position information is described on the same line as the "address" as the identification mark, the character string is Is determined as the address location, and the character string is output to the address location storage area on the FD 8 stored in the FD drive 9. Similarly, "name"
, The determination / classification unit 13 determines the character string described in the same line as the character of “Name” as the name,
The character string is output to the address location storage area on the FD 8 stored in the drive 9 (steps SP4, SP5). Note that the “address location” determined and classified in the present embodiment includes not only the current address but also all locations represented as an address location such as a permanent address. The “name” also includes the name of a corporation or an organization.

【００３７】このようにして、紙に記録されていた文字
列が住所地または氏名と判定、分類されてＦＤ８に自動
的に出力されるので、いずれの文字列が住所地であるか
を人手によって指定する必要がなく、労力を軽減するこ
とができる。In this manner, the character string recorded on the paper is determined as an address or a name, classified and automatically output to the FD 8, so that which character string is the address is manually determined. There is no need to specify, and labor can be reduced.

【００３８】なお、上記のような動作を実現するための
プログラムをコンピュータに読み取り可能な記録媒体に
記録して、この記録媒体に記録されたプログラムをコン
ピュータに読み込ませ、実行することとしてもよい。A program for realizing the above operation may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read and executed by a computer.

【００３９】また、上記の実施形態の他、以下のような
実施形態としてもよい。（１）本実施形態に係る住所地分類装置は、自動車登録
業務総合処理装置など、住所地や氏名を入力することが
必要な装置と組み合わせることが可能である。（２）上記本実施形態においては、所定の識別標識と同
一の行の文字列を住所地または氏名と判定しているが、
同一行でなくとも、近傍であればよい。住所地、氏名が
記載されている行の他に、別の文字列が記載されている
行があってもよい。（３）また、識別標識は文字であるとしたが、文字以外
にも、文字列近傍に付された記号、文字列に付された下
線、囲み枠等でもよい。Further, in addition to the above embodiment, the following embodiment may be adopted. (1) The address location classification device according to the present embodiment can be combined with a device that requires input of an address location and a name, such as an automobile registration business integrated processing device. (2) In the present embodiment, the character string on the same line as the predetermined identification mark is determined as the address or the name.
Even if they are not in the same row, they may be near each other. There may be a line in which another character string is described in addition to a line in which an address place and a name are described. (3) Although the identification mark is a character, other than a character, a symbol attached near the character string, an underline attached to the character string, an enclosing frame, or the like may be used.

【００４０】（４）判定された住所地、氏名をＦＤに出
力するようにしたが、ＦＤへの出力に代えて、あるい
は、ＦＤへの出力と共に、所定フォーマット（例えば、
住所地、氏名を記載する必要がある各種申請書の書式）
でのプリントを行ったり、その他の情報記録媒体への記
録や、一旦メモリに格納した後、別の処理を行ってから
ＦＤ等への出力を行ってもよい。別の処理としては、例
えば、上記第１の実施形態における住所地分類装置によ
り、改めて住所地か否かを検出する処理が挙げられる。(4) Although the determined address and name are output to the FD, a predetermined format (for example, instead of the output to the FD or together with the output to the FD) is used.
Various application forms that need to include your address and name)
May be printed, recorded on another information recording medium, or temporarily stored in a memory, and then subjected to another process and then output to an FD or the like. As another process, for example, a process of detecting whether or not an address is a new place by the address and place classification device in the first embodiment can be cited.

【００４１】（５）本実施形態においては紙に印字され
ている住所地、氏名をＯＣＲにより入力することとした
が、印字ではなく手書きであってもよい。さらに、文字
列の入力をＯＣＲではなく通信装置やディスク装置、キ
ーボードなどによって入力することとしてもよい。キー
ボードで文字列を入力する場合、入力された文字列を即
座に住所地などと判断することができる。また、ＯＣＲ
により入力される媒体としては、住民票、印鑑登録証明
書、土地登記簿抄本等の不動産に関する証明書、自動車
検査証等の動産に関する証明書、名刺、葉書など、分類
対象となる文字列が記載されている媒体であればいかな
るものでもよい。更に、テキストデータに変換された文
字列と共に、スキャナで読み込んだイメージデータとし
ての文字列をもメモリに格納し、必要に応じて出力する
こととしてもよい。これにより、ＯＣＲが読み込んだテ
キストデータと併せて、もともと紙に記録されていた文
字にもアクセスすることができる。(5) In this embodiment, the address and the name printed on the paper are input by the OCR, but may be handwritten instead of the printing. Further, the input of the character string may be performed by a communication device, a disk device, a keyboard, or the like instead of the OCR. When a character string is input using a keyboard, the input character string can be immediately determined as an address or the like. Also, OCR
The character string to be classified is described as a medium to be entered by, such as a resident's card, a seal registration certificate, a certificate for real estate such as an excerpt of land registry, a certificate for personal property such as an automobile inspection certificate, a business card, postcard, etc. Any medium can be used as long as it is used. Further, a character string as image data read by a scanner may be stored in the memory together with the character string converted into the text data, and may be output as needed. As a result, the characters originally recorded on the paper can be accessed together with the text data read by the OCR.

【００４２】[0042]

【発明の効果】以上、本発明に係る文字列分類装置によ
れば、入力された文字列が自動的に分類されるので、労
力を大幅に軽減することが可能である。As described above, according to the character string classification apparatus of the present invention, the input character strings are automatically classified, so that the labor can be greatly reduced.

[Brief description of the drawings]

【図１】本発明に係る第１の実施形態として示した住
所地分類装置のブロック図である。FIG. 1 is a block diagram of an address place classification device shown as a first embodiment according to the present invention.

【図２】同住所地分類装置の動作の流れを示すフロー
チャートである。FIG. 2 is a flowchart showing an operation flow of the same address place classification device.

【図３】同住所地分類装置の比較カウント過程を示す
図である。FIG. 3 is a diagram showing a comparison counting process of the same address place classification device.

【図４】同住所地分類装置の比較カウントを示す図で
ある。FIG. 4 is a diagram showing a comparison count of the same address place classification device.

【図５】同住所地分類装置の比較カウント結果を示す
図である。FIG. 5 is a diagram showing a comparison count result of the same address place classification device.

【図６】本発明に係る第２の実施形態として示した住
所地分類装置のブロック図である。FIG. 6 is a block diagram of an address place classification device shown as a second embodiment according to the present invention.

【図７】同住所地分類装置の動作の流れを示すフロー
チャートである。FIG. 7 is a flowchart showing a flow of the operation of the same address place classification device.

【図８】同住所地分類装置により分類される住所地、
氏名の記載状態を示す図である。FIG. 8 shows an address place classified by the address place classification device,
It is a figure showing the state of description of a name.

[Explanation of symbols]

３キーワード格納手段７ＯＣＲ装置（文字列入力手段）８ＦＤ１０比較カウント手段１１判定分類手段（第１の判定分類手段）１２識別標識検出手段１３判定分類手段（第２の判定分類手段） 3 Keyword storage means 7 OCR device (character string input means) 8 FD 10 Comparison counting means 11 Judgment / classification means (first judgment / classification means) 12 Identification sign detection means 13 Judgment / classification means (second judgment / classification means)

【手続補正書】[Procedure amendment]

【提出日】平成１１年１２月２７日（１９９９．１２．
２７）[Submission date] December 27, 1999 (1999.12.
27)

【手続補正１】[Procedure amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】全文[Correction target item name] Full text

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【書類名】明細書[Document Name] Statement

【発明の名称】文字列分類装置[Title of the Invention] Character string classification device

【特許請求の範囲】[Claims]

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【０００２】[0002]

【０００３】[0003]

【０００４】上記事情に鑑み、本発明においては、入
力された文字列を自動的に分類することができる文字列
分類装置を提供することを目的とする。[0004] In view of the above circumstances, it is an object of the present invention to provide a character string classifying apparatus that can automatically classify input character strings.

【０００５】[0005]

【課題を解決するための手段】請求項１記載の文字列分
類装置は、２つの文字列を入力する文字列入力手段と、
住所地用語により構成されるキーワードを格納するキー
ワード格納手段と、前記文字列入力手段により入力され
た前記文字列と前記キーワード格納手段に格納されたキ
ーワードとを比較して前記文字列に含まれる前記キーワ
ード数をカウントする比較カウント手段と、該比較カウ
ント手段の出力に基づいて前記２つの文字列の分類を行
う判定分類手段とを備え、該判定分類手段は、一方の前
記文字列に含まれる前記キーワード数が３以上であった
場合に該一方の文字列を住所地と判定するとともに他方
を氏名と判定することにより、前記２つの文字列の分類
を行うことを特徴とする。According to a first aspect of the present invention, there is provided a character string classification device for inputting two character strings,
A keyword storage means for storing the composed keyword by domicile term, said that by comparing the stored the inputted by the character string input unit string and the keyword storage means keyword included in the character string and comparing the count means for counting the number of keywords, the comparison Cow
Classify the two character strings based on the output of the
Judgment classification means, wherein the judgment classification means
The number of keywords included in the character string was 3 or more
In this case, the one character string is determined to be an address place and the other character string is determined.
Is determined as a name, thereby classifying the two character strings.
Is performed .

【０００６】この文字列分類装置においては、キーワ
ードをあらかじめ「県、市、丁目、番地」等の住所地用
語としておき、文字列中に含まれる該キーワード数に応
じ、一方の文字列を住所地と判定し、他方の文字列を氏
名と判定することにより、２つの文字列の分類を行う。
一方の文字列中に前記住所地用語からなるキーワードが
３以上あった場合、該文字列は住所地であると判定さ
れ、３未満であった場合には、該文字列は住所地ではな
い、すなわち、氏名であると判定される。なお、住所地
用語とは住所地に多く用いられる語であり、例として、
国・自治体・行政区の単位・名称や、「丁目」「マンシ
ョン」「号室」の文字、番地を表す際のアラビア数字な
どが挙げられる。 [0006] In this character string classification device,
Code for the address area such as "prefecture, city, chome, and street address" in advance.
Words, and correspond to the number of keywords contained in the character string.
Judge one of the character strings as an address, and
The two character strings are classified by judging that they are first names.
In one of the character strings, the keyword consisting of the address place
If there are three or more, the character string is determined to be an address.
If it is less than 3, the character string is not an address.
That is, it is determined that it is a name. In addition, address place
Terms are words that are often used in address locations, for example,
Units and names of countries, municipalities, and administrative districts, as well as
Characters, room numbers, Arabic numerals
And so on.

【０００７】請求項２記載の文字列分類装置は、請求
項１に記載の文字列分類装置において、前記文字列入力
手段は、光学文字読取装置であることを特徴とする。According to a second aspect of the present invention, in the character string classifying apparatus according to the first aspect, the character string input means is an optical character reading device.

【０００８】この文字列分類装置においては、ＯＣＲ
により文字列を入力するため、住民票や印鑑登録証明
書、土地登記簿抄本、自動車検査証など、ＯＣＲで読み
取り可能な媒体に記載された文字列を分類することがで
きる。In this character string classification device, the OCR
, The user can classify character strings described in an OCR-readable medium, such as a resident's card, a seal registration certificate, a land register excerpt, an automobile inspection certificate, and the like.

【０００９】請求項３に記載の文字列分類装置は、請
求項２に記載の文字列分類装置において、前記光学文字
読取装置により読み取り可能な情報媒体に、前記２つの
文字列と、これら文字列の近傍にそれぞれ位置して各々
住所地および氏名の指標となる識別標識とが記載され、
前記判定分類手段は、前記識別標識と、前記比較カウン
ト手段の出力とに基づいて前記２つの文字列を住所地ま
たは氏名と判定して分類することを特徴とする。 [0009] string classification apparatus according to claim 3, 請
3. The character string classification device according to claim 2, wherein the optical character
An information medium that can be read by a reader
Character strings and each of these
The address and the identification sign that serves as an indicator of the name are described,
The determination / classification unit includes the identification mark and the comparison counter.
The two character strings are converted to an address based on the output of the
It is characterized by judging it as a name or a name.

【００１０】この文字列分類装置においては、情報媒
体に記載されている識別標識がその近傍の文字列の内容
の指標となっており、この識別標識に基づいて該文字列
を住所地や氏名であると判定して分類する。例えば、紙
に記載された文字情報の中から「住所」の文字を検出
し、その文字と同じ行にある文字列は住所地であると判
定される。さらに、比較カウント手段によってその文字
列が住所地か否かを検出する。なお、情報媒体とは、文
字が印刷されるものであれば、紙の他、プラスチックや
ビニール、金属など材質は問わない。 In this character string classification device, the information medium
The contents of the character string near the identification mark printed on the body
Of the character string based on the identification sign.
Is determined to be an address or a name. For example, paper
"Address" characters are detected from the character information described in
The character string on the same line as that character is determined to be an address.
Is determined. Furthermore, the character is compared by the comparison counting means.
Detects whether a column is an address location. The information medium is a sentence
If the characters are printed, besides paper, plastic or plastic
Materials such as vinyl and metal do not matter.

【００１１】請求項４記載の文字列分類装置は、請求
項１から３いずれかに記載の文字列分類装置において、
前記文字列を分類後ＦＤ（フレキシブルディスク、以下
ＦＤと記す）に出力することを特徴とする。According to a fourth aspect of the present invention, there is provided the character string classifying apparatus according to any one of the first to third aspects.
The character strings are classified and output to an FD (flexible disk, hereinafter referred to as FD).

【００１２】この文字列分類装置においては、ＦＤを
介すことで、他のデータ処理装置において判定分類後の
データを利用することができる。この場合、判定分類後
のデータをそのままＦＤに格納しても良いし、所定のフ
ォーマットデータに変換して格納しても良い。従って、
例えば、判定分類後のデータ自体や所定フォーマットと
して出力されたデータを、他のデータ処理装置に集積さ
せてデータベースとして利用することができる。In this character string classification device, the data after the judgment classification can be used in another data processing device through the FD. In this case, the data after the determination and classification may be stored in the FD as it is, or may be converted into predetermined format data and stored. Therefore,
For example, the data itself after the determination and classification or the data output in a predetermined format can be integrated into another data processing device and used as a database.

【００１３】請求項５記載の文字列分類装置は、請求
項１から４いずれかに記載の文字列分類装置において、
分類後の前記２つの文字列は、自動車登録業務に用いら
れることを特徴とする。[0013] string sorting apparatus of claim 5, wherein
Item 5. The character string classification device according to any one of Items 1 to 4,
The two character strings after classification are used for automobile registration work.
It is characterized by being performed .

【００１４】この文字列分類装置においては、自動車
登録業務において、住所地、氏名の分類労力を軽減する
ことができる。 In this character string classifying device, an automobile
Reduce the work of classifying addresses and names in registration work
be able to.

【００１５】[0015]

【発明の実施の形態】以下、本発明の実施形態に係る住
所地分類装置について図面を参照して説明する。本実施
形態において示す住所地分類装置は、都道府県名を含む
住所地を分類するものであり、この場合、文字列中にキ
ーワードが３以上含まれている場合に該文字列を住所と
判定、分類することが有効である。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing the configuration of an address and place classification apparatus according to an embodiment of the present invention. The address place classification device shown in the present embodiment classifies an address place including a prefecture name. In this case, when a character string includes three or more keywords, the character string is determined as an address, Classification is effective.

【００１６】図１は、住所地分類装置のブロック図で
ある。図において、１はパーソナルコンピュータであ
り、共通バス２を介して、ディスク装置（キーワード格
納手段）３、ＣＰＵ（セントラル・プロセッシング・ユ
ニット）４、ＲＡＭ（ランダム・アクセス・メモリ）
５、ＲＯＭ（リード・オンリー・メモリ）６が設けられ
ている。ＲＯＭ６には、パーソナルコンピュータ１の基
本的な動作を制御するプログラムが書き込まれている。
ディスク装置３には、動作用プログラムファイル３ａ、
キーワードデータファイル３ｂが格納されており、ＣＰ
Ｕ４は、動作用プログラムファイル３ａをＲＡＭ５上で
実行する。キーワードデータファイル３ｂには、都・道・府・県・郡・市・町・村・区・丁目・大字・番
地等の、住所地用語からなる一連のキーワード群が格納さ
れている。（以下、これら各キーワードをそれぞれ第１
キーワード、第２キーワード…、と呼ぶ。）また、この
パーソナルコンピュータ１には、スキャナ（不図示）を
含むＯＣＲ装置（文字列入力手段）７、ＦＤ（フレキシ
ブルディスク）８が格納されたＦＤドライブ９が接続さ
れている。FIG. 1 is a block diagram of an address and place classifying apparatus. In FIG. 1, reference numeral 1 denotes a personal computer, a disk device (keyword storage means) 3, a CPU (central processing unit) 4, and a RAM (random access memory) via a common bus 2.
5, a ROM (read only memory) 6 is provided. In the ROM 6, a program for controlling the basic operation of the personal computer 1 is written.
The disk drive 3 has an operation program file 3a,
The keyword data file 3b is stored, and the CP
U4 executes the operation program file 3a on the RAM5. The keyword data file 3b stores a series of keywords composed of address and place terms such as capital, prefecture, prefecture, prefecture, county, city, town, village, ward, chome, larger section, and address. (Hereinafter, each of these keywords will be
A keyword, a second keyword,... The personal computer 1 is connected to an OCR device (character string input means) 7 including a scanner (not shown) and an FD drive 9 storing an FD (flexible disk) 8.

【００１７】ここで、ＣＰＵ４及びＲＡＭ５は比較カ
ウント手段１０及び判定分類手段１１を構成している。Here, the CPU 4 and the RAM 5 constitute a comparison counting means 10 and a judgment classification means 11 .

【００１８】次に、この住所地分類装置の動作を説明
する。動作用プログラムを起動させると、まず図２のフ
ローチャートにおけるステップＳＰ１にて、プログラム
はＯＣＲ装置７を介して、住所地、氏名が記載された紙
をスキャナで読み込む。なお、住所地、氏名が記載され
た紙は、住所地、氏名のみがそれぞれ一行づつ印字され
ているものとする。また、「住所地」とは、現住所の
他、本籍など、住所地により表されるもの全てを含むも
のであり、「氏名」とは、人名のほか、法人や団体の名
称も含むものである。ＯＣＲ装置７は、スキャナが画像
として読み込んだイメージデータの中から、文字列と認
識された部分を選択し、該複数の文字列をテキストデー
タ（ＪＩＳコード等の文字コード）に変換する。このテ
キストデータは、ＲＡＭ５に格納される。この段階にお
いて、ＲＡＭ５には、いずれか一方が住所地であり、他
方が氏名である二つの文字列が格納されている。以下、
これら文字列を、第１の文字列、第２の文字列と呼ぶ。Next, the operation of the address and place classification device will be described. When the operation program is started, first, in step SP1 in the flowchart of FIG. 2, the program reads the paper on which the address and the name are described by the scanner via the OCR device 7. The paper on which the address and the name are described is assumed to have only the address and the name printed on each line. The “address place” includes everything represented by the address place, such as the real address, in addition to the current address, and the “name” includes not only the name of a person but also the name of a corporation or an organization. The OCR device 7 selects a part recognized as a character string from the image data read by the scanner as an image, and converts the plurality of character strings into text data (character code such as JIS code). This text data is stored in the RAM 5. At this stage, the RAM 5 stores two character strings, one of which is an address and the other is a name. Less than,
These character strings are called a first character string and a second character string.

【００１９】次に、ステップＳＰ２に進む。比較カウ
ント手段１０としてのＣＰＵ４及びＲＡＭ５は、第１の
文字列中に含まれているキーワードの数をカウントす
る。まず、ＲＡＭ５の所定位置に格納されているポイン
タＰに、初期値として１が代入され、カウンタＣに、初
期値として０が代入される。Next, the operation proceeds to step SP2. The CPU 4 and the RAM 5 as the comparison counting means 10 count the number of keywords included in the first character string. First, 1 is assigned as an initial value to a pointer P stored at a predetermined position in the RAM 5, and 0 is assigned as an initial value to a counter C.

【００２０】そして、図３に示すように、第１キーワ
ード（「都」）と、第１の文字列のＰ（＝１）文字目と
が比較され、一致するか否かが検出される。これらが一
致する場合、カウンタＣに１が加えられる。一致しない
場合、第２、第３以降のキーワード（「道」「府」…）
とも同様に比較される。いずれのキーワードとも一致し
ない場合には、カウンタは変更されない（図３（１）参
照）。Then, as shown in FIG. 3, the first keyword ("to") is compared with the Pth (= 1) character of the first character string, and it is detected whether or not they match. If they match, 1 is added to counter C. If they do not match, the second, third and subsequent keywords ("road", "fu" ...)
Are compared in the same way. If none of the keywords match, the counter is not changed (see FIG. 3A).

【００２１】このようにしてキーワード群と比較さ
れ、第１の文字列のＰ文字目から始まる文字列がいずれ
のキーワードとも一致しない場合には、Ｐに１が足さ
れ、次の文字との比較が引き続き行われる（図３（２）
参照）。いずれかのキーワードと一致していた場合に
は、Ｐに一致キーワードの文字数が足され、新たなＰ文
字目との比較が引き続き行われる（図３（３）、図４参
照）。このようにして、増加後のＰについて再び上記の
比較を繰り返す。In this manner, the character string is compared with the keyword group. If the character string starting from the Pth character of the first character string does not match any keyword, 1 is added to P, and the comparison with the next character is performed. Is continued (Fig. 3 (2)
reference). If the keyword matches any one of the keywords, the number of characters of the matching keyword is added to P, and the comparison with the new P-th character is continued (see FIGS. 3 (3) and 4). In this way, the above comparison is repeated again for P after the increase.

【００２２】ここで、キーワードのカウント条件とし
て、第１の文字列中に同一キーワードが重複して含まれ
ている場合には、そのキーワードはカウントしないこと
とする。また、第１の文字列が、「都・道・府・県」の
文字を複数含む場合、そのうちの一つのみをカウントす
ることにする。「市・町・村」についても同様とする。
さらには、第１の文字列に含まれる各キーワード相互の
順番も考慮し、例えば「都・道・府・県」の文字は、
「郡・市・町・村」に先行する場合のみカウントする。Here, when the same keyword is repeatedly included in the first character string as a keyword counting condition, the keyword is not counted. Further, when the first character string includes a plurality of characters of "Tokyo / Province / Prefecture / Prefecture", only one of them is counted. The same applies to "city / town / village".
Furthermore, the order of each keyword included in the first character string is also taken into consideration.
Count only when preceding "County / City / Town / Village".

【００２３】この比較を、第１の文字列全体が比較さ
れるまで繰り返すと、カウンタＣには、第１の文字列が
含むキーワード数が格納されている。（なお、例えばキ
ーワードが「大字」、「字」の両方を含み、第１の文字
列中に「大字」の文字が含まれている場合においても、
上記に従えば、「大字」中の「字」の文字は、キーワー
ドの「字」ではないと判断される。）When this comparison is repeated until the entire first character string is compared, the counter C stores the number of keywords included in the first character string. (Note that, for example, even when the keyword includes both “large font” and “character” and the first character string includes the character “large font”,
According to the above description, it is determined that the character of "character" in "large" is not the "character" of the keyword. )

【００２４】カウンタＣの値は、当該文字列を住所地
として分類するか否かの判断指標として、判定分類手段
１１に利用される。カウンタＣが３以上の場合、すなわ
ち、第１の文字列中にキーワードが３種以上含まれてい
た場合には、図５（ａ）のように、都道府県名、市町村
名、町名・番地等を含む住所地と考えられるので、第１
の文字列は住所地であると判定し、ステップＳＰ１１に
おいてＦＤドライブ９に格納されたＦＤ８上の住所地格
納エリアに第１の文字列が出力される。この場合、もう
一つの文字列である第２の文字列は、判定分類手段１１
により氏名であると判定され、ステップＳＰ１２におい
てＦＤ８上の氏名格納エリアに第２の文字列が出力され
る。カウンタＣが３未満の場合、すなわち、図５の
（ｂ）〜（ｄ）のように、第１の文字列中にキーワード
が２種以下しか含まれていない場合、前記ＣＰＵ４は、
第１の文字列は住所地ではなく氏名であると判定し、ス
テップＳＰ２１においてＦＤ８上の氏名格納エリアに第
１の文字列が出力される。この場合、もう一つの文字列
である第２の文字列は住所地であると判定され、ステッ
プＳＰ２２においてＦＤ８上の住所地格納エリアに第２
の文字列が出力される。また、図５の（ｄ）の様に、同
じキーワード（「市」）が複数含まれている場合でも、
重複してカウントされない。さらに、図５（ｅ）のよう
に、キーワードを３種（「町」「村」「市」）含んでい
ても、前述のようにそのうちの一つしかカウントされな
いので、氏名と判定される。The value of the counter C is used by the judgment and classification means 11 as an index for judging whether or not to classify the character string as an address. If the counter C is 3 or more, that is, if the first character string contains three or more keywords, as shown in FIG. 5A, the name of a prefecture, the name of a city, the name of a city, and the like are used. Because it is considered to be an address place including
Is determined to be an address place, and the first character string is output to the address place storage area on the FD 8 stored in the FD drive 9 in step SP11. In this case, the second character string, which is another character string,
Is determined as a name, and the second character string is output to the name storage area on the FD 8 in step SP12. When the counter C is less than 3, that is, when the first character string includes only two or less keywords as in (b) to (d) of FIG.
It is determined that the first character string is not an address but a name, and the first character string is output to the name storage area on the FD 8 in step SP21. In this case, the second character string, which is another character string, is determined to be an address place, and the second place is stored in the address place storage area on the FD 8 in step SP22.
Is output. Also, as shown in FIG. 5D, even when a plurality of the same keywords (“city”) are included,
It is not counted twice. Further, as shown in FIG. 5E, even if three keywords ("town", "village", "city") are included, only one of them is counted as described above, so that the keyword is determined.

【００２５】ＦＤ８には、その後、他の必要なデータ
が書き込まれ、記録された住所地等の情報を利用する他
の処理手段へと受け渡される。After that, other necessary data is written in the FD 8 and is passed to another processing means using the recorded information such as the address.

【００２６】このように、紙に記録されていた文字列
が住所地または氏名と判定、分類されてＦＤ８に自動的
に出力されるので、いずれの文字列が住所地であるかを
人手によって指定する必要がなく、労力を軽減すること
ができる。As described above, the character string recorded on the paper is determined as an address place or name, classified and automatically output to the FD 8, so that which character string is the address place is manually designated. It is not necessary to do so, and labor can be reduced.

【００２７】なお、上記のような動作を実現するため
のプログラムをコンピュータに読み取り可能な記録媒体
に記録して、この記録媒体に記録されたプログラムをコ
ンピュータに読み込ませ、実行することとしてもよい。A program for realizing the above operation may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read and executed by a computer.

【００２８】また、上記の実施形態の他、以下のよう
な実施形態としてもよい。（１）本発明に係る判読分類装置は、自動車登録業務総
合処理装置など、住所地や氏名を入力することが必要な
装置と組み合わせることが可能である。（２）上記本実施形態においては、第２の文字列につい
てはその内容を検証することなく判定分類を行っている
が、第１の文字列について上記判定を行ったあと、更に
第２の文字列についても第１の文字列と同様に判定を行
い、双方の結果を比較して分類することとしてもよい。（３）上記本実施形態においてはカウンタＣが３以上で
住所地と判断したが、カウンタＣが２以上で住所地と判
定するようにしてもよい。この場合、例えば、都道府県
名が省略されていて含有キーワード数が少ない住所地で
も分類することができる。（４）上述したキーワードは一例であり、これらに限定
されるものではなく、キーワードを変更し、例えば「財
団法人」等とすることで、氏名、名称を分類することも
できる。さらに、キーワードの言語を変えることで、日
本語以外の文字列を分類することもできる。また、キー
ワードの文字数は本実施形態に限られず、単語、文章で
もよい。（５）識別された住所地、氏名をＦＤに出力するように
したが、ＦＤへの出力に代えて、あるいは、ＦＤへの出
力と共に、所定フォーマット（例えば、住所地、氏名を
記載する必要がある各種申請書の書式）でのプリントを
行ったり、その他の情報記録媒体への記録や、一旦メモ
リに格納した後、別の処理を行ってからＦＤ等への記録
を行ってもよい。別の処理としては、例えば、住所地で
はないと分類された文字列に対し、改めて氏名であるこ
とを検出する処理や、更に他の方法と組み合わせて判定
精度を上げることなどである。Further, in addition to the above embodiment, the following embodiment may be adopted. (1) The interpretation / classification device according to the present invention can be combined with a device that requires input of an address and a name, such as a vehicle registration business integrated processing device. (2) In the present embodiment, the determination and classification are performed without verifying the content of the second character string. However, after the above determination is performed on the first character string, the second character string is further processed. A determination may be made for a column in the same manner as for the first character string, and the results of both may be compared for classification. (3) In the above embodiment, the address is determined to be the address location when the counter C is 3 or more, but the address location may be determined when the counter C is 2 or more. In this case, for example, it is possible to classify even an address where the name of the prefecture is omitted and the number of contained keywords is small. (4) The above-mentioned keywords are merely examples, and the present invention is not limited to these keywords. By changing the keywords to, for example, “foundation” or the like, the names and names can be classified. Furthermore, character strings other than Japanese can be classified by changing the language of the keyword. Further, the number of characters of the keyword is not limited to the present embodiment, and may be a word or a sentence. (5) The identified address location and name are output to the FD. However, in place of the output to the FD or together with the output to the FD, a predetermined format (for example, it is necessary to describe the address location and name). It may be printed in a certain application form, recorded on another information recording medium, or temporarily stored in a memory, and then, after another processing, recorded on an FD or the like. As another process, for example, there is a process of detecting that a character string classified as not an address is a name again, or a combination with another method to improve determination accuracy.

【００２９】（６）本実施形態においては紙に印字さ
れている住所地、氏名をＯＣＲにより入力することとし
たが、印字ではなく手書きであってもよい。さらに、文
字列の入力をＯＣＲではなく、通信装置やディスク装
置、キーボード等によって入力することとしてもよい。
キーボードで文字列を入力する場合、入力された文字列
の内容を即座に住所地などと判断することができる。ま
た、ＯＣＲにより入力される媒体としては、住民票、印
鑑登録証明書、土地登記簿抄本等の不動産に関する証明
書、自動車検査証等の動産に関する証明書、名刺、葉書
など、分類対象となる文字列が記載されている媒体であ
ればいかなるものにでもよい。更に、テキストデータに
変換された文字列と共に、スキャナで読み込んだイメー
ジデータとしての文字列をもメモリに格納し、必要に応
じて出力することとしてもよい。これにより、ＯＣＲが
読み込んだテキストデータと併せて、もともと紙に記録
されていた文字にもアクセスすることができる。（７）上記本実施形態においては、第１の文字列中にキ
ーワードが含まれていた場合、キーワードひとつにつき
カウンタＣを１増加させるとしたが、キーワードの種類
に応じてカウンタ増加量の配分を変え、含まれていると
住所地である蓋然性が高いようなキーワード（例えば
「番地」）については、より多くカウンタＣを増加させ
るようにしてもよい。(6) In this embodiment, the address and the name printed on the paper are input by OCR, but may be handwritten instead of printing. Further, the input of the character string may be performed by a communication device, a disk device, a keyboard, or the like, instead of the OCR.
When a character string is input with a keyboard, the contents of the input character string can be immediately determined as an address or the like. The media to be entered by OCR include characters to be classified, such as a resident's card, a seal registration certificate, a certificate for real estate such as an excerpt of land registry, a certificate for personal property such as an automobile inspection certificate, a business card, and a postcard. Any medium may be used as long as the medium is described in the column. Further, a character string as image data read by a scanner may be stored in the memory together with the character string converted into the text data, and may be output as needed. As a result, the characters originally recorded on the paper can be accessed together with the text data read by the OCR. (7) In the present embodiment, when a keyword is included in the first character string, the counter C is incremented by one for each keyword, but the counter increment is distributed according to the type of keyword. Alternatively, for a keyword that is likely to be an address if included (for example, “address”), the counter C may be increased more.

【００３０】なお、文字列を分類する装置として、以
下の構成も挙げることができる。図６は、住所地分類装
置のブロック図である。図において、１はパーソナルコ
ンピュータであり、共通バス２を介して、ディスク装置
３、ＣＰＵ（セントラル・プロセッシング・ユニット）
４、ＲＡＭ（ランダム・アクセス・メモリ）５、ＲＯＭ
（リード・オンリー・メモリ）６が設けられている。Ｒ
ＯＭ６には、パーソナルコンピュータ１の基本的な動作
を制御するプログラムが書き込まれている。ディスク装
置３には、動作用プログラムファイル３ａ、識別標識フ
ァイル３ｂが格納されており、ＣＰＵ４は、動作用プロ
グラムファイル３ａをＲＡＭ５上で実行する。The following is a device for classifying character strings.
The following configuration can also be mentioned. FIG. 6 shows an address classifier.
It is a block diagram of a device. In the figure, reference numeral 1 denotes a personal computer, a disk device 3 and a CPU (central processing unit) via a common bus 2.
4, RAM (random access memory) 5, ROM
(Read only memory) 6 is provided. R
In the OM 6, a program for controlling basic operations of the personal computer 1 is written. The disk device 3 stores an operation program file 3a and an identification marker file 3b, and the CPU 4 executes the operation program file 3a on the RAM 5.

【００３１】識別標識ファイル３ｂには、「住所」
「氏名」の文字が格納されている。また、このパーソナ
ルコンピュータ１には、スキャナ（不図示）を含むＯＣ
Ｒ装置７、ＦＤ（フロッピーディスク）８が格納された
ＦＤドライブ９が接続されている。In the identification sign file 3b, "address"
Characters of "name" are stored. The personal computer 1 also includes an OC including a scanner (not shown).
An R device 7 and an FD drive 9 storing an FD (floppy disk) 8 are connected.

【００３２】ここで、ＣＰＵ４およびＲＡＭ５は、識
別標識検出手段１２および判定分類手段１３を構成して
いる。Here, the CPU 4 and the RAM 5 constitute the identification mark detection means 12 and the judgment classification means 13 .

【００３３】次に、この住所地分類装置の動作を説明
する。動作用プログラムを動作させると、まず図７のフ
ローチャートにおけるステップＳＰ１にて、紙に記載さ
れた情報がスキャナによって読み込まれる。この紙に
は、図８に示すように、住所地、氏名が１行づつ印字さ
れており、それぞれ「住所」、「氏名」の文字に続けて
印字されている。ＯＣＲ装置７は、読み込まれたイメー
ジデータの中から文字列と認識された部分を選択し、該
複数の文字列をテキストデータ（ＪＩＳコード等の文字
コード）に変換する。このテキストデータの内容は、該
テキストデータの紙上における位置情報（何行目に記載
されているか）とともに、ＲＡＭ５に格納される。この
段階において、ＲＡＭ５には、複数の文字列と、その位
置情報とが格納されている。Next, the operation of the address and place classification device will be described. When the operation program is operated, first, in step SP1 in the flowchart of FIG. 7, information written on paper is read by the scanner. As shown in FIG. 8, an address place and a name are printed on this paper one line at a time, and are printed following the letters "address" and "name", respectively. The OCR device 7 selects a portion recognized as a character string from the read image data, and converts the plurality of character strings into text data (character code such as JIS code). The contents of the text data are stored in the RAM 5 together with the position information of the text data on the paper (in what line is described). At this stage, the RAM 5 stores a plurality of character strings and their position information.

【００３４】次に、前記複数の文字列中に、識別標識
ファイル３ｂに格納されている識別標識が含まれている
かどうかが検出される。前記ＲＡＭ５に格納されている
文字列の中から、識別標識である「住所」の文字を検索
するとともに、「住所」の文字が存在する行を検出する
（ステップＳＰ２）。続いて、ステップＳＰ３におい
て、判定分類手段１３がＲＡＭ５に格納されている他の
文字列の位置情報を調べ、前記識別標識としての「住
所」と同一の行に記載されていれば、その文字列を住所
地と判定し、ＦＤドライブ９に格納されたＦＤ８上の住
所地格納エリアに該文字列を出力する。同様に「氏名」
の文字についても、判定分類手段１３が「氏名」の文字
と同じ行に記載されている文字列を氏名と判定し、ＦＤ
ドライブ９に格納されたＦＤ８上の住所地格納エリアに
該文字列を出力する（ステップＳＰ４、５）。なお、本
実施形態において判定分類される「住所地」とは、現住
所の他、本籍など、住所地として表されるもの全てを含
むものである。また、「氏名」とは、法人や団体の名称
も含むものである。Next, it is detected whether or not the plurality of character strings include the identification mark stored in the identification mark file 3b. The character string stored in the RAM 5 is searched for the character of "address" which is the identification sign, and the line in which the character of "address" exists is detected (step SP2). Subsequently, in step SP3, the determination / classification means 13 checks the position information of another character string stored in the RAM 5, and if the position information is described on the same line as the "address" as the identification mark, the character string is Is determined as the address location, and the character string is output to the address location storage area on the FD 8 stored in the FD drive 9. Similarly, "name"
, The determination / classification unit 13 determines the character string described in the same line as the character of “Name” as the name,
The character string is output to the address location storage area on the FD 8 stored in the drive 9 (steps SP4, SP5). Note that the “address location” determined and classified in the present embodiment includes not only the current address but also all locations represented as an address location such as a permanent address. The “name” also includes the name of a corporation or an organization.

【００３５】このようにして、紙に記録されていた文
字列が住所地または氏名と判定、分類されてＦＤ８に自
動的に出力されるので、いずれの文字列が住所地である
かを人手によって指定する必要がなく、労力を軽減する
ことができる。In this way, the character string recorded on the paper is determined as an address or a name, classified and automatically output to the FD 8, so that which character string is the address is manually determined. There is no need to specify, and labor can be reduced.

【００３６】なお、上記のような動作を実現するため
のプログラムをコンピュータに読み取り可能な記録媒体
に記録して、この記録媒体に記録されたプログラムをコ
ンピュータに読み込ませ、実行することとしてもよい。Note that a program for realizing the above-described operation may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read and executed by the computer.

【００３７】また、上記構成の他、以下の構成として
もよい。（１）上記住所地分類装置は、自動車登録業務総合処理
装置など、住所地や氏名を入力することが必要な装置と
組み合わせることが可能である。（２）上記構成においては、所定の識別標識と同一の行
の文字列を住所地または氏名と判定しているが、同一行
でなくとも、近傍であればよい。住所地、氏名が記載さ
れている行の他に、別の文字列が記載されている行があ
ってもよい。（３）また、識別標識は文字であるとしたが、文字以外
にも、文字列近傍に付された記号、文字列に付された下
線、囲み枠等でもよい。In addition to the above configuration , the following configuration may be adopted. (1) The above-mentioned address place classification device can be combined with a device that requires input of an address place and a name, such as an automobile registration business integrated processing device. (2) In the above configuration , the character string on the same line as the predetermined identification mark is determined to be the address or the name. There may be a line in which another character string is described in addition to a line in which an address place and a name are described. (3) Although the identification mark is a character, other than a character, a symbol attached near the character string, an underline attached to the character string, an enclosing frame, or the like may be used.

【００３８】（４）判定された住所地、氏名をＦＤに
出力するようにしたが、ＦＤへの出力に代えて、あるい
は、ＦＤへの出力と共に、所定フォーマット（例えば、
住所地、氏名を記載する必要がある各種申請書の書式）
でのプリントを行ったり、その他の情報記録媒体への記
録や、一旦メモリに格納した後、別の処理を行ってから
ＦＤ等への出力を行ってもよい。別の処理としては、例
えば、上記本発明の実施形態における住所地分類装置に
より、改めて住所地か否かを検出する処理が挙げられ
る。この場合、既にＲＡＭ５に格納されてる文字列に対
し、図２のＳＰ２からの処理を行う。 (4) The determined address and name are output to the FD. Instead of the output to the FD or together with the output to the FD, a predetermined format (for example,
Various application forms that need to include your address and name)
May be printed, recorded on another information recording medium, or temporarily stored in a memory, and then subjected to another process and then output to an FD or the like. As another process, for example, a process of re-detecting whether or not an address is an address using the address and location classification device in the above- described embodiment of the present invention can be mentioned. In this case, the character string already stored in the RAM 5 is
Then, the processing from SP2 in FIG. 2 is performed.

【００３９】（５）上記構成においては紙に印字され
ている住所地、氏名をＯＣＲにより入力することとした
が、印字ではなく手書きであってもよい。さらに、文字
列の入力をＯＣＲではなく通信装置やディスク装置、キ
ーボードなどによって入力することとしてもよい。キー
ボードで文字列を入力する場合、入力された文字列を即
座に住所地などと判断することができる。また、ＯＣＲ
により入力される媒体としては、住民票、印鑑登録証明
書、土地登記簿抄本等の不動産に関する証明書、自動車
検査証等の動産に関する証明書、名刺、葉書など、分類
対象となる文字列が記載されている媒体であればいかな
るものでもよい。更に、テキストデータに変換された文
字列と共に、スキャナで読み込んだイメージデータとし
ての文字列をもメモリに格納し、必要に応じて出力する
こととしてもよい。これにより、ＯＣＲが読み込んだテ
キストデータと併せて、もともと紙に記録されていた文
字にもアクセスすることができる。(5) In the above configuration , the address and name printed on the paper are input by OCR, but may be handwritten instead of printing. Further, the input of the character string may be performed by a communication device, a disk device, a keyboard, or the like instead of the OCR. When a character string is input using a keyboard, the input character string can be immediately determined as an address or the like. Also, OCR
As the media to be entered by, character strings to be classified such as resident's card, seal registration certificate, real estate certificate such as land register excerpt, personal property certificate such as automobile inspection certificate, business card, postcard, etc. are described Any medium can be used as long as it is used. Further, the character string as image data read by the scanner may be stored in the memory together with the character string converted into the text data, and may be output as needed. As a result, the characters originally recorded on the paper can be accessed together with the text data read by the OCR.

【００４０】[0040]

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明に係る実施形態として示した住所地分
類装置のブロック図である。FIG. 1 is a block diagram of an address place classification device shown as an embodiment according to the present invention.

【図６】他の例として示した住所地分類装置のブロッ
ク図である。FIG. 6 is a block diagram of an address and place classification device shown as another example.

【符号の説明】３キーワード格納手段７ＯＣＲ装置（文字列入力手段）８ＦＤ１０比較カウント手段１１判定分類手段１２識別標識検出手段[Description of Signs] 3 Keyword storage means 7 OCR device (character string input means) 8 FD 10 Comparison counting means 11 Judgment classification means 12 Identification mark detection means

Claims

[Claims]

1. A character string input means for inputting a character string, a keyword storage means for storing a predetermined keyword, and a character string input by the character string input means and a keyword stored in the keyword storage means. Comparison counting means for comparing and counting the number of keywords included in the character string, and a first determination classification for determining the content represented by the character string according to the counted number of keywords and classifying the character string Means for classifying character strings.

2. The character string classification device according to claim 1, wherein the keyword is composed of an address and place term, and
The character string classifying device according to claim 1, wherein the character string is determined to be an address according to the counted number of keywords.

3. The character string classification device according to claim 2, wherein the first determination / classification unit is configured to execute the character string when the number of keywords in the character string counted by the comparison counting unit is two or more. Is determined as an address location.

4. The character string classification device according to claim 1, wherein the character string input unit is an optical character reading device.

5. The character string classification device according to claim 1, wherein the classified character string is output to a flexible disk.

6. An identification marker detecting means for detecting a predetermined identification marker from a description of an information medium, and determining a content of a character string described in the vicinity of the identification marker according to the identification marker. A character string classifying device comprising a second determination / classification unit for classifying a character string.