JP2000057261A

JP2000057261A - Character segmenting device

Info

Publication number: JP2000057261A
Application number: JP10229744A
Authority: JP
Inventors: Masahiro Sakurai; 雅寛櫻井; Kazuhiro Ishikawa; 和弘石川
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1998-08-14
Filing date: 1998-08-14
Publication date: 2000-02-25

Abstract

PROBLEM TO BE SOLVED: To provide a character segmenting device which improves a segmenting method for a character segmentation object pattern. SOLUTION: Image data Si supplied from a scanner, etc., are stored in an image storage part 1 and the image data S1 are outputted from the image storage part 1. Further, feature information S3 set by an input part or a control part, etc., is already stored in a feature storage part 3 and outputted to a character segmentation part 2. The character segmentation part 2 cuts characters out of the image data S1, one by one, through character segmentation corresponding to the feature information S3. The cutting result is outputted as the character cutting result S2 to the outside of the character segmenting device.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文字又は記号から
なる文字切出し対象パターンの切出し方法を改善した文
字切出し装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character extracting apparatus having an improved method for extracting a character extracting target pattern composed of characters or symbols.

【０００２】[0002]

【従来の技術】従来の文字切出し装置では、例えばスキ
ャナ等の読取り装置から入力された画像に対して文字が
切出される。図２は従来の文字切出し装置における横書
きの文字の文字切出し処理の例を示す図であり、図３が
図２における文字切出し候補点の取得を説明するための
フローチャートである。これらの図２及び図３を参照し
つつ、従来の文字切出し装置における文字切出し処理
（ａ）〜（ｄ）を説明する。2. Description of the Related Art In a conventional character extracting device, characters are extracted from an image input from a reading device such as a scanner. FIG. 2 is a diagram showing an example of character extraction processing of horizontally written characters in a conventional character extraction device, and FIG. 3 is a flowchart for explaining the acquisition of character extraction candidate points in FIG. The character extraction processing (a) to (d) in the conventional character extraction device will be described with reference to FIGS.

【０００３】（ａ）文字パターンの画素（例えば、黒
ドット）の垂直方向に対する射影分布を測定し、この射
影分布の値が０となる部分の中点を文字切出し候補点１
〜８として取得する。この場合、図２に示すように、行
に垂直な黒ドットの個数で射影分布を測定する（ステッ
プＳＴ１）。始点座標を消去し（ステップＳＴ２）、処
理座標Ｘを行の左端に合わせる（ステップＳＴ３）。射
影分布の値が０でなくなるまで処理座標Ｘを右に移動す
る（ステップＳＴ４）。処理座標Ｘにおける射影分布の
値が０であるか否かを判定し（ステップＳＴ５）、０で
ない場合、始点座標が記憶されているか否かを判定する
（ステップＳＴ８）。始点座標が記憶されていない場
合、処理座標Ｘを右に１ドット移動する（ステップＳＴ
６）。処理座標Ｘが行の右端であるか否かを判定し、右
端である場合には処理を終了し、右端でない場合にはス
テップＳＴ５に戻る（ステップＳＴ７）。(A) A projection distribution of pixels (for example, black dots) of a character pattern in the vertical direction is measured, and a middle point where the value of the projection distribution becomes 0 is a character extraction candidate point 1.
~ 8. In this case, as shown in FIG. 2, the projection distribution is measured by the number of black dots perpendicular to the row (step ST1). The start point coordinates are deleted (step ST2), and the processing coordinates X are adjusted to the left end of the row (step ST3). The processing coordinate X is moved to the right until the value of the projection distribution is no longer 0 (step ST4). It is determined whether or not the value of the projection distribution at the processing coordinates X is 0 (step ST5). If not, it is determined whether or not the starting point coordinates are stored (step ST8). If the start point coordinates are not stored, the processing coordinates X are moved to the right by one dot (step ST).
6). It is determined whether or not the processing coordinate X is at the right end of the row. If the processing coordinate X is at the right end, the process is terminated. If not, the process returns to step ST5 (step ST7).

【０００４】前記ステップＳＴ５において、射影分布の
値が０である場合、ステップＳＴ１２へ進む。始点座標
が記憶されているか否かを判定し（ステップＳＴ１
２）、記憶されている場合にステップＳＴ６へ進み、記
憶されていない場合に現在の処理座標Ｘを始点座標とし
て記憶し、ステップＳＴ６ヘ進む（ステップＳＴ１
１）。前記ステップＳＴ８において、始点座標が記憶さ
れている場合、ステップＳＴ９に進む。｛（処理座標Ｘ
−１）＋始点座標｝／２を文字切出し候補点とする（ス
テップＳＴ９）。始点座標を消去し、前記ステップＳＴ
６へ進む（ステップＳＴ１０）。If the value of the projection distribution is 0 in step ST5, the process proceeds to step ST12. It is determined whether or not the start point coordinates are stored (step ST1).
2) If it is stored, the process proceeds to step ST6. If it is not stored, the current processing coordinate X is stored as the start point coordinate, and the process proceeds to step ST6 (step ST1).
1). If the start point coordinates are stored in step ST8, the process proceeds to step ST9. ｛(Processing coordinates X
-1) + Start point coordinate｝ / 2 is set as a character extraction candidate point (step ST9). The start point coordinates are deleted and the step ST
The process proceeds to step 6 (step ST10).

【０００５】（ｂ）文字切出し候補点１〜８の隣同志
の間隔の例えば最大値（図２では、文字切出し候補点５
と文字切出し候補点６との間隔）を基準間隔Ｗとする。（ｃ）基準間隔Ｗより一定値以下（例えば、基準間隔
Ｗの１／３以下）の間隔になる文字切出し候補点（図２
では、文字切出し候補点４，８）を候補から除外する。（ｄ）残った文字切出し候補点１，２，３，５，６，
７を、最終的な文字切出し位置Ａ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆ
とする。尚、図２では、文字切出し候補点１〜８の間隔の最大値
を基準間隔Ｗとしたが、これは基準間隔の決定方法の一
例であり、例えば文字切出し候補点１〜８の間隔の平均
値を基準間隔Ｗとしてもよい。(B) For example, the maximum value of the interval between the adjacent character extraction candidate points 1 to 8 (in FIG. 2, the character extraction candidate point 5
And the character separation candidate point 6) as a reference interval W. (C) Character extraction candidate points having an interval equal to or less than a fixed value (for example, 1/3 or less of the reference interval W) from the reference interval W (FIG.
Then, character extraction candidate points 4, 8) are excluded from the candidates. (D) Remaining character extraction candidate points 1, 2, 3, 5, 6,
7 to the final character cutout positions A, B, C, D, E, F
And In FIG. 2, the maximum value of the intervals between the character extraction candidate points 1 to 8 is set as the reference interval W. However, this is an example of a method for determining the reference interval. The value may be set as the reference interval W.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、従来の
文字切出し装置では、次のような課題があった。図４
（ａ），（ｂ），（ｃ）は、問題となる文字切出しの例
を示す図である。図４（ａ）では、行Ｌａ中の「旧」と
「１日」とを比較すると、垂直方向の黒ドットの射影分
布は似ているが、「旧」は１文字として切出し、「１
日」を２文字として切出すように判断する必要がある。
図４（ｂ）では、行Ｌｂ中の微小なパターンを文字（例
えば、ピリオド）として切出すか、又はごみとして切出
さないように判断する必要がある。図４（ｃ）では、行
Ｌｃ中の下側にあるパターンを文字（例えば、ピリオ
ド）として切出すか、ごみとして処理するように判断す
る必要がある。ところが、任意の画像に対してこれらの
判断を適切に行うことは困難であり、誤って１文字を２
文字として切出すことによって文字切出し率が低下する
か、或いは、ごみを文字として切出すことにより、不必
要な文字を切出すという課題があった。However, the conventional character extracting apparatus has the following problems. FIG.
(A), (b), (c) is a figure which shows the example of character extraction which becomes a problem. In FIG. 4A, when “old” and “1 day” in the row La are compared, the projected distribution of black dots in the vertical direction is similar, but “old” is cut out as one character, and “1” is cut out.
It is necessary to determine that the date is cut out as two characters.
In FIG. 4B, it is necessary to determine whether a minute pattern in the line Lb is cut out as a character (for example, a period) or not as garbage. In FIG. 4C, it is necessary to determine that the pattern on the lower side in the line Lc is cut out as a character (for example, a period) or processed as garbage. However, it is difficult to appropriately perform these determinations on an arbitrary image, and one character is mistakenly replaced by two.
There has been a problem that the character extraction rate is reduced by extracting as characters, or unnecessary characters are extracted by extracting garbage as characters.

【０００７】[0007]

【課題を解決するための手段】前記課題を解決するため
に、本発明のうちの請求項１に係る発明は、文字切出し
装置において、記号又は文字からなる文字切出し対象パ
ターンが記入された帳票の画像データを記憶する画像記
憶手段と、前記文字切出し対象パターンの特徴情報を記
憶する特徴情報記憶手段と、前記特徴情報に基づき前記
記憶された画像データから文字を切出す文字切出し手段
とを、備えている。このような構成を採用したことによ
り、帳票の画像データが画像記憶手段に記憶され、文字
切出し対象パターンの特徴情報が特徴情報記憶手段に記
憶される。文字切出し手段において、前記特徴情報に基
づき、前記画像記憶手段に記憶された画像データから文
字が切出される。According to a first aspect of the present invention, there is provided a character extracting apparatus, comprising: Image storage means for storing image data, characteristic information storage means for storing characteristic information of the pattern for character extraction, and character extraction means for extracting characters from the stored image data based on the characteristic information. ing. By adopting such a configuration, the image data of the form is stored in the image storage unit, and the characteristic information of the character extraction target pattern is stored in the characteristic information storage unit. In a character extracting unit, a character is extracted from the image data stored in the image storage unit based on the characteristic information.

【０００８】請求項２に係る発明では、請求項１の文字
切出し装置において、文字切出し対象パターンを記憶す
る対象文字記憶手段と、前記文字切出し対象パターンを
入力して前記特徴情報を出力する文字特徴判定手段とを
備え、前記文字特徴判定手段が出力する特徴情報を前記
特徴情報記憶手段に記憶するようにしている。このよう
な構成を採用したことにより、文字切出し対象パターン
が対象文字記憶手段に記憶され、該文字切出し対象パタ
ーンが文字特徴判定手段に入力されて特徴情報が出力さ
れる。前記文字特徴判定手段から出力された特徴情報
は、特徴情報記憶手段に記憶される。請求項３に係る発
明では、請求項１又は２の文字切出し装置において、前
記特徴情報は、前記文字切出し対象パターン固有の特徴
を表すようにしている。請求項４に係る発明では、請求
項１、２又は３の文字切出し装置において、前記文字切
出し手段は、前記特徴情報記憶手段が出力する前記特徴
情報に基づいて文字を切出すようにしている。In the invention according to a second aspect, in the character extracting apparatus according to the first aspect, a target character storage unit for storing a character extraction target pattern, and a character characteristic for inputting the character extraction target pattern and outputting the characteristic information. Determining means for storing the characteristic information output by the character characteristic determining means in the characteristic information storage means. By adopting such a configuration, the character extraction target pattern is stored in the target character storage unit, and the character extraction target pattern is input to the character feature determination unit and the characteristic information is output. The feature information output from the character feature determination unit is stored in a feature information storage unit. According to a third aspect of the present invention, in the character extracting device according to the first or second aspect, the characteristic information represents a characteristic unique to the character extraction target pattern. According to a fourth aspect of the present invention, in the character extracting apparatus according to any one of the first to third aspects, the character extracting unit extracts a character based on the characteristic information output from the characteristic information storing unit.

【０００９】請求項５に係る発明では、請求項２、３又
は４の文字切出し装置において、前記対象文字記憶手段
は、文字切出し対象パターンの文字コードを記憶するよ
うにしている。請求項６に係る発明では、請求項２、
３、４又は５の文字切出し装置において、前記文字特徴
判定手段は、前記文字切出し対象パターンの文字コード
に予め関連付けられた前記特徴情報を記憶する特徴情報
テーブル記憶手段を備えている。請求項７に係る発明で
は、請求項６の文字切出し装置において、前記特徴情報
テーブル記憶手段は、前記文字コードと該文字コードの
前記特徴情報を対応付けて記憶するようにしている。請
求項８に係る発明では、請求項６又は７の文字切出し装
置において、前記文字特徴判定手段は、前記対象文字記
憶手段が出力する前記文字コードに対応した前記特徴情
報を前記特徴情報テーブル記憶手段から取得し、前記特
徴情報記憶手段に出力するようにしている。According to a fifth aspect of the present invention, in the character extracting device according to the second, third or fourth aspect, the target character storage means stores a character code of a character extraction target pattern. In the invention according to claim 6, claim 2,
In any of the character extraction devices of 3, 4, or 5, the character characteristic determination unit includes a characteristic information table storage unit that stores the characteristic information previously associated with a character code of the character extraction target pattern. In the invention according to claim 7, in the character extracting apparatus according to claim 6, the characteristic information table storage means stores the character code and the characteristic information of the character code in association with each other. In the invention according to claim 8, in the character cutout apparatus according to claim 6 or 7, the character feature determination means stores the feature information corresponding to the character code output from the target character storage means in the feature information table storage means. And outputs it to the feature information storage means.

【００１０】請求項９に係る発明では、請求項１、２、
３、４、５、６、７又は８の文字切出し装置において、
前記文字切出し対象パターン固有の特徴は、前記文字切
出し対象パターンに対し左右分離可能文字の有無、上下
分離可能文字の有無、微小文字の有無、下付き文字の有
無、上付き文字の有無、又は数字以外の文字の有無の少
なくともいずれかを表すようにしている。請求項１０に
係る発明では、請求項１、２、３、４、５、６、７、８
又は９の文字切出し装置において、前記文字切出し手段
は、前記特徴情報記憶手段が出力する前記特徴情報に左
右分離可能文字が存在しない場合、横書きの文字切出し
において縦の射影分布の値が設定された値以下となる区
間を全て文字として切出すか、又は上下分離可能文字が
無い場合、縦書きの文字切出しにおいて横の射影分布の
値が設定された値以下となる区間を全て文字として切出
すようにしている。[0010] In the invention according to claim 9, claims 1, 2,
In the character extracting device of 3, 4, 5, 6, 7 or 8,
The unique features of the character extraction target pattern include the presence or absence of left / right separable characters, the presence / absence of upper / lower separable characters, the presence / absence of small characters, the presence / absence of subscripts, the presence / absence of superscript characters, and the number It indicates at least one of the presence or absence of a character other than. In the invention according to claim 10, claims 1, 2, 3, 4, 5, 6, 7, 8
Or, in the character extracting device according to 9, when the character information output by the characteristic information storage means does not include left-right separable characters, a vertical projection distribution value is set in horizontal writing character extraction. Cut out all sections where the value is less than the value as characters, or if there is no upper / lower separable character, cut out all sections where the value of the horizontal projection distribution is less than or equal to the set value in vertical writing character extraction. I have to.

【００１１】請求項１１に係る発明では、請求項１、
２、３、４、５、６、７、８又は９の文字切出し装置に
おいて、前記文字切出し手段は、前記特徴情報記憶手段
が出力する前記特徴情報に通常の文字より小さい微小文
字が存在しない場合にパターンのサイズが予め決めてお
いた閾値以下のものを文字として切出さないか、下付き
文字が存在しない場合に行の下側に予め決めておいた閾
値以下にパタンの上端が存在するものを文字として切出
さないか、又は上付き文字が存在しない場合に行の上側
に予め決めておいた閾値以上にパターンの下端が存在す
るものを文字として切出さないようにしている。請求項
１２に係る発明では、請求項１、２、３、４、５、６、
７、８又は９の文字切出し装置において、前記文字切出
し手段は、前記特徴情報記憶手段が出力する前記特徴情
報に数字以外の文字が存在せず、文字と文字との間隔が
基準となる間隔以上の場合に、前記文字間隔の長さに応
じて前記文字間隔の間を切出すようにしている。[0011] In the invention according to claim 11, claim 1,
In the character cutout device of 2, 3, 4, 5, 6, 7, 8 or 9, the character cutout means may include a case where a minute character smaller than a normal character does not exist in the feature information output by the feature information storage means. If the size of the pattern is not cut out as a character below the predetermined threshold, or if there is no subscript, the upper end of the pattern is below the predetermined threshold below the line Is not cut out as a character, or when there is no superscript character, a character whose lower end of the pattern is equal to or greater than a predetermined threshold value above the line is not cut out as a character. In the invention according to claim 12, claims 1, 2, 3, 4, 5, 6,
In the character extracting device according to 7, 8, or 9, the character extracting means is characterized in that the characteristic information output from the characteristic information storage means does not include any character other than a numeral, and the interval between characters is equal to or longer than a reference interval. In this case, the character interval is cut out in accordance with the length of the character interval.

【００１２】[0012]

【発明の実施の形態】第１の実施形態図１は、本発明の第１の実施形態を示す文字切出し装置
の構成図である。この文字切出し装置は、文字又は記号
からなる文字切出し対象パターンが横方向又は縦方向に
記入された帳票の画像データＳｉを記憶する画像記憶手
段（例えば、画像記憶部）１を有している。画像記憶部
１は、画像データＳｉを帳票上の文字切出し対象パター
ンの２次元座標が再現できる形式で格納できるメモリで
構成されている。又、この文字切出し装置には、文字切
出しパタン固有の特徴情報の記憶手段（例えば、特徴情
報記憶部）３が設けられている。特徴情報記憶部３は、
文字切出し対象パターン固有の特徴を表す特徴情報Ｓ３
を記憶するメモリである。特徴情報記憶部３の出力側に
は、文字切出し手段（例えば、文字切出し部）２が接続
されている。文字切出し部２は、画像記憶部１から出力
される画像データＳ１と特徴情報Ｓ３とを入力として文
字を切出す機能を有し、文字を切出して文字切出し結果
Ｓ２を文字切出し装置の外部に出力するものである。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS First Embodiment FIG. 1 is a block diagram of a character extracting device according to a first embodiment of the present invention. The character extracting apparatus includes an image storage unit (for example, an image storage unit) 1 for storing image data Si of a form in which a character extracting target pattern composed of characters or symbols is written in a horizontal or vertical direction. The image storage unit 1 is configured by a memory capable of storing the image data Si in a format in which the two-dimensional coordinates of the character extraction target pattern on the form can be reproduced. In addition, the character extraction device is provided with a storage unit (for example, a characteristic information storage unit) 3 for characteristic information unique to the character extraction pattern. The feature information storage unit 3
Feature information S3 representing features unique to the character extraction target pattern
Is stored in the memory. On the output side of the feature information storage unit 3, a character extracting unit (for example, a character extracting unit) 2 is connected. The character cutout unit 2 has a function of cutting out characters by inputting the image data S1 and the feature information S3 output from the image storage unit 1, and cutting out characters and outputting a character cutout result S2 outside the character cutout device. Is what you do.

【００１３】次に、図１の動作を説明する。図示しない
スキャナ等から供給された画像データＳｉは、画像記憶
部１に格納され、該画像記憶部１から画像データＳ１が
出力される。又、図示しない入力部又は制御部等により
設定された特徴情報Ｓ３が特徴記憶部３に記憶されてお
り、特徴情報Ｓ３は文字切出し部２に出力される。文字
切出し部２は、特徴情報Ｓ３に対応した文字切出しによ
って画像データＳ１から１文字毎に文字を切出す。切出
された結果は、文字切出し結果Ｓ２として文字切出し装
置の外に出力される。ここで、特徴情報Ｓ３の内容は、
例えば、次の（ａ）〜（ｆ）に示すものがある。（ａ）左右に分離可能な文字がない。（ｂ）上下に分離可能な文字がない。（ｃ）通常の文字よりも小さい微小文字がない。（ｄ）上付き文字がない。（ｅ）下付き文字がない。（ｆ）数字（０〜９）のみが存在する。Next, the operation of FIG. 1 will be described. Image data Si supplied from a scanner or the like (not shown) is stored in the image storage unit 1, and the image storage unit 1 outputs image data S1. Further, feature information S3 set by an input unit or a control unit (not shown) is stored in the feature storage unit 3, and the feature information S3 is output to the character cutout unit 2. The character cutout unit 2 cuts out characters one by one from the image data S1 by character cutout corresponding to the characteristic information S3. The extracted result is output to the outside of the character extraction device as a character extraction result S2. Here, the content of the feature information S3 is
For example, there are the following (a) to (f). (A) There are no separable characters on the left and right. (B) There are no upper and lower separable characters. (C) There is no minute character smaller than a normal character. (D) There is no superscript. (E) There is no subscript. (F) Only numbers (0-9) are present.

【００１４】図５（ａ），（ｂ），（ｃ）は、設定内容
（ａ）における左右に分離可能な文字と不可能な文字を
説明する図である。この図５（ａ）に示すように、左右
に分離可能な文字とは、１文字中に黒ドットの垂直方向
に対する射影分布の値が０となる部分が存在する文字で
ある。左右に分離不可能な文字とは、図５（ｂ）に示す
ように、射影分布の値が０となる部分が存在しない文字
である。又、図５（ｃ）に示すように、フォントによっ
ては、左右に分離可能とならない場合があるが、左右に
分離可能なフォントが存在するものは全て左右に分離可
能な文字とみなす。同様に、図６（ａ），（ｂ）は、上
下に分離可能な文字と不可能な文字を説明する図であ
る。図６（ａ）は上下に分離可能な文字を示し、図６
（ｂ）が上下に分離不可能な文字を示す。図５の場合と
同様に、上下に分離可能なフォントが存在するものは全
て上下に分離可能な文字とみなす。FIGS. 5 (a), 5 (b) and 5 (c) are diagrams for explaining characters which can be separated to the left and right and those which cannot be set in the setting contents (a). As shown in FIG. 5A, a character that can be separated to the left and right is a character in which there is a portion where the value of the projection distribution of the black dot in the vertical direction is 0 in one character. The non-separable character on the left and right is a character having no portion where the value of the projection distribution becomes 0, as shown in FIG. Also, as shown in FIG. 5 (c), depending on the font, the font may not be separable left and right, but any font that has a font separable left and right is regarded as a character separable left and right. Similarly, FIGS. 6A and 6B are diagrams for explaining characters that can be vertically separated and characters that cannot be separated. FIG. 6A shows characters that can be separated up and down.
(B) shows characters that cannot be separated vertically. As in the case of FIG. 5, all fonts that have a vertically separable font are regarded as vertically separable characters.

【００１５】設定内容（ｃ）における微小文字とは、例
えば、カンマ（，）やシングルクォーテーション（’）
等、文字の縦及び横のサイズが他の一般的な文字（例え
ば、ゼロ「０」）よりも小さい（例えば、縦横それぞれ
が一般の文字サイズの１／３以下のサイズになる）文字
である。設定内答（ｄ）における上付き文字とは、例え
ば、シングルクォーテーション（’）や半濁点（゜）
等、文字の下端が行の上側（例えば、行の中心よりも高
い）文字である。設定内答（ｅ）における下付き文字と
は、カンマ（，）や促音や拗音を表す小文字の「っ」、
「ゅ」等、文字の上端が行の下側（例えば、行の中心よ
りも低い）文字である。The small characters in the setting content (c) are, for example, commas (,) and single quotes (').
A character whose vertical and horizontal size is smaller than other general characters (for example, zero “0”) (for example, each vertical and horizontal size is 以下 or less of the general character size). . The superscript in the set answer (d) is, for example, a single quotation mark (') or a semi-voiced mark (゜)
The lower end of the character is a character above the line (for example, higher than the center of the line). The subscript in the set answer (e) is a lowercase letter “tsu” representing a comma (,)
The upper end of the character, such as "@", is the character below the line (for example, lower than the center of the line).

【００１６】設定内容（ａ）〜（ｆ）に基づく文字切出
し部２の動作は、次の（１−１）〜（１−６）に示すよ
うになる。（１−１）設定内容（ａ）に基づく動作図７は、行Ｌ中に左右に分離可能な文字が存在しない場
合の文字切出し部２の動作を説明する図である。設定内
容（ａ）を特徴記憶部３が記憶し、特徴情報Ｓ３として
文字切出し部２に出力した場合、文字切出し部２は文字
切出しの候補の全てを文字切出し位置とするように制御
を行う。従って、図７中の行Ｌ中の「２月１１日」で
は、文字切出し位置が文字切出し候補，，，と
なり、「２」、「月」、「１」、「１」、「日」と正確
に切出される。一方、従来の方式では、基準間隔Ｗは文
字切出し候補ととの間になり、文字切出し候補と
との間隔がこの基準間隔Ｗの半分以下のため、文字切
出し候補は切出しの候補から除外される。従って、
「１日」が１文字（即ち、「旧」）として誤って切出さ
れる。The operation of the character extracting section 2 based on the setting contents (a) to (f) is as shown in the following (1-1) to (1-6). (1-1) Operation Based on Setting Content (a) FIG. 7 is a diagram illustrating an operation of the character cutout unit 2 when there is no separable character on the left and right in the line L. When the setting content (a) is stored in the feature storage unit 3 and output to the character cutout unit 2 as the feature information S3, the character cutout unit 2 controls so that all of the character cutout candidates are set as character cutout positions. Therefore, at “February 11” in the row L in FIG. 7, the character extraction positions are character extraction candidates,..., And “2”, “month”, “1”, “1”, “day” It is cut out exactly. On the other hand, in the conventional method, the reference interval W is between the character extraction candidate and the character extraction candidate, and the interval between the character extraction candidate and the character extraction candidate is less than half of the reference interval W. Therefore, the character extraction candidate is excluded from the extraction candidates. . Therefore,
“One day” is erroneously cut out as one character (ie, “old”).

【００１７】（１−２）設定内容（ｂ）に基づく動作図８は、行Ｌ中に上下に分離可能な文字が存在しない場
合の文字切出し部２の動作を説明する図である。設定内
容（ｂ）を特徴記憶部３が記憶し、特徴情報Ｓ３として
文字切出し部２に出力した場合、文字切出し部２は（１
−１）の場合と同様に、文字切出しの候補の全てを文字
切出し位置とするように制御を行う。従って、図８中の
行Ｌ中の「あれ、それ」では、文字切出し位置が文字切
出し候補，，，となり、「あ」「れ」「、」
「そ」「れ」と正確に切出される。一方、従来の方式で
は、基準間隔Ｗは文字切出し候補ととの間になり、
文字切出し候補ととの間隔がこの基準間隔Ｗの半分
以下のため、文字切出し候補３は切出しの候補から除外
される。従って、「、そ」が１文字として誤って切出さ
れる。(1-2) Operation Based on Setting Content (b) FIG. 8 is a diagram for explaining the operation of the character cutout unit 2 when there is no vertically separable character in the line L. When the setting content (b) is stored in the feature storage unit 3 and output to the character extraction unit 2 as the characteristic information S3, the character extraction unit 2 sets (1)
As in the case of -1), control is performed so that all character extraction candidates are set as character extraction positions. Therefore, in "Are, that" in the line L in FIG. 8, the character extraction positions are character extraction candidates,,, and "A", "RE", ","
"So" and "re" are cut out exactly. On the other hand, in the conventional method, the reference interval W is between the reference interval and the character cutout candidate,
Since the interval with the character extraction candidate is not more than half of the reference interval W, the character extraction candidate 3 is excluded from the extraction candidates. Therefore, ", so" is erroneously cut out as one character.

【００１８】（１−３）設定内容（ｃ）に基づく動作図９は、行Ｌ中に微小文字が存在しない場合の文字切出
し部２の動作を説明する図である。設定内容（ｃ）を特
徴記憶部３が記憶し、特徴情報Ｓ３として文字切出し部
２に出力した場合、文字切出し部２は予め設定された閾
値以下のパタンサイズのものを文字として切出さない。
閾値を例えば行の高さの１／２とした場合、図９中のご
みＡ〜Ｅは文字として切出されない。従って、文字切出
し精度が向上する。（１−４）設定内容（ｄ）に基づく動作図１０は、行Ｌ中に上付き文字が存在しない場合の文字
切出し部２の動作を説明する図である。この図では、数
字、カンマ（，）とごみＡ〜Ｃが混在している。設定内
容（ｄ）を特徴記憶部３が記憶し、特徴情報Ｓ３として
文字切出し部２に出力した場合、文字切出し部２はパタ
ンの下端が行Ｌの上側に予め設定された閾値以上（例え
ば、行Ｌの中心よりも上側）に存在するものを文字とし
て切出さない。従来の方式では、行Ｌ中に存在するごみ
Ａ〜Ｃが文字として切出されるが、本実施形態では、ご
みＡの下端が行Ｌの中心よりも高い位置にあるため、文
字として切出されない。従って、文字切出し精度が向上
する。(1-3) Operation Based on Setting Content (c) FIG. 9 is a diagram for explaining the operation of the character cutout unit 2 when a minute character does not exist in the line L. When the setting content (c) is stored in the feature storage unit 3 and output to the character extraction unit 2 as the characteristic information S3, the character extraction unit 2 does not extract a character having a pattern size equal to or smaller than a preset threshold value as a character.
If the threshold value is, for example, の of the height of the line, the dusts A to E in FIG. 9 are not cut out as characters. Therefore, the character extraction accuracy is improved. (1-4) Operation Based on Setting Content (d) FIG. 10 is a diagram illustrating an operation of the character cutout unit 2 when a superscript character does not exist in the line L. In this figure, numerals, commas (,) and garbage A to C are mixed. When the feature storage unit 3 stores the setting content (d) and outputs it to the character extraction unit 2 as the characteristic information S3, the character extraction unit 2 sets the lower end of the pattern above the line L to be equal to or greater than a predetermined threshold (for example, Those existing above the center of the line L) are not cut out as characters. In the conventional method, the garbage A to C existing in the line L is cut out as a character, but in the present embodiment, the garbage A is not cut out as a character because the lower end of the garbage A is located higher than the center of the line L. . Therefore, the character extraction accuracy is improved.

【００１９】（１−５）設定内容（ｅ）に基づく動作図１１は、行Ｌ中に下付き文字が存在しない場合の文字
切出し部２の動作を説明する図である。この図では、数
字、シングルクォーテーション（’）、ダブルクォーテ
ーション（”）及びごみＡが混在している。設定内容
（ｅ）を特徴記憶部３が記憶し、特徴情報Ｓ３として文
字切出し部２に出力した場合、文字切出し部２はパター
ンの上端が行Ｌの下側に予め設定された閾値以下（例え
ば、行Ｌの中心よりも下側）に存在するものを文字とし
て切出さない。従来の方式では、行Ｌ中に存在する塵芥
Ａが文字として切出されるが、本実施形態では、ごみＡ
の上端が行Ｌの中心よりも低い位置にあるため、文字と
して切出されない。従って、文字切出し精度が向上す
る。(1-5) Operation Based on Setting Content (e) FIG. 11 is a diagram for explaining the operation of the character cutout unit 2 when a subscript character does not exist in the line L. In this figure, numerals, single quotes ('), double quotes ("), and garbage A are mixed. The setting contents (e) are stored in the feature storage unit 3 and output to the character extraction unit 2 as feature information S3. In this case, the character cutout unit 2 does not cut out a character whose upper end of the pattern is below a preset threshold below the line L (for example, below the center of the line L) as a character. Then, the trash A existing in the row L is cut out as a character.
Is located at a position lower than the center of the line L, and is not cut out as a character. Therefore, the character extraction accuracy is improved.

【００２０】（１−６）設定内容（ｆ）に基づく動作図１２は、行Ｌ中に数字のみが存在する場合の文字切出
し部２の動作を説明する図である。設定内容（ｆ）を特
徴記憶部３が記憶し、特徴情報Ｓ３として文字切出し部
２に出力した場合、文字切出し部２は次の（ｉ）〜（i
v）のように文字を切出す。（ｉ）黒ドットの垂直万向に対する射影分布を測定
し、文字切出し候補を求める。（ii）文字切出し候補のうちの隣接する文字切出し候
補の間隔の最小値minを求める。 (iii) 隣接する文字切出し候補の間隔が最小値min の
ｎ倍以上であるとき、文字切出し候補の間隔の中にｎの
値に応じて等分に文字切出し候補を作成する。例えば、
ｎの値がｎ＜１．５であれば分割せず、１．５≦ｎ＜
２．５の場合は２等分し、２．５≦ｎの場合は３等分す
る処理を行う。(1-6) Operation Based on Setting Contents (f) FIG. 12 is a diagram for explaining the operation of the character cutout unit 2 when only a numeral is present in the row L. When the setting content (f) is stored in the feature storage unit 3 and output to the character extraction unit 2 as the characteristic information S3, the character extraction unit 2 performs the following (i) to (i).
Cut out characters as in v). (I) The projection distribution of black dots in all directions is measured, and character cutout candidates are obtained. (Ii) Find the minimum value min of the interval between adjacent character extraction candidates among the character extraction candidates. (iii) When the interval between adjacent character extraction candidates is at least n times the minimum value min, character extraction candidates are created equally in the interval between character extraction candidates according to the value of n. For example,
If the value of n is n <1.5, no division is performed, and 1.5 ≦ n <
In the case of 2.5, a process of dividing into two is performed, and in the case of 2.5 ≦ n, a process of dividing into three is performed.

【００２１】（iv）作成された文字切出し候補を全て
文字切出し位置とする。図１２に示すように、２つの
「４」と「４」とが接触している場合、従来方式では、
文字切出し候補点，，，，が取得され、文字
の切出しが１箇所行われない。一方、本実施形態では、
文字切出し候補点の間隔の最小値min が文字切出し候補
点ととの間であった場合、文字切出し候補点と
との距離は最小値min の約２倍となるが、それ以外の間
隔は１．５倍を超えない。従って、文字切出し候補点
ととの間を２等分して文字切出し候補を作成し、最
終的な文字切出し位置Ａ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆが取得さ
れる。よって、文字が接触した場合でも、正確に文字の
切出しが行われる。(Iv) All the created character extraction candidates are set as character extraction positions. As shown in FIG. 12, when two “4” and “4” are in contact with each other,
Character extraction candidate points are acquired, and no character extraction is performed at one place. On the other hand, in the present embodiment,
When the minimum value min of the interval between the character extraction candidate points is between the character extraction candidate point and the character extraction candidate point, the distance between the character extraction candidate point and the character extraction candidate point is about twice the minimum value min. Not more than 5 times. Therefore, a character extraction candidate is created by dividing the space between the character extraction candidate point and the character extraction candidate point into two equal parts, and final character extraction positions A, B, C, D, E, and F are obtained. Therefore, even if the character touches, the character is accurately cut out.

【００２２】第２の実施形態図１３は、本発明の第２の実施形態を示す文字切出し装
置の構成図であり、第１の実施形態を示す図１中の要素
と共通の要素には共通の符号が付されている。この文字
切出し装置では、対象文字記憶部５と、特徴情報テーブ
ル記憶部４Ａを有する文字特徴判定部４とが設けられて
いる。この図１３において、対象文字記憶部５は、文字
切出し対象の文字コード（例えば、ＪＩＳコード）Ｓ５
を記憶し、該文字コードＳ５を文字特徴判定部４に出力
する。特徴情報テーブル記憶部４Ａは、文字切出し対象
の文字コードに予め特徴情報を対応付けたテーブルであ
る。文字特徴判定部４は、文字コードＳ５に対応した特
徴情報Ｓ４を出力するものである。特徴情報Ｓ４は特徴
情報記憶部３に入力され、特徴情報Ｓ３として記憶され
るようになっている。他は、図１と同様の構成である。 Second Embodiment FIG. 13 is a block diagram of a character extracting apparatus according to a second embodiment of the present invention, and is common to the elements in FIG. Are given. In this character extracting apparatus, a target character storage unit 5 and a character feature determination unit 4 having a feature information table storage unit 4A are provided. In FIG. 13, the target character storage unit 5 stores a character code (for example, a JIS code) S5 to be extracted.
And outputs the character code S5 to the character feature determination unit 4. The characteristic information table storage unit 4A is a table in which character information is previously associated with a character code to be extracted. The character feature determination section 4 outputs feature information S4 corresponding to the character code S5. The feature information S4 is input to the feature information storage unit 3 and stored as the feature information S3. Other configurations are the same as those in FIG.

【００２３】この文字切出し装置の動作では、次の点が
図１の文字切出し装置と異なっている。対象文字記憶部
５は、文字コードＳ５を文字特徴判定部４に出力する。
文字特徴判定部４は、特徴情報テーブル記憶部４Ａから
文字コードＳ５に対応した特徴情報を取出し、特徴情報
Ｓ４として特徴情報記憶部３に出力する。例えば、文字
切出しの対象（文字コード）が数字「０」〜「９」とカ
ンマ「，」であった場合、文字特徴判定部４は数字
「０」に対応付けられている、（１）左右分離不可能文
字である、（２）上下分離不可能文字である、（３）微
小文字でない、（４）上付文字でない、（５）下付文字
でない、（６）数字である、といった特徴情報Ｓ４を特
徴情報記憶部３に出力する。数字「１」〜「９」に対し
ても「０」と同様に対応付けられている（１）左右分離
不可能文字である、（２）上下分離不可能文字である、
（３）微小文字でない、（４）上付文字でない、（５）
下付文字でない、（６）数字である、といった特徴情報
Ｓ４を出力する。更にカンマ「，」に対応付けられてい
る（１）左右分離不可能文字である、（２）上下分離不
可能文字である、（３）微小文字である、（４）上付文
字でない、（５）下付文字である、（６）数字でない、
といった特徴情報Ｓ４を出力する。The operation of this character extracting apparatus differs from the character extracting apparatus shown in FIG. 1 in the following points. The target character storage unit 5 outputs the character code S5 to the character feature determination unit 4.
The character feature determination unit 4 extracts feature information corresponding to the character code S5 from the feature information table storage unit 4A, and outputs it to the feature information storage unit 3 as feature information S4. For example, if the character extraction target (character code) is a number “0” to “9” and a comma “,”, the character feature determination unit 4 is associated with the number “0”. It is a character that cannot be separated, (2) it is a character that cannot be separated vertically, (3) it is not a small character, (4) it is not a superscript, (5) it is not a subscript, and (6) it is a number. The information S4 is output to the feature information storage unit 3. The numbers "1" to "9" are similarly associated with "0" in the same manner as (1) a character that cannot be separated left and right, (2) a character that cannot be separated vertically.
(3) Not a small character, (4) Not a superscript, (5)
Characteristic information S4 such as not being a subscript or (6) being a number is output. Furthermore, (1) a character that cannot be separated horizontally, (2) a character that cannot be separated vertically, (3) a minute character, (4) a non-superscript character, 5) is a subscript, (6) is not a number,
Is output.

【００２４】特徴情報記憶部３は上記の特徴情報Ｓ４を
統合して（Ａ）左右に分離可能な文字がない、（Ｂ）上
下に分離可能な文字がない、（Ｃ）通常の文字よりも小
さい微小文字がある、（Ｄ）上付文字がない、（Ｅ）下
付文字がある、（Ｆ）数字（０〜９）以外の文字が存在
する、といった特徴情報Ｓ３を記憶する。前記（Ａ），
（Ｂ），（Ｄ）は第１の実施形態で述べた特徴情報
（ａ），（ｂ），（ｄ）に一致しており、従って特徴情
報Ｓ３を使用した文字切出しは第１の実施形態における
（１−１），（１−２），（１−４）を合わせた処理、
もしくはどれか一つ以上を組合わせた処理となる。以上
のように、この第２の実施形態では、対象文字記憶部５
が文字コードＳ５を記憶し、文字特徴判定部４は文字コ
ードＳ５より特徴情報テーブル記憶部４Ａにより変換し
た特徴情報Ｓ４を出力し、特徴情報記憶部３は特徴情報
Ｓ４より特徴情報Ｓ３を記憶するようにしたので、第１
の実施形態の利点に加えて、煩雑な情報で規定される特
徴情報よりも一義的に規定される文字コードを用いるこ
とにより簡便な手順で文字切出しを行うことができる。
尚、本発明は上記実施形態に限定されず、種々の変形が
可能である。例えば、第１の実施形態における設定内容
（ｃ）〜（ｅ）の閾値は、行の高さを基準にしたり、又
は文字サイズのドット数で設定してもよい。The characteristic information storage unit 3 integrates the above characteristic information S4, (A) there is no character that can be separated on the left and right, (B) there is no character that can be separated on the top and bottom, and (C) it is better than normal characters Characteristic information S3 is stored, such as small small characters, (D) no superscript, (E) subscript, and (F) characters other than numbers (0 to 9). (A),
(B) and (D) correspond to the feature information (a), (b), and (d) described in the first embodiment. Therefore, character extraction using the feature information S3 is performed in the first embodiment. Processing combining (1-1), (1-2), and (1-4) in
Alternatively, the processing is a combination of any one or more. As described above, in the second embodiment, the target character storage unit 5
Stores the character code S5, the character characteristic determination unit 4 outputs the characteristic information S4 converted by the characteristic information table storage unit 4A from the character code S5, and the characteristic information storage unit 3 stores the characteristic information S3 from the characteristic information S4. So, the first
In addition to the advantages of the embodiment, character extraction can be performed in a simple procedure by using a character code defined more uniquely than characteristic information defined by complicated information.
Note that the present invention is not limited to the above embodiment, and various modifications are possible. For example, the thresholds of the setting contents (c) to (e) in the first embodiment may be set based on the height of the line or by the number of dots of the character size.

【００２５】[0025]

【発明の効果】以上詳細に説明したように、請求項１に
係る発明によれば、文字切出し対象パターンの特徴情報
を特徴情報記憶手段に記憶させ、該特徴情報に基づき、
文字切出し手段で画像記憶手段に記憶された画像データ
から文字を切出すようにしたので、文字切出し精度を向
上できる。請求項２に係る発明によれば、文字切出し対
象パターンを対象文字記憶手段に記憶させ、文字切出し
対象パターンから特徴情報を出力するようにしたので、
請求項１に係る発明の効果に加え、更に簡便な手順で文
字切出しを行うことができる。請求項３に係る発明によ
れば、特徴情報は、文字切出し対象パターン固有の特徴
を表すようにしたので、文字切出し精度を向上できる。
請求項４に係る発明によれば、文字切出し手段は、特徴
情報記憶手段が出力する特徴情報に基づいて文字を切出
すようにしたので、文字切出し精度を向上できる。As described in detail above, according to the first aspect of the present invention, the characteristic information of the character extraction target pattern is stored in the characteristic information storage means, and based on the characteristic information,
Since the character is cut out from the image data stored in the image storage means by the character cutout means, the character cutout accuracy can be improved. According to the invention according to claim 2, the character extraction target pattern is stored in the target character storage means, and the characteristic information is output from the character extraction target pattern.
In addition to the effect of the invention according to claim 1, character extraction can be performed in a simpler procedure. According to the third aspect of the invention, the characteristic information represents a characteristic peculiar to the character extraction target pattern, so that the character extraction accuracy can be improved.
According to the fourth aspect of the present invention, the character cutout unit cuts out the character based on the feature information output from the feature information storage unit, so that the character cutout accuracy can be improved.

【００２６】請求項５に係る発明によれば、対象文字記
憶手段は、文字切出し対象パターンの文字コードを記憶
するようにしたので、文字切出し精度を向上できる。請
求項６に係る発明によれば、文字特徴判定手段は、文字
切出し対象パターンの文字コードに予め関連付けられた
特徴情報を記憶する特徴情報テーブル記憶手段を備えて
いるので、文字切出し精度を向上できる。請求項７に係
る発明によれば、特徴情報テーブル記憶手段は、文字コ
ードと該文字コードの特徴情報を対応付けて記憶するよ
うにしたので、文字切出し精度を向上できる。請求項８
に係る発明によれば、文字特徴判定手段は、対象文字記
憶手段が出力する文字コードに対応した特徴情報を特徴
情報テーブル記憶手段から取得し、特徴情報記憶手段に
出力するようにしたので、文字切出し精度を向上でき
る。請求項９に係る発明によれば、文字切出し対象パタ
ーン固有の特徴は、文字切出し対象パターンに対し左右
分離可能文字の有無、上下分離可能文字の有無、微小文
字の有無、下付き文字の有無、上付き文字の有無、又は
数字以外の文字の有無の少なくともいずれかを表すよう
にしたので、文字切出し精度を向上できる。According to the fifth aspect of the present invention, since the target character storage means stores the character code of the character extraction target pattern, the character extraction accuracy can be improved. According to the invention according to claim 6, the character feature determination means includes the feature information table storage means for storing the feature information previously associated with the character code of the character extraction target pattern, so that the character extraction accuracy can be improved. . According to the invention according to claim 7, the characteristic information table storage means stores the character code and the characteristic information of the character code in association with each other, so that the character extraction accuracy can be improved. Claim 8
According to the invention, the character feature determination unit acquires the feature information corresponding to the character code output from the target character storage unit from the feature information table storage unit and outputs the acquired feature information to the feature information storage unit. Cutting accuracy can be improved. According to the ninth aspect of the present invention, the character extraction target pattern-specific features include a character extraction target pattern having left / right separable characters, a vertical character separable character, a minute character, a subscript character, Since at least one of the presence of a superscript character and the presence or absence of a character other than a number is indicated, the accuracy of character extraction can be improved.

【００２７】請求項１０に係る発明によれば、文字切出
し手段は、特徴情報記憶手段が出力する特徴情報に左右
分離可能文字が存在しない場合、横書きの文字切出しに
おいて縦の射影分布の値が設定された値以下となる区間
を全て文字として切出すか、又は上下分離可能文字が無
い場合に、縦書きの文字切出しにおいて横の射影分布の
値が設定された値以下となる区間を全て文字として切出
すようにしたので、文字切出し精度を向上できる。請求
項１１に係る発明によれば、文字切出し手段は、特徴情
報記憶手段が出力する特徴情報に通常の文字より小さい
微小文字が存在しない場合にパターンのサイズが予め決
めておいた閾値以下のものを文字として切出さないか、
下付き文字が存在しない場合に行の下側に予め決めてお
いた閾値以下にパターンの上端が存在するものを文字と
して切出さないか、又は上付き文字が存在しない場合に
行の上側に予め決めておいた閾値以上にパターンの下端
が存在するものを文字として切出さないようにしたの
で、文字切出し精度を向上できる。請求項１２に係る発
明によれば、文字切出し手段は、特徴情報記憶手段が出
力する特徴情報に数字以外の文字が存在せず、文字と文
字との間隔が基準となる間隔以上の場合に、文字間隔の
長さに応じて前記文字間隔の間を切出すようにしたの
で、文字切出し精度を向上できる。According to the tenth aspect, the character extracting means sets the value of the vertical projection distribution in the horizontal character extracting when the character information which is output from the characteristic information storage means does not include a character which can be separated into right and left. All the sections where the value of the horizontal projection distribution is less than or equal to the set value in vertical character extraction are extracted as characters. Since the clipping is performed, the precision of character clipping can be improved. According to the eleventh aspect of the present invention, the character cutout means has a pattern size equal to or smaller than a predetermined threshold when there is no small character smaller than a normal character in the characteristic information output from the characteristic information storage means. Is not extracted as a character,
If there is no superscript character, do not cut out the character with the upper end of the pattern below the predetermined threshold below the line as a character, or if there is no superscript character, Since the pattern having the lower end of the pattern exceeding the predetermined threshold is not cut out as a character, the character cutout accuracy can be improved. According to the twelfth aspect of the invention, the character extracting unit is configured to output the characteristic information stored by the characteristic information storage unit when there is no character other than a numeral and the interval between the characters is equal to or longer than a reference interval. Since the character interval is cut out in accordance with the length of the character interval, the character extraction accuracy can be improved.

[Brief description of the drawings]

【図１】本発明の第１の実施形態の文字切出し装置の構
成図である。FIG. 1 is a configuration diagram of a character cutout device according to a first embodiment of the present invention.

【図２】従来の文字切出し処理の例を示す図である。FIG. 2 is a diagram illustrating an example of a conventional character cutout process.

【図３】図２におけるフローチャートである。FIG. 3 is a flowchart in FIG. 2;

【図４】問題となる文字切出しの例を示す図である。FIG. 4 is a diagram illustrating an example of extracting a character in question.

【図５】左右に分離可能な文字と不可能な文字を説明す
る図である。FIG. 5 is a diagram illustrating characters that can be separated into right and left and characters that cannot be separated.

【図６】上下に分離可能な文字と不可能な文字を説明す
る図である。FIG. 6 is a diagram illustrating characters that can be separated vertically and characters that cannot be separated.

【図７】左右に分離可能な文字が存在しない場合の動作
を説明する図である。FIG. 7 is a diagram illustrating an operation when there is no character that can be separated on the left and right.

【図８】上下に分離可能な文字が存在しない場合の動作
を説明する図である。FIG. 8 is a diagram illustrating an operation when there is no character that can be vertically separated.

【図９】微小文字が存在しない場合の動作を説明する図
である。FIG. 9 is a diagram for explaining an operation when a small character does not exist;

【図１０】上付き文字が存在しない場合の動作を説明す
る図である。FIG. 10 is a diagram illustrating an operation when a superscript does not exist.

【図１１】下付き文字が存在しない場合の動作を説明す
る図である。FIG. 11 is a diagram illustrating an operation when a subscript does not exist.

【図１２】数字列のみが存在する場合の動作を説明する
図である。FIG. 12 is a diagram illustrating an operation when only a numeric string exists.

【図１３】本発明の第２の実施形態の文字切出し装置の
構成図である。FIG. 13 is a configuration diagram of a character cutout device according to a second embodiment of the present invention.

[Explanation of symbols]

１画像記憶部２文字切出し部３特徴情報記憶部４文字特徴判定部４Ａ特徴情報テーブル記憶部５対象文字記憶部 REFERENCE SIGNS LIST 1 image storage unit 2 character extraction unit 3 feature information storage unit 4 character feature determination unit 4A feature information table storage unit 5 target character storage unit

Claims

[Claims]

An image storage unit configured to store image data of a form in which a character extraction target pattern including a symbol or a character is written; a characteristic information storage unit configured to store characteristic information of the character extraction target pattern; Character extracting means for extracting a character from the stored image data based on the character data.

2. A target character storage unit that stores a character extraction target pattern, and a character characteristic determination unit that inputs the character extraction target pattern and outputs the characteristic information, wherein the character characteristic determination unit outputs 2. The character extracting apparatus according to claim 1, wherein characteristic information is stored in said characteristic information storage means.

3. The character extraction device according to claim 1, wherein the characteristic information represents a characteristic unique to the character extraction target pattern.

4. The character extracting device according to claim 1, wherein the character extracting unit extracts a character based on the characteristic information output from the characteristic information storage unit.

5. The character extraction device according to claim 2, wherein the target character storage unit stores a character code of a character extraction target pattern.

6. The character information determination unit according to claim 2, further comprising: a characteristic information table storage unit configured to store the characteristic information previously associated with a character code of the character extraction target pattern. 4 or 5
The character extraction device described.

7. The character extracting apparatus according to claim 6, wherein the characteristic information table storage unit stores the character code and the characteristic information of the character code in association with each other.

8. The method according to claim 8, wherein the character feature determination unit acquires the feature information corresponding to the character code output from the target character storage unit from the feature information table storage unit and outputs the acquired feature information to the feature information storage unit. 8. The character extracting device according to claim 6, wherein the character extracting device is a character extracting device.

9. The unique features of the character extraction target pattern include: the presence or absence of left / right separable characters, the presence / absence of vertical characters, the presence / absence of minute characters, the presence / absence of subscripts, and the superscript 9. The character extracting device according to claim 1, wherein the character extracting device indicates at least one of the presence or absence of a character and the presence or absence of a character other than a number.

10. The character extracting unit, when there is no left-right separable character in the characteristic information output by the characteristic information storage unit, the value of a vertical projection distribution in horizontal character extraction is set to a value equal to or less than a set value. Character section, or when there is no upper / lower separable character, all sections where the value of the horizontal projection distribution is equal to or less than the set value in vertical character cutout are cut out as characters. Claims 1, 2, 3, 4, 5, 6, 7,
10. The character extracting device according to 8 or 9.

11. The method according to claim 1, wherein the character information extracting unit outputs, when the feature information output from the feature information storing unit does not include a minute character smaller than a normal character, a pattern whose size is equal to or less than a predetermined threshold. If the subscript is not present, or if there is no subscript, the character with the upper end of the pattern below the predetermined threshold below the line is not extracted as a character, or the superscript does not exist 9. The method according to claim 1, wherein a character having a lower end of the pattern above a predetermined threshold value above the line is not cut out as a character.
Or the character extracting device according to 9.

12. The character extracting unit, when there is no character other than a number in the characteristic information output by the characteristic information storage unit and the interval between characters is equal to or longer than a reference interval, 4. The method according to claim 1, wherein the character space is cut out in accordance with the length of the space.
A character extracting device according to 4, 5, 6, 7, 8 or 9.