JPH10334188A

JPH10334188A - Type face identification device, type face identification method and information storage medium

Info

Publication number: JPH10334188A
Application number: JP9154545A
Authority: JP
Inventors: Tei Abe; 悌阿部
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1997-05-28
Filing date: 1997-05-28
Publication date: 1998-12-18

Abstract

PROBLEM TO BE SOLVED: To easily and accurately identify the type face of the character even for the character images of various type faces by comparing the image of the stroke tip part of an extracted character with stroke tip shape models for the respective type faces prepared beforehand and identifying the type face of the character. SOLUTION: A document to which the character which is a type face identification object is written is read by an image input part 2 and it is fetched into a memory 2 as a document image first. Then, only the character image is segmented from the document image by a character segmentation processing part 3 and a character rectangle segmentation processing for obtaining the coordinate of the circumscribing rectangle area is performed. The segmented respective character images are numbered in an order. Then, a number (i) for searching the respective character images is initialized to '1'. Thereafter, the type face of an (i)-th character is judged. After it is judged, the number (i) is increased for '1', then the presence of the (i)-th character is confirmed, and when it is present, a similar processing is performed for the next character.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文字の書体(フォ
ント)の識別を行なう書体識別装置および書体識別方法
および情報記憶媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a typeface identification device for identifying a typeface (font) of a character, a typeface identification method, and an information storage medium.

【０００２】[0002]

【従来の技術】従来、例えば特開平６−２０８６４９号
には、文字の縦方向および横方向の文字線幅を推定し、
これらの線幅の比によって、文字の書体が明朝体である
かゴシック体であるかを識別する書体識別技術が示され
ている。この書体識別技術は、より具体的には、文字画
像の水平方向および垂直方向のランレングスヒストグラ
ムのモード(最頻値)によって、横方向および縦方向の文
字線幅を推定し、これらの線幅の比によって、文字の書
体が明朝体であるかゴシック体であるかを識別するよう
になっている。2. Description of the Related Art Conventionally, for example, Japanese Unexamined Patent Publication No. 6-208649 discloses a technique of estimating the character line width in the vertical and horizontal directions of a character.
A typeface identification technique for identifying whether the typeface of a character is Mincho type or Gothic type based on the ratio of these line widths is disclosed. More specifically, this typeface identification technique estimates horizontal and vertical character line widths based on the mode (mode) of horizontal and vertical run-length histograms of a character image, and estimates these line widths. The ratio is used to identify whether the typeface of a character is Mincho or Gothic.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上述し
た従来の書体識別技術では、「中」や「田」等のように
文字を構成するストロークの多くが水平または垂直な直
線で、かつ画像にノイズがない場合にしか、書体を良好
に識別することができないという問題があった。すなわ
ち、ほとんどの文字のフォントは、斜めのストロークが
存在し、文字に斜めのストロークが存在する場合、上述
した従来の書体識別技術では、ランレングスヒストグラ
ムのピーク(最頻値)が誤ったところに出てしまい、正し
い線幅を検出できないので、大半の文字の書体を正確に
識別することができず、実用化には適しないという問題
があった。However, in the above-described conventional typeface identification technology, most of the strokes constituting characters such as "middle" and "field" are horizontal or vertical straight lines, and noise is included in the image. There is a problem that a font cannot be satisfactorily identified only when there is no font. In other words, most character fonts have diagonal strokes, and in the case where diagonal strokes exist in characters, the above-described conventional typeface identification technology may cause the run-length histogram peak (mode) to be incorrect. As a result, the correct line width cannot be detected, so that the typeface of most characters cannot be accurately identified, which is not suitable for practical use.

【０００４】本発明は、斜めのストロークを含む文字画
像や、ノイズ，つぶれ，かすれのある文字画像や、種々
の書体(フォント)の文字画像に対しても、その文字の書
体を容易にかつ正確に識別することの可能な書体識別装
置および書体識別方法および情報記憶媒体を提供するこ
とを目的としている。The present invention makes it possible to easily and accurately apply character typefaces to character images containing oblique strokes, character images having noise, crushing, blurring, and character images of various fonts (fonts). It is an object of the present invention to provide a typeface identification device, a typeface identification method, and an information storage medium that can be identified by a user.

【０００５】[0005]

【課題を解決するための手段】上記目的を達成するため
に、請求項１乃至請求項５記載の発明では、文字画像に
おいて文字のストロークの先端部分の画像を抽出し、抽
出した文字のストローク先端部分の画像を予め用意され
た書体別のストローク先端形状モデルと比較して文字の
書体を識別することを特徴としている。これにより、文
字画像の文字の書体(フォント)を精度良く識別すること
が可能となる。In order to achieve the above object, according to the present invention, an image of a leading end portion of a character stroke is extracted from a character image, and a stroke leading end portion of the extracted character is extracted. Character images are identified by comparing the image of the part with a stroke tip shape model prepared for each typeface prepared in advance. This makes it possible to accurately identify the typeface (font) of the character in the character image.

【０００６】[0006]

【発明の実施の形態】以下、本発明の実施形態を図面に
基づいて説明する。図１は本発明に係る書体識別装置の
構成例を示す図である。図１を参照すると、この書体識
別装置は、文書を例えば２値画像(黒，白の画素画像)と
して読み込む画像入力部１と、画像入力部１で読み込ま
れた文書画像等を記憶するメモリ２と、文書画像から文
字画像を抽出する文字切り出し処理部３と、文字切り出
し処理部３により切り出された文字画像に対し、その文
字の書体(フォント)の識別を行なう書体識別部４と、全
体の制御を行なう制御部５と、書体識別部４による文字
の書体の識別結果を出力する結果出力部６とを有してい
る。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a configuration example of a typeface identification device according to the present invention. Referring to FIG. 1, this typeface identification apparatus includes an image input unit 1 that reads a document as, for example, a binary image (black and white pixel images), and a memory 2 that stores a document image and the like read by the image input unit 1. A character cutout processing unit 3 for extracting a character image from a document image, a font type identification unit 4 for identifying a character type (font) of the character image cut out by the character cutout processing unit 3, The control unit 5 includes a control unit 5 that performs control and a result output unit 6 that outputs a result of character type identification performed by the type identification unit 4.

【０００７】ここで、文字切り出し処理部３は、文書画
像から例えば図２に示すように１つの文字画像を切り出
すようになっている。すなわち、図２の例では、１つの
文字画像(図２の例では、文字「Ｋ」)は、文字の外接矩
形領域ＡＲとして切り出される。Here, the character cutout processing section 3 cuts out one character image from the document image, for example, as shown in FIG. That is, in the example of FIG. 2, one character image (the character “K” in the example of FIG. 2) is cut out as a circumscribed rectangular area AR of the character.

【０００８】また、書体識別部４は、文字画像において
文字のストロークの先端部分の画像を抽出するストロー
ク先端部分抽出手段８と、ストローク先端部分抽出手段
８で抽出された文字のストローク先端部分の画像を予め
用意された書体別のストローク先端形状モデルと比較し
て文字の書体を識別する識別手段９とを有している。Further, the typeface identification section 4 includes a stroke leading end extracting means 8 for extracting an image of a leading end of a character stroke in a character image, and an image of a stroke leading end of the character extracted by the stroke leading end extracting means 8. Is compared with a stroke tip shape model prepared for each typeface prepared in advance to identify the typeface of the character.

【０００９】また、図３は図１の書体識別部４のストロ
ーク先端部分抽出手段８の第１の構成例を示す図であ
る。図３の例では、ストローク先端部分抽出手段８は、
文字画像の細線化を行なう細線化部１１と、細線化画像
から端点を抽出する端点抽出部１２と、端点抽出部１２
で抽出された端点位置を中心として半径ｒの円の範囲内
の文字画像をストローク先端部分として抽出するストロ
ーク先端部分抽出部１３とを有している。FIG. 3 is a diagram showing a first configuration example of the stroke leading end extraction means 8 of the typeface identification unit 4 in FIG. In the example of FIG.
A thinning unit 11 for thinning a character image, an endpoint extracting unit 12 for extracting an endpoint from the thinned image, and an endpoint extracting unit 12
And a stroke leading portion extracting unit 13 for extracting, as a leading stroke portion, a character image within a circle having a radius r with the end point position extracted as the center.

【００１０】ここで、端点抽出部１２およびストローク
先端部分抽出部１３は、第１の抽出例として、文字を構
成する全てのストロークの端点および先端部分を抽出す
ることができる。Here, the end point extracting unit 12 and the stroke leading end part extracting unit 13 can extract the end points and leading end parts of all the strokes constituting the character as a first example of extraction.

【００１１】あるいは、端点抽出部１２およびストロー
ク先端部分抽出部１３は、第２の抽出例として、文字を
構成する各ストロークのうち、特定の(例えば所定の傾
きをもつ１つの)ストロークの端点および先端部分のみ
を抽出することもできる。Alternatively, as a second extraction example, the end point extracting unit 12 and the stroke leading end portion extracting unit 13 determine the end point and the end point of a specific (for example, one having a predetermined inclination) stroke among the strokes constituting the character. It is also possible to extract only the tip part.

【００１２】次に、図１の構成の書体識別装置の処理動
作を図４，図５のフローチャートを用いて説明する。な
お、図４は全体の処理動作を説明するためのフローチャ
ート、図５は書体識別部４のストローク先端部分抽出手
段８が図４のようになっている場合の書体(フォント)を
判定する処理(図４の処理動作において書体(フォント)
を判定する処理)の一例を示すフローチャートである。Next, the processing operation of the typeface identification apparatus having the configuration shown in FIG. 1 will be described with reference to the flowcharts shown in FIGS. FIG. 4 is a flowchart for explaining the entire processing operation, and FIG. 5 is a flowchart of processing for determining a font (font) when the stroke leading end extraction means 8 of the font identification unit 4 is as shown in FIG. Fonts (fonts) in the processing operation of FIG.
9 is a flowchart illustrating an example of a process of determining ().

【００１３】図４を参照すると、先ず、ステップＳ１０
１では、画像入力部１により、書体識別対象である文字
が記載された文書(例えば原稿)を読込み、これを文書画
像としてメモリ２内に取り込む。次いで、ステップＳ１
０２では、文字切り出し部３によって文書画像から文字
画像のみを切り出し、その外接矩形領域の座標を求める
文字矩形切り出し処理を行なう。このようにして、文字
画像に含まれる各文字画像に対して切り出しを行ない、
切り出した各文字画像に対して順番に番号付けをする。
次いで、ステップＳ１０３では、各文字画像をサーチす
るための番号(文字カウンタ)ｉを“１”に初期設定す
る。Referring to FIG. 4, first, at step S10
In step 1, the image input unit 1 reads a document (for example, a manuscript) in which characters to be typeface-identified are described, and takes the document as a document image into the memory 2. Then, step S1
In step 02, the character extracting unit 3 extracts only the character image from the document image, and performs character rectangle extracting processing for obtaining the coordinates of the circumscribed rectangular area. In this way, clipping is performed for each character image included in the character image,
Numbering is sequentially performed on the cut-out character images.
Next, in step S103, a number (character counter) i for searching each character image is initialized to "1".

【００１４】しかる後、ステップＳ１０４では、ｉ番目
の文字の書体(フォント)の判定を行なう。Thereafter, in step S104, the typeface (font) of the i-th character is determined.

【００１５】ステップＳ１０４における書体(フォント)
の判定処理は、例えば図５のようにしてなされる。な
お、図５の処理例は、前述した第１の抽出例に従い、文
字を構成する全てのストロークの先端部分を抽出するも
のである。図５を参照すると、先ず、ステップＳ２０１
では、文字画像を細線化し、次いで、ステップＳ２０２
では、ステップＳ２０１で細線化した文字画像(骨格画
像)から端点を抽出し、全ての端点をメモリ２に記憶す
る。この際、抽出した各端点に順番に番号付けをして記
憶する。次いで、ステップＳ２０３では、端点をサーチ
するための番号ｊを“１”に初期設定する。Typeface (font) in step S104
Is determined, for example, as shown in FIG. Note that the processing example of FIG. 5 is for extracting the leading end portions of all the strokes constituting the character according to the above-described first extraction example. Referring to FIG. 5, first, in step S201,
In step S202, the character image is thinned.
Then, endpoints are extracted from the character image (skeleton image) thinned in step S201, and all the endpoints are stored in the memory 2. At this time, the extracted end points are sequentially numbered and stored. Next, in step S203, the number j for searching for an end point is initialized to "1".

【００１６】次いで、ステップＳ２０４では、ｊ番目の
端点を中心として半径ｒの円の範囲にある黒画素をスト
ロークの先端部分として求める。Next, in step S204, a black pixel within a circle having a radius r with the j-th end point as the center is obtained as the leading end of the stroke.

【００１７】しかる後、ステップＳ２０５では、番号ｊ
を“１”だけインクリメントし、ステップＳ２０６で
は、ｊ番目の端点が存在するか否かを判定し、存在すれ
ば、ステップＳ２０４へ戻り、次の端点について、上述
したと同様の処理(文字の中の１つのストロークの先端
部分を抽出する処理)を行なう。Thereafter, at step S205, the number j
Is incremented by "1". In step S206, it is determined whether or not the j-th end point exists. If the j-th end point exists, the process returns to step S204, and the same processing as described above (in the character (The process of extracting the leading end of one stroke).

【００１８】このようにして、ステップＳ２０２でメモ
リ２に記憶された全ての端点に対応する各ストローク先
端部分を順次に求め、ステップＳ２０６でｊ番目の端点
が存在しなくなったとき(全ての端点の処理を完了した
とき)、ステップＳ２０７では、この１つの文字画像内
において抽出した各ストローク先端部分を予め用意した
全フォントの先端形状と比較して、ｉ番目の文字の書体
(フォント)を識別する。例えば、ｉ番目の文字の書体
が、例えば、明朝体であるか、丸ゴシック体であるか、
角ゴシック体であるかを判定する。In this manner, the leading end portions of the strokes corresponding to all the end points stored in the memory 2 are sequentially obtained in step S202, and when the j-th end point is no longer present (step S206). When the processing is completed), in step S207, the stroke end portion extracted in this one character image is compared with the tip shape of all the fonts prepared in advance, and the font of the i-th character is
(Font). For example, whether the font of the i-th character is Mincho or Maru Gothic,
Determine whether it is a square Gothic body.

【００１９】図４のステップＳ１０４において、ｉ番目
の文字の書体(フォント)を、例えば図５のステップＳ２
０１乃至Ｓ２０７のようにして判定した後、図４のステ
ップＳ１０５では、番号ｉを“１”だけインクリメント
し、次いで、ステップＳ１０６では、ｉ番目の文字が存
在するか否かを判定し、存在すれば、ステップＳ１０４
へ戻り、次の文字について、上述したと同様の処理(こ
の文字の書体(フォント)を判定する処理)を行なう。At step S104 in FIG. 4, the typeface (font) of the i-th character is
After the determination in steps S01 to S207, in step S105 of FIG. 4, the number i is incremented by "1". Then, in step S106, it is determined whether or not the i-th character exists. If step S104
Then, the same processing as described above (processing for determining the typeface (font) of this character) is performed for the next character.

【００２０】このようにして、ステップＳ１０１で入力
された文書画像に含まれる各文字画像について、書体
(フォント)を判定する処理を順次に行ない、ステップＳ
１０６でｉ番目の文字が存在しなくなったとき(全ての
文字画像について書体を判定する処理を完了したと
き)、全ての処理を終了する。As described above, for each character image included in the document image input in step S101,
(Font) determination processing is sequentially performed, and step S
When the i-th character no longer exists in 106 (when the processing of determining the typeface has been completed for all the character images), all the processing ends.

【００２１】なお、図５の例では、第１の抽出例に従っ
て、全てのストロークの先端部分を用いて書体(フォン
ト)を判定したが、文字を構成する各ストロークのうち
予め定めた特定の方向のストロークの先端部分だけを用
いて、書体(フォント)を判定することも可能である。In the example of FIG. 5, the typeface (font) is determined using the leading end of all strokes in accordance with the first extraction example. It is also possible to determine the font (font) using only the leading end of the stroke.

【００２２】図６(ａ)，(ｂ)，(ｃ)は書体(フォント)が
それぞれ明朝体，丸ゴシック体，角ゴシック体である場
合のある文字のストロークの先端部分を示す図である。
なお、図６(ａ)，(ｂ)，(ｃ)において、円は細線化画像
の端点を中心として抽出される範囲である。図６(ａ)，
(ｂ)，(ｃ)からわかるように、書体(フォント)がそれぞ
れ明朝体，丸ゴシック体，角ゴシック体である場合で、
ストロークの先端部分の形状はそれぞれ互いに相違して
おり、従って、ストロークの先端部分の形状の相違に基
づいて、書体(フォント)を識別することができる。FIGS. 6 (a), 6 (b) and 6 (c) show the leading end of a stroke of a character whose typeface (font) may be Mincho, round Gothic and square Gothic, respectively. .
In FIGS. 6A, 6B, and 6C, a circle is a range that is extracted centering on an end point of the thinned image. FIG. 6 (a),
As can be seen from (b) and (c), the fonts (fonts) are Mincho, Maru Gothic, and Square Gothic, respectively.
The shapes of the leading ends of the strokes are different from each other, and therefore, the typeface (font) can be identified based on the difference in the shape of the leading end of the stroke.

【００２３】また、図７は図１の書体識別部４のストロ
ーク先端部分抽出手段８の第２の構成例を示す図であ
る。図７の例では、ストローク先端部分抽出手段８は、
文字画像の輪郭追跡を行なう輪郭追跡部２１と、輪郭の
曲率を算出する曲率算出部２２と、曲率算出部２２で算
出された曲率に基づいてストローク先端部分を抽出する
ストローク先端部分抽出部２３とを有している。FIG. 7 is a diagram showing a second configuration example of the stroke leading end extraction means 8 of the typeface identification unit 4 in FIG. In the example of FIG.
A contour tracing unit 21 for tracing the contour of the character image, a curvature calculating unit 22 for calculating the curvature of the contour, and a stroke tip extracting unit 23 for extracting a stroke tip based on the curvature calculated by the curvature calculating unit 22; have.

【００２４】ここで、ストローク先端部分抽出部２３
は、第１の抽出例として、文字を構成する全てのストロ
ークの先端部分を抽出することができる。Here, the stroke leading end portion extracting section 23
Can extract, as a first extraction example, the leading end portions of all strokes constituting a character.

【００２５】あるいは、ストローク先端部分抽出部２３
は、第２の抽出例として、文字を構成する各ストローク
のうち、特定の(例えば１つの)ストロークの先端部分の
みを抽出することもできる。Alternatively, the stroke leading end portion extracting section 23
As a second example of extraction, it is also possible to extract only the leading end of a specific (for example, one) stroke from each stroke constituting a character.

【００２６】なお、曲率算出部２２における曲率の算出
は、例えば次のようになされる。すなわち、文字画像の
輪郭線の曲率Ｒは、文字画像の輪郭線をｙ＝ｆ(ｘ)とす
るとき、次式で与えられる。The calculation of the curvature in the curvature calculation section 22 is performed, for example, as follows. That is, the curvature R of the outline of the character image is given by the following equation when the outline of the character image is y = f (x).

【００２７】[0027]

【数１】Ｒ＝［(ｄ²ｙ／ｄｘ²)／{１＋(ｄｙ／ｄｘ)²}^3/2］R = [(d ² y / dx ² ) / {1+ (dy / dx) ² } ^3/2 ]

【００２８】この式を離散的な形の折線に近似して曲率
を算出するため、平均化の概念を導入することができ
る。すなわち、図８に示すように、曲率を求めようとす
る輪郭線の点(ｘ₀，ｙ₀)について、その左右にそれぞれ
ｋ個の点をとり、これらから、次式(数２)のような値ｄ
_-，ｄ₊，ｄ_±などを計算する。In order to calculate the curvature by approximating this equation with a discrete broken line, the concept of averaging can be introduced. That is, as shown in FIG. 8, for the point (x ₀ , y ₀ ) of the contour for which the curvature is to be obtained, k points are respectively taken on the left and right sides, and from these, the following equation (Equation 2) is obtained. Value d
_-, d _+, to calculate such as d _±.

【００２９】[0029]

【数２】 (Equation 2)

【００３０】あるいは、さらに簡単に、次式によって求
めることもできる。Alternatively, it can be more simply obtained by the following equation.

【００３１】[0031]

【数３】ｄ_-＝(ｙ₀−ｙ_-k)／(ｘ₀−ｘ_-k) ｄ₊＝(ｙ_k−ｙ₀)／(ｘ_k−ｘ₀) ｄ_±＝(ｙ_k/2−ｙ_-k/2)／(ｘ_k/2−ｘ_-k/2)[Number 3] _{_{_{d - = (y 0 -y -k}}} ) / (x 0 -x -k) d + = (y k -y 0) / (x k -x 0) d ± = (y k / 2 −y _{-k / 2} ) / (x _{k / 2} −x _{-k / 2} )

【００３２】そして、このようにして求めた値ｄ_-，
ｄ₊，ｄ_±から、次式のように、(ｄ²ｙ／ｄｘ²)，(ｄｙ
／ｄｘ)を求める。[0032] Then, the value d obtained in this way _-,
From d ₊ and d _± , (d ² y / dx ² ), (dy
/ Dx).

【００３３】[0033]

【数４】(ｄ²ｙ／ｄｘ²)＝ｄ₊−ｄ_- (ｄｙ／ｄｘ)＝ｄ_± (D ² y / dx ² ) = d ₊ −d ₋ (dy / dx) = d _±

【００３４】そして、数４で求めた(ｄ²ｙ／ｄｘ²)，
(ｄｙ／ｄｘ)を数１に代入して、点(ｘ₀，ｙ₀)について
の曲率Ｒを求めることができる。Then, (d ² y / dx ² ) obtained by Equation 4 is used.
By substituting (dy / dx) into Equation 1, the curvature R for the point (x ₀ , y ₀ ) can be obtained.

【００３５】このようにして、文字画像の輪郭を追跡
し、輪郭の各点について曲率が算出されるとき、ストロ
ーク先端部分抽出部２３は、例えば、曲率の変化の大き
い輪郭点の近傍の画素(より具体的に、例えば、この輪
郭点の前後の輪郭画素)をストローク先端部分として抽
出することができる。As described above, when the contour of the character image is tracked and the curvature is calculated for each point of the contour, the stroke leading end extraction unit 23 determines, for example, pixels (e.g., pixels) near the contour point having a large change in curvature. More specifically, for example, a contour pixel before and after this contour point can be extracted as a stroke leading end portion.

【００３６】図９は、図１の構成の書体識別装置におい
て、書体識別部４のストローク先端部分抽出手段８が図
７のようになっている場合の書体(フォント)を判定する
処理(図４の処理動作において書体(フォント)を判定す
る処理)の一例を示すフローチャートである。FIG. 9 shows a process of judging a font (font) when the stroke leading end extracting means 8 of the font identification unit 4 is as shown in FIG. 9 is a flowchart illustrating an example of a process of determining a typeface (font) in the processing operation of FIG.

【００３７】すなわち、書体識別部４のストローク先端
部分抽出手段８が図７のようになっている場合には、図
４のステップＳ１０４における書体(フォント)の判定処
理は、例えば図９のようにしてなされる。なお、図９の
処理例は、前述した第１の抽出例に従い、文字を構成す
る全てのストロークの先端部分を抽出するものである。
図９を参照すると、先ず、ステップＳ３０１では、ｉ番
目の文字画像の輪郭を追跡し、ステップＳ３０２では、
文字画像の各輪郭の曲率を計算し、ステップＳ３０３で
は、曲率の変化の大きい輪郭点を抽出し、抽出した全て
の輪郭点(曲率の変化の大きい輪郭点)をメモリ２に記憶
する。この際、抽出した各輪郭点に順番に番号付けをし
て記憶する。次いで、ステップＳ３０４では、各輪郭点
をサーチするための番号ｊを“１”に初期設定する。That is, when the stroke leading end extraction means 8 of the font identification unit 4 is as shown in FIG. 7, the font (font) determination processing in step S104 of FIG. Done. Note that the processing example in FIG. 9 is for extracting the leading end portions of all strokes constituting a character according to the above-described first extraction example.
Referring to FIG. 9, first, in step S301, the contour of the i-th character image is tracked, and in step S302,
The curvature of each contour of the character image is calculated, and in step S303, contour points having a large change in curvature are extracted, and all extracted contour points (contour points having a large change in curvature) are stored in the memory 2. At this time, the extracted contour points are sequentially numbered and stored. Next, in step S304, the number j for searching for each contour point is initialized to "1".

【００３８】しかる後、ステップＳ３０５では、ｊ番目
の輪郭点の前後の輪郭画素を抽出し、これをストローク
の先端部分として求める。Thereafter, in step S305, contour pixels before and after the j-th contour point are extracted, and these are determined as the leading end of the stroke.

【００３９】しかる後、ステップＳ３０６では、番号ｊ
を“１”だけインクリメントし、ステップＳ３０７で
は、ｊ番目の輪郭点が存在するか否かを判定し、存在す
れば、ステップＳ３０５へ戻り、次の輪郭点について、
上述したと同様の処理(文字の中の１つのストロークの
先端部分を抽出する処理)を行なう。Thereafter, in step S306, the number j
Is incremented by “1”, and in step S307, it is determined whether or not the j-th contour point exists. If the j-th contour point exists, the process returns to step S305, and for the next contour point,
The same processing as described above (processing for extracting the leading end of one stroke in a character) is performed.

【００４０】このようにして、ステップＳ３０３でメモ
リ２に記憶された全ての輪郭点に対応する各ストローク
先端部分を順次に求め、ステップＳ３０７でｊ番目の輪
郭点が存在しなくなったとき(全ての輪郭点の処理を完
了したとき)、ステップＳ３０８では、この１つの文字
画像内において抽出した各ストローク先端部分を予め用
意した全フォントの先端形状と比較して、ｉ番目の文字
の書体(フォント)を識別する。例えば、ｉ番目の文字の
書体が、例えば、明朝体であるか、丸ゴシック体である
か、角ゴシック体であるかを判定する。In this manner, the leading end of each stroke corresponding to all the contour points stored in the memory 2 is sequentially obtained in step S303, and when the j-th contour point is no longer present in step S307 (all When the processing of the contour point is completed), in step S308, the tip of each stroke extracted in this one character image is compared with the tip shapes of all the fonts prepared in advance, and the typeface (font) of the i-th character is obtained. Identify. For example, it is determined whether the font of the i-th character is, for example, Mincho, round Gothic, or square Gothic.

【００４１】図７の構成例では、図４のステップＳ１０
４において、ｉ番目の文字の書体(フォント)を、例えば
図９のステップＳ３０１乃至Ｓ３０８のようにして判定
した後、図４のステップＳ１０５では、番号ｉを“１”
だけインクリメントし、次いで、ステップＳ１０６で
は、ｉ番目の文字が存在するか否かを判定し、存在すれ
ば、ステップＳ１０４へ戻り、次の文字について、上述
したと同様の処理(この文字の書体(フォント)を判定す
る処理)を行なう。In the configuration example of FIG. 7, step S10 of FIG.
In step S4, the typeface (font) of the i-th character is determined, for example, as in steps S301 to S308 in FIG. 9, and in step S105 in FIG.
Then, in step S106, it is determined whether or not the i-th character exists. If there is, the process returns to step S104, and the same processing as described above for the next character (the typeface of this character ( Font)).

【００４２】このようにして、ステップＳ１０１で入力
された文書画像に含まれる各文字画像について、書体
(フォント)を判定する処理を順次に行ない、ステップＳ
１０６でｉ番目の文字が存在しなくなったとき(全ての
文字画像について書体を判定する処理を完了したと
き)、全ての処理を終了する。As described above, for each character image included in the document image input in step S101,
(Font) determination processing is sequentially performed, and step S
When the i-th character no longer exists in 106 (when the processing of determining the typeface has been completed for all the character images), all the processing ends.

【００４３】なお、図９の例では、第１の抽出例に従っ
て、全てのストロークの先端部分を用いて書体(フォン
ト)を判定したが、文字を構成する各ストロークのうち
予め定めた特定の方向のストロークの先端部分だけを用
いて、書体(フォント)を判定することも可能である。In the example shown in FIG. 9, the typeface (font) is determined by using the leading end of all strokes according to the first extraction example. It is also possible to determine the font (font) using only the leading end of the stroke.

【００４４】図１０(ａ)，(ｂ)，(ｃ)は書体(フォント)
がそれぞれ明朝体，丸ゴシック体，角ゴシック体である
場合のある文字のストロークの先端部分を示す図であ
る。なお、図１０(ａ)，(ｂ)，(ｃ)からわかるように、
書体(フォント)がそれぞれ明朝体，丸ゴシック体，角ゴ
シック体である場合で、ストロークの先端部分の形状
(曲率の変化の大きい輪郭点の前後の輪郭画素により構
成される形状)はそれぞれ互いに相違しており、従っ
て、ストロークの先端部分の形状の相違に基づいて、書
体(フォント)を識別することができる。FIGS. 10A, 10B and 10C show fonts (fonts).
Is a diagram showing a leading end of a stroke of a character which may be a Mincho, a round Gothic, and a square Gothic, respectively. As can be seen from FIGS. 10 (a), (b) and (c),
When the typeface (font) is Mincho, Round Gothic, and Square Gothic, respectively, the shape of the tip of the stroke
(Shape formed by contour pixels before and after the contour point with a large change in curvature) are different from each other, and therefore, it is possible to identify a font based on the difference in shape at the tip of the stroke. it can.

【００４５】このように、本発明では、文字画像におい
て文字のストロークの先端部分の画像を抽出し、抽出し
た文字のストローク先端部分の画像を予め用意された書
体別のストローク先端形状モデルと比較して文字の書体
を識別するので、文字の書体(フォント)を小さなプログ
ラムサイズで容易にかつ正確に精度良く識別することが
できる。As described above, in the present invention, the image of the leading end of the stroke of the character in the character image is extracted, and the extracted image of the leading end of the stroke of the character is compared with the stroke leading end shape model prepared for each typeface. Since the font of the character is identified by using a small program size, the font of the character can be easily and accurately identified with a small program size.

【００４６】すなわち、文字のストロークの先端の形状
を特徴量として用いるので、非常に精度が高く、効率良
く、書体(フォント)の識別ができる。例えば、書体(フ
ォント)の特徴はストロークの先端の形状に顕著に表さ
れるので、例えばゴシック体と丸ゴシック体のように従
来の方法では識別が不可能なフォントでも識別できる。That is, since the shape of the tip of the stroke of the character is used as the characteristic amount, the typeface (font) can be identified with extremely high accuracy and efficiency. For example, the characteristics of a typeface (font) are markedly represented by the shape of the tip of a stroke, so that a font that cannot be identified by a conventional method, such as a Gothic font and a round Gothic font, can be identified.

【００４７】また、本発明では、各種のフォントの各種
の文字のストロークの先端形状モデルを予め登録するこ
とによって、各種の文字について、各種のフォントの識
別が可能になる。すなわち、識別の対象としたい全ての
書体(フォント)のストロークの先端部分の形状モデルを
予め用意し、それとの形状の一致度(相違度)を計測する
ことによって対象文字の書体(フォント)を識別できる。Further, in the present invention, various fonts can be identified for various characters by pre-registering stroke tip shape models of various characters of various fonts. In other words, a shape model of the tip part of the stroke of all the typefaces (fonts) to be identified is prepared in advance, and the typeface (font) of the target character is identified by measuring the degree of matching (difference) of the shape with that. it can.

【００４８】また、図３の例では、ストロークの先端部
分を抽出するのにまず文字画像を細線化して、その端点
を検出し、検出した端点を中心として予め決定した半径
ｒの範囲の画像をストロークの先端部分として抽出する
ことによって、ストロークの先端部分を安定して抽出で
きる。In the example of FIG. 3, to extract the leading end of the stroke, the character image is first thinned, its end point is detected, and an image within a predetermined radius r centered on the detected end point is set. By extracting the leading end of the stroke, the leading end of the stroke can be stably extracted.

【００４９】また、図７の例では、ストロークの先端部
分を抽出するのにまず文字画像の輪郭を追跡し、輪郭の
曲率が大きく変わる部分をストロークの先端部分とみな
し、その部分の画像をストロークの先端部分として抽出
することによって、ストロークの先端部分を安定して抽
出できる。In the example of FIG. 7, to extract the leading end of the stroke, the contour of the character image is first traced, a portion where the curvature of the contour changes greatly is regarded as the leading end of the stroke, and the image of that portion is regarded as the leading end of the stroke. By extracting the leading end of the stroke, the leading end of the stroke can be stably extracted.

【００５０】なお、上述の例では、書体として、明朝
体，丸ゴシック体，角ゴシック体のいずれかを識別する
場合が示されているが、本発明は、書体として、明朝
体，丸ゴシック体，角ゴシック体の他のフォントを識別
することももちろん可能であり、また、書体として、明
朝体，丸ゴシック体，角ゴシック体に加えてさらに他の
フォントを識別することも可能である。In the above-described example, the case where any one of the Mincho font, the round Gothic font, and the square Gothic font is identified as the font is shown. It is of course possible to identify other fonts in Gothic and square Gothic fonts, and it is also possible to identify other fonts in addition to Mincho, Round Gothic and Square Gothic fonts. is there.

【００５１】このように、本発明では、文字画像の文字
の書体(フォント)を精度良く識別することが可能とな
り、このようにして得られた文字の書体(フォント)の識
別結果に基づいて、例えば文書画像を再現したりするの
に有用である。As described above, according to the present invention, it is possible to accurately identify the character typeface (font) of a character in a character image, and based on the identification result of the character typeface (font) thus obtained, For example, it is useful for reproducing a document image.

【００５２】図１１は図１の書体識別装置のハードウェ
ア構成例を示す図である。図１１を参照すると、この書
体識別装置は、例えばパーソナルコンピュータ等で実現
され、全体を制御するＣＰＵ３１と、ＣＰＵ３１の制御
プログラム等が記憶されているＲＯＭ３２と、ＣＰＵ３
１のワークエリア等として使用されるＲＡＭ３３と、文
書を文書画像として読込むスキャナ３４と、スキャナ３
４で読込まれた文書画像が例えばページ単位で記憶され
る文書画像ファイル３５と、文書画像に含まれている各
文字画像に対し書体識別を行なった結果の情報を出力す
る結果出力装置(例えば、ディスプレイやプリンタ)３６
とを有している。FIG. 11 is a diagram showing an example of a hardware configuration of the typeface identification apparatus of FIG. Referring to FIG. 11, this typeface identification device is realized by, for example, a personal computer or the like, and controls a CPU 31 that controls the whole, a ROM 32 that stores a control program of the CPU 31, and the like.
A scanner 33 for reading a document as a document image; a scanner 33 for reading a document as a document image;
4, a document image file 35 in which the document image is stored in, for example, a page unit, and a result output device (e.g., a device for outputting information on the result of performing typeface identification on each character image included in the document image) (Display and printer) 36
And

【００５３】ここで、スキャナ３４，文書画像ファイル
３５，結果出力装置３６は、図１の画像入力部１，メモ
リ２，結果出力部６にそれぞれ対応している。また、Ｃ
ＰＵ３１は、図１の制御部５，文字切り出し処理部３，
書体識別部４の機能を有している。Here, the scanner 34, the document image file 35, and the result output device 36 correspond to the image input unit 1, the memory 2, and the result output unit 6 in FIG. Also, C
The PU 31 includes a control unit 5, a character cutout processing unit 3,
It has the function of the typeface identification unit 4.

【００５４】なお、ＣＰＵ３１におけるこのような制御
部５，文字切り出し処理部３，書体識別部４等としての
機能は、例えばソフトウェアパッケージ(具体的には、
ＣＤ−ＲＯＭ等の情報記憶媒体)の形で提供することが
でき、このため、図１１の例では、情報記憶媒体４０が
セットさせるとき、これを駆動する媒体駆動装置４１が
設けられている。The functions of the control unit 5, character cutout processing unit 3, typeface identification unit 4, etc. in the CPU 31 are, for example, software packages (specifically,
The information storage medium can be provided in the form of an information storage medium such as a CD-ROM). For this reason, in the example of FIG. 11, when the information storage medium 40 is set, a medium drive device 41 for driving the information storage medium 40 is provided.

【００５５】換言すれば、本発明の書体識別装置は、イ
メージスキャナ，ディスプレイ等を備えた汎用の計算機
システムにＣＤ−ＲＯＭ等の情報記憶媒体に記録された
プログラムコードを読み込ませて、この汎用計算機シス
テムのマイクロプロセッサに書体識別処理を実行させる
装置構成においても実施することが可能である。この場
合、本発明の書体識別処理プログラムなどを格納する情
報記憶媒体としては、ＣＤ−ＲＯＭに限られるものでは
なく、ＲＯＭ，ＲＡＭ，ＦＤ等が用いられても良い。In other words, the typeface identification apparatus of the present invention causes a general-purpose computer system having an image scanner, a display, and the like to read a program code recorded on an information storage medium such as a CD-ROM, and The present invention can also be implemented in an apparatus configuration in which a typeface identification process is executed by a microprocessor of the system. In this case, the information storage medium for storing the typeface identification processing program of the present invention is not limited to a CD-ROM, but may be a ROM, a RAM, an FD, or the like.

【００５６】[0056]

【発明の効果】以上に説明したように、請求項１，請求
項４，請求項５記載の発明によれば、文字画像において
文字のストロークの先端部分の画像を抽出し、抽出した
文字のストローク先端部分の画像を予め用意された書体
別のストローク先端形状モデルと比較して文字の書体を
識別するので、文字画像の文字の書体(フォント)を容易
にかつ正確に精度良く識別することができる。As described above, according to the first, fourth and fifth aspects of the present invention, an image of a leading end portion of a character stroke is extracted from a character image, and the stroke of the extracted character is extracted. Since the image of the tip portion is compared with the stroke tip shape model prepared for each typeface prepared in advance, the character typeface is identified, so that the character typeface (font) of the character in the character image can be easily and accurately identified with high accuracy. .

【００５７】また、請求項２記載の発明では、ストロー
クの先端部分を抽出するのにまず文字画像を細線化し
て、その端点を検出し、検出した端点を中心として予め
決定した半径ｒの範囲の画像をストロークの先端部分と
して抽出することによって、ストロークの先端部分を安
定して抽出できる。According to the second aspect of the present invention, in order to extract the leading end of the stroke, the character image is first thinned, its end point is detected, and the end of the stroke is detected within a predetermined radius r around the detected end point. By extracting the image as the leading end of the stroke, the leading end of the stroke can be stably extracted.

【００５８】また、請求項３記載の発明では、ストロー
クの先端部分を抽出するのにまず文字画像の輪郭を追跡
し、輪郭の曲率が大きく変わる部分をストロークの先端
部分とみなし、その部分の画像をストロークの先端部分
として抽出することによって、ストロークの先端部分を
安定して抽出できる。According to the third aspect of the present invention, in extracting the leading end of a stroke, the outline of a character image is first traced, and the part where the curvature of the outline changes greatly is regarded as the leading end of the stroke, and the image of that part is determined. Is extracted as the leading end of the stroke, the leading end of the stroke can be stably extracted.

[Brief description of the drawings]

【図１】本発明に係る書体識別装置の構成例を示す図で
ある。FIG. 1 is a diagram showing a configuration example of a typeface identification device according to the present invention.

【図２】１つの文字画像の一例を示す図である。FIG. 2 is a diagram illustrating an example of one character image.

【図３】図１の書体識別部のストローク先端部分抽出手
段の一構成例を示す図である。FIG. 3 is a diagram showing an example of a configuration of a stroke leading end portion extracting means of the typeface identification unit in FIG. 1;

【図４】図１の書体識別装置の処理動作を説明するため
のフローチャートである。FIG. 4 is a flowchart for explaining a processing operation of the typeface identification device of FIG. 1;

【図５】図１の構成の書体識別装置において、書体識別
部４のストローク先端部分抽出手段が図３のようになっ
ている場合の書体(フォント)を判定する処理(図４の処
理動作において書体(フォント)を判定する処理)の一例
を示すフローチャートである。FIG. 5 is a flowchart illustrating a process of determining a typeface (font) when the stroke leading end extraction unit of the typeface identification unit 4 is configured as shown in FIG. 3 in the typeface identification device having the configuration of FIG. 9 is a flowchart illustrating an example of a process of determining a typeface (font).

【図６】書体(フォント)がそれぞれ明朝体，丸ゴシック
体，角ゴシック体である場合のある文字のストロークの
先端部分を示す図である。FIG. 6 is a diagram illustrating a leading end of a stroke of a character in which a font (font) may be a Mincho font, a round Gothic font, or a square Gothic font.

【図７】図１の書体識別部のストローク先端部分抽出手
段の他の構成例を示す図である。FIG. 7 is a diagram showing another configuration example of the stroke leading end portion extracting means of the typeface identification unit in FIG. 1;

【図８】曲率の算出例を説明するための図である。FIG. 8 is a diagram illustrating an example of calculating a curvature.

【図９】図１の構成の書体識別装置において、書体識別
部４のストローク先端部分抽出手段が図７のようになっ
ている場合の書体(フォント)を判定する処理(図４の処
理動作において書体(フォント)を判定する処理)の一例
を示すフローチャートである。FIG. 9 is a flowchart illustrating a process of determining a typeface (font) when the stroke leading end extraction unit of the typeface identification unit 4 is configured as shown in FIG. 9 is a flowchart illustrating an example of a process of determining a typeface (font).

【図１０】書体(フォント)がそれぞれ明朝体，丸ゴシッ
ク体，角ゴシック体である場合のある文字のストローク
の先端部分を示す図である。FIG. 10 is a diagram showing a leading end portion of a stroke of a character in which a font (font) may be a Mincho font, a round Gothic font, or a square Gothic font.

【図１１】図１の書体識別装置のハードウェア構成例を
示す図である。11 is a diagram illustrating an example of a hardware configuration of the typeface identification device in FIG. 1;

[Explanation of symbols]

１画像入力部２メモリ３文字切り出し処理部４書体識別部５制御部６結果出力部８ストローク先端部分抽出手段９識別手段１１細線化部１２端点抽出部１３，２３ストローク先端部分抽出部２１輪郭追跡部２２曲率算出部３１ＣＰＵ３２ＲＯＭ３３ＲＡＭ３４スキャナ３５文書画像ファイル３６結果出力装置４０情報記憶媒体４１媒体駆動装置 DESCRIPTION OF SYMBOLS 1 Image input part 2 Memory 3 Character cutout processing part 4 Typeface identification part 5 Control part 6 Result output part 8 Stroke tip part extraction means 9 Identification means 11 Thin line thinning part 12 Endpoint extraction part 13, 23 Stroke tip part extraction part 21 Outline tracking Unit 22 curvature calculating unit 31 CPU 32 ROM 33 RAM 34 scanner 35 document image file 36 result output device 40 information storage medium 41 medium drive device

Claims

[Claims]

1. A stroke tip extracting means for extracting an image of a tip of a stroke of a character in a character image, and an image of a stroke tip of the character extracted by the stroke tip extracting means is prepared for each typeface. A type identification device for identifying a typeface of a character as compared with a stroke tip shape model.

2. The typeface identification device according to claim 1, wherein
A typeface identification device, wherein the stroke leading end extracting means thins a character image, extracts an end point from the thinned image, and extracts an image of a leading end of the stroke from the character image based on the extracted end point position. .

3. The typeface identification device according to claim 1, wherein
The typeface identification apparatus, wherein the stroke tip extracting means traces the outline of the character image and extracts an image of the tip of the stroke from the character image based on the curvature.

4. An image of a leading end of a stroke of a character in a character image is extracted, and the extracted image of the leading end of a stroke of the character is compared with a prepared stroke leading end shape model for each typeface to identify a character typeface. A typeface identification method.

5. An image of a leading end of a stroke of a character in a character image is extracted, and the image of the leading end of a stroke of the extracted character is compared with a stroke leading end shape model prepared for each typeface in advance to identify a character typeface. An information storage medium characterized by storing a program for performing the operation.