JPS58222384A

JPS58222384A - Discriminating system of font

Info

Publication number: JPS58222384A
Application number: JP57105175A
Authority: JP
Inventors: Mamoru Maeda; 護前田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1982-06-18
Filing date: 1982-06-18
Publication date: 1983-12-24

Abstract

PURPOSE:To discriminate the font of a character pattern, by extracting the profile position viewed from the lateral direction of a character pattern picked up one by one character and extracting a characteristic parameter relating to the state of change of the profile position. CONSTITUTION:A reading section 100 scans optically an orginal (not shown) and decomposes the contrast information on the original into pictures for reading them and transmits them as analog picture signals. The picture signal is converted into binary black-and-white picture data at a pre-processing section 102 and noise elimination is processed also. A character pickup section 104 to which the binary picture data is inputted picks up an individual character pattern from the inputted picture data. The character pattern is transmitted to a character recognizing section 112 and a font recognizing section 105. The font recognizing section 105 consists of a left profile extracting section 106, an italic number extracting section 108 and a discriminating section 110 so as to recognize the font (writing style) of the picked up character pattern.

Description

【発明の詳細な説明】技術分野本発明は、光学的文字読取装置等において、文字パター
ンのフォント（書体）を識別する技術に関する。DETAILED DESCRIPTION OF THE INVENTION Technical Field The present invention relates to a technology for identifying a font (font) of a character pattern in an optical character reading device or the like.

従来技術光学的読取装置において、原稿より読み取った文字パタ
ーンの文字認識を実行するに際し、その文字パターンの
フォントが予め分っていると、認識率の向上および認識
処理の効率化の囲で有利である。Prior art When performing character recognition of a character pattern read from a document using an optical reading device, it is advantageous to know the font of the character pattern in advance in terms of improving the recognition rate and increasing the efficiency of the recognition process. be.

もし、原稿上の全文字のフォントが一定ならば。If the font of all characters on the manuscript is the same.

その原稿を光学的文字読取装置に入力する前に、人手に
よってフォントを指定するか、あるいは原稿の特定位置
にフォント指定情報を記入しておき、それを装置に読取
らせることにより、上記の利益を得ることは出来る。し
かし、異なったフォントを混用した原稿の場合は、上記
のような方法は実際上採用できない。このような原稿の
場合は、個々の文字、あるいは単語などの文字群を単位
としてフォントを識別できないと、前記のような利益は
得られない。しかるに、このような目的を達成するため
の優れたフォント識別技術はこれまで知られていなかっ
た。Before entering the document into an optical character reader, you can achieve the above benefits by manually specifying the font, or by writing font specification information in a specific position on the document and having the device read it. It is possible to obtain. However, in the case of a manuscript that uses a mixture of different fonts, the above method cannot be practically adopted. In the case of such manuscripts, the above benefits cannot be obtained unless the font can be identified on the basis of individual characters or groups of characters such as words. However, an excellent font identification technique for achieving such a purpose has not been known so far.

目的したがって本発明の目的は、個々の文字単位、単語など
の文字群の単位でフォントを識別できるフォント識別方
式を提供することにある。ただし本発明のフォント識別
方式は、フォント識別の単位は本質的に任意であり、原
稿全体のフォントを識別することも可能であることは言
うまでもない１、概要本発明は、文字パターンの横方向から見た輪郭部の特徴
に注目して、フォントを判定するものである。OBJECTS Accordingly, an object of the present invention is to provide a font identification method that can identify fonts in units of individual characters or groups of characters such as words. However, in the font identification method of the present invention, the unit of font identification is essentially arbitrary, and it goes without saying that it is also possible to identify the font of the entire document1. The font is determined by paying attention to the characteristics of the seen outline.

しかして本発明によるフォント識別方式は、１文字ずつ
切り出された文字パターンの横方向から見た輪郭位置を
抽出し、この輪郭位置の変化状態に関連した特徴パラメ
ータを抽出し、その特徴パラメータから文字パターンの
フォントを判定することを特徴とするものである。Therefore, the font identification method according to the present invention extracts the contour position of a character pattern cut out one character at a time when viewed from the horizontal direction, extracts the characteristic parameters related to the state of change of this contour position, and extracts the character This method is characterized by determining the font of a pattern.

単語や行などの単位でフォントを識別する場合は、複数
の文字パターンに対、１〜て抽出された特徴パラメータ
を統計処理し、その結果を用いてフォントを判定する。When identifying fonts in units of words, lines, etc., the feature parameters extracted from 1 to 1 are statistically processed for a plurality of character patterns, and the font is determined using the results.

実施例第１図は、本発明を適用した光学的文字読取装置のブロ
ック図である。Embodiment FIG. 1 is a block diagram of an optical character reading device to which the present invention is applied.

読取部１００は公知のＣＣＤスキャナ等から構成される
もので、原稿（図示せず）を光学的に定歪して原稿上の
濃淡情報を画素に分解して１νｔみ取り、アナログ画像
信号として送出する。このアナログ画像信号は前処Ｊ里
部１０２によって白黒２値の画像データに変換され、ま
たノイズ除去の処理も受ける。The reading unit 100 is composed of a known CCD scanner or the like, and optically distorts a document (not shown) at a constant rate, decomposes the grayscale information on the document into pixels, captures 1 νt, and sends it out as an analog image signal. do. This analog image signal is converted into black and white binary image data by the pre-processing section 102, and is also subjected to noise removal processing.

この２値回像データが入力される文字切出し部１０２は
、公知の射影法等の技術により、入力された画像データ
から個々の文字パターンを切り出す。ここで切り出され
る文字パターンの例を第２図に示す。この例はｒＯＪの
パターンで、左側のパターンはイタリック体、右側のパ
ターンはローマン体テある。この様な切出しパターンは
、文字認識部１１２とフォント識別部１０５へ送られる
。The character cutting unit 102 to which this binary rotational image data is input cuts out individual character patterns from the input image data using a technique such as a well-known projection method. An example of the character pattern cut out here is shown in FIG. This example is an rOJ pattern, with the pattern on the left in italics and the pattern on the right in roman. Such a cutout pattern is sent to the character recognition section 112 and the font identification section 105.

フォント識別部１０５は、切り出された文字パター□・
””’１１１１１・・ンのフォノ）（ｉｔ休）を識別する部分である。本実施
例では、フォントとしてイタリック体とローマン体の２
種を考えており、フォント識別部１０５は左輪郭抽出部
１０６、イタリック数抽出部１０８、および判定部１１
０かも構成されている。The font identification unit 105 identifies the cut out character pattern □.
``''11111... This is the part that identifies the phono (IT). In this example, there are two fonts: italic and roman.
The font identification unit 105 includes a left outline extraction unit 106, an italic number extraction unit 108, and a determination unit 11.
0 is also configured.

左輪郭抽出部１０６では、入力文字パターンの各水平ラ
インを左より右に向って走査し、最初に検出した黒画素
の位置をその水平ラインにおける左輪郭位置として抽出
し、それをイタリック数抽出部１０８へ送る。The left contour extraction section 106 scans each horizontal line of the input character pattern from left to right, extracts the position of the first detected black pixel as the left contour position on that horizontal line, and uses it as the italic number extraction section. Send to 108.

イタリック数抽出部１０８は、左輪郭抽出部１０６より
送られてくる左輪郭位置を内部メモリ（図示せず）に格
納し、左輪郭位置の変化状態に関連する特徴パラメータ
であるイタリック数を抽出する。すなわち、上の水平ラ
インから下の水平ラインに向って左輪郭位置を順に追跡
し、各ラインの手前のラインとの左輪郭位置の差（ａ）
、および手前のライン同様の操作を下から上に向っても
実行し１、上記のａに相当する左輪郭位置の差（ｄ）と
上記のｂに相当する前変化ラインからの距離（ｅ）をそ
れぞれ求める。The italic number extraction unit 108 stores the left contour position sent from the left contour extraction unit 106 in an internal memory (not shown), and extracts the italic number, which is a characteristic parameter related to the state of change of the left contour position. . In other words, the left contour position is sequentially traced from the upper horizontal line to the lower horizontal line, and the difference (a) in the left contour position between each line and the previous line is calculated.
, and the line in front. Execute the same operation from bottom to top. 1. Difference between the left contour position corresponding to a above (d) and distance from the previous change line corresponding to b above (e) Find each.

ついで、これらの値からイタリック点を検出し、そのイ
タリック数（Ｉ）を求める。Next, italic points are detected from these values, and the number of italic points (I) is determined.

ここでイタリック点とは、輪郭位置の差ａ（またはｄ）
が＋１または−１、および／または、前変化ラインから
の距離ｂ（またはｄ）が３以上の変化ラインのことであ
る。そして、そのイタリック点のイタリック数１は、そ
のイタリック点の輪郭位置の差ａ（またはｄ）が＋１な
らば＋１、−１ならば−１である。しかして、■＝−１
のイタリック点はイタリック体らしい左輪郭位置の変化
（右上りのストローク）を示す点であり、それを否定す
る点が■二＋１のイタリック点である。つまり、イタリ
ック体の文字パターンであれば、■＝−１のイタリック
点が多く見つかるわけで、谷イタリック点のイタリック
数の合計値は文字パターンがイタリック体であるか否か
を判定する１つの指標として利用できる。したがってイ
タリック数抽出部１０８は、上記の様な手順で求めた各
イタリック点のイタリック数の合計値Ｚｌを算出し、そ
れを判定部１１０へ送る。Here, the italic point is the difference a (or d) in the contour position.
is +1 or -1 and/or a change line whose distance b (or d) from the previous change line is 3 or more. The italic number 1 of the italic point is +1 if the difference a (or d) in the contour position of the italic point is +1, and -1 if it is -1. However, ■=-1
The italic point indicates a change in the left contour position (upward stroke to the right) that is typical of italics, and the italic point ■2+1 negates this change. In other words, if it is an italic character pattern, many italic points with ■ = -1 will be found, and the total value of the italic numbers of the valley italic points is one index for determining whether or not the character pattern is italic. It can be used as Therefore, the italic number extracting unit 108 calculates the total number Zl of the italic numbers of each italic point obtained by the procedure described above, and sends it to the determining unit 110.

なお、前記のように輪郭位置の差（ａ、ｄ）が少ない変
化ラインのみイタリック点として抽出するようにしたの
は、文字パターンの曲り部分に本来存在する斜めストロ
ーブの影響を排除するためである。As mentioned above, the reason why only transition lines with small differences in outline positions (a, d) are extracted as italic points is to eliminate the influence of diagonal strobes that originally exist in curved parts of character patterns. .

ここで、第２図に示した文字ｒＯＪのイタリック体パタ
ーンを例として考えてみよう。このイタリック体パター
ンを拡大したのが第３図であり、ライン番号と水平方向
の位置を図示のように表わす。Here, let us consider as an example the italic pattern of the letters rOJ shown in FIG. FIG. 3 is an enlarged version of this italic pattern, and the line numbers and horizontal positions are shown as shown.

この文字パターンに対する前記処理の結果は第４図に一
覧表として示す如くである。The results of the processing for this character pattern are shown in a list in FIG.

すなわち、上から下へ調べた場合、屋９、扁１４、扁η
の各ラインがイタリック点（下方向イタリック点）とし
て抽出され、それぞれのイタリック数はＣ′欄に示す如
くである。また下から上へ調べた場合、Ａ５、扁８、Ａ
１３の各ラインがイタリック点（上方向イタリック点）
として抽出され、そのイタリック数はｆ欄に示す通りで
ある。かくして、この例のパターンに対しては１．イタ
リック数の合計値Ｚｌは−４となる。That is, when examined from top to bottom, ya 9, ba 14, ba η
Each line is extracted as an italic point (downward italic point), and the number of italics for each is as shown in column C'. Also, when examined from bottom to top, A5, B8, A
Each of the 13 lines is an italic point (upward italic point)
The number of italics is as shown in column f. Thus, for this example pattern, 1. The total value Zl of italic numbers is -4.

さて判定部１１０は、イタリック数抽出部１０８より与
えられるイタリック数合計値ＺＩｋ適当な判定閾値と比
較することにより、文字パターンがイタリック体かロー
マン体か判定する。この判定により、例えば第３図の文
字パターｙ（ＺＩ＝−４）のフォントはイタリック体と
識別できる。Now, the determining unit 110 determines whether the character pattern is an italic type or a Roman type by comparing the italic number total value ZIk provided by the italic number extracting unit 108 with an appropriate determination threshold value. Through this determination, for example, the font of the character pattern y (ZI=-4) in FIG. 3 can be identified as italic.

文字認識部１１２では、文字切出し部１０４で切り出さ
れる文字パターンについて文字認識を行なうが、この際
、前記のようなフォント識別の結果に応じて認識対象の
フォントを絞ることができる。したがって、フォントが
不明の場合よりも、認識率を向上し、また認識処理時間
を短縮することができる。The character recognition unit 112 performs character recognition on the character pattern cut out by the character cutout unit 104, and at this time, it is possible to narrow down the fonts to be recognized according to the results of the font identification described above. Therefore, the recognition rate can be improved and the recognition processing time can be shortened compared to when the font is unknown.

さて、ここまでの説明では、個々の文字単位にフォント
識別を行ったが、単語などの文字列、または行の単位で
フォントを識別することも可能である。この場合、判定
部１１０において、識別単位に含まれる１つ以上の文字
パターンについて前述のような手順で求められたイタリ
ック数の合計値を統計処理し、その結果と判定閾値との
比較によりフォントを判定する。Now, in the explanation so far, font identification has been performed in units of individual characters, but it is also possible to identify fonts in units of character strings such as words or lines. In this case, the determination unit 110 statistically processes the total number of italics obtained in the above-described procedure for one or more character patterns included in the identification unit, and determines the font by comparing the result with the determination threshold. judge.

例えば、イタリック体とローマン体とについて、文字列
（単語）　”　Ｊａｎｕａｒｙ　”のフォントを識別す
る場合を考えよう。この場合、各フォントの文字列中の
個々の文字パタ−ンのイタリック数合計値、およびその
総和値は次の第１表の如くになる。For example, consider a case where the font of the character string (word) "January" is to be identified between italic and roman fonts. In this case, the total number of italics of each character pattern in the character string of each font and the total value are as shown in Table 1 below.

第１表上表の様に、イタリック体の総和値は負（−９）である
のに対し、ローマン体のそれは正（＋２）であり、フォ
ントは明確に判別できる。As shown in the upper table of Table 1, the sum value of italic type is negative (-9), while that of Roman type is positive (+2), and the font can be clearly distinguished.

なお、このような統計処理を行なう場合、文字ｒｙＪの
ように、誤判定の原因になり易い形状の文字パターンを
統計から除外することもできる。Note that when performing such statistical processing, it is also possible to exclude from the statistics character patterns with shapes that are likely to cause erroneous determinations, such as the character ryJ.

これは例えば、＋１以上のイタリック数合計値は除いて
統計をとることで可能である。上側のＪａｎｕａｒｙの
文字列では、文字「ｙ」のパターンに対するＺｌ値を除
外するわけであり、その場合の総和値は第１表の総和欄
の（）内に示す値になる。This can be done, for example, by taking statistics excluding the total italic number of +1 or more. In the upper January character string, the Zl value for the pattern of the character "y" is excluded, and the total value in that case is the value shown in parentheses in the total column of Table 1.

一実施例について以上に詳述したが、第１図中の各ブロ
ックは以上の説明に基づ（・て公知の技術知識によって
容易に実現できる性格のものであるので、それらの具体
的構成は特に示さない。言うまでもなく、各ブロックは
ハードウェアで実現することもできるし、ハードウェア
とソフトウェアの組合せによって実現してもよい。また
、各ブロックはあくまで機能の上で分割して示したもの
であり、物理的には一部または全体を１つのブロックに
まとめた形で実現してもよい。Although one embodiment has been described in detail above, each block in FIG. Not specifically shown.Needless to say, each block can be realized by hardware or by a combination of hardware and software.In addition, each block is shown divided based on function. However, it may be physically realized partially or entirely in one block.

効果以上に説明した如く、本発明によれば個々の文字パター
ン、あるいは文字列や行など、任意の単位でフォントを
識別することができる。したがって、異種のフォントの
混在した原稿についても、フォントの識別が可能である
。。故に、本発明を例えば光学的文字読取装置に適用す
れば、認識率や認識速度の向上を達成できる。Effects As explained above, according to the present invention, fonts can be identified in arbitrary units such as individual character patterns, character strings, or lines. Therefore, it is possible to identify fonts even for documents containing different types of fonts. . Therefore, if the present invention is applied to, for example, an optical character reading device, it is possible to improve the recognition rate and recognition speed.

[Brief explanation of the drawing]

第１図は本発明の一実施例である光学的文字読取装置の
ブロック図、第２図はイタリック体とローマン体の文字
パターンの例を示す図、第３図はイタリック体文字パタ
ーンの一例を拡大して示す図、第４図は第３図の文字パ
ターンに対するフォント識別処理の内容を示す図である
。１０４・・・文字切出し部、１０５・・・フォント識別
部、１０６・・・左輪郭抽出部、１０８・・・イタリッ
ク数抽出部、１１０・・判定部。４、オヵ、、オ、＋、７オ　竺）゛。 □ １−１FIG. 1 is a block diagram of an optical character reading device that is an embodiment of the present invention, FIG. 2 is a diagram showing examples of italic and roman character patterns, and FIG. 3 is an example of italic character patterns. FIG. 4, which is an enlarged view, is a diagram showing the contents of the font identification process for the character pattern of FIG. 3. 104... Character cutting section, 105... Font identification section, 106... Left outline extraction section, 108... Italic number extraction section, 110... Judgment section. 4, ok,, oh, +, 7 oh)゛. □ 1-1

Claims

[Claims]

(1) Extract the outline position seen from the horizontal direction of the character pattern cut out one character at a time, extract the feature parameters related to the state of change of the outline position, and determine the font of the character pattern from the extracted feature parameters. A font identification method characterized by:

(2) The font identification method according to claim 1, wherein the font is determined based on the statistical processing of feature parameters extracted from a plurality of character patterns without performing statistical processing.