JPS60164879A

JPS60164879A - Character separating device

Info

Publication number: JPS60164879A
Application number: JP59020300A
Authority: JP
Inventors: Yoshitake Tsuji; 辻　善丈
Original assignee: NEC Corp; Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1984-02-07
Filing date: 1984-02-07
Publication date: 1985-08-27

Abstract

PURPOSE:To separate a character row image into a single character by calculating the dispersion of the distance between separated characters, and calculating a dispersed evaluation value consisting of plural dispersion maximum likelihood linear sum to identify a character pitch. CONSTITUTION:A scan device 1 optically scans a character row image described on paper, converts it into an electric signal and quantizes it into binary to write it in a character row image memory 2. A character string extraction device 3 sequentially extracts a character string from said image, and stores start point position and size of each of said strings in a character string register 4. A controller 7 calculates the distance between the character strings from the start and end positions to be transferred, and increments the number of frequencies which are the contents of address position of said corresponding distance of a frequency distribution table 6. In such a way, the device forms the frequency distribution of said distance of the table 6 constituted of a memory.

Description

【発明の詳細な説明】（技術分野）本発明は文字分離装置に関し、特に紙面上に記載された
文字行イメージを個個の文字に分離して文字ピッチの識
別を行うのに適応的な文字分離装置に関するものである
。[Detailed Description of the Invention] (Technical Field) The present invention relates to a character separation device, and particularly to a character separation device that is suitable for separating a character line image written on paper into individual characters and identifying the character pitch. This relates to a separation device.

（先行技術）各種印刷文字等を光学的に読み取る装置（以下ＯＣＲと
呼ぶ）において一連の文字を認識する場合、各文字を１
字毎に分離して文字認識部に送シ込む必要がある。各文
字ｔ−ｉ字毎に分離するために必要となる情報としては
文字ピッチがあるが、これはＯＣＲの読み取９対象とな
る印刷物の大きさや種類が限定されれば前もって与える
ことができる。(Prior Art) When recognizing a series of characters with a device that optically reads various printed characters (hereinafter referred to as OCR), each character is
It is necessary to separate each character and send it to the character recognition unit. The information required to separate each character t-i is the character pitch, but this can be provided in advance if the size and type of printed matter to be read by OCR 9 is limited.

しかし、最近のようにＯＣＲにおける読み取シ対象が不
特定な文字ピッチを持つ郵便物や文書のような床机な適
用範囲のものに及ぶようになると、あらかじめ文７″′
・−′Ｐｆ−知ること力゛できな？で紙面上の文字行イ
メージから文字ピッチを推定する必要が生じる。従来、
文字ピッチの推定方法としては例えば、平均値等が用い
られていた。しかし、英印字文字のようにフォントや文
字カテゴリーによって個個の文字幅が大きく異なる場合
や接触する文字数が増加した場合には、上述した平均値
と実際の文字ピッチとでは１文字の分離を行う時に生じ
る誤差が無視できなくなる。そのため、例えば前記平均
値を用いて文字間で接触が生じた多くの文字を含む文字
行イメージ全分離する場合は、接触した文字の個数を誤
ったり不正確な分離位置で切断されたシするという問題
があった。また上述したＯＣＲの読み取シ条件下では、
通常、等ピッチ印字データだけでなく文字ピッチが可変
となる可変ピッチ印字データ、時には手書き文字も含ま
れる場合がある。そこで例えば、欧文可変ピッチ印字デ
ータ間で文字間の接触が生じると、等ピッチ印字データ
と同様な文字分離処理の形態を採用することが困難とな
るという第２の問題があった。However, in recent years, OCR has begun to read items such as mail and documents with unspecified character pitch, such as floor desks, etc.
・-'Pf- Can't you have the power to know? Therefore, it becomes necessary to estimate the character pitch from the character line image on the paper. Conventionally,
For example, an average value has been used as a method for estimating character pitch. However, when the width of individual characters differs greatly depending on the font or character category, such as in the case of English printed characters, or when the number of characters that touch each other increases, the above average value and the actual character pitch are separated by one character. The errors that sometimes occur cannot be ignored. Therefore, for example, if you use the above average value to separate an entire character line image that includes many characters that are in contact with each other, it is possible that the number of characters in contact may be incorrect or that characters may be cut at incorrect separation positions. There was a problem. Also, under the OCR reading conditions mentioned above,
Usually, not only uniform pitch print data but also variable pitch print data in which the character pitch is variable, and sometimes handwritten characters may also be included. Therefore, for example, if contact occurs between characters in European variable pitch print data, there is a second problem in that it becomes difficult to employ the same character separation process as in uniform pitch print data.

シれは、文字分離のプロセスが欧文可変ピッチで代表さ
れる印字データ集合と等ピッチ印字データ集合とではそ
れぞれ異なっていることを意味する。This means that the character separation process is different between a print data set represented by variable pitch European characters and a uniform pitch print data set.

従って、通常の１文字単位の分離処理を行う以前に前述
した文字ピッチの識別を行うことが重要となる。即ち、
文字分離装置には、１文字の分離を行うための文字ピッ
チを検出すると共に文字分離対象となる文字行イメージ
が推定された文字ピッチに対してどの程度のばらつきを
持って文字イメージが印字されているかを示す分散評価
値も検出することが要求される。そこで、このような分
散評価値が算出されれば、例えば同一出願人によシ昭和
５８年１２月２０日に出願された発明（以下先願発明と
呼ぶ）の「文字分離装置」を用いる場合、文字ピッチや
文字パターンイメージの特徴を用いて、１文字単位に分
離するためのパラメータを決定するために利用すること
ができるし、更に、欧文タイプライタのような可変ピッ
チ印字文字や文字枠を意識せずに書かれた手書き文字の
ような文字ピッチを主体とした文字分離が困難なデータ
集合が混在する場合にも上述した分散評価値を用いて識
別することが可能となシ、よシ汎用的な６ＣＲの文字読
取対象に対しても適用できる文字分離装置を実現するこ
とができる之゛（発明の目的）本発明は上記問題点を解決するためになされたもので、
その目的は、最良な文字ピッチを検出すると同時に、文
字行イメージが検出された文字ピッチに対してどの程度
のばらつきを持って印字されているかを示す分散評価尺
度をも算出することによって文字ピッチの性質を把握し
、適応的な文字分離処理を提供することにある。また本
発明の他の目的は、文字間に接触を含む場合や１文字が
２文字以上に分離する場合等の種種の文字読取対象条件
下でも、安定にしかも最良の文字ピッチを検出すること
が可能な文字分離装置を提供することにある。Therefore, it is important to identify the character pitches described above before performing normal character-by-character separation processing. That is,
The character separation device detects the character pitch for separating one character and also determines how much variation the character line image to be separated has with respect to the estimated character pitch. It is also required to detect a variance evaluation value indicating whether the Therefore, if such a dispersion evaluation value is calculated, for example, when using a "character separation device" of an invention filed on December 20, 1982 by the same applicant (hereinafter referred to as the "earlier invention"). , it can be used to determine parameters for separating individual characters by using the characteristics of the character pitch and character pattern image, and furthermore, it can be used to determine the parameters for separating characters into individual characters. Even when there are mixed data sets that are difficult to separate characters based on character pitch, such as handwritten characters written unconsciously, it is possible to identify them using the variance evaluation value described above. It is possible to realize a character separation device that can be applied to general-purpose 6CR character reading objects. (Objective of the Invention) The present invention has been made to solve the above problems.
The purpose is to detect the best character pitch and at the same time calculate the variance evaluation measure that shows how much variation the character line image is printed with respect to the detected character pitch. The goal is to understand the characteristics and provide adaptive character separation processing. Another object of the present invention is to stably detect the best character pitch even under various character reading conditions, such as when there is contact between characters or when one character is separated into two or more characters. The object of the present invention is to provide a possible character separator.

＜ａｍｏ構成）本発明によれば、一連の文字行イメージを走査　５− し１文字に分離する文字分離装置において、前記一連の
文字行イメージから複数個の文字塊を順次検出する手段
と、前記複数個の文字塊から文字塊間距離を算出し該文
字塊間距離の頻度分布を記憶する手段と、文字ピッチの
存在区間を予測する手段と、前記頻度分布を複数個の領
域に分割し該複数個の領域における該文字ピッチ推定誤
差の最尤線形和によって構成される推定誤差評価尺度が
最小と壜る前記存在区間に含まれる候補文字ピッチを検
出する手段と、前記文字ピッチを用馳て分割された複数
個の前記領域内に含まれる前記文字塊間距離の分散を算
出し複数個の該分散の最尤線形和によって構成される・
分散評価値を算出する手段と、該分散評価値に基づいて
文字ピッチの識別を行い該文字行イメージ′ｆ：１文字
に分離する手段とを備えることを特徴とする文字分離装
置が得られる。<amo configuration) According to the present invention, in a character separation device that scans a series of character line images and separates them into single characters, means for sequentially detecting a plurality of character chunks from the series of character line images; means for calculating distances between character clusters from a plurality of character clusters and storing a frequency distribution of the distances between character clusters; means for predicting an interval in which a character pitch exists; and means for dividing the frequency distribution into a plurality of regions, means for detecting a candidate character pitch included in the existing interval in which an estimation error evaluation scale constituted by a maximum likelihood linear sum of the character pitch estimation errors in a plurality of regions is the minimum; Calculate the variance of the distances between the character blocks included in the plurality of divided regions, and configure by the maximum likelihood linear sum of the plurality of variances.
A character separation device is obtained, comprising means for calculating a variance evaluation value, and means for identifying a character pitch based on the variance evaluation value and separating the character line image 'f: one character.

（実施例）　゛次に図面を参照して本発明について説明する。(Example) Next, the present invention will be explained with reference to the drawings.

第１図は本発明の文字ピッチ検出装置で用いる６− 文字塊間距離を説明するための文字行イメージの一例を
示す図である。同図において、斜線で示した白地で分離
可能な文字イメージすなわち文字塊を破線図示の矩形領
域で示している。ここで参照記号ＴＪｔ、ｔ＋ｔ　（但
しｉ　−１，Ｌ−６）　ハｆｊＪ！、　ｉ　番目。FIG. 1 is a diagram showing an example of a character line image for explaining the distance between six character blocks used in the character pitch detection device of the present invention. In the figure, a separable character image, that is, a block of characters, is shown by a rectangular area shown by a broken line on a white background shown by diagonal lines. Here, reference symbol TJt, t+t (however, i -1, L-6) HafjJ! , i-th.

文字塊の始端から第ｉ＋１番目の文字塊の始端までを文
字塊間距離としてめたものであり、また参照記号Ｕ′１
−＋□（但し’　”　１．　ａ・・・６）は第１番目の
文字塊の終端から第ｉ＋１番目の文字塊の終端までを文
字塊間距離としてめたものである。The distance between the character blocks is defined as the distance from the start of the character block to the start of the i+1th character block, and the reference symbol U'1
-+□ (where ''' 1.a...6) is the distance between character blocks from the end of the first character block to the end of the i+1th character block.

これらはすべて文字塊間距離として後述する文字塊間距
離の頻度分布を構成する観測値として利用することがで
きる。同様に、参照記号Ｕ、、、（但しｉ＝１．２．・
・・６　、ｉ（ｊであシ、図ではＵ□、３及びＵ０９．
のみ示している）は第１番目の文字塊のｌ＝　ｘ、ｚ・
ａ　、　ｉ　＜ｊ　テあシ、図ではＵ’、３（７）みも
のであり、通常、観測量が多ければ統計的には安定なの
で後述する文字塊間距離の頻度分布を構成する観測量と
して利用することができる。All of these can be used as observed values constituting a frequency distribution of distances between character blocks, which will be described later as distances between character blocks. Similarly, reference symbol U, , (where i=1.2.・
...6, i (j), in the figure U□, 3 and U09.
) is the first character block l = x, z・
a , i < j teashi, in the figure U', 3 (7). Normally, the larger the number of observations, the more statistically stable it is, so the number of observations that make up the frequency distribution of the distance between character clusters, which will be described later. It can be used as

なお頻度分布を構成するための観測値となる上述した文
字塊間距離は、例えば文字塊間距離ＵＩ。Note that the above-mentioned distance between character blocks, which is an observed value for configuring the frequency distribution, is, for example, the distance between character blocks UI.

ｊ（但しｊ＝　ｉ＋１　、ｉ＋２．ｉ＋３）、文字塊間
距離Ｕ’１　、３　（ｊ　＝　ｉ＋１　、　ｉ＋２　、
　ｉ＋３）までを利用するというように制限することが
できる。また、ピリオド（・）、カンマ（１）等（いず
れも図示していない）のような文字イメージは、例えば
複数個の文字塊の平均高さＨｌを用いてその文字塊幅及
び高さを調べることによシ、文字塊間距離の観測量から
除去しても良い。更に、前述の平均高さＨｏに比べて大
きな空白が検出された場合も、その空白を含めた文字塊
間距離を観測量として用いないようにしても良い。j (however, j=i+1, i+2.i+3), distance between character blocks U'1, 3 (j=i+1, i+2,
It is possible to limit the use of up to i+3). Also, for character images such as periods (・), commas (1), etc. (none of which are shown), the width and height of the character blocks are checked using, for example, the average height Hl of multiple character blocks. In particular, it may be removed from the observed amount of distance between character blocks. Furthermore, even if a blank space larger than the above-mentioned average height Ho is detected, the distance between character blocks including the blank space may not be used as the observed amount.

次に第２図は第１図で示したような等ピッチデータにお
ける一連の文字塊間距離の頻度分布の一例を示す図であ
る。同図において、頻度分布の横軸Ｕは文字塊間距離Ｕ
、、、の値を示しておシ、縦軸ＮＵＭは任意の文字塊間
距離における頻度数を示している。Next, FIG. 2 is a diagram showing an example of the frequency distribution of distances between a series of character blocks in equal pitch data as shown in FIG. 1. In the same figure, the horizontal axis U of the frequency distribution is the distance between character blocks U
, , , and the vertical axis NUM indicates the frequency at a given distance between character blocks.

次に第３図囚及び（６）はそれぞれ等ピッチデータ及び
欧文可変ピッチデータにおける一連の文字塊間距離の頻
度分布の第１及び第２の例を示す図である。第３図囚は
等ピッチデータを、また第３図■は欧文可変ピッチデー
タを示している。Next, Figures 3 and 6 are diagrams showing first and second examples of frequency distributions of distances between a series of character blocks in constant pitch data and Roman variable pitch data, respectively. Figure 3 shows constant pitch data, and Figure 3 shows variable pitch data in Roman languages.

続いて第２図及び第３図を用いて本発明の原理について
説明する。複数個の文子塊の高さから平均高さＨ□を算
出し、係数α□、α２（但しα１〈−α、）を用いて文
字ピッチの存在区間（α１・Ｈ□。Next, the principle of the present invention will be explained using FIGS. 2 and 3. The average height H□ is calculated from the heights of multiple sentence blocks, and the character pitch existence interval (α1·H□) is calculated using the coefficients α□ and α2 (however, α1<−α,).

α２・Ｈ，）’ｔ−設定する。ここで、前述した存在区
間（α、・ＨＨｔα２・Ｈ，）内のすべての文字塊間距
離を候補文字ピッチＰ１　としても良いが、以下に述べ
るような処理を用いれば候補文字ピッチＰｌの数を減少
させ処理時間の向上をはかることができる。すなわち前
述した存在区間（α□・Ｈｍ。α2・H, )'t-set. Here, all the distances between character clusters in the existence interval (α, ・HHtα2・H,) described above may be taken as the candidate character pitch P1, but if the process described below is used, the number of candidate character pitches Pl can be It is possible to improve the processing time by reducing the amount of time required. That is, the existence interval (α□・Hm) mentioned above.

α２・Ｈ，）内で、一定許容幅Δτ（但しΔτ≧１）で
最頻値をとる文字塊間距離Ｕ　（１）　ｔ−頻度分布か
ら算出し、文字ピッチの存在区間の下限値をＭＡＸ（９
− α１・Ｈｍ、（１−α３）・Ｕ（１））（但し係数α３
はＯ≦α３≦１を満たす）、文字ピッチの存在区間の上
限値ＭＩＮ（α２・Ｈｍ、（１＋α３）・Ｕ　（１）　
）とすることができる。第２図では、区間Ｃ１が文字ピ
ッチの存在区間となり、区間Ｃ１に含まれる文字塊間距
離が候補文字ピッチＰＩ（但しＭＡＸ（α１句Ｈ，，（
１−α３）・Ｕ（１））≦ＰＫ≦ＭＩＮ（α２・ＨｒＩ
、、（１＋α３）−Ｕ（１））を満たす）となる。The distance U between character clusters that takes the mode within a certain allowable width Δτ (however, Δτ ≥ 1) within α2・H, ) (1) Calculated from the t-frequency distribution, and MAX (9
− α1・Hm, (1−α3)・U(1)) (however, coefficient α3
satisfies O≦α3≦1), the upper limit value of the character pitch existence section MIN (α2・Hm, (1+α3)・U (1)
). In Fig. 2, section C1 is the section where character pitches exist, and the distance between character blocks included in section C1 is the candidate character pitch PI (however, MAX (α1 phrase H,, (
1-α3)・U(1))≦PK≦MIN(α2・HrI
, , (1+α3)−U(1)).

次に、任意の候補文字ピッチＰ、　を基量として頻度分
布は図中破線で示したような領域ｆ’ｌ＋ｆ２＋・・・
ｆ　ｐに分割される。ここで各領域ｆ′ｋ（但しに＝１
．２．−ｎ）の境界点８　（ｆ′に−１＋　ｆ′ｋ）は
領域ｆ４−１帥心（ｋ−１）・Ｐ、と領域ｆ′、の中心
に＊　ｐｓとの間に存在し、各領域ｆ′にの境界点Ｓ　
（ｆＪ、　、　ｆｌヤ□）は領域ｆ（、の中心に−Ｐｌ
と領域ｆ’に＋１の中心（ｋ＋１）・ＰＩとの間に存在
する。そこで、例えば境界点Ｓ　（ｆ′に−１＋　’′
ｋ）は領域ｆ’に−１の中心（ｋ−１）・Ｐｌと領域ｆ
ｌ、の中心ｋ　−Ｐ、の中間点ｋｆ（の中心ｋ　−ｐ、
と領域ｆ’に＋１の中心（ｋ＋１）１０− 次に、領域、ｆ′ｋ（但しに＝１．２．・・・ｎ）内に
存在するｎｋ個（但しｎ、≧０）の文字塊間距離の平均
値Ｐ（ｋ、ｎｋ）を算出し該平均値Ｐ　（１ｃ＋　”ｋ
）を整数にで除算することによって、候補文字ピッチＰ
１　を基準とした場合の領域ｆ′ｋから観測される文字
ピッチに相当する量が検出される。そこで、ｎｋ）と候
補文字ピッチＰ、とのずれをすべての領域ｆ′ｋにわた
って最小にする候補文字ピッチＰ。Next, using an arbitrary candidate character pitch P, as a base quantity, the frequency distribution is in the area f'l+f2+... as shown by the broken line in the figure.
It is divided into f p. Here, each area f'k (where = 1
．． 2. -n) boundary point 8 (-1+f'k at f') exists between the region f4-1 center (k-1) P and * ps at the center of the region f', and each Boundary point S in area f'
(fJ, , flya□) is −Pl at the center of the area f(,
and the center of +1 (k+1)·PI in region f'. Therefore, for example, the boundary point S (−1+'' for f'
k) is the center of -1 (k-1) Pl in the area f' and the area f
The center of l, k - P, and the midpoint kf (center of k - P,
and the center of +1 in area f' (k+1) 10- Next, nk (n, ≧0) character blocks existing in area f'k (where = 1.2...n) Calculate the average value P (k, nk) of the distance between
) by dividing the candidate character pitch P by an integer.
An amount corresponding to the character pitch observed from the area f'k when 1 is used as a reference is detected. Therefore, the candidate character pitch P minimizes the deviation between the candidate character pitch P and the candidate character pitch P over all the regions f'k.

を推定文字ピッチＰとする最尤線形推定手法を適用する
ことが有効となる。そのため、例えば、最尤推定を行う
ための評価基準となる推定誤差評価こで、係数Ｃ（ｋ、
ｎｋ）はサンプル数ｎｋ及び整数ｋ（但しに−１，２，
・ｎ）の関数であシ、″”ｘＣｋ＝１（Ｊｎｋ）＝１を満たす。具体的な係数Ｃ（ｋ。It is effective to apply a maximum likelihood linear estimation method in which P is the estimated character pitch. Therefore, for example, in the estimation error evaluation, which is the evaluation standard for performing maximum likelihood estimation, the coefficient C(k,
nk) is the number of samples nk and the integer k (however, -1, 2,
・It is a function of n), and satisfies ""xCk=1 (Jnk)=1. The specific coefficient C(k.

”５ｃ）（Ｄ−例としては、Ｃ（ｋ、ｎｋ）−ｋ　−ｎ
ｋ／Σ　ｋ２・ｎｋを用いることができる。なお、上述
に＝１した推定誤差評価尺度Ｔは推定誤差の分散量とな　− っているが、ずれの絶対量１−・Ｐ　（ｋ　＋　”ｋ）
−Ｐ＋　Ｉに基づく評価基準を用いることもできる。こ
のようにして、最適な推定文字ピッチＰを推定できると
共に、文字塊間距離の頻度分布のクラスター化も同時に
行われる。"5c) (D-For example, C(k, nk)-k -n
k/Σ k2·nk can be used. Note that the estimation error evaluation scale T, which is equal to 1 above, is the amount of variance of the estimation error, but the absolute amount of deviation is 1-・P (k + ``k)
An evaluation criterion based on -P+I can also be used. In this way, the optimal estimated character pitch P can be estimated, and the frequency distribution of distances between character blocks can be clustered at the same time.

次に第３図囚、＠を用いて本発明における文字ピッチ識
別手法について説明する。第３図囚、＠に示したように
文字塊間距離の頻度分布は、等ピッチ印字データと可変
ピッチ印字データでは異なることがわかる。この相異は
第２図で示した最適す文字ピッチＰによってクラスター
化された各領域ｆｋ（但しに＝１．２．・・・ｎ）にお
ける推定文字ピッチＰのに倍からの各文字塊間距離の分
散σ”（ｆｓｃ）をすべての領域ｆ１．ｆ２．・・・に
わたって評価することによって識別することができる。Next, the character pitch identification method according to the present invention will be explained using FIG. 3 and @. As shown in Figure 3, the frequency distribution of the distance between character blocks is different between uniform pitch print data and variable pitch print data. This difference is between each character block from twice the estimated character pitch P in each region fk (where = 1.2...n) clustered by the optimal character pitch P shown in Figure 2. It can be identified by evaluating the distance variance σ'' (fsc) over all regions f1, f2, .

そこで、クラスター化された各領域ｆｋ（但しに−１，
２，・・・ｎ）の分散σ”　（ｆｋ）の線形和で構成さ
れる分散評価を用いることができる。ここで、上述した
分散量）　価値ε２における係数Ｃ（ｋ、ｎｋ）は、文
字塊間距離のサンプル数ｎｋ及び整数にの関数でありＸ
Ｃｋ＝１（ｋ、ｎｋ）＝１１満たす。具体的な係数Ｃ（ｋ。Therefore, each clustered region fk (−1,
2,...n) can be used. Here, the coefficient C(k, nk) in the value ε2 (the above-mentioned variance amount) can be used. It is a function of the number of samples nk of the distance between clusters and an integer, and
Ck=1 (k, nk)=11 is satisfied. The specific coefficient C(k.

ｎｋ）の−例としては、Ｃ（ｋ、ｎｋ）−に２・ｎｋ／
）ｋ２・ｎｋを用いることができる。なお、上述した分
散評価値ε２の代わりに誤差評価値６を用いることもで
きることは言うまでもない。nk) - For example, C(k, nk) - has 2・nk/
) k2·nk can be used. Note that it goes without saying that the error evaluation value 6 can be used instead of the above-mentioned variance evaluation value ε2.

次に、分散評価値ε２あるいは誤差評価値ｅが、あらか
じめ設けられた閾値α４よシも大きければ上述した推定
文字ピッチＰを主体とする文字分離が可能な場合、すな
わち等ピッチ印字データであシ、前記閾値α４よシも小
さければ上述した推定文字ピッチを主体とする文字分離
が適用できない場合、すなわち欧文可変ピッチ印字デー
タであることが判明し、文字ピッチの性賛ヲ識別するこ
とができる。このようにして、文字ピッチの性質を識別
することが可能になると、文字ピッチを主体として文字
分離が可能な場合には、例えば前記先願発明の「文字分
離装置゛」を用いて安定に文字分離を行うことができる
し、また上述した以外の公１３− 知の文字分離技術を用いることもできる。一方、文字ピ
ッチを主体とする文字分離が適用できない場合には、例
えば容易に実現できる公知の文字行イメージの空白によ
る文字分離技術を用いて文字分離を行うこともできる。Next, if the variance evaluation value ε2 or the error evaluation value e is larger than the preset threshold α4, character separation based on the above-mentioned estimated character pitch P is possible, that is, equal pitch print data is used. If the threshold value α4 is also smaller than the above-mentioned threshold value α4, it is found that the above-mentioned character separation based mainly on the estimated character pitch cannot be applied, that is, the data is European variable pitch print data, and the nature of the character pitch can be identified. In this way, if it becomes possible to identify the characteristics of the character pitch, if character separation is possible based on the character pitch, for example, the characters can be stably separated using the "character separation device" of the prior invention. Separation can be performed, and other well-known character separation techniques other than those described above can also be used. On the other hand, if character separation based mainly on character pitch cannot be applied, character separation can be performed using, for example, a known character separation technique using blank spaces in character line images that can be easily realized.

第４図は本発明の文字分離装置の具体的一実施例を示す
論理ブロック図である。走査装置１は紙面上に記載され
た文字行イメージを光学的に走査して、電気信号に変換
し、２値量子化後、文字行イメージメモリ２へ書き込む
。文字塊抽出装置３は、文字行イメージメモリ２に格納
された複数個の文字行イメージから文字塊を順次抽出し
、各文字塊の始点位置及び大きさを文字塊レジスタ４へ
格納する。なお、文字塊の大きさは、文字塊の幅及び高
さを表わすものとする。また、このような文字塊抽出装
置は、例えば、同一出願人による特願昭５６−２７５１
２号明細書で示されている技術を用いることができる。FIG. 4 is a logical block diagram showing a specific embodiment of the character separation device of the present invention. A scanning device 1 optically scans a character line image written on a paper surface, converts it into an electrical signal, and writes it into a character line image memory 2 after binary quantization. The character block extraction device 3 sequentially extracts character blocks from a plurality of character line images stored in the character line image memory 2, and stores the starting position and size of each character block in the character block register 4. Note that the size of a character block represents the width and height of the character block. Further, such a character block extraction device is disclosed in, for example, Japanese Patent Application No. 56-2751 filed by the same applicant.
The technique shown in the specification of No. 2 can be used.

制御装置７は、順次、文字塊レジスタ４から転送される
文字塊の始端位置及び終端位置から文字塊間距離を算出
し、頻度分１４− 布テーブル６の対応する文字塊間距離のアドレス位置の
内容である頻度数をインクリメントする。The control device 7 sequentially calculates the distance between character blocks from the start and end positions of the character blocks transferred from the character block register 4, and calculates the address position of the corresponding distance between character blocks in the frequency section 14-cloth table 6. Increment the frequency number that is the content.

このようにして、メモリから構成される頻度分布テーブ
ル６に、第２図で示したような文字塊間距離の頻度分布
が生成される。なお、頻度分布テーブル６は、最初Ｏに
初期化されているとする。次に、制御装置７によって文
字塊レジスタ４に格納された複数個の文字塊の高さから
平均高さＨｌが算出され、存在区間検出部５０へ転送さ
れる。定数レジスタ１９は、第２図で示した係数α□、
α２（但しα□くα２）、α３及び一定許容幅Δτ（但
しΔτ≧１）の各定数を格納する。存在区間検出部５０
は、最初に定数レジスタ１９から係数α□及びα２を入
力し、α□・Ｈｍ及びα２・Ｈｏを前述した文字ピッチ
の存在区間の下限値α１・Ｈｍ及び上限値α２・Ｈｏと
して設定する。次いで存在区間検出部５０は、前述した
存在区間の下限値α１・Ｈｏと上限値α２・Ｈ，内に属
する文字塊間距離の頻度値を制御装置７を介して順次頻
度分布テーブル６から読み出し、定数レジスタ１９から
転送された一定許容幅Δτ（但しΔτ≧１）で最頻値と
なる文字塊間距離Ｕ　（１）　？算出する。存在区間演
算部５１は、存在区間検出部５０においてめられた文字
塊間距離Ｕ（１）と定数レジスタ１９から入力した係数
α３を用いて、値（１−α３）・Ｕ（１）及び（１＋α
、）・Ｕ（１）を算出した後、ＭＡＸ（α、・Ｈｌ、ｌ
ｌ、（１−α３）・Ｕ（１）　）　？前述した文字ピッ
チの存在区間の下限値ＰＬ、ＭＩＮ（α２・Ｈ，ｎ、（
１＋α３）・Ｕ（１））を前述した文字ピッチの存在区
間の上限値ｐＵとして、それぞれ存在区間レジスタ８に
格納する。なお、前述した文字塊間距離Ｕ（１）が存在
区間検出部５０において検出されなかった場合には、文
字塊間距離Ｕ（１）として平均高さＨｌが存在区間検出
部５０においてセットされるものとする。次いで制御装
置７によって存在区間レジスタ８に格納された文字ピッ
チの下限値ＰＬがカウンタ１４にセットされる。カウン
タ１４は、下限値ＰＬから上限値ＰＵマで順次以下に述
べる動作が終了後にカウントアツプされ、そのカウント
値Ｐｌ（但しＰＬ≦Ｐ、≦Ｐｔｒ）を平均値算出部９に
転送する。平均値算出部９は、カウンタ１４から転送さ
れるカウント値すなわち候補文字ピッチｐ、　′ｔｌ−
基単と基量、頻度分布のｎ個の領域範囲すなわち領域ｆ
ｋ（但して頻度分布テーブル６を参照することによって
領域ｆｋに属する文字塊間距離の数ｎｋ及び領域ｆｋに
属する文字塊間距離の千吟値Ｐ（ｋ、ｎｋ）を算出する
。以上の処理をｎ個の領域にわたシ行い、候補客字ピッ
チＰ、及び各領域ｆｋ（但しに＝１．λ・・・ｎ）の文
字塊間距離の数ｎｋ（但し＊＝１．ｚ・・・ｎ）及び平
均値Ｐ（ｋ、ｎｋ）、（但しに＝　１．２．・ｎ　）が
推定誤差評価値演算部１０へ転送される。推定誤差評価
値演算部１０は、前述した推定誤差評価値Ｔｆ平均値算
出部９から転送される情報を用いて演算することにより
てめられる。評価値レジスタ１２には最小となる推定誤
差評価値Ｔが記憶される。カお最尤文字ピッチレジスタ
１５にはちら１７− かじめ推定誤差評価値として非常に大きな値がセットさ
れているものとする。比較部１１において、推定誤差評
価値演算部１０から出力された推定誤差評価値Ｔと評価
値レジスタ１２の内容とを比較し、推定誤差評価値演算
部１０の出力値が評価値レジスタ１２の内容よシも小さ
ければ推定誤差評価値演算部１０の出力値を評価値レジ
スタ１２へ書き込み、！に候補文字ピッチＰＩＯ値を最
尤文字ピッチレジスタ１５へ書き込み、制御装置７′ｆ
：介してカウンタ１４が１カウントアツプする。一方、
比較部１１において推定誤差評価値演算部１０の出力値
が評価値レジスタ１２の内容よシも大きければ制御装置
７ｔ−寒してカウンタ１４が１カウントア、プされる。In this way, a frequency distribution of distances between character blocks as shown in FIG. 2 is generated in the frequency distribution table 6 constructed from memory. It is assumed that the frequency distribution table 6 is initially initialized to O. Next, the average height Hl is calculated by the control device 7 from the heights of the plurality of character blocks stored in the character block register 4, and is transferred to the existing section detection unit 50. The constant register 19 stores the coefficient α□ shown in FIG.
The constants α2 (however, α□×α2), α3, and a constant allowable width Δτ (however, Δτ≧1) are stored. Existence section detection unit 50
First inputs the coefficients α□ and α2 from the constant register 19, and sets α□·Hm and α2·Ho as the lower limit value α1·Hm and upper limit value α2·Ho of the character pitch existing interval. Next, the existence section detection unit 50 sequentially reads the frequency values of the distances between character blocks that belong to the lower limit α1·Ho and the upper limit α2·H of the existence interval from the frequency distribution table 6 via the control device 7, Distance between character blocks U (1) that has the most frequent value with the constant allowable width Δτ (however, Δτ≧1) transferred from the constant register 19? calculate. Existence interval calculation unit 51 uses the distance between character blocks U(1) determined by existence interval detection unit 50 and coefficient α3 input from constant register 19 to calculate the values (1−α3)・U(1) and ( 1+α
, )・U(1), MAX(α,・Hl,l
l, (1-α3)・U(1) )? The lower limit value PL,MIN(α2・H,n,(
1+α3)·U(1)) are respectively stored in the existence interval register 8 as the upper limit pU of the character pitch existence interval. Note that when the above-mentioned distance between character blocks U(1) is not detected by the existing area detection unit 50, the average height Hl is set as the distance between character blocks U(1) in the existing area detection unit 50. shall be taken as a thing. Next, the lower limit value PL of the character pitch stored in the existence interval register 8 is set in the counter 14 by the control device 7. The counter 14 counts up after the operations described below are completed sequentially from the lower limit value PL to the upper limit value PU, and transfers the count value Pl (PL≦P,≦Ptr) to the average value calculation unit 9. The average value calculation unit 9 calculates the count value transferred from the counter 14, that is, the candidate character pitch p, 'tl-
n area ranges of base units, base quantities, and frequency distributions, that is, area f
k (However, by referring to the frequency distribution table 6, the number nk of distances between character blocks belonging to the area fk and the 1,000-degree value P(k, nk) of the distances between character blocks belonging to the area fk are calculated.The above processing is carried out over n areas, and the candidate character pitch P and the number of distances between character clusters nk (however, *=1.z... n) and the average value P(k, nk), (where = 1.2.·n) are transferred to the estimated error evaluation value calculation unit 10.The estimated error evaluation value calculation unit 10 is The value Tf is determined by calculation using the information transferred from the average value calculation unit 9. The minimum estimated error evaluation value T is stored in the evaluation value register 12. The maximum likelihood character pitch register 15 17- It is assumed that a very large value is set as the preliminary estimation error evaluation value.In the comparator 11, the estimation error evaluation value T output from the estimation error evaluation value calculation unit 10 and the evaluation value register 12, and if the output value of the estimated error evaluation value calculation unit 10 is smaller than the content of the evaluation value register 12, the output value of the estimated error evaluation value calculation unit 10 is written to the evaluation value register 12, and ! The candidate character pitch PIO value is written to the maximum likelihood character pitch register 15, and the control device 7'f
: The counter 14 increments by one. on the other hand,
In the comparison section 11, if the output value of the estimated error evaluation value calculation section 10 is larger than the content of the evaluation value register 12, the control device 7t is turned off and the counter 14 is counted up by one.

以上の動作をカウンタ１４の値が文字ピッチの上限値Ｐ
Ｕに達するまで行うことによって、最適な文字ピッチＰ
が最尤文字ピッチレジスタ１５に格納されるととになる
。分散評価値演算部１３ゆ、最尤文字ピッチレジスタ１
号に最適な推定文字ピッチＰがセットされると、推定文
字ピッチルｔ最尤文字ピッチレジスタ１５か１８− ら読み出し、推定文字ピッチＰ？基量として頻度分布の
ｎ個の領域範囲、即ち領域ｆｋ（但しに＝１゜テーブル
６を参照することによって領域ｆｋに属する文字塊間距
離のサンプル数ｎｋ及び領域ｆ、に属する文字塊間距離
の値に−Ｐにおける分散σ２（ｆｋ）を算出し、前述し
たよう表分散評価値６２＝分散評価値演算部１３におい
て検出される分散σ３（ｆｋ）は平均値算出部９におい
て算出するようにすることも可能である。分散評価値演
算部１３において算出された分散評価値Ｃ２は分散評価
値レジスタ１６に格納される。閾値レジスタ１８には前
記文字ピッチの識別を行うための閾値α４ｔ−記憶する
。比較部１７は前記分散評価値レジスタ１６に格納され
た分散評価値ε２と閾値レジスタ１８に格納された閾値
α４どを比較することＫよりて文字ピッチの性質を識別
する。すなわち、分散評価値ε２が閾値α４よシも大き
くなれば推定文字ピッチＰを主体とした文字分離位置の
決定方法が適用できないと判定され、分散評価値ε２が
閾値α４よυも小さければ推定文字ピッチＰを主体とし
た文字分離位置の決定方法が適用できると判定される。The above operation is performed until the value of the counter 14 is the upper limit value P of the character pitch.
By repeating until reaching U, the optimum character pitch P
is stored in the maximum likelihood character pitch register 15. Variance evaluation value calculation unit 13, maximum likelihood character pitch register 1
When the estimated character pitch P that is most suitable for the number is set, the estimated character pitch is read from the maximum likelihood character pitch register 15 or 18-, and the estimated character pitch P? By referring to n area ranges of the frequency distribution as a base quantity, that is, area fk (where = 1° Table 6), the number of samples nk of the distance between character blocks belonging to area fk and the distance between character blocks belonging to area f can be determined by referring to table 6. The variance σ2(fk) at −P is calculated for the value of , and the variance σ3(fk) detected in the table variance evaluation value 62=dispersion evaluation value calculation unit 13 as described above is calculated in the average value calculation unit 9. The variance evaluation value C2 calculated by the variance evaluation value calculation unit 13 is stored in the variance evaluation value register 16.The threshold value register 18 stores a threshold value α4t-storage for identifying the character pitch. The comparison unit 17 identifies the nature of the character pitch by comparing the variance evaluation value ε2 stored in the variance evaluation value register 16 with the threshold value α4 stored in the threshold value register 18. That is, the character pitch characteristic is identified by comparing the variance evaluation value ε2 stored in the variance evaluation value register 16 with the threshold value α4 stored in the threshold value register 18. If the value ε2 is larger than the threshold α4, it is determined that the character separation position determining method based mainly on the estimated character pitch P cannot be applied, and if the variance evaluation value ε2 is smaller than the threshold α4 by υ, the method for determining character separation positions based on the estimated character pitch P is determined as the main method. It is determined that the character separation position determination method described above is applicable.

参照符号２０は１文字分離位置決定部であシ、前述した
ような公知の技術あるいはそれらを組合わせて用いるこ
とができる。す表わち、例えば前記先願発明の「文字分
離装置」と最も容易に実現できる空白による文字分離を
用いることができる。Reference numeral 20 is a one-character separation position determination unit, and the known techniques as described above or a combination thereof can be used. In other words, it is possible to use, for example, the "character separation device" of the invention of the prior application and character separation using blank spaces, which can be realized most easily.

すなわち、比較部１７において前述した分散評価値ε３
から推定文字ピッチＰを主体とした文字分離が適用され
ると判定されると、制御装置７によって最尤文字ピッチ
レジスタ１５に格納された推定文字ピッチＰ及び分散評
価値レジスタ１６に格納された分散評価値ε！を例えば
前記先願発明の「文字分離装置」よシ構成された１文字
分離位置決定部２０に転送する。一方、比較部１７にお
いて推定文字ピッチＰｔ−主体とした文字分離が適用で
きないと判定されると、前述した文字塊レジスタ４に格
納された複数個の文字塊イメージを１文字と見なし、１
文字分離位置決定部２０の出力値とする。なお、１文字
分離位置決定部２０は、上述の実施例以外のものでも公
知の技術を用いて構成できることは言うまでもない。That is, the above-mentioned variance evaluation value ε3 in the comparing section 17
When it is determined that character separation based on the estimated character pitch P is applied, the control device 7 uses the estimated character pitch P stored in the maximum likelihood character pitch register 15 and the variance stored in the variance evaluation value register 16. Evaluation value ε! is transferred to the one-character separation position determination unit 20, which is configured, for example, as the "character separation device" of the prior invention. On the other hand, if the comparison unit 17 determines that the character separation based on the estimated character pitch Pt cannot be applied, it considers the plurality of character block images stored in the character block register 4 as one character, and
This is the output value of the character separation position determining section 20. Note that it goes without saying that the single character separation position determining section 20 can be constructed using a known technique other than the above-described embodiments.

（発明の効果）以上述べたように本発明の文字分離位置によれば、あら
かじめ文字ピッチがわからなくとも、また文字塊の接触
や文字間の分離を含む文字イメージが含まれていても、
正確に文字ピッチを測定すると共に、文字ピッチの性質
を把握し適応性の高い文字分離装置が容易に実現できる
という効果が生じる。(Effects of the Invention) As described above, according to the character separation position of the present invention, even if the character pitch is not known in advance or even if a character image including contact of character blocks or separation between characters is included,
The effect is that it is possible to accurately measure character pitch, grasp the characteristics of character pitch, and easily realize a highly adaptable character separation device.

[Brief explanation of the drawing]

第１図は本発明の文字分離装置で用いる文字塊間距離の
文字行イメージの一例を示す図、第２図は等ピッチデー
タにおける一連の文字塊間距離の頻度分布の一例を示す
図、第３図（ト）、＠はそれぞ２１− れ等ピッチ、可変ピッチデータにおける一連の文字塊間
距離の頻度分布の第１．第２の例を示す図及び第４図は
本発明の文字分離装置の具体的一実施例を示す論理ブロ
ック図である。図において、１・・・・・・走査装置、２・・・・・・
文字行イメージメモリ、３・・・・・・文字塊抽出装置
、４・・・・・・文字塊レジスタ、５０・・・・・・存
在区間検出部、５１・・・・・・存在区間演算部、６・
・・・・・頻度分布テーブル、７・・・・・・制御装置
、８・・・・・・存在区間レジスタ、９・・・・・・平
均値算出部、１０・・・・・・推定誤差評価値演算部、
１１．１７・・・・・・比較部、１２・・・・・・評価
値レジスタ、１３・・・・・・分散評価値演算部、１４
・・・・・・カウンタ、１５・・・・・・最尤文字ピッ
チレジスタ、１６・・・・・・分散評価値レジスタ、１
８・・・・・・閾値レジスタ、１９・・・・・・定数レ
ジスタ、２０・・・・・・１文字分離位置決定部。２２−FIG. 1 is a diagram showing an example of a character line image of the distance between character blocks used in the character separation device of the present invention, FIG. 2 is a diagram showing an example of the frequency distribution of the distance between a series of character blocks in equal pitch data, In Figure 3 (g), @ indicates the first frequency distribution of a series of distances between character blocks in constant pitch and variable pitch data. The second example and FIG. 4 are logical block diagrams showing a specific embodiment of the character separation device of the present invention. In the figure, 1... scanning device, 2......
Character line image memory, 3...Character block extraction device, 4...Character block register, 50...Existence section detection unit, 51...Existence section calculation Department, 6・
... Frequency distribution table, 7 ... Control device, 8 ... Existence interval register, 9 ... Average value calculation unit, 10 ... Estimation error evaluation value calculation unit,
11.17...Comparison unit, 12...Evaluation value register, 13...Dispersion evaluation value calculation unit, 14
... Counter, 15 ... Maximum likelihood character pitch register, 16 ... Variance evaluation value register, 1
8...Threshold value register, 19...Constant register, 20...1 character separation position determination unit. 22-

Claims

[Claims]

A character separation device that scans a series of character line images and separates them into single characters, comprising means for sequentially detecting a plurality of character blocks from the series of character line images, and calculating distances between character blocks from the plurality of character blocks. means for storing the frequency distribution of the distance between character blocks; means for predicting the existing interval of character pitch; and means for dividing the frequency distribution into a plurality of regions and calculating the maximum of the character pitch estimation error in the plurality of regions. means for detecting a candidate character pitch included in the existing interval in which an estimated error evaluation measure constituted by a likelihood linear sum is minimized; means for calculating the variance of the distance between the character blocks and calculating a variance evaluation value constituted by the maximum likelihood linear sum of a plurality of said variances; and means for identifying the character pitch based on the variance evaluation value and generating the character line image. A character separating device comprising means for separating into one character.