JPH0244486A

JPH0244486A - Document picture processing method

Info

Publication number: JPH0244486A
Application number: JP63195884A
Authority: JP
Inventors: Sueji Miyahara; 末治宮原; Teruo Akiyama; 秋山　照雄
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1988-08-05
Filing date: 1988-08-05
Publication date: 1990-02-14
Anticipated expiration: 2012-10-29
Also published as: JP2669661B2

Abstract

PURPOSE:To detect the inclination of the whole of a document with high accuracy even in the document having a few amount of videotex or the videotex of multi-size by detecting the inclination of a videotex string by checking position relation among videotex areas. CONSTITUTION:A videotex area detecting mechanism 2 extracts the videotex area by detecting continuous black image elements and compulsorily divided black image elements or features on which those image elements are projected, and a videotex area selecting mechanism 3 extracts a videotex area group having similar nature. Next, a videotex string detecting mechanism 4 performs a videotex string detection processing or a videotex area detection processing from a selected videotex area or the position relation of an original videotex area. And a videotex string processing mechanism 5 connects feature points for detecting the inclination of the videotex area having similar features (vertical width, vertical and horizontal width, arrangement and change point of black image element, or identification result of rectangular pattern, etc.) for a detected videotex string with a straight line, and detects the inclination by assuming the inclination of the straight line as that of the document. In such a way, it is possible to find the inclination of a document image with a few amount of character strings with high accuracy.

Description

【発明の詳細な説明】（１）発明の属する技術分野本発明は文書読取装置や文書ファイリング装置などにお
いて１文書画像中の文字図形列検出２文字列の並び方向
（Ｎ方向か横方向かなど）の検出。DETAILED DESCRIPTION OF THE INVENTION (1) Technical field to which the invention pertains The present invention relates to detection of character/figure strings in a document image in document reading devices, document filing devices, etc. ) detection.

同−文字図形列区間の検出２文字図形列の傾き検出など
の処理を間遠にしてかつ、精度よく実行する文書画像処
理方法に関するものである。The present invention relates to a document image processing method that performs processes such as detecting the inclination of a two-character/graphic string section at a distance and with high precision.

（２）従来の技術文書画像から文字列を検出する方法としては。(2) Conventional technology How to detect character strings from document images.

従来技術として“書式情報によらない紙面構成要素抽出
法”信学論（Ｄ）　Ｖｏｌ、Ｊ６６−Ｄ、　ＮＯ，１（
１９８３）〔文献１〕がある。この方法は文書画像中の
黒画素の射影を求め、その射影関数の周期性から文字列
や文書の傾きを検出するものである。この方法は１文書
中に文字列が多数存在する場合には良好な結果が得られ
るが５文字列が少ない場合や０文書中の文字列の位置が
段組みによって変動している場合などでは十分な精度を
得ることが困難であった。As a prior art, “Paper component extraction method that does not depend on format information” IEICE Theory (D) Vol. J66-D, No. 1 (
1983) [Reference 1]. This method calculates the projection of black pixels in a document image, and detects the inclination of a character string or document from the periodicity of the projection function. This method gives good results when there are many character strings in one document, but it is not sufficient when there are only a few character strings or when the position of character strings in a document changes depending on the number of columns. It was difficult to obtain accurate accuracy.

文書画像の傾きを検出する方法としては、従来技術とし
て“文書画像の傾き補正のための一方式°°、信学論（
Ｄ）　Ｖｏｌ、Ｊ６９−Ｄ、　Ｎｏ、１１（１９８６）
　（文献２〕や、“英文文書の傾き検出及び単語切り出
しに関する検討”、昭和６２年度信学総全大、　Ｎｏ。As a method for detecting the skew of a document image, there is a conventional technique known as ``A method for correcting the skew of a document image''.
D) Vol, J69-D, No. 11 (1986)
(Reference 2), “Study on skew detection and word segmentation of English documents”, 1985 IEICE National Conference, No.

１５１０（１９８７）　　（文献３〕などがある。1510 (1987) (Reference 3).

文献２の方法は文書画像上の黒画素の輪郭抽出からその
外接矩形を求める処理を行った後に１文書画像を一定の
角度で回転させなから５外接矩形の特徴間（たとえば底
辺の長さ）を一定方向に射影し、射影が急峻なピークを
示す角度を文書の傾きとして検出する手法であり、入力
した文書画像そのものを用いるのでなく、外接矩形の特
徴間を用いることによって処理の高速化を図った方法で
ある。この方法は角度分解能を小さくとれば文書の傾き
を精度よく検出できるが、角度分解能を小さくとると処
理量が増大し、処理速度が低下すると云う欠点がある。The method in Document 2 involves extracting the contours of black pixels on a document image to obtain its circumscribed rectangle, and then rotating the document image at a fixed angle to determine the characteristics of the circumscribed rectangle (for example, the length of the base). is projected in a certain direction, and the angle at which the projection shows a steep peak is detected as the inclination of the document.It speeds up processing by using the features of the circumscribed rectangle rather than using the input document image itself. This is the method that was planned. This method can accurately detect the inclination of a document if the angular resolution is small, but it has the disadvantage that a small angular resolution increases the amount of processing and reduces the processing speed.

一方１文献３の方法は文書画像上の黒画素の塊から外接
矩形を求める処理を行った後に、大きさの等しい矩形を
検出し、注目する矩形とその上下左右に存在する隣接矩
形とを結ぶことによって局所的な文字同士の方向を求め
る処理を２文書画像の本文領域の全面にわたって行ない
、検出された個々の傾きの平均値を文書の傾きとみなし
て検出するもので、矩形が多数存在する場合には有効で
あるが、矩形が少ない場合は傾き検出精度が低下すると
云う欠点がある。On the other hand, the method in Document 1 calculates a circumscribed rectangle from a block of black pixels on a document image, then detects rectangles of equal size, and connects the rectangle of interest with adjacent rectangles that exist above, below, to the left, and to the right. The process of determining the local direction of characters is performed over the entire text area of two document images, and the average value of the detected individual inclinations is regarded as the inclination of the document. Although this method is effective in some cases, it has the disadvantage that inclination detection accuracy decreases if the number of rectangles is small.

以上示したように、これらの方法は、処理の対象として
いる文書に外接矩形や射影として抽出される文字図形が
多数存在することを前提としており２図や画像の領域が
大きく文字列の少ない文書や、マルチサイズの文字から
成る文書に対しては十分な傾き検出精度が得にくいと云
う欠点があった。As shown above, these methods assume that the document being processed has a large number of character figures that can be extracted as circumscribed rectangles or projections. This method has the disadvantage that it is difficult to obtain sufficient tilt detection accuracy for documents consisting of characters of multiple sizes.

（３）発明の目的本発明の目的は２文書画像処理において２文書中に文字
が多数存在することを前提に考案された従来方法の問題
点を解決し２文字図形が多数存在する文書だけでなく２
文字図形が少ない文書や。(3) Purpose of the Invention The purpose of the present invention is to solve the problems of conventional methods devised on the premise that there are many characters in two documents in two-document image processing, and to Naku 2
Documents with few characters and figures.

マルチサイズの文字図形が含まれる文書に対しても２文
字図形列の区間やその方向、あるいは文書全体の傾きを
精度よく検出する方法を提供することにある。It is an object of the present invention to provide a method for accurately detecting the interval of a two-character graphic string, its direction, or the inclination of the entire document even for a document containing multi-sized character graphics.

（４）発明の構成（４−１）発明の特徴と従来技術との差本発明は２文書
上のある限定された区間の文字列や図形列に注目すれば
１文書が傾いていても。(4) Structure of the Invention (4-1) Differences between Features of the Invention and the Prior Art The present invention can be applied even if one document is tilted by focusing on a character string or graphic string in a certain limited section on two documents.

はぼ大きさの等しい文字や図形が特定の方向（例えば縦
方向、あるいは横方向）に並んでいて、その相互の関係
が保存されていることに注目し３文書画像上の黒画素の
塊、あるいは周辺分布から文字図形列、あるいは文字図
形列領域を予測・抽出し５文字図形列内あるいは文字図
形列領域内の文字図形の位置の相互関係から文書画像の
文字図形列区間と傾きとを検出することを最も主要な特
徴とする。そのため従来技術として取り上げた文書中の
黒画素の射影から文字列や文書の傾きを求める文献１の
方法や、外接矩形の統計的性質のみを用いて文書の傾き
を求める文献２の方法や、注目する文字の上下方向の隣
接矩形の位置から求まる局所的な傾きを統計的に処理し
て文書全体の傾きを求める文献３の方法とは１文書中の
図形を選択的に用いて処理する点で異なる。We focused on the fact that characters and figures of equal size are lined up in a specific direction (e.g., vertically or horizontally), and that their mutual relationships are preserved. Alternatively, predict and extract a character/figure string or a character/figure string area from the peripheral distribution, and detect the text/figure string interval and slope of the document image from the mutual relationship of the positions of the character/figures within the 5 character/figure string or within the character/figure string area. The most important feature is that Therefore, the method of Reference 1 which calculates the inclination of a character string or document from the projection of black pixels in the document, which was taken up as prior art, the method of Reference 2 which calculates the inclination of a document using only the statistical properties of the circumscribed rectangle, and The method of Reference 3, which calculates the inclination of the entire document by statistically processing the local inclination found from the positions of vertically adjacent rectangles of the characters in the document, is that it processes by selectively using figures in one document. different.

（４−２）実施例〔実施例１〕第１図は本発明の処理方法を説明する構成図であって、
ｌは文書画像の入力端子、２は連続する黒画素および強
制分割した黒画素あるいはそれらを射影した特徴を検出
し２文字図形領域を抽出する文字図形領域検出機構、３
は文字図形領域選択機構、４は文字図形列および文字図
形列領域を検出する文字図形列検出機構、５は文字図形
列処理機構、６は出力端子、７は制御部である。(4-2) Example [Example 1] FIG. 1 is a block diagram illustrating the processing method of the present invention,
1 is an input terminal for a document image; 2 is a character/figure area detection mechanism that detects continuous black pixels, forcibly divided black pixels, or features projected therefrom, and extracts a 2-character/figure area; 3;
Reference numeral 4 designates a character/graphic area selection mechanism, 4 a character/graphic string detection mechanism for detecting a character/graphic string and a character/graphic string area, 5 a character/graphic string processing mechanism, 6 an output terminal, and 7 a control section.

入力端子１から白、黒などのように「０」と「ｌ」とで
表現された２値から成る文書画像データを入力し３文字
図形領域検出機構２では、入力された文書画像データに
おいて、（イ）黒画素の連続性の検出処理の結果にもと
づいて非連続部分を囲い〔第２図（ａ）の場合〕、ある
いは、（ロ）黒画素の連続性の検出において黒画素に非
所望に途切れがない場合に強制分割処理を行って囲いを
つくり（一定間隔で区切り〔第２図（ｂ）　（Ｃ）の場
合〕や周辺分布の変化を勘案して区切り〔第２図（ｄ）
の場合〕による分割）から黒画素の塊を囲む文字図形領
域Ｍ？を検出し、その存在位置情報、すなわち上辺。Document image data consisting of binary values expressed as "0" and "l", such as white and black, is input from the input terminal 1, and the 3-character graphic area detection mechanism 2 detects the following in the input document image data: (b) Enclose discontinuous parts based on the results of black pixel continuity detection processing [in the case of Fig. 2 (a)], or (b) undesired black pixels in the black pixel continuity detection process. If there is no discontinuity, forced division is performed to create enclosures (separated at regular intervals [in the case of Figure 2 (b) (C)]) or partitioned taking into account changes in the peripheral distribution [Figure 2 (d)]
) from the character/figure area M? that surrounds the block of black pixels. Detects its existence position information, that is, the upper side.

底辺、左辺、右辺の座標Ｙ　ｉｌ＋　　Ｙ　ｉＺ＋　　
ｘｉｌ＋　　ｘｉＺを求め１次の処理機構に送出する。Coordinates of base, left side, right side Y il+ Y iZ+
xil+xiZ is determined and sent to the primary processing mechanism.

なお、外接矩形Ｍ？の代りに短区間の周辺分布Ｂ？（Ｓ
ＪＩ域あるいは位置情報として横方向のアドレスＶｒ１
．　　Ｙｒｚ。Furthermore, the circumscribed rectangle M? Instead of short interval marginal distribution B? (S
Horizontal address Vr1 as JI area or position information
．． Yrz.

あるいは縦方向のアドレスＸｉｌ＋　　ｘｉＺを求める
ことを意味する）を求めても後続処理を近似的に実現で
きる。Alternatively, the subsequent processing can be approximately realized by determining the vertical address (Xil+xiZ).

次に文字図形領域選択機構３では、性質の近い文字図形
領域群Ｍ：を抽出する。Next, the character/graphic area selection mechanism 3 extracts a group of character/graphic areas M: having similar properties.

この文字図形領域群Ｍ１は１例えば１文字図形領域Ｍ？
の位置情報から各文字図形領域の縦幅ＭＴ正と横幅ＭＷ
、とのヒストグラムＨ（ｙ）、　　Ｈ（ｘ）を算出し、
そこから求めた文字図形領域の縦幅（あるいは横幅）の
代表値ＭＴ　（ＭＷ）にほぼ等しい縦幅ＭＴｉ　　（横
幅ＭＷ、）を持つ文字図形領域を選択しそれらの文字図
形領域をまとめて群とじてとらえることによって得られ
る（式（１）、　（１）’）。線分のような図形、ある
いは個別の文字が連結しているような場合には、一定間
隔で強制分割した文字図形領域に対して求めた特徴を使
用する。This character/graphic area group M1 is 1, for example, 1 character/graphic area M?
From the position information, determine the vertical width MT and horizontal width MW of each character/figure area.
, calculate the histograms H(y) and H(x) of
From there, select a character/graphic area with a vertical width MTi (horizontal width MW) that is approximately equal to the representative value MT (MW) of the vertical width (or horizontal width) of the character/graphic area found there, and group-stitch those character/graphic areas together. (Equations (1), (1)'). In the case of a figure such as a line segment or a case where individual characters are connected, the characteristics obtained for the character figure area that is forcibly divided at regular intervals are used.

縮幅用：ＭＴ＊（１−α）−β＜ＭＴ逼　く肘＊（１＋α）＋β
横幅用：Ｍ匈＊（１−α）−β〈門−、＜ＭＷ＊（１＋α）＋β
−（１）　　’ α、βはデータの画像分解能で決定されるが、８画素／
　ｍｍの場合αは０．０５〜０．１０．　　βは１〜２
を選べばよい。なお２文字図形領域の選択においてヒス
トグラムＨ（Ｚ　、　　Ｈ（ｘ）の代わりに、ヒストグ
ラムに矩形の幅を乗じた矩形占有関数Ｇ（ｙ）、Ｇ（ｘ
）を用いれば１文字図形領域の大きさごとの文書画像中
の占有面積にほぼ比例した値が求まり２文字列の傾きな
どを検出するのに適した文字図形領域を求めることがで
きる。第３図（ａ）は約５度傾いた文書画像データを示
し、第３図０））は第３図（ａ）に示す文書画像データ
において５例えば図示「株」１式、　　１．−、−ｒ工
、「ヌ、ｒ−」　ｒアＪ　「イ」の如く各塊を矩形で囲
った文字図形領域Ｍ９などの縮幅についてとったヒスト
グラムＨ（ｙ）とその矩形占有関数Ｇ（ｙ）とを示した
ものである。第３図（ｂ）から縦幅ＭＴ、、ＭＴ、、Ｍ
Ｔ、をもつ文字図形領域が多く存在することが判る。For narrowing width: MT*(1-α)-β<MT elbow*(1+α)+β
For width: M*(1-α)-β<gate-,<MW*(1+α)+β
-(1) 'α and β are determined by the image resolution of the data, but 8 pixels/
In the case of mm, α is 0.05 to 0.10. β is 1-2
All you have to do is choose. In addition, in selecting the two-character graphic area, instead of the histogram H(Z, H(x), the rectangle occupancy functions G(y), G(x
), a value approximately proportional to the area occupied in the document image for each size of one character/graphic area can be determined, and a character/graphic area suitable for detecting the inclination of two character strings can be determined. FIG. 3(a) shows the document image data tilted by about 5 degrees, and FIG. 3(0)) shows the document image data shown in FIG. 3(a). -, -r 工, ``nu, r-'' r Ａ J Histogram H(y) taken for the reduced width of a character/figure area M9 such as ``i'' where each block is surrounded by a rectangle, and its rectangle occupancy function G( y). From Fig. 3(b), the vertical width MT, MT, ,M
It can be seen that there are many character/graphic areas with T.

次に文字図形列検出機構４では１選択された文字図形領
域Ｍｉ、あるいは元の文字図形領域Ｍ？の位置関係から
文字図形列検出、あるいは文字図形列領域検出の処理を
行なう、なお１文字図形列検出処理は横方向の処理と縦
方向の処理とが同一の処理なので、横方向（縦幅）の処
理について述べるが、縦方向の情報を使用するときには
記号の添字を区別することによって表記する。Next, the character/graphic string detection mechanism 4 selects the selected character/graphic area Mi or the original character/graphic area M? Character/graphic string detection or character/graphic string area detection processing is performed based on the positional relationship of We will discuss the processing of , but when using information in the vertical direction, it is expressed by distinguishing the subscripts of the symbols.

第４図は文字図形列検出の様子を示すもので注目する文
字図形領域Ｍｉ（あるいはＭ？）が第り行目の文字図形
列に存在する場合を示しており。FIG. 4 shows how character and graphic strings are detected, and shows a case where the character and graphic region Mi (or M?) of interest exists in the character and graphic string of the second row.

文字図形列の抽出過程を示している。この図が示すよう
に、注目する文字図形領域Ｍ１　（あるいはＭ　？　）
を文書画像上の横方向に射影し、射影が重なる文字図形
領域の中で注目する文字図形領域に最も近い矩形を検出
し、右側に存在すればその文字図形領域をＭｌ−＋Ｃあ
るいはＭ？−＋）とし。It shows the process of extracting a character/figure string. As this figure shows, the character/figure area of interest M1 (or M?)
is projected in the horizontal direction on the document image, the rectangle closest to the character and graphic area of interest is detected among the character and graphic areas where the projections overlap, and if it exists on the right side, the character and graphic area is converted to Ml-+C or M? -+).

次に注目する文字図形領域をＭｌ。２（あるいはＭ？、
Ｚ）に移し、前記と同様にして右方向への射影をとり、
最も近い文字図形領域を検出する処理を繰り返す（以後
、この処理を伝播処理と呼ぶ）。Next, the character/figure area to be focused on is Ml. 2 (or M?
Z), take the projection to the right in the same way as above,
The process of detecting the closest character/graphic area is repeated (hereinafter, this process will be referred to as propagation process).

文字図形領域Ｍ＋（あるいはＭ’ｔ）の左側についても
右側と同様な伝播処理を繰り返す。このようにして、第
り行分の文字図形列を検出することができる。The same propagation process as for the right side is repeated for the left side of the character/graphic area M+ (or M't). In this way, the character/graphic string for the second row can be detected.

次に１文字図形列処理機構５では（イ）（頃き検出の処理は、検出された文字図形列に対
して等しい特徴（縦幅、縦幅と横幅、黒画素の配置や変
化点、あるいは矩形パターンの識別結果など）を持つ文
字図形領域の傾き検出用特徴点（■文字図形領域の中心
点のアドレス、■底辺の中心点のアドレス、■上辺の中
心点のアドレスなど）同士を直線で結び、直線の傾きθ
？を文書の傾きと見なして検出する。傾きの求め方は同
−文字図形列において限定された区間内で最も距離が遠
く、かつ等しい特徴を持つ文字図形領域同士において傾
き検出用の特徴点を直線で結んだ傾きθＱや８第５図に
示すように同一文字列でありかつ等しい特徴を持つ文字
図形領域同士において。Next, the one character/figure string processing mechanism 5 performs (a) (circle detection processing) on the detected character/figure string with equal characteristics (vertical width, vertical width and width, arrangement of black pixels, changing points, (rectangular pattern identification results, etc.) for tilt detection of text/figure areas (■Address of the center point of the text/figure area, ■Address of the center point of the bottom side, ■Address of the center point of the top side, etc.) with a straight line. Connect, the slope of the straight line θ
? is detected by regarding it as the skew of the document. The method for determining the slope is the slope θQ, which is obtained by connecting feature points for slope detection with a straight line between character and graphic regions that are farthest within a limited section in a character and graphic string and have the same characteristics. As shown in , between character and graphic areas that are the same character string and have the same characteristics.

傾き検出用の特徴点同士の傾きθ？を求め、その値の平
均値θ１の分散の小さいもの、あるいは傾き値のヒスト
グラムのピークの象、峻なものを求めその傾きを文書画
像の傾きθ、とみなして検出する　などが考えられる。Tilt θ between feature points for tilt detection? It is conceivable to find a value with a small variance of the average value θ1, or a peak peak in a histogram of slope values, and detect the slope by regarding it as the slope θ of the document image.

また２文字図形列のつながりを直線で表現したときに生
じる傾き検出用特徴点のばらつき（２乗誤差など）が最
小となる基準線を最小２乗近似によって求め、その基準
線の方向を文字図形列の並び方向すなわち文字図形列の
傾きθ、（θＸ）とすることもできる。In addition, the reference line that minimizes the variation (squared error, etc.) of feature points for tilt detection that occurs when the connection between two character and figure strings is expressed as a straight line is determined by least squares approximation, and the direction of the reference line is determined by the character and figure. It is also possible to set the direction of arrangement of the columns, that is, the inclination θ, (θX) of the character/figure column.

（ロ）文字図形列の並びを検出する処理は、複数方向（
例えば縦と横）で求めた文字図形列の傾きθアの分散や
、ヒストグラムのピークの急峻さを比較し１分散が小さ
い方あるいはピークの鋭い方を文字図形列の並び方向と
する。あるいは、注目する文字図形領域が形成する文字
列において、一定区間内における文字図形領域の出現数
を求め、出現数の大きい方を文字図形列の並び方向とす
ることもできる。(b) The process of detecting the arrangement of character/figure strings is performed in multiple directions (
For example, the dispersion of the inclination θa of the character/figure string determined by the vertical and horizontal directions and the steepness of the peak of the histogram are compared, and the one with the smaller 1 variance or the sharper peak is determined as the alignment direction of the character/figure string. Alternatively, in a character string formed by the character/graphic area of interest, the number of occurrences of the character/graphic area within a certain interval may be determined, and the direction in which the character/graphic string is arranged may be set to the larger number of appearances.

（＋１）　（［＋）項の処理で選択された文字列の傾き
、あるいは複数の文字列の（頃きの平均などを文書の傾
きθとみなして出力端子６に出力する。(+1) The inclination of the character string selected in the process of the ([+) term, or the average of the (approximate times) of a plurality of character strings is regarded as the inclination θ of the document, and is output to the output terminal 6.

第６図は、このようにして第３図（ａ）図示の文書画像
についての傾きを検出し、その傾き値を用いて、傾きを
補正した結果を示す。FIG. 6 shows the result of detecting the tilt of the document image shown in FIG. 3(a) in this way and correcting the tilt using the detected tilt value.

制御部７は各処理機構において、どの処理を選択したの
かを、伝達する役割をする。The control unit 7 plays the role of transmitting which process has been selected in each processing mechanism.

本発明は、上記のような処理を用いることによって１文
字列の多い文書画像だけでなく２文字列の少ない文書画
像に対しても精度よく、かつ高速にその傾きを求めるこ
とができるようになるので。By using the above-described processing, the present invention can accurately and quickly determine the slope not only for document images with many one character strings but also for document images with few two character strings. So.

傾いた文書画像の高速、高精度の書式認識や１文書ファ
イリングにおける品質のよい画像の蓄積に効果を発渾す
る。It is effective for high-speed, high-precision format recognition of tilted document images and for accumulating high-quality images when filing a single document.

〔実施例２〕第７図は実施例１において文字図形領域選択機構３を除
くとともに９文字図形列検出機構４の処理を第８図に示
すように文字図形列領域に限定したり、注目する文字図
形領域の予測を下記の方法で行うようにした実施例を示
している。すなわち文字図形列検出機構４は文字図形領
域Ｍ？（あるいはＭｔ　）の存在個数を第８図図示の如
く文書上の横方向に計数して射影関数Ｆ（ｙ）を求め、
Ｆ（ｙ）の最大値（掻大値）を文字図形列の中心位ＩＰ
、とし、Ｐ、±ｒ＊ＭＴ＋の区間を図示の如く「文字図
形列領域」とするもので５傾き検出の処理は注目する文
字図形領域Ｍ＋（Ｍ？）から一定距離だけ隔だてた一定
区間（δ＊ＭＴＬ　）内に存在する複数の対応文字図形
領域群とにおいて傾き検出用の特徴点同士を互いに直線
で結び、複数の傾きθ？を求め９傾きθ？のヒストグラ
ムを求め最も大きな値を検出して文書画像の傾き候補θ
ｈとする。[Embodiment 2] FIG. 7 shows a modification of Embodiment 1 in which the character/graphic area selection mechanism 3 is removed and the processing of the 9 character/graphic string detection mechanism 4 is limited to or focused on the character/graphic string area as shown in FIG. An example is shown in which prediction of a character/graphic area is performed in the following manner. That is, the character/graphic string detection mechanism 4 detects the character/graphic area M? (or Mt) in the horizontal direction on the document as shown in FIG. 8 to obtain the projection function F(y),
The maximum value (large value) of F(y) is set at the center IP of the character/figure string.
, and the interval P, ±r*MT+ is defined as the "character/figure string area" as shown in the figure. Feature points for inclination detection are connected with straight lines in a plurality of corresponding character/figure area groups existing within the interval (δ*MTL), and a plurality of inclinations θ? Find 9 slope θ? The histogram of the document image is calculated, the largest value is detected, and the tilt candidate
Let it be h.

文書画像の傾きとしては特徴点の種別ごと（■文字図形
領域の中心点のアドレス、■底辺の中心点のアドレス、
■上辺の中心点のアドレスなど）の傾きの中からバラツ
キが最も小さいものを選んで文書画像の傾きθとする。The inclination of the document image is calculated according to the type of feature point (■Address of the center point of the character/figure area, ■Address of the center point of the base,
(2) The one with the smallest variation is selected from among the tilts (such as the address of the center point of the upper side) and is set as the tilt θ of the document image.

この方法においては処理の一部を省略、あるいは簡略化
しているので処理の高速化、処理規模の小形化が可能に
なるなどの効果がある。また、黒画素の塊の射影に重な
りがないような文字図形列においても、その傾きを検出
することができる。Since this method omits or simplifies a part of the processing, it has effects such as speeding up the processing and making it possible to reduce the scale of the processing. Furthermore, the inclination can be detected even in a string of characters and figures in which the projections of blocks of black pixels do not overlap.

〔実施例３〕第９図は実施例１の文字図形側処理機構５の処理におい
て文字図形列検出を同−文字図形列検出検出に発展させ
た手法を説明する図である。なお図中に示した＊印は「
０」あるいは「１」のいずれかである。すなわち、注目
する文字図形領域Ｍ（（あるいはＭ　？　）から文字図
形列を求める処理までは同じ処理を行ない、傾きθ？を
求める際に方向別の傾きを求めて処理するものである。[Third Embodiment] FIG. 9 is a diagram illustrating a method in which the character/graphic string detection is developed into the same character/graphic string detection in the processing of the character/graphic side processing mechanism 5 of the first embodiment. Note that the * mark shown in the figure is “
Either "0" or "1". That is, the same processing is performed up to the process of determining a character/graphic string from the character/graphic region M (or M?) of interest, and when determining the inclination θ?, the inclination in each direction is determined and processed.

たとえば、傾きθ？を求める際に右側（前）方向と左側
（後）方向との傾きを角度で第１０図に示すように別々
に求めて表にし、　（１）注目文字図形領域Ｍ：　（あ
るいはＭ　？　）から右側方向の傾きＲθ？と左側方向
の傾きＬθ？とがθＴＨ，以上差があり、（ｉｉ）注目
文字図形領域の左側方向の傾き予測しθ？とその注目文
字図形領域の右側に隣接する文字図形領域Ｍｔ−１（あ
るいはＭ？−１）の右側方向の傾き予測がＲθ？、Ｉと
かθＴＨ，以下であればＭｔ（あるいはＭ？）とＭｌ、
１（あるいはＭ？、ｌ）の間で文字列が区切れているも
のと見なすことができる。For example, the slope θ? When calculating, calculate the angles of the inclinations in the right (front) direction and the left (rear) direction separately and tabulate them as shown in Figure 10. (1) From the character figure area of interest M: (or M?) Inclination Rθ in the right direction? and the leftward tilt Lθ? There is a difference of more than θTH, and (ii) predict the leftward inclination of the character/figure area of interest. The predicted inclination in the right direction of the character/graphic area Mt-1 (or M?-1) adjacent to the right side of the character/graphic area of interest is Rθ? , I or θTH, if it is less than Mt (or M?) and Ml,
It can be considered that the character string is separated between 1 (or M?, l).

この方法は文字図形列置間を検出して文書画像の傾きを
求めることから段組みや文字・図表・写真の混在によっ
て文字図形列がずれた場合や大小文字の混在によって文
字図形列の中心や文字図形列幅が変動した場合でも精度
よく文書の傾きを検出することができる。さらに、連続
した線分に本方法を適用すれば直線と曲線との区別がで
きる。This method detects the spacing between characters and figures to find the inclination of the document image, so if the character or figure is shifted due to columns or a mixture of characters, figures, tables, or photographs, or if the center of the character or figure is mixed due to a mixture of uppercase and lowercase letters, It is possible to accurately detect the skew of a document even when the character/figure column width changes. Furthermore, if this method is applied to continuous line segments, it is possible to distinguish between straight lines and curved lines.

このように本方法を従来の文書構造認識に加えることに
より、より精度の高い文書構造認識が可能となる。By adding this method to conventional document structure recognition in this way, more accurate document structure recognition becomes possible.

［実施例４〕第１１図及び第１２図は２周辺分布を文字図形列の検出
と傾き検出とに適用した方法を説明する図である。まず
１文書を高さｈのたんざく状の領域に分割しく第１１図
（ａ））、各々の周辺分布を黒画素の上下方向の投影に
よって求める（第１１図（ｂ）図示の半円状の塊は投影
を表している）。入力画像が多値の場合には、そのまま
画素濃度を累積すればよし・。次に２周辺分布の値が一
定の値ε以上、また幅がζ以上となる区間（第１１図（
ｂ）図示のｘ１〜Ｘ２）を検出し、それの投影成分とす
る。[Embodiment 4] FIGS. 11 and 12 are diagrams illustrating a method in which the two-periphery distribution is applied to character/graphic string detection and tilt detection. First, one document is divided into tanzaku-shaped regions of height h (Fig. 11(a)), and the peripheral distribution of each is determined by vertically projecting black pixels (Fig. 11(b)). The cluster represents a projection). If the input image is multivalued, you can simply accumulate the pixel density. Next, the section where the value of the two marginal distributions is greater than a certain value ε and the width is greater than or equal to ζ (see Figure 11)
b) Detect x1 to x2) shown in the figure and use it as its projection component.

ここで、ε及びこの値は除去すべき雑音の大きさに基づ
いて設定すればよい。この投影成分は文字列の位置をそ
のまま反映していると考えることができる。また２文書
中に図表などが含まれている場合には１図表の大きさが
文書中の文字の大きさよりも大きいことを利用して、投
影成分の幅が一定値η以下のものを文字列による投影成
分とじて選択してくればよい。ηの値は文字列幅に基づ
いて設定する必要がある。この値はパラメータとして予
め設定してもよいし、あるいは文書中の文字図形の個数
が多い場合には投影成分の幅のヒストグラムを求め、そ
の代表値を基に設定してもよい。Here, ε and its value may be set based on the magnitude of noise to be removed. This projected component can be considered to reflect the position of the character string as it is. In addition, if two documents include charts, etc., the size of one chart is larger than the size of characters in the document, so if the width of the projected component is less than a certain value η, the character string is All you have to do is select the projection component according to . The value of η needs to be set based on the string width. This value may be set in advance as a parameter, or if the number of characters and figures in the document is large, a histogram of the widths of the projected components may be obtained and the value may be set based on its representative value.

第１２図は第１１図に示した方法で求めた各投影成分に
ついてのクラスタリングを行い１文字図形列ごとに異な
ったラベルを割り付けるための過程を示したものである
。まず、第１２図において上下方向に見たときにお互い
に重なり合う各投影成分に同一のラベル■を付与する（
第１２図（ａ）ステップ１）。次に、同じ番号のラベル
を付与したものの中で、上下方向に見たときに重なりの
程度が少ないものには異なったラベル■を付与する（第
１２図イ）、ステップ２）。ここで同一のラベルを付与
された投影成分をそれぞれ文字図形列の候補とし、その
方向を求める。文字図形列の方向は、投影成分の中点付
近を通過する直線を最小２乗近似によって求めてもよい
し、あるいは隣接する投影成分間の方向を全ての投影成
分について求め　全体の平均を求めるなどの方法を用い
ることによって求めてもよい。また、実施例１の文字図
形列処理機構で述べた方法をとってもよい。同一のラベ
ルが付与された文字図形開会てに対してこの処理を行い
２文字図形列全体の方向の平均をとって文書の傾きを求
めることができる。次に、ステップ２で求めた傾きをも
とに、連続する投影成分の位置を予測し、その予測位置
から大きくはずれるものに新たなラベル■を付与する（
第１２図（Ｃ）、ステップ３）。ステップ２で述べたと
同様の処理によって文字図形列の方向を求め、ステップ
２で得られたよりもさらに正確な文書の傾きを求める。FIG. 12 shows the process of clustering each projection component obtained by the method shown in FIG. 11 and assigning a different label to each character/graphic string. First, in FIG. 12, the same label ■ is given to each projection component that overlaps with each other when viewed in the vertical direction (
FIG. 12(a) Step 1). Next, among those labeled with the same number, those with a small degree of overlap when viewed in the vertical direction are assigned a different label (2) (FIG. 12A), step 2). Here, each of the projected components given the same label is taken as a candidate for a character/figure sequence, and its direction is determined. The direction of the character/figure string may be determined by least squares approximation using a straight line passing near the midpoint of the projected components, or the direction between adjacent projected components may be determined for all projected components and the overall average determined. It may be determined by using the method described above. Alternatively, the method described in the character/graphic string processing mechanism of the first embodiment may be used. This process is performed on the openings of characters and graphics to which the same label is given, and the orientation of the entire two-character and graphics sequence is averaged to determine the inclination of the document. Next, based on the slope obtained in step 2, the positions of continuous projection components are predicted, and those that deviate significantly from the predicted positions are given a new label ■ (
FIG. 12(C), step 3). The direction of the character/figure string is determined by the same processing as described in step 2, and a more accurate inclination of the document than that obtained in step 2 is determined.

この処理によって１段ごとに位置の異なる文字列が存在
しても文書の傾きの検出は可能になる。ここで、同じラ
ベルを付与された投影成分はそのまま個々の文字列を反
映したものになっている。従って１文書の傾きの検出と
同時に２文書中の文字列の抽出もできたことになる。This processing makes it possible to detect the skew of a document even if there are character strings in different positions in each column. Here, the projection components given the same label reflect individual character strings as they are. Therefore, character strings in two documents can be extracted at the same time as detecting the tilt of one document.

以上述べたように、入力した文書の領域をたんざく状に
分割し、その各領域の中で求めた周辺分布を用いること
によって１図表が含まれたものや。As mentioned above, one figure or table can be included by dividing the area of the input document into tanzag shapes and using the marginal distribution found in each area.

段ごとに文字列の位置が異なる文書に対しても傾きの検
出や９文字列の抽出が可能になる。なお。Even in documents where the position of character strings differs from column to column, it is possible to detect the inclination and extract nine character strings. In addition.

ステップ１からステップ３に至る処理は必ずしも全てを
行う必要はなく１文書によってその一部分を省略するこ
とが可能である。また、ここでは文字列の方向が縦の場
合を例にとって説明したが文字列の方向が横の場合でも
全く同様の処理が可能であることは言うまでもない。It is not necessarily necessary to perform all of the processing from step 1 to step 3, and it is possible to omit a part of it depending on one document. Further, although the case where the character string is in the vertical direction has been described as an example, it goes without saying that the same processing is possible even in the case where the character string is in the horizontal direction.

第１３図は第１図図示構成図における要部についてのフ
ローチャートを示している。FIG. 13 shows a flowchart of the main parts in the configuration diagram shown in FIG. 1.

処理■において外接矩形が検出され、処理■においてヒ
ストグラムが算出され、処理■において極大値が算出さ
れ、処理■において第３図（ｂ）図示の如（極大値Ｍ　
Ｔ　１．　Ｍ　Ｔ　ｔ−−−−−が検出され、処理■に
おいて傾きが検出され、処理■において傾きの判定が行
われ、処理■において文書画像についての（頃きが検出
される。In process (2), a circumscribed rectangle is detected, in process (2) a histogram is calculated, in process (2) a maximum value is calculated, and in process (2), as shown in FIG. 3(b) (local maximum value M
T1. M T t------ is detected, the tilt is detected in process (2), the tilt is determined in process (2), and (the tilt) of the document image is detected in process (2).

第１４図は第１１図および第１２図に示した実施例に対
応する要部フローチャートを示している。FIG. 14 shows a main part flowchart corresponding to the embodiment shown in FIGS. 11 and 12.

処理■、■までにおいて第１１図（ａ）図示の如くたん
ざく状の領域に分割する。処理０．■までにおいて第１
１図（ｂ１図示の如く投影を得る。処理＠において第１
２図（ａ）図示の処理を行い、処理■において第１２図
（ｂ）の如くラベル付けを行う。処理［株］において平
均によって文書の仮の傾きを得る。In processes (1) and (2), the area is divided into tanzag-shaped areas as shown in FIG. 11(a). Processing 0. 1st in up to ■
Figure 1 (b1 Obtain the projection as shown in Figure 1. In the process @, the first
The process shown in FIG. 2(a) is performed, and in process (2), labeling is performed as shown in FIG. 12(b). Obtain the tentative tilt of the document by averaging in the process.

処理■において第１２図（Ｃ）の如くラベル付けを行う
。処理■において文書の傾きを得る。処理■において結
果が出力される。In process (2), labeling is performed as shown in FIG. 12(C). Obtain the tilt of the document in process (2). The result is output in process (■).

（５）発明の詳細な説明したように１本発明によれば文書画像の文字図形
列５文字図形列の並び方向（縦書方向や横方向）、同−
文字図形列置間、および文字図形列の傾きを高速に、し
かも精度よく検出できるから１文書画像の傾き補正や１
文書認識における文字図形列置式の認識１文字認識にお
ける文字図形列置間を用いた文字種識別などに効果があ
る。(5) As described in detail of the invention, according to the present invention, character/graphic strings of a document image, five character/graphic string alignment directions (vertical writing direction or horizontal direction),
The spacing between characters and figures, as well as the inclination of the characters and figures, can be detected quickly and accurately, so it is possible to correct the inclination of one document image or
Recognition of character/figure alignment in document recognition It is effective in character type identification using character/figure alignment in single character recognition.

[Brief explanation of the drawing]

第１図は本発明の処理方法を説明する構成図第２図は文
字図形領域の抽出手法の異なりを示す文字図形領域概念
図、第３図（ａ）（ｂ）は傾いた文書画像（ａ）とその
ヒストグラム及び矩形占有関数を示す図ｂ）、第４図は
文字図形列検出の概念図、第５図は傾き検出の概念図、
第６図は第３図（ａ）に示した文書画像の傾き補正結果
を示す図、第７図は本発明を簡易化した構成図、第８図
は文字図形列領域検出の概念図、第９図は同−文字図形
列検出検出の概念図、第１０図は方向別の傾き検出結果
図。第１１図は周辺分布を用いて投影成分を求める方法を示
す図、第１２図は投影成分のラベリングによる文字図形
列の検出とその傾き検出の原理を示す図、第１３図は第
１図図示構成図における要部についてのフローチャート
、第１４図は第１１図および第１２図に示した実施例に
対応する要部フローチャートを示す。図中において。１は入力端子は文字図形領域検出機構。は文字図形領域選択機構。は文字図形列検出機構。は文字図形列処理機構。は出力端子。は制御部である。Fig. 1 is a block diagram illustrating the processing method of the present invention. Fig. 2 is a conceptual diagram of text/graphic areas showing different extraction methods for text/graphic areas. Fig. 3 (a) and (b) are tilted document images (a ) and its histogram and rectangle occupancy function b), Figure 4 is a conceptual diagram of character/figure string detection, Figure 5 is a conceptual diagram of slope detection,
FIG. 6 is a diagram showing the result of tilt correction of the document image shown in FIG. 3(a), FIG. 7 is a simplified configuration diagram of the present invention, FIG. FIG. 9 is a conceptual diagram of character/figure string detection, and FIG. 10 is a diagram of tilt detection results for each direction. Fig. 11 is a diagram showing the method of determining the projection component using marginal distribution, Fig. 12 is a diagram showing the principle of detecting a character/figure sequence and its inclination by labeling the projection component, and Fig. 13 is the diagram shown in Fig. 1. FIG. 14 shows a flowchart of the main parts corresponding to the embodiment shown in FIGS. 11 and 12. In the figure. The input terminal of 1 is a character/figure area detection mechanism. is a text/figure area selection mechanism. is a character/figure string detection mechanism. is a character/figure string processing mechanism. is an output terminal. is the control section.

Claims

[Claims]

(1) In a document image processing device that detects and corrects the tilt of a document and imports the document image data input from the image input unit, the document image processing device detects and corrects the tilt of the document, and then detects and corrects the tilt of the document. A character/figure area detection mechanism that determines the area of a character/figure based on the position and size of a circumscribed rectangle of a partial block obtained by dividing the block, or the peripheral distribution obtained by projecting black pixels, and a character/figure area detection mechanism. size and frequency of appearance,
Alternatively, a character/graphic area selection mechanism that selects a character/graphic area to be used for inclination detection based on the characteristics or pattern recognition results within the character/graphic area, and the selected character/graphic area or all character/graphic areas that exist in the vicinity. A character/figure string detection mechanism detects the arrangement of character/figures based on the degree of overlap with the character/figure areas and their positional relationship, and a character/figure string detection mechanism detects the inclination of the character/figure string from the positional relationship of the character/figure areas that make up each character/figure string. 1. A document image processing method, comprising: a character/graphic string processing mechanism, and detects the inclination of the character/graphic string by checking the positional relationship between the character/graphic areas.

(2) Claim (1) characterized in that the character/graphic area selection mechanism is removed, and the character/graphic string area to be processed is limited in the character/graphic string detection mechanism, thereby simplifying the tilt detection process. Document image processing method described.

(3) In the character/graphic string processing mechanism, the inclinations between character/graphic areas are determined in multiple directions using a specific character/graphic area as a starting point, and the inclinations between the character/graphic area of interest and other character/graphic areas are determined by direction. Claim: 1. An identical character/figure string section is detected by detecting a change in the inclination of the character/figure string and a change in the deviation of the character/figure string from a reference line.
1) Document image processing method described.

(4) In the text/figure area detection mechanism, the input document image is divided into small regions and the marginal distribution for each area is determined, and in the text/figure area selection mechanism, the marginal distribution obtained by the text/figure area detection mechanism is The part corresponding to the character string is selected, and the character/graphic string detection mechanism extracts the character/graphic string based on the overlapping position of the peripheral distribution selected by the character/graphic area selection mechanism. 2. The document image processing method according to claim 1, wherein the inclination of the entire document or the direction of the document is determined based on the inclination of the character/graphic string obtained by the character/graphic string detecting mechanism.