JPH0433075B2

JPH0433075B2 -

Info

Publication number: JPH0433075B2
Application number: JP60110286A
Authority: JP
Inventors: Yoshitake Tsuji
Original assignee: Agency of Industrial Science and Technology
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 1985-05-24
Filing date: 1985-05-24
Publication date: 1992-06-02
Also published as: JPS61269778A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、文字行抽出装置に係わり、特にルビ
付き文字を含む文字行から所望の文字行を抽出す
る装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a character line extraction device, and more particularly to a device for extracting a desired character line from a character line including ruby characters.

（従来技術とその問題点）文庫本等の書籍などに印字された一般的な日本
語文章には、ところどころの文字にふりがなや傍
点（以下、ルビと呼ぶ）が付されていることがあ
る。このようにルビが付された文字を文字認識技
術を用いて読み取る場合、このルビは誤読又はリ
ジエクトの原因となる。このため、文字又は文字
行とルビとを分離する必要が生じる。このような
ルビを文字を分離する装置として、例えば、特公
昭58−8024号公報（特願昭53−127855号）に開示
されているように、１行分の文章のうち、ルビが
示されていない側の端部から所定幅の範囲の射影
情報の空白部を基にして分離する方法があるが、
所定幅を印字された文字の大きさによつて予め定
めて置く必要が生じる。しかし、例えば、書籍等
の比較的大きな文字である章題と前記章題に比べ
小さな本文文字行のように、大きさの異なる文字
が混在する場合のルビについては予め文字の大き
さを固定的に定めることができない。さらには、
画像入力装置の分解能などが原因して、文字行と
ルビとが互いに接触する場合などには、ルビを分
離することが困難となる。(Prior Art and its Problems) In general Japanese texts printed on books such as paperbacks, furigana and dots (hereinafter referred to as ruby) may be added to some characters. When characters with ruby attached in this manner are read using character recognition technology, the ruby may cause misreading or rejection. Therefore, it is necessary to separate characters or character lines from ruby. As a device for separating ruby characters, for example, as disclosed in Japanese Patent Publication No. 58-8024 (Japanese Patent Application No. 127855-1983), a device that separates ruby characters from one line of text is used. There is a method of separating based on the blank part of the projection information within a predetermined width from the edge of the side that is not
It is necessary to set a predetermined width in advance depending on the size of the printed characters. However, for ruby when characters of different sizes coexist, such as a chapter title with relatively large characters in a book and a main text line that is smaller than the chapter title, the font size is fixed in advance. cannot be determined. Furthermore,
If the character line and the ruby come into contact with each other due to the resolution of the image input device or the like, it becomes difficult to separate the ruby.

（発明の目的）本発明は、上記従来の欠点を解決するために為
されたものであり、文字行内の文字の並び方向に
射影し、黒画素数の分布を算出した場合、その分
布の平均位置は、ルビの存在に影響されにくいの
に対して、文字行の幅は、ルビによつて変化する
という性質に着目することによつて、上記従来の
欠点を解決したルビ分離装置を有する文字行抽出
装置を提供することにある。(Objective of the Invention) The present invention has been made to solve the above-mentioned drawbacks of the conventional art. A character with a ruby separation device that solves the above conventional drawbacks by focusing on the property that the position is not easily affected by the presence of ruby, but the width of the character line changes depending on the ruby. The object of the present invention is to provide a line extraction device.

（発明の構成）本発明によれば、光学的に走査量子化された文
章画像から抽出された文字行において、文字行を
重畳領域を有する複数個の部分領域に分け、文字
並び方向に走査して射影分布を検出する手段と、
複数個の部分領域の両端位置及び平均位置を射影
分布に従つて、検出する手段と、射影分布上の黒
画素数が一定値以下となる複数個の候補区間を検
出する手段と、該平均値及び両端位置を基にし
て、複数個の候補区間から部分文字行とルビ領域
との境界区間を算出する手段と、複数個の境界区
間から文字行とルビ行との境界領域を算出し、境
界領域内で、文字行の並び方向に得られる射影分
布上の黒画素数が最小となる位置として検出する
手段と、ルビ分離位置に従つて、所望の文字行を
抽出することを特徴とする文字行抽出装置を提供
することにある。(Structure of the Invention) According to the present invention, in a character line extracted from an optically scanned and quantized text image, the character line is divided into a plurality of partial areas having overlapping areas, and the character line is scanned in the character arrangement direction. means for detecting a projective distribution using
means for detecting both end positions and the average position of a plurality of partial regions according to a projection distribution; means for detecting a plurality of candidate sections in which the number of black pixels on the projection distribution is less than or equal to a certain value; and the average value. and a means for calculating a boundary area between a partial character line and a ruby area from a plurality of candidate areas based on the position of the character line and both end positions; A character characterized by: means for detecting a position where the number of black pixels on a projection distribution obtained in the direction of arrangement of character lines is minimum within a region; and extracting a desired character line according to a ruby separation position. The object of the present invention is to provide a line extraction device.

（実施例）以下、本発明について図面を参照しながら説明
する。第１図ｄ〜ｇは、一例として本発明におけ
るルビ付き文字を含む文字行から所望の文字行を
抽出する方法を説明するための図である。同図
ａ，ｂ，ｃ，ｆ，ｇはルビを含む文字行の一部を
示したものであり、同図ｄ，ｅはルビを含まない
文字行の一部である。また、同図ａは、ルビの一
部と所望の文字とに接触が生じている状態を表わ
している。尚、同図ａ及び同図ｄに示したような
文字行の抽出は、従来の公知の技術を用いて行う
ことができる。また、同図ｂ，ｅ，ｆ，ｇ文字行
内の文字の並び方向に走査することによつて得ら
れる黒画素数の分布、即ち射影分布を示してい
る。同図ａ及び図ｄにおける記号R₁，R₂は、例
えば文字行の幅情報を基にして設定した所定の大
きさを用いて、文字行を重畳領域を持つ複数個に
分割した際に得られる部分領域を示している。そ
こで、同図ａに示す部分領域R₁に対して同図ｂ、
R₂に対して同図ｃ、同図ｄに示すR₁に対して同
図ｅに示すような射影分布を抽出することができ
る。同図ｂ，ｃ，ｅ，ｆの射影分布に記された記
号B₁(l)，B₂(l)，Bi(l)は、それぞれの射影分布に
対して得られる左端位置を示し、同様に記号B₁
(U)，B₂(U)，Bi(U)は、右端位置を示している。更
に、記号μ₁，μ₂，μiは、それぞれの射影分布にお
ける平均位置を示している。そこで、同図ａ及び
同図ｄに示した部分領域R₁において、左端及び
右端位置B₁(l)，B₁(U)平均値μ₁との距離D_l及びD_U
を式D_l＝μ₁−B₁(l)，D_U＝B₁(U)−μ₁を用いて算出
した場合、ルビを含まない同図ｄの場合には、距
離D_lとD_Uはほぼ等しいと見なせる一方、ルビを
含む同図ａの場合には、距離D_Uは距離D_lよりも
大きくなる。これは、同図ａにおける平均値μ₁
が、ルビを含むか否かに影響を受けにくいためで
ある。そこで、ルビ行と所望の文字行との境界付
近を例えば、同図ｂで示した位置S₁＝μ₁＋D_lより
求めることができる。尚、同図ｄにおいて、同図
ａで示したようなルビ行と所望の文字行との境界
付近を同図ｄで示す平均値μ₁と上述した距離D_lを
用いて算出すると、上述した境界は、文字行の右
端を含むことになり、ルビが存在しないことが容
易に判明する。(Example) The present invention will be described below with reference to the drawings. FIGS. 1d to 1g are diagrams for explaining, as an example, a method of extracting a desired character line from character lines including ruby characters according to the present invention. Figures a, b, c, f, and g in the figure show part of character lines that include ruby, and d and e in the figure show part of character lines that do not include ruby. Further, a in FIG. 3 shows a state in which a part of ruby is in contact with a desired character. Incidentally, character lines as shown in FIGS. 1A and 2D can be extracted using conventional known techniques. It also shows the distribution of the number of black pixels, that is, the projection distribution, obtained by scanning in the direction of the arrangement of characters in character lines b, e, f, and g in the figure. Symbols R ₁ and R ₂ in Figures a and d are obtained when a character line is divided into multiple pieces with overlapping areas using, for example, a predetermined size set based on the width information of the character line. This shows the partial area that will be displayed. Therefore, for the partial region R ₁ shown in figure a, figure b,
It is possible to extract a projection distribution as shown in c for R ₂ and as shown in e in the same figure for R ₁ shown in c in the same figure and d in the same figure. The symbols B ₁ (l), B ₂ (l), and Bi(l) written on the projected distributions b, c, e, and f in the same figure indicate the left end position obtained for each projected distribution, and the same symbol B ₁
(U), B ₂ (U), and Bi(U) indicate the right end position. Further, the symbols μ ₁ , μ ₂ , μi indicate the average positions in the respective projection distributions. _Therefore _, _in the partial region R ₁ _shown in FIGS _.
is calculated using the formula D _l = μ ₁ − B ₁ (l), D _U = B ₁ (U) − μ _1. In the case of d in the same figure, which does not include ruby, the distance D _l and D _U can be considered to be almost equal, while in the case of a in the same figure that includes ruby, the distance D _U is larger than the distance D _l . This is the average value μ ₁ in a of the same figure
This is because it is not easily affected by whether or not ruby is included. Therefore, the vicinity of the boundary between the ruby line and the desired character line can be determined from, for example, the position S ₁ =μ ₁ +D _l shown in FIG. In addition, in Figure d, if the vicinity of the boundary between the ruby line and the desired character line as shown in Figure A is calculated using the average value μ ₁ shown in Figure d and the above-mentioned distance D _l , the above-mentioned The border will include the right edge of the character line, and it is easy to see that no ruby exists.

次に同図ｆに示した第ｉ番目（ｉ＝１，２，
３，…）の部分領域に対し、検出された射影分布
を用いて、ルビ行と所望の文字行との境界領域を
検出する方法について説明する。最初に、図中点
線に示すように、射影分布上の黒画素数が所定の
スライスレベルTs以下となる区間ａ，ｂ，ｃを
検出する。次に、上述した平均値μ₁及び距離D_lを
用いて上述した位置S₁（＝μ₁＋D_l）を算出し、位
置S₁から所定許容幅を有する境界候補区間を設定
した場合、前述した区間ａ，ｂ，ｃと境界候補区
間との論理積で示される領域を第ｉ番領域Riに
おける境界区間として求めることができる。例え
ば、同図ｆの場合には、区間ｂより得られ、図中
LiとUiで示す区間が境界区間として求まる。尚
図中、Liは境界区間の左端位置を表わし、Uiは
境界区間の右端位置を表わしている。また、上述
した平均値μiを基にして射影分布を算出する方法
は、上述した所定のスライスレベルTsを用いる
方法に限定されるものではない。 Next, the i-th (i=1, 2,
A method of detecting a boundary area between a ruby line and a desired character line using the detected projection distribution for the partial area (3, . . . ) will be described. First, as shown by the dotted line in the figure, sections a, b, and c in which the number of black pixels on the projection distribution is equal to or less than a predetermined slice level Ts are detected. Next, when the above-mentioned position S ₁ (=μ ₁ + D _l ) is calculated using the above-mentioned average value μ ₁ and distance D _l and a boundary candidate section having a predetermined allowable width is set from position S ₁ , the above-mentioned The area indicated by the logical product of the areas a, b, c and the boundary candidate area can be determined as the boundary area in the i-th area Ri. For example, in the case of figure f, it is obtained from section b, and in the figure
The interval indicated by Li and Ui is determined as the boundary interval. In the figure, Li represents the left end position of the boundary section, and Ui represents the right end position of the boundary section. Further, the method of calculating the projection distribution based on the average value μi described above is not limited to the method using the predetermined slice level Ts described above.

このようにして求められた複数個の部分領域に
対して検出された境界区間の左端位置Li及び右端
位置Ui（ｉ＝１，２，…）から、同図ｇで示した
ような文字行全体に対する境界区間の左端位置Ｌ
及びＵを容易に算出することができる。最後に文
字行全体に対する射影分布に対して、境界区間の
左端位置Ｌから右端位置Ｕまでのうち同図ｇの点
線矢印で示すように、射影分布上の黒画素数が最
も少ない位置をルビ行と文字行との分離位置とし
て求めることができる。尚、前述した分離位置の
算出時に用いる射影分布を文字行全体に対して再
度、検出しても良いし、処理速度の向上を目的と
して、部分領域の射影分布を累積することによつ
て代用しても良い。第２図は、本発明の具体的実
施例を示した論理ブロツク図である。図におい
て、１は画像メモリであり、画像メモリ１には、
書籍や帳票などの文書画像が、光学的に走査量子
化され画像情報として記憶される。２は、文字行
検出部であり、画像メモリ１に記憶された画像情
報から文字行を順次検出し、行情報記憶部３へ、
該文字行の位置や大きさなどの領域情報を格納す
る。尚、上述した文書画像を入力し、画像メモリ
１に記憶する画像入力装置や画像メモリ１に記憶
された画像情報から文字行を順次抽出する文字行
検出部２は、公知の技術を用いることにより実現
できる。部分領域算出部４は、行情報記憶部３に
格納された文字行を、第１図ａで示したように、
ｎ個（ｎ≧１）の部分領域Ri（ｉ＝１，２，…
ｎ）に分け、順次文字の並び方向（例えば、第１
図ａでは垂直方向）及び部分領域Riの位置・大
きさの各情報を射影分布抽出部５に転送した後、
射影分布抽出部５によつて得られる部分領域Ri
に対する射影分布を順次部分射影記憶部６に格納
する。射影分布抽出部５は、設定された領域に対
する設定された方向の射影分布を画像メモリ１を
走査読出しすることによつて抽出する。平均位置
算出部７は、部分射影記憶部６より順次転送され
る射影分布から第１図に説明した平均位置μiを算
出する。候補区間検出部８は、部分射影記憶部６
より順次転送される射影分布を第１図に説明した
ように、射影分布上の黒画素数が所定のスライス
レベルTs以下となる複数個の区間を求める。境
界区間算出部９は、部分領域におけるルビ行と所
望の文字行との境界区間Li，Uiを、第１図で説
明したように、平均位置算出部７の出力である平
均位置μi及び候補区間検出部８の出力である複数
個の区間ａ，ｂ，ｃ及び該部分領域の左端位置
Bi(l)と右端位置Bi(U)を基にして算出する。尚、
前述した部分領域の左端位置と右端位置は、部分
領域算出部４へ該射影分布を射影分布抽出部５か
ら転送された際に、部分領域算出部４において検
出され、境界区間算出部９へ転送されるとする。
次に、境界区間算出部９から順次出力される部分
領域Ri（ｉ＝１，２，…ｎ）の境界区間（第１図
ｆで示したように、境界区間の左端位置をLi、右
端位置をUiで示す）がそれぞれ、最小値算出部
１０及び最大値算出部１２へ転送される。即ち、
境界区間の左端位置Liが、最小値算出部１０へ、
境界区間の右端位置Ui値が最大算出部１２へ転
送される。１１は、左端記憶部であり、ルビと所
望の文字行との境界区間の左端位置Ｌを記憶す
る。尚、左端記憶部１１は、初期値として、非常
に大きな値がセツトされる。１３は、右端記憶部
であり、ルビと所望の文字行との境界区間の右端
位置Ｕを記憶する。尚、右端記憶部１３は、初期
値として０がセツトされる。最小値算出部１０に
おいて、境界区間算出部９より順次出力される部
分領域Ri（ｉ＝１，２，…ｎ）の境界区間の左端
位置Liと左端記憶部１１に格納された内容とを比
較し、境界区間算出部９の出力値である左端位置
Liが左端記憶部１１の内容より小さければ、該左
端位置Liを左端記憶部１１へ転送し、左端記憶部
１１の内容が更新される。最大値算出部１２にお
いて、境界区間算出部９より順次出力される部分
領域Ri（ｉ＝１，２，…ｎ）の境界区間の右端位
置Uiと右端記憶部１３に格納された内容とを比
較し、境界区間算出部９の出力値である右端位置
Uiが右端記憶部１３の内容より大きければ、該
右端位置Uiを右端記憶部１３へ転送し、右端記
憶部１３の内容が更新される。上記動作をｎ個の
部分領域Ri（ｉ＝１，２，…ｎ）について行うこ
とによつて、ルビと所望の文字行との境界区間の
左端位置Ｌ及び右端位置Ｕがそれぞれ、左端記憶
部１１及び右端記憶部１３に記憶されることにあ
る。ルビ分離位置判定部１４は、左端記憶部１１
及び右端記憶部１３の内容である左端位置Ｌ及び
右端位置Ｕと、該左端位置Ｌ及び右端位置Ｕの算
出対象となる文字行の領域情報を行情報記憶部３
から取り込み、該領域情報と文字行の文字並び方
向とを射影分布抽出部５へ転送し、射影分布抽出
部５より得られる該文字行の文字の並び方向の射
影分布とを用いて、ルビと所望の文字行とを分離
する位置（以下、ルビ分離位置）を算出する。即
ち、左端位置Ｌから右端位置Ｕで示される境界区
間において、文字行全体における射影分布上の黒
画素が最小となる位置をルビ分離位置として算出
される。尚、前述した説明では、文字行全体にお
ける射影分布を抽出するとしたが、文字行全体の
うち、左端位置Ｌから右端位置Ｕで示される境界
区間の射影分布を射影分布抽出部５によつて抽出
することによつて、射影分布算出の処理時間を短
縮することもできるし、更に、部分射影記憶部６
に記憶される部分領域Ri（ｉ＝１，２，…ｎ）の
射影分布を累積した射影分布を算出し、前述した
文字行全体の射影分布の代用として用いることも
できる。ルビ分離位置判定部１４により得られた
ルビ分離位置及び行情報記憶部３に格納された文
字行の領域情報に基づいて、画像メモリ１から該
文字行からルビ領域を除去した所望の文字行を抽
出し、従来技術によつて実現できる文字切出し装
置（図中省略）へ転送し、一文字毎に分離され
る。上述した動作は、行情報記憶部３に記憶され
すべての文字行に適用される。尚、行情報記憶部
３に記憶された文字行のうち、ルビを含まない文
字行の場合には、前述した文字行に対する複数個
の部分領域に対して、すべてルビと所望の文字行
との境界区間が検出されないか、または、検出さ
れた境界区間の右端位置Ｕに、文字行全体の右端
位置が含まれるようになるため、ルビを含まない
文字行に対して適用しても安定に所望の文字行を
抽出することができる。 From the left end position Li and right end position Ui (i=1, 2,...) of the boundary section detected for the plurality of partial regions obtained in this way, the entire character line as shown in g in the same figure is The left end position L of the boundary section for
and U can be easily calculated. Finally, for the projection distribution for the entire character line, from the left end position L to the right end position U of the boundary section, as shown by the dotted line arrow in g in the same figure, the position where the number of black pixels on the projection distribution is the smallest is set to the ruby line. It can be determined as the separation position between the character line and the character line. Note that the projection distribution used when calculating the separation position described above may be detected again for the entire character line, or, for the purpose of improving processing speed, it may be substituted by accumulating the projection distribution of partial areas. It's okay. FIG. 2 is a logic block diagram illustrating a specific embodiment of the present invention. In the figure, 1 is an image memory, and the image memory 1 includes:
Document images such as books and forms are optically scanned and quantized and stored as image information. 2 is a character line detection unit that sequentially detects character lines from the image information stored in the image memory 1 and stores them in the line information storage unit 3;
Stores area information such as the position and size of the character line. Note that the character line detecting unit 2 that inputs the document image described above and sequentially extracts character lines from the image input device that stores it in the image memory 1 and the image information stored in the image memory 1 can be configured by using a known technique. realizable. The partial area calculation unit 4 converts the character lines stored in the line information storage unit 3 into
n (n≧1) partial regions Ri (i=1, 2,...
n), sequentially in the character arrangement direction (for example, the first
After transmitting information on the vertical direction (in figure a) and the position and size of the partial region Ri to the projection distribution extraction unit 5,
The partial region Ri obtained by the projective distribution extraction unit 5
The projection distributions for the partial projections are sequentially stored in the partial projection storage unit 6. The projection distribution extraction unit 5 extracts the projection distribution in the set direction for the set area by scanning and reading out the image memory 1. The average position calculation unit 7 calculates the average position μi explained in FIG. 1 from the projection distributions sequentially transferred from the partial projection storage unit 6. The candidate section detecting section 8 includes a partial projection storage section 6
As explained in FIG. 1, a plurality of sections in which the number of black pixels on the projection distribution is less than or equal to a predetermined slice level Ts are determined for the projection distribution to be transferred in sequence. The boundary section calculation section 9 calculates the boundary sections Li, Ui between the ruby line and the desired character line in the partial area using the average position μi output from the average position calculation section 7 and the candidate section, as explained in FIG. A plurality of sections a, b, c which are the output of the detection unit 8 and the left end position of the partial area
Calculated based on Bi(l) and right end position Bi(U). still,
The left end position and right end position of the partial area described above are detected in the partial area calculation unit 4 when the projection distribution is transferred from the projection distribution extraction unit 5 to the partial area calculation unit 4, and are transferred to the boundary section calculation unit 9. Suppose that it is done.
Next, the boundary section of the partial area Ri (i=1, 2,...n) sequentially output from the boundary section calculation unit 9 (as shown in FIG. 1 f, the left end position of the boundary section is Li, the right end position (denoted by Ui) are transferred to the minimum value calculation unit 10 and maximum value calculation unit 12, respectively. That is,
The left end position Li of the boundary section is sent to the minimum value calculation unit 10,
The right end position Ui value of the boundary section is transferred to the maximum calculation unit 12. Reference numeral 11 denotes a left end storage section, which stores the left end position L of the boundary section between ruby and a desired character line. Note that the left end storage section 11 is set to a very large value as an initial value. Reference numeral 13 denotes a right end storage section, which stores the right end position U of the boundary section between ruby and a desired character line. Note that the right end storage section 13 is set to 0 as an initial value. The minimum value calculation section 10 compares the left end position Li of the boundary section of the partial area Ri (i=1, 2,...n) sequentially output from the boundary section calculation section 9 with the content stored in the left end storage section 11. and the left end position which is the output value of the boundary section calculation unit 9
If Li is smaller than the contents of the left end storage section 11, the left end position Li is transferred to the left end storage section 11, and the contents of the left end storage section 11 are updated. The maximum value calculation section 12 compares the right end position Ui of the boundary section of the partial area Ri (i=1, 2,...n) sequentially output from the boundary section calculation section 9 with the content stored in the right end storage section 13. and the right end position which is the output value of the boundary section calculation unit 9
If Ui is larger than the contents of the right end storage section 13, the right end position Ui is transferred to the right end storage section 13, and the contents of the right end storage section 13 are updated. By performing the above operation for n partial regions Ri (i=1, 2,...n), the left end position L and right end position U of the boundary section between ruby and a desired character line are respectively 11 and the right end storage section 13. The ruby separation position determination unit 14 includes the left edge storage unit 11
The left end position L and right end position U, which are the contents of the right end storage unit 13, and the area information of the character line for which the left end position L and right end position U are to be calculated are stored in the line information storage unit 3.
The region information and the character alignment direction of the character line are transferred to the projection distribution extraction unit 5, and the ruby and The position where the text is separated from the desired character line (hereinafter referred to as ruby separation position) is calculated. That is, in the boundary section indicated by the left end position L to the right end position U, the position where the number of black pixels on the projection distribution in the entire character line is the minimum is calculated as the ruby separation position. In the above explanation, it is assumed that the projective distribution of the entire character line is extracted, but the projective distribution extracting unit 5 extracts the projective distribution of the boundary section indicated from the left end position L to the right end position U of the entire character line. By doing so, the processing time for calculating the projection distribution can be shortened, and furthermore, the partial projection storage unit 6
It is also possible to calculate the cumulative projection distribution of the partial regions Ri (i=1, 2, . . . n) stored in , and use it as a substitute for the projection distribution of the entire character line described above. Based on the ruby separation position obtained by the ruby separation position determination unit 14 and the area information of the character line stored in the line information storage unit 3, a desired character line from which the ruby area has been removed from the character line is extracted from the image memory 1. The extracted characters are transferred to a character segmentation device (not shown in the figure) that can be realized using conventional technology, and separated into individual characters. The above-described operation is stored in the line information storage section 3 and applied to all character lines. Note that among the character lines stored in the line information storage unit 3, in the case of a character line that does not include ruby, all the combinations of ruby and the desired character line are Either the boundary section is not detected, or the right end position U of the detected boundary section includes the right end position of the entire character line, so it is stable and desirable even when applied to a character line that does not include ruby. It is possible to extract a line of characters.

（発明の効果）以上、説明したように、本発明のルビ付き文字
を含む文字行抽出装置によれば、予め文字の大き
さを固定的に定めることが困難な場合や所望の文
字行とルビとが互いに接触する場合にも安定にル
ビ行と文字行とを分離することが可能となる。(Effects of the Invention) As explained above, according to the character line extracting device including ruby characters of the present invention, it is possible to extract characters with ruby characters in cases where it is difficult to fix the character size in advance or when a desired character line and ruby characters are extracted. It becomes possible to stably separate the ruby line and the character line even when they touch each other.

[Brief explanation of drawings]

第１図ａ〜ｇは、一例として本発明のルビ付き
文字を含む文字行から所望の文字行を抽出する方
法を説明する図である。第２図は本発明の具体的
実施例を示す論理ブロツク図である。図におい
て、１は画像メモリ、２は文字行検出部、３は行
情報記憶部、４は部分領域算出部、５は射影分布
抽出部、６は部分射影記憶部、７は平均位置算出
部、８は候補区間検出部、９は境界区間算出部、
１０は最小値算出部、１１は左端記憶部、１２は
最大算出部、１３は右端記憶部、１４はルビ分離
位置判定部である。 FIGS. 1a to 1g are diagrams for explaining, as an example, a method of extracting a desired character line from character lines including ruby characters according to the present invention. FIG. 2 is a logic block diagram showing a specific embodiment of the present invention. In the figure, 1 is an image memory, 2 is a character line detection unit, 3 is a line information storage unit, 4 is a partial area calculation unit, 5 is a projection distribution extraction unit, 6 is a partial projection storage unit, 7 is an average position calculation unit, 8 is a candidate section detection section, 9 is a boundary section calculation section,
10 is a minimum value calculation section, 11 is a left end storage section, 12 is a maximum calculation section, 13 is a right end storage section, and 14 is a ruby separation position determination section.

Claims

[Claims]

1 Separate the ruby from the text line to which ruby has been added,
An apparatus for extracting a desired character line, comprising: means for extracting a character line from optically scanned and quantized text image information; and a means for extracting a character line from optically scanned and quantized text image information; means for extracting a projective distribution in the direction of arrangement of characters in the character line; means for detecting both end positions and an average position of a plurality of partial regions according to the projective distribution; and black pixels on the projective distribution. means for detecting a plurality of candidate sections whose number is equal to or less than a certain value; and means for calculating a boundary section between a partial character line and a ruby area from the plurality of candidate sections based on the end positions and the average position. Then, the boundary area between the character line and the ruby line is calculated from the boundary sections of the plurality of partial areas, and within the boundary area, the number of black pixels on the projection distribution obtained in the direction of the arrangement of characters in the character line is calculated. A character line extracting device comprising means for detecting a minimum ruby separation position, and extracting a desired character line according to the ruby separation position.