JP2569132B2

JP2569132B2 - How to determine special character lines

Info

Publication number: JP2569132B2
Application number: JP63180284A
Authority: JP
Inventors: 正年岡田; 收志吉田
Original assignee: Fuji Electric Co Ltd
Current assignee: Fuji Electric Co Ltd
Priority date: 1988-07-21
Filing date: 1988-07-21
Publication date: 1997-01-08
Anticipated expiration: 2012-01-08
Also published as: JPH0231286A

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、画像処理技術を利用してルビ・アンダー
ライン・傍点および傍線が含まれる文書中から、これら
の特殊文字行または列（単に、特殊行とも云う。）を、
行幅の狭い文字行として検出し、それが前後のどちらの
文字行に付随するものなのか、あるいはどちらにも属さ
ないノイズ行なのかを判別する判別方法に関する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The present invention uses an image processing technique to extract a special character line or column (simply, Special rows.)
The present invention relates to a method for detecting a character line having a narrow line width and determining whether the character line is attached to any one of the preceding and following character lines or a noise line that does not belong to any of the character lines.

[Conventional technology]

この種の検出判別方法として、出願人は先に各文字行
間のピツチを調べ、これをもとに得られる標準ピツチと
各文字行間のピツチとを比較してピツチの狭い文字行の
組を見つけ出し、２つの文字行の行幅をそれぞれ調べる
ことにより、一方の幅が文書中で標準とされている行幅
に比べて或る比率以下のときは、その幅の狭い文字行を
幅の広い方の文字行に付随するルビ・アンダーライン・
傍点または傍線からなる特殊行として検出判別する方法
を提案している（特願昭63-19595号参照）。As a detection and discrimination method of this kind, the applicant first examines the pitch between each character line, compares the standard pitch obtained based on this with the pitch between each character line, and finds a set of character lines with a narrow pitch. By examining the line widths of two character lines, if one of the widths is less than a certain ratio compared to the standard line width in the document, the narrower character line is Ruby underline attached to the character line
A method of detecting and determining a special line consisting of a side point or a side line has been proposed (see Japanese Patent Application No. 63-19595).

[Problems to be solved by the invention]

しかしながら、このような方法では文書中に特殊行の
含まれる比率が高くなると、特殊行があることによつて
生じる狭いピツチが標準ピツチ算出に与える影響が大き
くなり、得られた標準ピツチと、特殊行と通常行との間
の狭いピツチとの差異が小さくなり、特殊行の検出精度
が低下すると云う問題が残されている。However, in such a method, when the ratio of special lines included in a document is high, the influence of the narrow pitch caused by the presence of the special lines on the calculation of the standard pitch becomes large, and the obtained standard pitch and the special pitch are compared. There is a problem that the difference between the narrow pitch between the row and the normal row is reduced, and the detection accuracy of the special row is reduced.

例えば、極端な場合としてルビの振られた１行のみを
読ませようとしても、ピツチが１つしか存在しないの
で、その行に振られたルビの検出はできないことにな
る。For example, in an extreme case, even if it is attempted to read only one line with ruby, since there is only one pitch, ruby on that line cannot be detected.

したがつて、この発明は文書中に特殊行が含まれる割
合の多少にかゝわりなく、特殊行を安定かつ正確に検出
して、それが前後のどちらの文字行に付随するものなの
か、あるいはどちらにも属さないノイズ行なのかを判別
し得るようにすることを目的とする。Therefore, the present invention detects the special line stably and accurately regardless of the ratio of the special line included in the document, and determines whether the special line is attached to the preceding or following character line, or An object of the present invention is to be able to determine whether a noise row belongs to neither of them.

[Means for solving the problem]

上記目的達成のため、本発明では、文書を画像処理し
て文字行を切り出すとともにその各々の行方向の幅（行
幅）を求め、該行幅を行切出しの過程で抽出された標準
文字サイズから決まる第１のしきい値と比較して行幅の
狭い文字行を探し出した後、該行幅の狭い文字行をその
前，後の文字行とそれぞれ統合した２つの統合文字行に
つき、その各行幅を標準サイズから決まる第１のしきい
値とは異なる第２のしきい値とそれぞれ比較し、その比
較結果に基づき、行幅の狭い文字行が前，後の文字行の
いずれに付随する特殊文字行か、あるいはノイズ行かを
判別するようにした。In order to achieve the above object, according to the present invention, a character line is cut out by performing image processing on a document, a width (line width) in each line direction is obtained, and the line width is determined by a standard character size extracted in a line cutting process. After searching for a character line with a narrow line width by comparing with a first threshold value determined from the above, two integrated character lines obtained by integrating the character line with the narrow line width with the character lines before and after the character line are obtained. Each line width is compared with a second threshold value different from the first threshold value determined from the standard size, and based on the comparison result, a character line with a narrow line width is attached to either the preceding or following character line. It is now determined whether the line is a special character line or a noise line.

[Action]

文書中に特殊行が含まれる割合の多少に全く影響され
ずに特殊行を検出し、それが前後どちらの文字行に付随
するものなのか、あるいはどちらにも属さないノイズ行
なのかを判別することができる。Detects special lines without any influence on the percentage of special lines included in the document, and determines whether they are attached to the previous or next character line or a noise line that does not belong to either be able to.

〔Example〕

第１図はこの発明の実施例を示すフローチヤートであ
る。同図に示すように、このフローチヤートは下記
（１）〜（４）のステツプからなつている。FIG. 1 is a flow chart showing an embodiment of the present invention. As shown in the figure, this flowchart comprises the following steps (1) to (4).

（１）良く知られている画像処理技術を利用して、文書
中より文字行を切出す。このとき、その行方向の幅（行
幅）の最大のものまたは最頻値を、切出し領域内の標準
文字サイズとする。(1) A character line is cut out from a document using a well-known image processing technique. At this time, the largest or most frequent value in the line direction (line width) is set as the standard character size in the cutout area.

（２）切出された文字行の行幅Wiを求め、文書中で標準
とされる文字サイズから決定されるしきい値Th₁と各文
字行の行幅Wiを比較し、このしきい値以下である文字行
を特殊行の候補とする。(2) cut out to determine the line width Wi of character line, to compare the line width Wi of each character row with a threshold Th _1, which is determined from the character size that is standard in the document, this threshold The following character lines are considered as special line candidates.

（３）特殊行の候補とされた行に対して、その前の行お
よび後の行とそれぞれ結合して１つの行とみなした場合
の行（結合文字行）の行幅を求める。(3) The line width of a line (combined character line) in the case where a line that is a special line candidate is combined with the preceding and subsequent lines and regarded as one line is determined.

（４）得られた行幅（前の行と統合した行幅をW_B，後の
行と統合した行幅をW_Aとする）と、文書中で標準とされ
ている文字サイズとから得られるしきい値Th₂との大小
関係により、その特殊行候補がどのような行であるかを
以下のように判定する。(4) Obtained from the obtained line width (the line width integrated with the previous line is W _B , and the line width integrated with the subsequent line is W _A ) and the character size standardized in the document the magnitude relation between the threshold value Th ₂ for judges as follows what kind of line is the special line candidate.

W_BTh₂かつW_ATh₂の場合〔（41）のケース〕前後どちらの行にも付随しないノズル行 W_B,W_Aのどちらか１方＜Th₂の場合〔（42）のケース〕条件を満たす方の行を構成している通常行に付随する特
殊行 W_B＜Th₂かつW_A＜Th₂の場合〔（43）のケース〕行幅の小さい方の構成をしている通常行に付随する特殊
行つまり、W_BとW_Aを比較し、例えばW_B＞W_Aならば後の行
に付随する特殊行と判定する。For W _B Th ₂ and W _A Th ₂ nozzle rows W _B of [the case (41)] unattached to either row adjoining either 1-way of W _A <For Th ₂ [Case of (42)] Special row attached to the normal row constituting the row that satisfies the condition When W _B <Th ₂ and W _A <Th ₂ [Case (43)] Normal with the smaller row width determining special row associated with the row that is, it compares the W _B and W _a, for example, W _B> W _a special line associated with row after if.

こゝで、特殊行検出の対象文書として、第２図の文書
が与えられた場合について説明する。なお、同図におけ
るY_i,y_i（ｉ＝１〜11）は、それぞれ切り出された行の
開始，終了座標である。これら切出された各行の座標を
もとに、行幅Wiを、 Wi＝y_i−Y_i（ｉ＝１〜11）により求める。それらと同時に、文書中で標準とされる
文字サイズC_Sから、例えば、 Th₁＝αC_S（α：定数）によりルビ・アンダーライン等の特殊行の検出しきい値
Th₁を求める。このしきい値Th₁と各行の行幅Wiを比較
し、特殊行の候補となる行を検出する。第２図の場合で
は適切なαの値を用いることにより、容易に行３および
行９が特殊行の候補として他の通常行と分離できる。Here, the case where the document shown in FIG. 2 is given as the target document for the special line detection will be described. It should be noted that Y _i and y _i (i = 1 to 11) in the figure are the start and end coordinates of the cut-out line. Based on the coordinates of each of the extracted lines, a line width Wi is obtained by Wi = y _i −Y _i (i = 1 to 11). At the same time, from the standard character size C _S in the document, for example, a threshold value for detecting a special line such as ruby underline by Th ₁ = αC _S (α: constant)
Seek Th _1. Comparing the threshold Th ₁ and row line widths Wi, it detects a candidate to become the line of the special lines. In the case of FIG. 2, by using an appropriate value of α, the rows 3 and 9 can be easily separated from other normal rows as special row candidates.

このようにして得られた特殊行の候補に対して該候補
がルビ・アンダーラインなのかノイズ行なのかを判別す
る。すなわち、特殊行の候補として分離された行３およ
び行９に対して、その前後の行と統合した場合の行幅、
例えば行３の場合についていえば、 W_3B＝y₃−Y₂,W_3A＝y₄−Y₃ によつてW_3B,W_3Aを求める。この様子を第３図に拡大し
て示す。そして、このW_3BおよびW_3Aを、例えば、 Th₂＝βC_S（β：定数）によつて得られたしきい値Th₂と比較して、この特殊行
候補の行３が前後どちらの行に付随するものなのか、あ
るいはどちらにも属さないノイズ行なのかを判断する。
この場合も、適切なβを用いることにより、行３が行４
に付随する特殊行であると判定するのは容易である。同
様の手順により行９は行10に付随する特殊行であると判
定できる。For the special row candidate obtained in this way, it is determined whether the candidate is a ruby underline or a noise row. That is, the line width when the lines 3 and 9 separated as the special line candidates are integrated with the lines before and after them,
For example, in the case of row 3, W _3B = W ₃ -Y ₂ and W _3A = Y ₄ -Y ₃ are used to determine W _3B and W _3A . This is shown in an enlarged manner in FIG. Then, W _3B and W _3A are compared with, for example, a threshold Th ₂ obtained by Th ₂ = βC _S (β: constant), and the row 3 of this special row candidate It is determined whether the noise line is a noise line that does not belong to any of them.
Again, by using the appropriate β, row 3 becomes row 4
It is easy to determine that this is a special line associated with. By the same procedure, row 9 can be determined to be a special row accompanying row 10.

以上は横書きの例であるが、縦書きの場合でも同様の
方法を用いて検出することが可能である。ただし、縦書
きの場合には横書の場合のアンダーラインのような、前
の行（縦書の場合は左側の行）に付随する特殊行は存在
しないので、前の行に付随すると判定された特殊行はノ
イズ行とする。The above is an example of horizontal writing, but it is also possible to detect vertical writing using the same method. However, in the case of vertical writing, there is no special line attached to the previous line (the left line in the case of vertical writing), such as the underline in the case of horizontal writing, so it is determined to be attached to the previous line. The special row is a noise row.

〔The invention's effect〕

この発明によれば、ルビ・アンダーライン等の特殊行
を行ピツチを利用することなく検出するようにしたの
で、文書中に特殊行の含まれる割合の多少にかかわらな
い、安定した検出が可能となる。According to the present invention, special lines such as ruby and underline are detected without using line pitch, so that stable detection can be performed regardless of the ratio of the special lines included in the document. Become.

【図面の簡単な説明】第１図はこの発明の処理手順を示すフローチヤート、第
２図はこの発明の実施例を具体的に説明するための説明
図、第３図はその一部を拡大して示す拡大図である。符号説明 W_A,W_B,W_3A,W_3B……行幅、Th₁,Th₂……しきい値、Y₁〜Y
₁₁……行の開始座標、y₁〜y₁₁……行の終了座標。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a flowchart showing the processing procedure of the present invention, FIG. 2 is an explanatory view for specifically explaining an embodiment of the present invention, and FIG. FIG. Description of symbols W _A , W _B , W _3A , W _3B …… line width, Th ₁ , Th ₂ …… threshold, Y _{1 to} Y
₁₁ ...... start coordinates of the line, y ₁ ~y ₁₁ ...... end coordinates of the line.

Claims

(57) [Claims]

An image processing of a document cuts out a character line, obtains a width in each line direction (line width), and determines the line width from a standard character size extracted in a line cutting process. After searching for a character line with a narrow line width in comparison with the threshold value, for each of two integrated character lines in which the narrow character line is integrated with the preceding and following character lines, the respective line widths are standardized. Are compared with a second threshold value different from the first threshold value determined based on the first and second threshold values. Alternatively, a special character line determination method characterized by determining whether the line is a noise line.