JPH0231286A

JPH0231286A - Detecting method for special character row

Info

Publication number: JPH0231286A
Application number: JP63180284A
Authority: JP
Inventors: Masatoshi Okada; 岡田　正年; Kazuyuki Yoshida; 收志吉田
Original assignee: Fuji Electric Co Ltd
Current assignee: Fuji Electric Co Ltd
Priority date: 1988-07-21
Filing date: 1988-07-21
Publication date: 1990-02-01
Anticipated expiration: 2012-01-08
Also published as: JP2569132B2

Abstract

PURPOSE:To stably and exactly detect a special row by investigating row width and detecting the special row concerning two united character rows in which a character row with narrow width possible to have the special row not a row pitch, and the preceding and following character rows. CONSTITUTION:Row width Wi of the respective segmented character rows is obtained and a threshold Th1, which is determined from a character size to be defined as standard in a document, and the row width Wi of the respective character rows are compared. Then, the value of character row lower than this threshold is defined as the candidate of the special row. The row width of the united character row, in case that the preceding and following rows are respectively coupled and defined as one row to the row to be defined as the candidate of the special row, is obtained. According to the size relation of the obtained row width (in case the width of the row to be united with the preceding row is defined as WB and the width of the row to be united with the following row is defined as WA) and a threshold Th2 to be obtained from the character size which is defined as the standard in the document, it is decided how row is the special row candidate. For example, in the case of WB>=Th2 and WA>=Th2, the candidate is defined as a noisy row belonging neither the preceding or following rows.

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、画像処理技術を利用してルビ・アンダーラ
イン・傍点および傍線が含まれる文書中から、これらの
特殊文字行または列（単に、特殊性とも云う。）を検出
する検出方法に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention utilizes image processing technology to extract lines or columns of special characters (simply, (also referred to as speciality).

[Conventional technology]

この種の検出方法として、出願人は先に各文字行間のピ
ッチｔ−調べ、これをもとに得られる標準ピッチと各文
字行間のピッチとを比較してピッチの狭い文字行の組を
見つけ出し、２つの文字行の行幅をそれぞれ調べること
により、一方の幅が文書中で標準とされている行幅に比
べて成る比率以下のときは、その幅の狭い文字行を幅の
広い方の文字行に付随するルビ・アンダーライン・傍点
または傍線からなる特殊性として検出する方法を提案し
ている（特願昭６３−１９５９５号参照）。As this type of detection method, the applicant first examines the pitch t between each character line, and compares the standard pitch obtained based on this with the pitch between each character line to find a set of character lines with a narrow pitch. , by checking the line widths of two character lines, if the width of one is less than the ratio of the standard line width in the document, replace the narrow character line with the wider one. A method has been proposed for detecting special characteristics consisting of rubies, underlines, side points, or lines attached to a character line (see Japanese Patent Application No. 1987-19595).

[vA problem that the invention attempts to solve]

しかしながら、このような方法では文書中に特殊性の含
まれる比率が高くなると、特殊性があることによって生
じる狭いピッチが標準ピッチ算出に与える影響が大きく
なり、得られた標準ピッチと、特殊性と通常行との間の
狭いピッチとの差異が小さくなり、特殊性の検出精度が
低下すると云うＩ！Ｉ題が残されている。However, with this method, as the proportion of special features in a document increases, the narrow pitch caused by the special features has a greater influence on the standard pitch calculation, and the obtained standard pitch and the special pitch increase. It is said that the difference between the narrow pitch and the normal row becomes smaller, and the detection accuracy of speciality decreases. I have one issue left.

例えば、極端な場合としてルビの振られた１行のみを読
ませよ、うとしても、ピッチが１つしか存在しないので
、その行に振られたルビの検出はできないことになる。For example, in an extreme case, if you try to read only one line with ruby, since there is only one pitch, it will not be possible to detect the ruby on that line.

したがって、この発明は文書中に特殊性が含まれる割合
の多少にか＼わりなく、特殊性を安定かつ正確に検出し
得るようにすることを目的とする。Therefore, it is an object of the present invention to make it possible to stably and accurately detect peculiarities, regardless of the proportion of peculiarities contained in a document.

[Means to solve all problems]

文１１を悶像処理して文字行を切出すと−もにその各々
の行方向の幅（行幅）１：求め、該行幅を行切出しの過
程で抽出された標準文字サイズから決まる第１のしきい
値と比較して行幅の狭い文字行を探し田し、該行幅の狭
い文字行をその前、後の文字行とそれぞれ統合したＺａ
ｌ類の統合文字行につき、その各行幅を前記標準文字サ
イズから決まる前記第１のしきい値とは興なる第２のし
きい値とそれぞれ比較してルビ・アンダーライン・傍点
・傍線からなる特殊文字行を検出する。When sentence 11 is subjected to image processing to cut out character lines, the width in the line direction (line width) of each line is calculated, and the line width is determined from the standard character size extracted in the line cutting process. Za that searches for a character line with a narrow line width compared to the threshold of 1, and integrates the narrow character line with the previous and subsequent character lines, respectively.
For integrated character lines of type I, each line width is compared with the second threshold value, which is determined from the standard character size and is composed of ruby, underline, side dots, and side line. Detect special character lines.

[Effect]

ルビ・アンダーライン等の特殊性を検出するに当たり、
行ピッチではなく特殊性の可能性のある幅の狭い文字行
とその前、後の文字行とをそれぞれ統合した２つの統合
文字行につき、その各行幅を調べて特殊性ｔ−検出する
ことにより、文書中に特殊性が含まれる割合の多少に全
く影響されずに、特殊性が検出できるようにする。When detecting special characteristics such as ruby and underline,
By examining the width of each line and detecting the specialness t- for two integrated character lines, each of which is a combination of a narrow character line that has a possibility of speciality rather than the line pitch, and the character lines before and after it. To enable detection of special characteristics without being affected by the proportion of special characteristics included in a document.

〔Example〕

第１図はこの発明の実施例を示す７四−チャードである
。同図に示すように、このフローチャートは下記（１）
〜（４）のステップからなっている。FIG. 1 is a 74-chard showing an embodiment of the present invention. As shown in the figure, this flowchart is as follows (1)
It consists of steps (4).

（１）良く知られているＩＩＩＩｌＩｇＩ＠理技術を利
用して、文書中より文字行を勇出す。このとき、その行
方向の幅（行！Ｉ）の最大のものまたは最頻値を、切出
し領域内の標準文字サイズとする。(1) Using the well-known IIIIlIgI@ technology, lines of text are extracted from the document. At this time, the maximum or most frequent value of the width in the line direction (line!I) is taken as the standard character size within the cutout area.

（２）切出された各文字行の行Ｉｌ１ｇＷｉを求め、文
書中で標準とされる文字サイズから決定されるしきい値
Ｔｈｌと各文字行の行＠Ｗｉ　ｆ比較し、このしきい値
以下である文字行を特殊性の候補とする。(2) Find the line Il1gWi of each extracted character line, compare the line @Wi f of each character line with the threshold Thl determined from the standard character size in the document, and find the line @Wi f of each character line below this threshold. Let the character line with , be a candidate for specialness.

（３）特殊性の候補とされた行に対して、その前の行お
よび後の行とそれぞれ結合して１つの行とみなした場合
の行（統合文字行）の行幅を求める。(3) For a line that is a candidate for speciality, calculate the line width of the line (integrated character line) when it is combined with the previous line and the following line and regarded as one line.

（４）得られた行ｍ＜前の行と統合した行幅ｔ　ＷＢ　
を後の行と統合した行幅をＷＡとする）と、文書中で標
準とされている文字サイズとから得られる゛しきい値Ｔ
ｈ２との大小関係により、その特殊性候補がどのような
行であるかを以下のように判定する。(4) Obtained line m< line width integrated with previous line t WB
``Threshold value T'' obtained from the line width (WA is the line width integrated with the subsequent line) and the standard font size in the document.
Based on the magnitude relationship with h2, what kind of line is the specificity candidate is determined as follows.

■ＷＢ’、　’Ｉ’　ｈ　２　カッＷＡ’！、　Ｔ　ｈ
　２の場合（（４１）のケース〕削後どちらの行にも付
随しないノイズ行■ＷＢ２ｗＡのどちらか１万（Ｔｈ２
の場合（（４２）のケース〕条件を満たす方の行を構成している通常行に付随する特
殊性 ■ＷＢくＴｈ２かつＷＡ　（Ｔ　ｈ　２の場合Ｃ（４３
）のケース〕行幅の小さい方の行ｔ−構成している通常行に付随する
特殊性つまり、ＷＢとｗＡｔ−比較し、例えばＷＢ＞Ｗ
えならば後の行に付随する特殊性と判定する。■WB', 'I' h 2 Kawa'! , Th
In the case of 2 (case (41)) Noise line that is not attached to either line after deletion ■ Either WB2wA 10,000 (Th2
(Case (42)) Special characteristics attached to the normal rows that make up the row that satisfies the condition ■WB, Th2, and WA (In the case of T h 2, C(43)
) case] Line t with smaller line width - Specialities attached to the normal lines that constitute it, that is, WB and wAt - Compare, for example, WB>W
If so, it is determined that the speciality is attached to the following line.

こ−で、特殊行検出の対象文書として、第２図の文書が
与えられた場合について説明する。なお、同図における
ηｙＹｉ（’＝１〜１１）は、それぞれ切り出された行
の開始、終了座樟である。これら切出された各行の座標
をもとに、行１１Ｗｉｔ−１Ｗ　ｉ　−ｙｌ−Ｙｌ（ｉ
−１〜１１）にょう求める。それと同時に、文書中で標
準とされる文字サイズＣ８から、例えば、Ｔｂ１−αＣ８（α：定数）によりルビ・アンダーライン等の特殊性の検出しきい値
Ｔｈ１ｉ求める。このしきい値Ｔｈｘと各行の行幅Ｗｉ
を比較し、特殊性の候補となる行を検出する。第２図の
場合では適切なαの値を用いることにより、容易に行３
および行９が特殊性の候補として他の通常行と分離でき
る。A case will now be described in which the document shown in FIG. 2 is given as a target document for special line detection. Note that ηyYi ('=1 to 11) in the figure are the start and end marks of the cut out rows, respectively. Based on the coordinates of each of these cut out rows, row 11Wit-1W i -yl-Yl(i
-1 to 11) Find out. At the same time, a detection threshold value Th1i for special characteristics such as ruby and underline is determined from the standard character size C8 in the document using, for example, Tb1-αC8 (α: constant). This threshold Thx and the row width Wi of each row
, and find rows that are candidates for specificity. In the case of Figure 2, by using an appropriate value of α, it is easy to
and line 9 can be separated from other normal lines as candidates for speciality.

このようＫして、特殊性の候補として分離された行３お
よび行９に対して、その前後の行と統合した場合の行幅
、例えば行３の場合についていえば、ｗ３Ｂ−ｙ、−Ｙ２．　ｗ３Ａ−ｙ４−ｙ３によってｗ
３Ｂ、　ｗ３Ａ’１求める。この様子を第３図に拡大し
て示す。モして、このＷ３ＢおよびＷ３Ａｔｈ％例えば
、Ｔｈ２−βＣ８（β：定数）によって得られたしきい値Ｔｈ２と比較して、この特殊
性候補の行３が前後どちらの行に付随するものなのか、
あるいはどちらにも属さないノイズ行なのかを判断する
。この場合も、ａ切なβを用いることにより、行３が行
４に付ｒａする特殊性であると判定するのは容易である
。同様の手ｊｌｌｌＫより行９は行１０に付随する特殊
性であると判定できる。In this way, for rows 3 and 9, which were separated as candidates for specificity, the row width when integrated with the previous and following rows, for example, in the case of row 3, w3B-y, -Y2 ．． w3A-y4-y3 w
3B, find w3A'1. This situation is shown enlarged in FIG. Compare this W3B and W3Ath% with the threshold value Th2 obtained by, for example, Th2-βC8 (β: constant) to determine which row before or after this particularity candidate row 3 is attached to. ,
Or, it is determined whether it is a noise line that does not belong to either category. In this case as well, by using a-cut β, it is easy to determine that row 3 has a special characteristic attached to row 4. From a similar method, it can be determined that row 9 is a special feature attached to row 10.

以上は横書きの的であるが、縦書きの場合でも同様の方
法を用いて検出することが可能である。Although the above targets are for horizontal writing, it is possible to detect targets for vertical writing using the same method.

ただし、縦書きの場合には横書の場合のアンダーライン
のような、１ｉＪの行（縦書の場合は左側の行）に付随
する特殊性は存在しないので、前の行に付随すると判定
された特殊性はノイズ行とする。However, in the case of vertical writing, there is no special feature attached to the 1iJ line (the left line in the case of vertical writing), such as an underline in the case of horizontal writing, so it is determined that it is attached to the previous line. The peculiarity is a noise line.

〔Effect of the invention〕

この発明によれば、ルビ・アンダーライン等の特殊性を
行ピッチを利用することなく検出するようにしたので、
文書中に＃殊行の含まれる割合の多少にかかわらない、
安定した検出が可能となる。According to this invention, special characteristics such as ruby and underline can be detected without using line pitch.
Regardless of the proportion of #special lines in the document,
Stable detection becomes possible.

[Brief explanation of the drawing]

第１図はこの発明の処理手順を示すフローチャート、第
２図はこの発明の実施９１ｔ−具体的に説明するための
説明図、＃！３図はその一部を拡大して示す拡大図であ
る。符号説明ＷＡ、ＷＢ、Ｗ３Ａ、Ｗ３Ｂ・・・・・・行幅、Ｔｈｌ
、Ｔｈｚ・・・・・・しきい値、Ｙｌ””Ｙｌｌ・・・
・・・行の開始座標、ｙ１〜ｙｔｔ・・・・・・行の終
了座標。代理人　弁堀士　並　木　昭　夫代理人　弁趨士　松　崎　　　　清冨　ＩＩＩ笥２１Ｆ第　３　！ＪFIG. 1 is a flowchart showing the processing procedure of the present invention, and FIG. 2 is an explanatory diagram for specifically explaining the implementation 91t of the present invention, #! FIG. 3 is an enlarged view showing a part thereof. Code explanation: WA, WB, W3A, W3B...Line width, Thl
, Thz...Threshold value, Yl""Yll...
. . . Start coordinates of the row, y1 to ytt . . . End coordinates of the row. Agent Benhori Akio Namiki Agent Benhori Kiyotomi Matsuzaki III 21F No. 3! J

Claims

[Claims]

Image processing a document to cut out character lines, find the width in the line direction (line width) of each line, and calculate the line width using the first method determined from the standard character size extracted during the line cutting process. Find a character line with a narrow line width by comparing it with a threshold, and for each of the two merged character lines obtained by merging the narrow character line with the previous and subsequent character lines, calculate the line width from the standard character size. A method for detecting a special character line, comprising detecting a special character line consisting of ruby, underline, side dots, and side lines by comparing each with a second threshold value that is different from the determined first threshold value.