JPH0231286A - Detecting method for special character row - Google Patents

Detecting method for special character row

Info

Publication number
JPH0231286A
JPH0231286A JP63180284A JP18028488A JPH0231286A JP H0231286 A JPH0231286 A JP H0231286A JP 63180284 A JP63180284 A JP 63180284A JP 18028488 A JP18028488 A JP 18028488A JP H0231286 A JPH0231286 A JP H0231286A
Authority
JP
Japan
Prior art keywords
row
line
character
width
special
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP63180284A
Other languages
Japanese (ja)
Other versions
JP2569132B2 (en
Inventor
Masatoshi Okada
岡田 正年
Kazuyuki Yoshida
收志 吉田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuji Electric Co Ltd
Original Assignee
Fuji Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuji Electric Co Ltd filed Critical Fuji Electric Co Ltd
Priority to JP63180284A priority Critical patent/JP2569132B2/en
Publication of JPH0231286A publication Critical patent/JPH0231286A/en
Application granted granted Critical
Publication of JP2569132B2 publication Critical patent/JP2569132B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

PURPOSE:To stably and exactly detect a special row by investigating row width and detecting the special row concerning two united character rows in which a character row with narrow width possible to have the special row not a row pitch, and the preceding and following character rows. CONSTITUTION:Row width Wi of the respective segmented character rows is obtained and a threshold Th1, which is determined from a character size to be defined as standard in a document, and the row width Wi of the respective character rows are compared. Then, the value of character row lower than this threshold is defined as the candidate of the special row. The row width of the united character row, in case that the preceding and following rows are respectively coupled and defined as one row to the row to be defined as the candidate of the special row, is obtained. According to the size relation of the obtained row width (in case the width of the row to be united with the preceding row is defined as WB and the width of the row to be united with the following row is defined as WA) and a threshold Th2 to be obtained from the character size which is defined as the standard in the document, it is decided how row is the special row candidate. For example, in the case of WB>=Th2 and WA>=Th2, the candidate is defined as a noisy row belonging neither the preceding or following rows.

Description

【発明の詳細な説明】 〔産業上の利用分野〕 この発明は、画像処理技術を利用してルビ・アンダーラ
イン・傍点および傍線が含まれる文書中から、これらの
特殊文字行または列(単に、特殊性とも云う。)を検出
する検出方法に関する。
[Detailed Description of the Invention] [Industrial Application Field] The present invention utilizes image processing technology to extract lines or columns of special characters (simply, (also referred to as speciality).

〔従来の技術〕[Conventional technology]

この種の検出方法として、出願人は先に各文字行間のピ
ッチt−調べ、これをもとに得られる標準ピッチと各文
字行間のピッチとを比較してピッチの狭い文字行の組を
見つけ出し、2つの文字行の行幅をそれぞれ調べること
により、一方の幅が文書中で標準とされている行幅に比
べて成る比率以下のときは、その幅の狭い文字行を幅の
広い方の文字行に付随するルビ・アンダーライン・傍点
または傍線からなる特殊性として検出する方法を提案し
ている(特願昭63−19595号参照)。
As this type of detection method, the applicant first examines the pitch t between each character line, and compares the standard pitch obtained based on this with the pitch between each character line to find a set of character lines with a narrow pitch. , by checking the line widths of two character lines, if the width of one is less than the ratio of the standard line width in the document, replace the narrow character line with the wider one. A method has been proposed for detecting special characteristics consisting of rubies, underlines, side points, or lines attached to a character line (see Japanese Patent Application No. 1987-19595).

〔発明が解決しようとするvA題〕[vA problem that the invention attempts to solve]

しかしながら、このような方法では文書中に特殊性の含
まれる比率が高くなると、特殊性があることによって生
じる狭いピッチが標準ピッチ算出に与える影響が大きく
なり、得られた標準ピッチと、特殊性と通常行との間の
狭いピッチとの差異が小さくなり、特殊性の検出精度が
低下すると云うI!I題が残されている。
However, with this method, as the proportion of special features in a document increases, the narrow pitch caused by the special features has a greater influence on the standard pitch calculation, and the obtained standard pitch and the special pitch increase. It is said that the difference between the narrow pitch and the normal row becomes smaller, and the detection accuracy of speciality decreases. I have one issue left.

例えば、極端な場合としてルビの振られた1行のみを読
ませよ、うとしても、ピッチが1つしか存在しないので
、その行に振られたルビの検出はできないことになる。
For example, in an extreme case, if you try to read only one line with ruby, since there is only one pitch, it will not be possible to detect the ruby on that line.

したがって、この発明は文書中に特殊性が含まれる割合
の多少にか\わりなく、特殊性を安定かつ正確に検出し
得るようにすることを目的とする。
Therefore, it is an object of the present invention to make it possible to stably and accurately detect peculiarities, regardless of the proportion of peculiarities contained in a document.

〔課題全解決するための手段〕[Means to solve all problems]

文11を悶像処理して文字行を切出すと−もにその各々
の行方向の幅(行幅)1:求め、該行幅を行切出しの過
程で抽出された標準文字サイズから決まる第1のしきい
値と比較して行幅の狭い文字行を探し田し、該行幅の狭
い文字行をその前、後の文字行とそれぞれ統合したZa
l類の統合文字行につき、その各行幅を前記標準文字サ
イズから決まる前記第1のしきい値とは興なる第2のし
きい値とそれぞれ比較してルビ・アンダーライン・傍点
・傍線からなる特殊文字行を検出する。
When sentence 11 is subjected to image processing to cut out character lines, the width in the line direction (line width) of each line is calculated, and the line width is determined from the standard character size extracted in the line cutting process. Za that searches for a character line with a narrow line width compared to the threshold of 1, and integrates the narrow character line with the previous and subsequent character lines, respectively.
For integrated character lines of type I, each line width is compared with the second threshold value, which is determined from the standard character size and is composed of ruby, underline, side dots, and side line. Detect special character lines.

〔作用〕[Effect]

ルビ・アンダーライン等の特殊性を検出するに当たり、
行ピッチではなく特殊性の可能性のある幅の狭い文字行
とその前、後の文字行とをそれぞれ統合した2つの統合
文字行につき、その各行幅を調べて特殊性t−検出する
ことにより、文書中に特殊性が含まれる割合の多少に全
く影響されずに、特殊性が検出できるようにする。
When detecting special characteristics such as ruby and underline,
By examining the width of each line and detecting the specialness t- for two integrated character lines, each of which is a combination of a narrow character line that has a possibility of speciality rather than the line pitch, and the character lines before and after it. To enable detection of special characteristics without being affected by the proportion of special characteristics included in a document.

〔実施例〕〔Example〕

第1図はこの発明の実施例を示す7四−チャードである
。同図に示すように、このフローチャートは下記(1)
〜(4)のステップからなっている。
FIG. 1 is a 74-chard showing an embodiment of the present invention. As shown in the figure, this flowchart is as follows (1)
It consists of steps (4).

(1)良く知られているIIIIlIgI@理技術を利
用して、文書中より文字行を勇出す。このとき、その行
方向の幅(行!I)の最大のものまたは最頻値を、切出
し領域内の標準文字サイズとする。
(1) Using the well-known IIIIlIgI@ technology, lines of text are extracted from the document. At this time, the maximum or most frequent value of the width in the line direction (line!I) is taken as the standard character size within the cutout area.

(2)切出された各文字行の行Il1gWiを求め、文
書中で標準とされる文字サイズから決定されるしきい値
Thlと各文字行の行@Wi f比較し、このしきい値
以下である文字行を特殊性の候補とする。
(2) Find the line Il1gWi of each extracted character line, compare the line @Wi f of each character line with the threshold Thl determined from the standard character size in the document, and find the line @Wi f of each character line below this threshold. Let the character line with , be a candidate for specialness.

(3)特殊性の候補とされた行に対して、その前の行お
よび後の行とそれぞれ結合して1つの行とみなした場合
の行(統合文字行)の行幅を求める。
(3) For a line that is a candidate for speciality, calculate the line width of the line (integrated character line) when it is combined with the previous line and the following line and regarded as one line.

(4)得られた行m<前の行と統合した行幅t WB 
を後の行と統合した行幅をWAとする)と、文書中で標
準とされている文字サイズとから得られる゛しきい値T
h2との大小関係により、その特殊性候補がどのような
行であるかを以下のように判定する。
(4) Obtained line m< line width integrated with previous line t WB
``Threshold value T'' obtained from the line width (WA is the line width integrated with the subsequent line) and the standard font size in the document.
Based on the magnitude relationship with h2, what kind of line is the specificity candidate is determined as follows.

■WB’、 ’I’ h 2 カッWA’!、 T h
 2の場合((41)のケース〕削後どちらの行にも付
随しないノイズ行■WB2wAのどちらか1万(Th2
の場合((42)のケース〕 条件を満たす方の行を構成している通常行に付随する特
殊性 ■WBくTh2かつWA (T h 2の場合C(43
)のケース〕 行幅の小さい方の行t−構成している通常行に付随する
特殊性つまり、WBとwAt−比較し、例えばWB>W
えならば後の行に付随する特殊性と判定する。
■WB', 'I' h 2 Kawa'! , Th
In the case of 2 (case (41)) Noise line that is not attached to either line after deletion ■ Either WB2wA 10,000 (Th2
(Case (42)) Special characteristics attached to the normal rows that make up the row that satisfies the condition ■WB, Th2, and WA (In the case of T h 2, C(43)
) case] Line t with smaller line width - Specialities attached to the normal lines that constitute it, that is, WB and wAt - Compare, for example, WB>W
If so, it is determined that the speciality is attached to the following line.

こ−で、特殊行検出の対象文書として、第2図の文書が
与えられた場合について説明する。なお、同図における
ηyYi(’=1〜11)は、それぞれ切り出された行
の開始、終了座樟である。これら切出された各行の座標
をもとに、行11Wit−1W i −yl−Yl(i
−1〜11)にょう求める。それと同時に、文書中で標
準とされる文字サイズC8から、例えば、 Tb1−αC8(α:定数) によりルビ・アンダーライン等の特殊性の検出しきい値
Th1i求める。このしきい値Thxと各行の行幅Wi
を比較し、特殊性の候補となる行を検出する。第2図の
場合では適切なαの値を用いることにより、容易に行3
および行9が特殊性の候補として他の通常行と分離でき
る。
A case will now be described in which the document shown in FIG. 2 is given as a target document for special line detection. Note that ηyYi ('=1 to 11) in the figure are the start and end marks of the cut out rows, respectively. Based on the coordinates of each of these cut out rows, row 11Wit-1W i -yl-Yl(i
-1 to 11) Find out. At the same time, a detection threshold value Th1i for special characteristics such as ruby and underline is determined from the standard character size C8 in the document using, for example, Tb1-αC8 (α: constant). This threshold Thx and the row width Wi of each row
, and find rows that are candidates for specificity. In the case of Figure 2, by using an appropriate value of α, it is easy to
and line 9 can be separated from other normal lines as candidates for speciality.

このようKして、特殊性の候補として分離された行3お
よび行9に対して、その前後の行と統合した場合の行幅
、例えば行3の場合についていえば、 w3B−y、−Y2. w3A−y4−y3によってw
3B、 w3A’1求める。この様子を第3図に拡大し
て示す。モして、このW3BおよびW3Ath%例えば
、 Th2−βC8(β:定数) によって得られたしきい値Th2と比較して、この特殊
性候補の行3が前後どちらの行に付随するものなのか、
あるいはどちらにも属さないノイズ行なのかを判断する
。この場合も、a切なβを用いることにより、行3が行
4に付raする特殊性であると判定するのは容易である
。同様の手jlllKより行9は行10に付随する特殊
性であると判定できる。
In this way, for rows 3 and 9, which were separated as candidates for specificity, the row width when integrated with the previous and following rows, for example, in the case of row 3, w3B-y, -Y2 .. w3A-y4-y3 w
3B, find w3A'1. This situation is shown enlarged in FIG. Compare this W3B and W3Ath% with the threshold value Th2 obtained by, for example, Th2-βC8 (β: constant) to determine which row before or after this particularity candidate row 3 is attached to. ,
Or, it is determined whether it is a noise line that does not belong to either category. In this case as well, by using a-cut β, it is easy to determine that row 3 has a special characteristic attached to row 4. From a similar method, it can be determined that row 9 is a special feature attached to row 10.

以上は横書きの的であるが、縦書きの場合でも同様の方
法を用いて検出することが可能である。
Although the above targets are for horizontal writing, it is possible to detect targets for vertical writing using the same method.

ただし、縦書きの場合には横書の場合のアンダーライン
のような、1iJの行(縦書の場合は左側の行)に付随
する特殊性は存在しないので、前の行に付随すると判定
された特殊性はノイズ行とする。
However, in the case of vertical writing, there is no special feature attached to the 1iJ line (the left line in the case of vertical writing), such as an underline in the case of horizontal writing, so it is determined that it is attached to the previous line. The peculiarity is a noise line.

〔発明の効果〕〔Effect of the invention〕

この発明によれば、ルビ・アンダーライン等の特殊性を
行ピッチを利用することなく検出するようにしたので、
文書中に#殊行の含まれる割合の多少にかかわらない、
安定した検出が可能となる。
According to this invention, special characteristics such as ruby and underline can be detected without using line pitch.
Regardless of the proportion of #special lines in the document,
Stable detection becomes possible.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図はこの発明の処理手順を示すフローチャート、第
2図はこの発明の実施91t−具体的に説明するための
説明図、#!3図はその一部を拡大して示す拡大図であ
る。 符号説明 WA、WB、W3A、W3B・・・・・・行幅、Thl
、Thz・・・・・・しきい値、Yl””Yll・・・
・・・行の開始座標、y1〜ytt・・・・・・行の終
了座標。 代理人 弁堀士 並 木 昭 夫 代理人 弁趨士 松 崎    清 冨 III 笥21F 第 3 !J
FIG. 1 is a flowchart showing the processing procedure of the present invention, and FIG. 2 is an explanatory diagram for specifically explaining the implementation 91t of the present invention, #! FIG. 3 is an enlarged view showing a part thereof. Code explanation: WA, WB, W3A, W3B...Line width, Thl
, Thz...Threshold value, Yl""Yll...
. . . Start coordinates of the row, y1 to ytt . . . End coordinates of the row. Agent Benhori Akio Namiki Agent Benhori Kiyotomi Matsuzaki III 21F No. 3! J

Claims (1)

【特許請求の範囲】[Claims] 文書を画像処理して文字行を切出すとゝもにその各々の
行方向の幅(行幅)を求め、該行幅を行切出しの過程で
抽出された標準文字サイズから決まる第1のしきい値と
比較して行幅の狭い文字行を探し出し、該行幅の狭い文
字行をその前、後の文字行とそれぞれ統合した2つの統
合文字行につき、その各行幅を前記標準文字サイズから
決まる前記第1のしきい値とは異なる第2のしきい値と
それぞれ比較してルビ・アンダーライン・傍点・傍線か
らなる特殊文字行を検出することを特徴とする特殊文字
行の検出方法。
Image processing a document to cut out character lines, find the width in the line direction (line width) of each line, and calculate the line width using the first method determined from the standard character size extracted during the line cutting process. Find a character line with a narrow line width by comparing it with a threshold, and for each of the two merged character lines obtained by merging the narrow character line with the previous and subsequent character lines, calculate the line width from the standard character size. A method for detecting a special character line, comprising detecting a special character line consisting of ruby, underline, side dots, and side lines by comparing each with a second threshold value that is different from the determined first threshold value.
JP63180284A 1988-07-21 1988-07-21 How to determine special character lines Expired - Lifetime JP2569132B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP63180284A JP2569132B2 (en) 1988-07-21 1988-07-21 How to determine special character lines

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP63180284A JP2569132B2 (en) 1988-07-21 1988-07-21 How to determine special character lines

Publications (2)

Publication Number Publication Date
JPH0231286A true JPH0231286A (en) 1990-02-01
JP2569132B2 JP2569132B2 (en) 1997-01-08

Family

ID=16080523

Family Applications (1)

Application Number Title Priority Date Filing Date
JP63180284A Expired - Lifetime JP2569132B2 (en) 1988-07-21 1988-07-21 How to determine special character lines

Country Status (1)

Country Link
JP (1) JP2569132B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004094292A (en) * 2002-08-29 2004-03-25 Ricoh Co Ltd Character recognizing device, character recognizing method, and program used for executing the method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6048582A (en) * 1983-08-25 1985-03-16 Fujitsu Ltd Character cutting-out method of character recognizer
JPS61269778A (en) * 1985-05-24 1986-11-29 Agency Of Ind Science & Technol Character line extracting device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6048582A (en) * 1983-08-25 1985-03-16 Fujitsu Ltd Character cutting-out method of character recognizer
JPS61269778A (en) * 1985-05-24 1986-11-29 Agency Of Ind Science & Technol Character line extracting device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004094292A (en) * 2002-08-29 2004-03-25 Ricoh Co Ltd Character recognizing device, character recognizing method, and program used for executing the method

Also Published As

Publication number Publication date
JP2569132B2 (en) 1997-01-08

Similar Documents

Publication Publication Date Title
KR890009168A (en) Image processing device
JPH0231286A (en) Detecting method for special character row
JP2569103B2 (en) Character detection method
JPH0410087A (en) Base line extracting method
JPS629958B2 (en)
JP3193573B2 (en) Character recognition device with brackets
JP2510722B2 (en) How to distinguish uppercase and lowercase letters in English
JP2520174B2 (en) Automatic character extraction device
JP2768289B2 (en) Character segmentation device
JP2821303B2 (en) Sharp character combination method
JP2683116B2 (en) Ruled line removal method
JP2974145B2 (en) Correcting character recognition results
JPH0498477A (en) Character segmenting method
Walker Spacio-Temporal Arabic Character Recognition Using Polynomial Contour Fitting-Internal Report
JPH05242299A (en) Character recognition device
JPH10214308A (en) Character discrimination method
JP3665435B2 (en) Character recognition device and character recognition method
JPS6383890A (en) Character recognizing device
JP2922949B2 (en) Post-processing method for character recognition
Green et al. Layout analysis of book pages
JPH08272909A (en) Method and device for character recognition
JPS6383889A (en) Character recognizing device
JPS6383887A (en) Character recognizer
JPH05174189A (en) Character recognizing method
JPH04353989A (en) Word segmenting system