JPH0231286A - Detecting method for special character row - Google Patents
Detecting method for special character rowInfo
- Publication number
- JPH0231286A JPH0231286A JP63180284A JP18028488A JPH0231286A JP H0231286 A JPH0231286 A JP H0231286A JP 63180284 A JP63180284 A JP 63180284A JP 18028488 A JP18028488 A JP 18028488A JP H0231286 A JPH0231286 A JP H0231286A
- Authority
- JP
- Japan
- Prior art keywords
- row
- line
- character
- width
- special
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 8
- 239000010979 ruby Substances 0.000 claims description 7
- 229910001750 ruby Inorganic materials 0.000 claims description 7
- 238000005520 cutting process Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 description 6
- 239000003795 chemical substances by application Substances 0.000 description 2
- 230000007423 decrease Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
Landscapes
- Character Input (AREA)
Abstract
Description
【発明の詳細な説明】
〔産業上の利用分野〕
この発明は、画像処理技術を利用してルビ・アンダーラ
イン・傍点および傍線が含まれる文書中から、これらの
特殊文字行または列(単に、特殊性とも云う。)を検出
する検出方法に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention utilizes image processing technology to extract lines or columns of special characters (simply, (also referred to as speciality).
この種の検出方法として、出願人は先に各文字行間のピ
ッチt−調べ、これをもとに得られる標準ピッチと各文
字行間のピッチとを比較してピッチの狭い文字行の組を
見つけ出し、2つの文字行の行幅をそれぞれ調べること
により、一方の幅が文書中で標準とされている行幅に比
べて成る比率以下のときは、その幅の狭い文字行を幅の
広い方の文字行に付随するルビ・アンダーライン・傍点
または傍線からなる特殊性として検出する方法を提案し
ている(特願昭63−19595号参照)。As this type of detection method, the applicant first examines the pitch t between each character line, and compares the standard pitch obtained based on this with the pitch between each character line to find a set of character lines with a narrow pitch. , by checking the line widths of two character lines, if the width of one is less than the ratio of the standard line width in the document, replace the narrow character line with the wider one. A method has been proposed for detecting special characteristics consisting of rubies, underlines, side points, or lines attached to a character line (see Japanese Patent Application No. 1987-19595).
しかしながら、このような方法では文書中に特殊性の含
まれる比率が高くなると、特殊性があることによって生
じる狭いピッチが標準ピッチ算出に与える影響が大きく
なり、得られた標準ピッチと、特殊性と通常行との間の
狭いピッチとの差異が小さくなり、特殊性の検出精度が
低下すると云うI!I題が残されている。However, with this method, as the proportion of special features in a document increases, the narrow pitch caused by the special features has a greater influence on the standard pitch calculation, and the obtained standard pitch and the special pitch increase. It is said that the difference between the narrow pitch and the normal row becomes smaller, and the detection accuracy of speciality decreases. I have one issue left.
例えば、極端な場合としてルビの振られた1行のみを読
ませよ、うとしても、ピッチが1つしか存在しないので
、その行に振られたルビの検出はできないことになる。For example, in an extreme case, if you try to read only one line with ruby, since there is only one pitch, it will not be possible to detect the ruby on that line.
したがって、この発明は文書中に特殊性が含まれる割合
の多少にか\わりなく、特殊性を安定かつ正確に検出し
得るようにすることを目的とする。Therefore, it is an object of the present invention to make it possible to stably and accurately detect peculiarities, regardless of the proportion of peculiarities contained in a document.
文11を悶像処理して文字行を切出すと−もにその各々
の行方向の幅(行幅)1:求め、該行幅を行切出しの過
程で抽出された標準文字サイズから決まる第1のしきい
値と比較して行幅の狭い文字行を探し田し、該行幅の狭
い文字行をその前、後の文字行とそれぞれ統合したZa
l類の統合文字行につき、その各行幅を前記標準文字サ
イズから決まる前記第1のしきい値とは興なる第2のし
きい値とそれぞれ比較してルビ・アンダーライン・傍点
・傍線からなる特殊文字行を検出する。When sentence 11 is subjected to image processing to cut out character lines, the width in the line direction (line width) of each line is calculated, and the line width is determined from the standard character size extracted in the line cutting process. Za that searches for a character line with a narrow line width compared to the threshold of 1, and integrates the narrow character line with the previous and subsequent character lines, respectively.
For integrated character lines of type I, each line width is compared with the second threshold value, which is determined from the standard character size and is composed of ruby, underline, side dots, and side line. Detect special character lines.
ルビ・アンダーライン等の特殊性を検出するに当たり、
行ピッチではなく特殊性の可能性のある幅の狭い文字行
とその前、後の文字行とをそれぞれ統合した2つの統合
文字行につき、その各行幅を調べて特殊性t−検出する
ことにより、文書中に特殊性が含まれる割合の多少に全
く影響されずに、特殊性が検出できるようにする。When detecting special characteristics such as ruby and underline,
By examining the width of each line and detecting the specialness t- for two integrated character lines, each of which is a combination of a narrow character line that has a possibility of speciality rather than the line pitch, and the character lines before and after it. To enable detection of special characteristics without being affected by the proportion of special characteristics included in a document.
第1図はこの発明の実施例を示す7四−チャードである
。同図に示すように、このフローチャートは下記(1)
〜(4)のステップからなっている。FIG. 1 is a 74-chard showing an embodiment of the present invention. As shown in the figure, this flowchart is as follows (1)
It consists of steps (4).
(1)良く知られているIIIIlIgI@理技術を利
用して、文書中より文字行を勇出す。このとき、その行
方向の幅(行!I)の最大のものまたは最頻値を、切出
し領域内の標準文字サイズとする。(1) Using the well-known IIIIlIgI@ technology, lines of text are extracted from the document. At this time, the maximum or most frequent value of the width in the line direction (line!I) is taken as the standard character size within the cutout area.
(2)切出された各文字行の行Il1gWiを求め、文
書中で標準とされる文字サイズから決定されるしきい値
Thlと各文字行の行@Wi f比較し、このしきい値
以下である文字行を特殊性の候補とする。(2) Find the line Il1gWi of each extracted character line, compare the line @Wi f of each character line with the threshold Thl determined from the standard character size in the document, and find the line @Wi f of each character line below this threshold. Let the character line with , be a candidate for specialness.
(3)特殊性の候補とされた行に対して、その前の行お
よび後の行とそれぞれ結合して1つの行とみなした場合
の行(統合文字行)の行幅を求める。(3) For a line that is a candidate for speciality, calculate the line width of the line (integrated character line) when it is combined with the previous line and the following line and regarded as one line.
(4)得られた行m<前の行と統合した行幅t WB
を後の行と統合した行幅をWAとする)と、文書中で標
準とされている文字サイズとから得られる゛しきい値T
h2との大小関係により、その特殊性候補がどのような
行であるかを以下のように判定する。(4) Obtained line m< line width integrated with previous line t WB
``Threshold value T'' obtained from the line width (WA is the line width integrated with the subsequent line) and the standard font size in the document.
Based on the magnitude relationship with h2, what kind of line is the specificity candidate is determined as follows.
■WB’、 ’I’ h 2 カッWA’!、 T h
2の場合((41)のケース〕削後どちらの行にも付
随しないノイズ行■WB2wAのどちらか1万(Th2
の場合((42)のケース〕
条件を満たす方の行を構成している通常行に付随する特
殊性
■WBくTh2かつWA (T h 2の場合C(43
)のケース〕
行幅の小さい方の行t−構成している通常行に付随する
特殊性つまり、WBとwAt−比較し、例えばWB>W
えならば後の行に付随する特殊性と判定する。■WB', 'I' h 2 Kawa'! , Th
In the case of 2 (case (41)) Noise line that is not attached to either line after deletion ■ Either WB2wA 10,000 (Th2
(Case (42)) Special characteristics attached to the normal rows that make up the row that satisfies the condition ■WB, Th2, and WA (In the case of T h 2, C(43)
) case] Line t with smaller line width - Specialities attached to the normal lines that constitute it, that is, WB and wAt - Compare, for example, WB>W
If so, it is determined that the speciality is attached to the following line.
こ−で、特殊行検出の対象文書として、第2図の文書が
与えられた場合について説明する。なお、同図における
ηyYi(’=1〜11)は、それぞれ切り出された行
の開始、終了座樟である。これら切出された各行の座標
をもとに、行11Wit−1W i −yl−Yl(i
−1〜11)にょう求める。それと同時に、文書中で標
準とされる文字サイズC8から、例えば、
Tb1−αC8(α:定数)
によりルビ・アンダーライン等の特殊性の検出しきい値
Th1i求める。このしきい値Thxと各行の行幅Wi
を比較し、特殊性の候補となる行を検出する。第2図の
場合では適切なαの値を用いることにより、容易に行3
および行9が特殊性の候補として他の通常行と分離でき
る。A case will now be described in which the document shown in FIG. 2 is given as a target document for special line detection. Note that ηyYi ('=1 to 11) in the figure are the start and end marks of the cut out rows, respectively. Based on the coordinates of each of these cut out rows, row 11Wit-1W i -yl-Yl(i
-1 to 11) Find out. At the same time, a detection threshold value Th1i for special characteristics such as ruby and underline is determined from the standard character size C8 in the document using, for example, Tb1-αC8 (α: constant). This threshold Thx and the row width Wi of each row
, and find rows that are candidates for specificity. In the case of Figure 2, by using an appropriate value of α, it is easy to
and line 9 can be separated from other normal lines as candidates for speciality.
このようKして、特殊性の候補として分離された行3お
よび行9に対して、その前後の行と統合した場合の行幅
、例えば行3の場合についていえば、
w3B−y、−Y2. w3A−y4−y3によってw
3B、 w3A’1求める。この様子を第3図に拡大し
て示す。モして、このW3BおよびW3Ath%例えば
、
Th2−βC8(β:定数)
によって得られたしきい値Th2と比較して、この特殊
性候補の行3が前後どちらの行に付随するものなのか、
あるいはどちらにも属さないノイズ行なのかを判断する
。この場合も、a切なβを用いることにより、行3が行
4に付raする特殊性であると判定するのは容易である
。同様の手jlllKより行9は行10に付随する特殊
性であると判定できる。In this way, for rows 3 and 9, which were separated as candidates for specificity, the row width when integrated with the previous and following rows, for example, in the case of row 3, w3B-y, -Y2 .. w3A-y4-y3 w
3B, find w3A'1. This situation is shown enlarged in FIG. Compare this W3B and W3Ath% with the threshold value Th2 obtained by, for example, Th2-βC8 (β: constant) to determine which row before or after this particularity candidate row 3 is attached to. ,
Or, it is determined whether it is a noise line that does not belong to either category. In this case as well, by using a-cut β, it is easy to determine that row 3 has a special characteristic attached to row 4. From a similar method, it can be determined that row 9 is a special feature attached to row 10.
以上は横書きの的であるが、縦書きの場合でも同様の方
法を用いて検出することが可能である。Although the above targets are for horizontal writing, it is possible to detect targets for vertical writing using the same method.
ただし、縦書きの場合には横書の場合のアンダーライン
のような、1iJの行(縦書の場合は左側の行)に付随
する特殊性は存在しないので、前の行に付随すると判定
された特殊性はノイズ行とする。However, in the case of vertical writing, there is no special feature attached to the 1iJ line (the left line in the case of vertical writing), such as an underline in the case of horizontal writing, so it is determined that it is attached to the previous line. The peculiarity is a noise line.
この発明によれば、ルビ・アンダーライン等の特殊性を
行ピッチを利用することなく検出するようにしたので、
文書中に#殊行の含まれる割合の多少にかかわらない、
安定した検出が可能となる。According to this invention, special characteristics such as ruby and underline can be detected without using line pitch.
Regardless of the proportion of #special lines in the document,
Stable detection becomes possible.
第1図はこの発明の処理手順を示すフローチャート、第
2図はこの発明の実施91t−具体的に説明するための
説明図、#!3図はその一部を拡大して示す拡大図であ
る。
符号説明
WA、WB、W3A、W3B・・・・・・行幅、Thl
、Thz・・・・・・しきい値、Yl””Yll・・・
・・・行の開始座標、y1〜ytt・・・・・・行の終
了座標。
代理人 弁堀士 並 木 昭 夫
代理人 弁趨士 松 崎 清
冨 III
笥21F
第 3 !JFIG. 1 is a flowchart showing the processing procedure of the present invention, and FIG. 2 is an explanatory diagram for specifically explaining the implementation 91t of the present invention, #! FIG. 3 is an enlarged view showing a part thereof. Code explanation: WA, WB, W3A, W3B...Line width, Thl
, Thz...Threshold value, Yl""Yll...
. . . Start coordinates of the row, y1 to ytt . . . End coordinates of the row. Agent Benhori Akio Namiki Agent Benhori Kiyotomi Matsuzaki III 21F No. 3! J
Claims (1)
行方向の幅(行幅)を求め、該行幅を行切出しの過程で
抽出された標準文字サイズから決まる第1のしきい値と
比較して行幅の狭い文字行を探し出し、該行幅の狭い文
字行をその前、後の文字行とそれぞれ統合した2つの統
合文字行につき、その各行幅を前記標準文字サイズから
決まる前記第1のしきい値とは異なる第2のしきい値と
それぞれ比較してルビ・アンダーライン・傍点・傍線か
らなる特殊文字行を検出することを特徴とする特殊文字
行の検出方法。Image processing a document to cut out character lines, find the width in the line direction (line width) of each line, and calculate the line width using the first method determined from the standard character size extracted during the line cutting process. Find a character line with a narrow line width by comparing it with a threshold, and for each of the two merged character lines obtained by merging the narrow character line with the previous and subsequent character lines, calculate the line width from the standard character size. A method for detecting a special character line, comprising detecting a special character line consisting of ruby, underline, side dots, and side lines by comparing each with a second threshold value that is different from the determined first threshold value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP63180284A JP2569132B2 (en) | 1988-07-21 | 1988-07-21 | How to determine special character lines |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP63180284A JP2569132B2 (en) | 1988-07-21 | 1988-07-21 | How to determine special character lines |
Publications (2)
Publication Number | Publication Date |
---|---|
JPH0231286A true JPH0231286A (en) | 1990-02-01 |
JP2569132B2 JP2569132B2 (en) | 1997-01-08 |
Family
ID=16080523
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP63180284A Expired - Lifetime JP2569132B2 (en) | 1988-07-21 | 1988-07-21 | How to determine special character lines |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP2569132B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004094292A (en) * | 2002-08-29 | 2004-03-25 | Ricoh Co Ltd | Character recognizing device, character recognizing method, and program used for executing the method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6048582A (en) * | 1983-08-25 | 1985-03-16 | Fujitsu Ltd | Character cutting-out method of character recognizer |
JPS61269778A (en) * | 1985-05-24 | 1986-11-29 | Agency Of Ind Science & Technol | Character line extracting device |
-
1988
- 1988-07-21 JP JP63180284A patent/JP2569132B2/en not_active Expired - Lifetime
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6048582A (en) * | 1983-08-25 | 1985-03-16 | Fujitsu Ltd | Character cutting-out method of character recognizer |
JPS61269778A (en) * | 1985-05-24 | 1986-11-29 | Agency Of Ind Science & Technol | Character line extracting device |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004094292A (en) * | 2002-08-29 | 2004-03-25 | Ricoh Co Ltd | Character recognizing device, character recognizing method, and program used for executing the method |
Also Published As
Publication number | Publication date |
---|---|
JP2569132B2 (en) | 1997-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR890009168A (en) | Image processing device | |
JPH0231286A (en) | Detecting method for special character row | |
JP2569103B2 (en) | Character detection method | |
JPH0410087A (en) | Base line extracting method | |
JPS629958B2 (en) | ||
JP3193573B2 (en) | Character recognition device with brackets | |
JP2510722B2 (en) | How to distinguish uppercase and lowercase letters in English | |
JP2520174B2 (en) | Automatic character extraction device | |
JP2768289B2 (en) | Character segmentation device | |
JP2821303B2 (en) | Sharp character combination method | |
JP2683116B2 (en) | Ruled line removal method | |
JP2974145B2 (en) | Correcting character recognition results | |
JPH0498477A (en) | Character segmenting method | |
Walker | Spacio-Temporal Arabic Character Recognition Using Polynomial Contour Fitting-Internal Report | |
JPH05242299A (en) | Character recognition device | |
JPH10214308A (en) | Character discrimination method | |
JP3665435B2 (en) | Character recognition device and character recognition method | |
JPS6383890A (en) | Character recognizing device | |
JP2922949B2 (en) | Post-processing method for character recognition | |
Green et al. | Layout analysis of book pages | |
JPH08272909A (en) | Method and device for character recognition | |
JPS6383889A (en) | Character recognizing device | |
JPS6383887A (en) | Character recognizer | |
JPH05174189A (en) | Character recognizing method | |
JPH04353989A (en) | Word segmenting system |