JP2009053826A - Document processor and document processing program - Google Patents

Document processor and document processing program Download PDF

Info

Publication number
JP2009053826A
JP2009053826A JP2007218456A JP2007218456A JP2009053826A JP 2009053826 A JP2009053826 A JP 2009053826A JP 2007218456 A JP2007218456 A JP 2007218456A JP 2007218456 A JP2007218456 A JP 2007218456A JP 2009053826 A JP2009053826 A JP 2009053826A
Authority
JP
Japan
Prior art keywords
character
line segment
curvature
document processing
character image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2007218456A
Other languages
Japanese (ja)
Inventor
Hironari Konno
裕也 今野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujifilm Business Innovation Corp
Original Assignee
Fuji Xerox Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuji Xerox Co Ltd filed Critical Fuji Xerox Co Ltd
Priority to JP2007218456A priority Critical patent/JP2009053826A/en
Publication of JP2009053826A publication Critical patent/JP2009053826A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document processor and a document processing program for discriminating handwritten characters and printed characters without using any character image with high resolution. <P>SOLUTION: When a character image acquisition part 22 acquires a character image as the object of determination, a line segment extraction part 24 executes filamentizing processing to the acquired character image, and extracts line segments from the character image. In this case, the line segment extraction part 24 may extract an end point and an intersection point from the character image, and decompose the character for every line segment on the basis of the edge point and intersection point to extract the line segments. A curvature variation arithmetic part 26 calculates curvature variation for every line segment extracted by the line segment extraction part 24. A character discrimination part 28 totals the curvature variation calculated by the curvature variation arithmetic part 26, and generates a histogram, and determanates whether or not a measured number in the low pass section of the histogram exceeds a preset threshold. When the measured number exceeds the threshold, the character discrimination part 28 discriminates that the object character is a handwritten character, and when the number of measurement does not exceed the threshold, discriminates that the object character is a printed character. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、文書処理装置及び文書処理プログラムに関する。   The present invention relates to a document processing apparatus and a document processing program.

従来から、原稿中の文字列に対して手書き文字・活字文字の判別を行う技術が提案されている。例えば、下記特許文献1では、文字の輪郭線情報に着目し、文字の輪郭線が直線的であるかどうかを見ることにより手書き文字と活字文字との判別を行っている。この技術では、輪郭線情報を鮮明に抽出するために、高い解像度が必要となる。
特開昭58−37776号公報
Conventionally, techniques for discriminating handwritten / printed characters from character strings in a document have been proposed. For example, in the following Patent Document 1, attention is paid to character outline information, and whether a character outline is linear is determined by distinguishing between handwritten characters and printed characters. In this technique, high resolution is required to clearly extract the contour line information.
JP 58-37776 A

本発明の目的は、高解像度の文字画像を使用せずに手書き文字と活字文字の判別を行う文書処理装置及び文書処理プログラムを提供することにある。   An object of the present invention is to provide a document processing apparatus and a document processing program for discriminating handwritten characters and printed characters without using a high-resolution character image.

上記目的を達成するために、請求項1記載の文書処理装置の発明は、文字画像から線分を抽出する線分抽出手段と、前記線分の曲率の変化量を演算する曲率変化量演算手段と、前記曲率の変化量に基づいて手書き文字と活字文字とを判別する文字判別手段と、を備えることを特徴とする。   In order to achieve the above object, the document processing apparatus according to claim 1 is characterized in that a line segment extracting unit that extracts a line segment from a character image and a curvature change amount calculating unit that calculates a change amount of the curvature of the line segment. And character discriminating means for discriminating between handwritten characters and printed characters based on the amount of change in curvature.

請求項2記載の発明は、請求項1記載の発明において、前記線分抽出手段が、文字を構成する線分を細線化処理した後に線分抽出を行うことを特徴とする。   The invention according to claim 2 is characterized in that, in the invention according to claim 1, the line segment extraction means performs line segment extraction after thinning a line segment constituting a character.

請求項3記載の発明は、請求項2記載の発明において、前記線分抽出手段が、文字を構成する線分の端点及び交点を抽出し、これらの点で文字を線分毎に分解して線分を抽出することを特徴とする。   The invention according to claim 3 is the invention according to claim 2, wherein the line segment extracting means extracts end points and intersections of line segments constituting the character, and decomposes the character into line segments at these points. It is characterized by extracting a line segment.

請求項4記載の発明は、請求項1から請求項3のいずれか一項記載の発明において、前記文字判別手段が、前記曲率変化量のヒストグラムから手書き文字と活字文字とを判別することを特徴とする。   The invention according to claim 4 is the invention according to any one of claims 1 to 3, wherein the character discriminating means discriminates a handwritten character and a printed character from the curvature change amount histogram. And

請求項5記載の発明は、請求項4記載の発明において、前記文字判別手段が、前記ヒストグラムのピーク形状から手書き文字と活字文字とを判別することを特徴とする。   The invention according to claim 5 is the invention according to claim 4, wherein the character discriminating means discriminates a handwritten character and a printed character from the peak shape of the histogram.

請求項6記載の文書処理プログラムの発明は、コンピュータを、文字画像から線分を抽出する線分抽出手段、前記線分の曲率の変化量を演算する曲率変化量演算手段、前記曲率の変化量に基づいて手書き文字と活字文字とを判別する文字判別手段、として機能させることを特徴とする。   The document processing program according to claim 6 is a computer program comprising: a line segment extracting means for extracting a line segment from a character image; a curvature change amount calculating means for calculating a change amount of the curvature of the line segment; and a change amount of the curvature. It is made to function as a character discrimination means which discriminate | determines a handwritten character and a printed character based on this.

請求項1の発明によれば、文字の輪郭線情報を使用する場合に比べて、高解像度の文字画像を使用せずに手書き文字と活字文字の判別を行うことができる。   According to the first aspect of the present invention, it is possible to discriminate between a handwritten character and a printed character without using a high-resolution character image as compared with the case of using character outline information.

請求項2及び請求項3の発明によれば、本構成を有していない場合に比べて、文字画像から簡単に線分を抽出することができる。   According to the invention of Claim 2 and Claim 3, compared with the case where it does not have this structure, a line segment can be extracted from a character image easily.

請求項4及び請求項5の発明によれば、本構成を有していない場合に比べて、手書き文字と活字文字とを容易に判別することができる。   According to the invention of Claim 4 and Claim 5, compared with the case where this structure is not provided, a handwritten character and a printed character can be discriminate | determined easily.

請求項6の発明によれば、本構成を有していない場合に比べて、高解像度の文字画像を使用せずに手書き文字と活字文字の判別を行うことができる文書処理プログラムを提供できる。   According to the sixth aspect of the present invention, it is possible to provide a document processing program capable of discriminating between handwritten characters and printed characters without using a high-resolution character image as compared with the case where this configuration is not provided.

以下、本発明を実施するための最良の形態(以下、実施形態という)を、図面に従って説明する。   Hereinafter, the best mode for carrying out the present invention (hereinafter referred to as an embodiment) will be described with reference to the drawings.

図1には、本発明にかかる文書処理装置の一実施形態のハードウェア構成が示される。図1において、文書処理装置は、中央処理装置(例えばCPUを用いることができる)10、ランダムアクセスメモリ(RAM)12、画像読取装置14、表示装置16、入力装置18及びハードディスク装置(HDD)20を含んで構成されている。また、これらの構成要素は、バス21により互いに接続されている。   FIG. 1 shows a hardware configuration of an embodiment of a document processing apparatus according to the present invention. In FIG. 1, the document processing apparatus includes a central processing unit (for example, a CPU can be used) 10, a random access memory (RAM) 12, an image reading device 14, a display device 16, an input device 18, and a hard disk device (HDD) 20. It is comprised including. These components are connected to each other by a bus 21.

CPU10は、RAM12またはハードディスク装置20に格納されている制御プログラムに基づいて、後述する各部の動作を制御する。RAM12は主としてCPU10の作業領域として機能する。   The CPU 10 controls the operation of each unit, which will be described later, based on a control program stored in the RAM 12 or the hard disk device 20. The RAM 12 mainly functions as a work area for the CPU 10.

また、画像読取装置14は、スキャナ等により構成され、文字画像を読み取る装置である。   The image reading device 14 is configured by a scanner or the like and reads a character image.

また、表示装置16は、液晶ディスプレイ等により構成され、判定対象の文字画像等を表示する。   The display device 16 is configured by a liquid crystal display or the like, and displays a character image to be determined.

また、入力装置18は、キーボード、ポインティングデバイス等により構成され、使用者が動作指示等を入力するために使用する。   The input device 18 includes a keyboard, a pointing device, and the like, and is used by a user to input operation instructions and the like.

また、ハードディスク装置20は、コンピュータが読み取り可能な大容量の記憶装置であり、後述する処理に必要となる種々のデータを記憶することができる。   The hard disk device 20 is a large-capacity storage device that can be read by a computer, and can store various data necessary for processing to be described later.

図2には、本発明にかかる文書処理装置の一実施形態の機能ブロック図が示される。図2において、文書処理装置は、文字画像取得部22、線分抽出部24、曲率変化量演算部26及び文字判別部28を含んで構成されている。   FIG. 2 shows a functional block diagram of an embodiment of a document processing apparatus according to the present invention. In FIG. 2, the document processing apparatus includes a character image acquisition unit 22, a line segment extraction unit 24, a curvature change amount calculation unit 26, and a character determination unit 28.

文字画像取得部22は、例えば図1に示される画像読取装置14及びこれをCPU10により制御するためのプログラムにより構成され、判別対象である文字の画像データを取得する。なお、文字画像取得部22は、画像読取装置14の代わりに適宜な通信手段等を介して文字画像の電子データを取得する構成としてもよい。   The character image acquisition unit 22 includes, for example, the image reading device 14 shown in FIG. 1 and a program for controlling the image reading device 14 by the CPU 10, and acquires image data of a character to be determined. Note that the character image acquisition unit 22 may be configured to acquire electronic data of a character image via an appropriate communication unit or the like instead of the image reading device 14.

線分抽出部24は、例えばCPU10及びCPU10の処理動作を制御するプログラムを含んで構成され、文字画像取得部22が取得した文字画像から線分を抽出する。線分の抽出は、文字を構成する線分の幅を狭くする細線化処理を行った後に、文字を構成する線分を追跡することにより行うことができる。また、例えば文字を構成する線分の端点及び交点を抽出し、これらの点で文字を線分毎に分解することにより線分の抽出を行うこともできる。なお、上記細線化処理及び線分抽出方法については後述する。   The line segment extraction unit 24 includes, for example, the CPU 10 and a program that controls the processing operation of the CPU 10, and extracts a line segment from the character image acquired by the character image acquisition unit 22. The line segment can be extracted by performing a thinning process for narrowing the width of the line segment constituting the character and then tracking the line segment constituting the character. Further, for example, line segments can be extracted by extracting end points and intersections of line segments constituting the character, and decomposing the character into line segments at these points. The thinning process and the line segment extraction method will be described later.

曲率変化量演算部26は、例えばCPU10及びCPU10の処理動作を制御するプログラムを含んで構成され、線分抽出部24が抽出した線分の曲率の変化量(曲率変化量)を演算する。曲率変化量は、例えば線分を構成する画素間に引いた直線の傾きから求めることができる。曲率変化量の演算方法については後述する。   The curvature change amount calculation unit 26 is configured to include, for example, the CPU 10 and a program for controlling the processing operation of the CPU 10, and calculates the change amount (curvature change amount) of the curvature of the line segment extracted by the line segment extraction unit 24. The amount of curvature change can be obtained from, for example, the slope of a straight line drawn between pixels constituting the line segment. A method of calculating the curvature change amount will be described later.

文字判別部28は、例えばCPU10及びCPU10の処理動作を制御するプログラムを含んで構成され、上記曲率変化量に基づいて手書き文字と活字文字とを判別する。手書き文字と活字文字との判別は、例えば曲率変化量のヒストグラムをとることにより行うことができる。この判別方法については後述する。   The character determination unit 28 includes, for example, the CPU 10 and a program for controlling the processing operation of the CPU 10, and determines a handwritten character and a printed character based on the curvature change amount. The discrimination between the handwritten character and the printed character can be performed by taking, for example, a histogram of the amount of curvature change. This determination method will be described later.

図3(a),(b)には、線分抽出部24が行う細線化処理の説明図が示される。図3(a)が細線化処理前の文字画像(大切)であり、図3(b)が細線化処理後の文字画像である。細線処理は、文字を構成する線分の幅方向の画素数を減らし、幅1画素の線画像に変換する処理である。   3A and 3B are explanatory diagrams of the thinning process performed by the line segment extraction unit 24. FIG. FIG. 3A shows a character image (important) before the thinning process, and FIG. 3B shows a character image after the thinning process. The fine line process is a process of reducing the number of pixels in the width direction of the line segment constituting the character and converting it into a line image having a width of 1 pixel.

図4には、線分抽出部24が行う線分抽出方法の一例の説明図が示される。図4では、ひらがなの「あ」を例に説明する。図4において、「あ」を構成する線分上には、白丸(○)で示される端点と、三角(△)で示され、線分同士が交わる交点とが存在する。線分抽出部24は、各線分を上記端点及び交点により分解し、これらに挟まれた線分を抽出する。その際に、例えば端点aと交点αで挟まれた線分、交点αと交点βで挟まれた線分及び交点αと交点γで挟まれた線分のように長さが短い線分は、文字判別部28における判別処理が不正確になる可能性が有るので、判別処理から除外してもよい。また、上記端点のみを使用し、端点に挟まれた線分を判別処理に使用してもよい。なお、線分抽出方法は、図4に示された例に限定されない。文字を構成する線分を追跡できる方法であればいずれも本発明に適用することができる。   FIG. 4 is an explanatory diagram illustrating an example of a line segment extraction method performed by the line segment extraction unit 24. In FIG. 4, explanation will be given by taking “a” of hiragana as an example. In FIG. 4, an end point indicated by a white circle (◯) and an intersection point indicated by a triangle (Δ) and where the line segments intersect each other exist on the line segment constituting “A”. The line segment extraction unit 24 decomposes each line segment by the end points and the intersections, and extracts a line segment sandwiched between them. At that time, for example, a line segment that is short between the end point a and the intersection point α, a line segment that is sandwiched between the intersection point α and the intersection point β, and a line segment that is sandwiched between the intersection point α and the intersection point γ are Since the determination process in the character determination unit 28 may be inaccurate, it may be excluded from the determination process. Alternatively, only the end point may be used, and a line segment sandwiched between the end points may be used for the determination process. The line segment extraction method is not limited to the example shown in FIG. Any method can be applied to the present invention as long as it can trace a line segment constituting a character.

図5には、曲率変化量演算部26が行う曲率変化量の演算方法の例が示される。図5では、9つの画素で構成された線分を例に説明する。図5において、曲率変化量演算部26は、隣接する画素毎または数画素飛ばしで画素を結ぶ線を設定し、その線の傾きの変化から曲率変化量を求める。   FIG. 5 shows an example of a method for calculating the curvature change amount performed by the curvature change amount calculation unit 26. In FIG. 5, a line segment composed of nine pixels will be described as an example. In FIG. 5, the curvature change amount calculation unit 26 sets a line connecting pixels for every adjacent pixel or by skipping several pixels, and obtains a curvature change amount from a change in inclination of the line.

例えば、隣接する画素毎に線分を設定する場合には、図5に示された1番目の画素と2番目の画素を結ぶ線、2番目の画素と3番目の画素を結ぶ線というように隣り合う画素を結ぶ線を設定し、それらの傾きの変化を線分の曲率変化量として求める。また、数画素飛ばしで画素を結ぶ線を設定する場合には、図5に示された1番目の画素と7番目の画素を結ぶ線、2番目の画素と8番目の画素を結ぶ線というように数画素飛ばした(図5の例では5画素飛ばしている)画素を結ぶ線を設定し、それらの傾きの変化を線分の曲率変化量として求める。なお、飛ばす画素数は限定されない。   For example, when a line segment is set for each adjacent pixel, a line connecting the first pixel and the second pixel shown in FIG. 5, a line connecting the second pixel and the third pixel, and so on. A line connecting adjacent pixels is set, and a change in the inclination is obtained as a curvature change amount of the line segment. When a line connecting pixels by skipping several pixels is set, a line connecting the first pixel and the seventh pixel shown in FIG. 5 is a line connecting the second pixel and the eighth pixel. A line connecting pixels that have been skipped several pixels (five pixels skipped in the example of FIG. 5) is set, and the change in the inclination is determined as the curvature change amount of the line segment. Note that the number of pixels to be skipped is not limited.

図6(a),(b)には、文字判別部28が手書き文字と活字文字との判別処理に使用するヒストグラムの例が示される。図6(a),(b)では、横軸が演算した曲率変化量であり、縦軸が曲率変化量の各値毎の計測数(頻度)である。また、図6(a)が手書き文字のヒストグラムの例であり、図6(b)が活字文字のヒストグラムの例である。   6A and 6B show examples of histograms used by the character discrimination unit 28 for discrimination processing between handwritten characters and printed characters. 6A and 6B, the horizontal axis represents the calculated curvature change amount, and the vertical axis represents the number of measurements (frequency) for each value of the curvature change amount. FIG. 6A is an example of a histogram of handwritten characters, and FIG. 6B is an example of a histogram of printed characters.

一般に、文字を構成する線分の曲率変化量をヒストグラム化すると、活字文字と比較して手書き文字では曲率変化量の小さな値(低域)の計測数が大きくなる。これは、手書きによる曲率の揺れが曲率変化量の低域に現れるためである。このようなヒストグラムの違いを比較することで、手書き文字と活字文字を判別が可能となる。例えば、図6(a)に示された手書き文字のヒストグラムでは、曲率変化量の低域における計測数が、図6(b)に示された活字文字のヒストグラムより大きくなっている。そこで、適宜な閾値を設定し、曲率変化量の低域における計測数が閾値を超えた場合に手書き文字と判別することができる。   In general, when the amount of curvature change of a line segment constituting a character is made into a histogram, the number of measurements of a small value (low range) of the amount of curvature change is larger for handwritten characters than for printed characters. This is because the curvature fluctuation due to handwriting appears in the low range of the curvature change amount. By comparing such differences in histograms, it is possible to distinguish between handwritten characters and printed characters. For example, in the histogram of handwritten characters shown in FIG. 6A, the number of measurements in the low range of the curvature change amount is larger than the histogram of type characters shown in FIG. 6B. Therefore, an appropriate threshold value is set, and when the number of measurements in the low range of the curvature change amount exceeds the threshold value, it can be determined as a handwritten character.

また、手書きで文字を書く場合、純粋な直線や一定の曲率の曲線を描くことは困難であるので、手書き文字には活字文字に比べて線分にぶれすなわち小さな蛇行が多い。このため、手書き文字のヒストグラムでは、図6(a)の丸で囲まれた部分に示されるように、低域にある程度の幅をもったピークが形成される。これに対して、活字文字のヒストグラムでは、幅の少ないピークが現れる。従って、ピーク形状をみることによっても手書き文字と活字文字との判別を行うことができる。   In addition, when writing characters by hand, it is difficult to draw a pure straight line or a curve with a certain curvature. Therefore, handwritten characters are more likely to be blurred, that is, have smaller meanders than printed characters. For this reason, in the histogram of handwritten characters, a peak having a certain width is formed in the low band as shown in the circled part of FIG. On the other hand, a peak with a small width appears in the histogram of type characters. Therefore, it is possible to discriminate between handwritten characters and printed characters by looking at the peak shape.

以上述べたことから、文字判別部28は、曲率変化量演算部26が演算した曲率変化量を受け取り、その値毎に計測数を求め、図6(a),(b)に示されるヒストグラムを生成することにより、手書き文字と活字文字との判別を行う。   As described above, the character discriminating unit 28 receives the curvature change amount calculated by the curvature change amount calculating unit 26, obtains the number of measurements for each value, and displays the histograms shown in FIGS. 6 (a) and 6 (b). By generating, it distinguishes between handwritten characters and printed characters.

図7には、本発明にかかる文書処理装置の動作例のフローが示される。図7において、文字画像取得部22が判別対象である文字画像を取得する(S1)。文字画像は、例えば利用者が画像読取装置14にセットし、入力装置18から読み取り指示を入力する等の工程により取得される。   FIG. 7 shows a flow of an operation example of the document processing apparatus according to the present invention. In FIG. 7, the character image acquisition unit 22 acquires a character image as a discrimination target (S1). The character image is acquired by, for example, a process in which a user sets the image reading device 14 and inputs a reading instruction from the input device 18.

線分抽出部24は、文字画像取得部22が取得した文字画像に対して細線化処理を実行し(S2)、文字画像から線分を抽出する(S3)。ここで、線分抽出部24は、細線化処理後の文字画像から端点及び交点を抽出し、この端点及び交点により文字を線分毎に分解することにより文字画像から線分を抽出してもよい。   The line segment extraction unit 24 performs thinning processing on the character image acquired by the character image acquisition unit 22 (S2), and extracts a line segment from the character image (S3). Here, the line segment extraction unit 24 extracts end points and intersections from the thinned character image, and extracts line segments from the character image by decomposing the characters into line segments based on the end points and intersections. Good.

曲率変化量演算部26は、線分抽出部24が抽出した線分毎に曲率変化量を演算する(S4)。   The curvature change amount calculation unit 26 calculates a curvature change amount for each line segment extracted by the line segment extraction unit 24 (S4).

文字判別部28は、曲率変化量演算部26が算出した曲率変化量を集計し、ヒストグラムを生成する(S5)。次に、文字判別部28は、図6(a),(b)で説明したヒストグラムの低域部分における計測数が予め設定した閾値を超えているか否かを判定する(S6)。   The character discriminating unit 28 totals the curvature change amounts calculated by the curvature change amount calculating unit 26 and generates a histogram (S5). Next, the character determination unit 28 determines whether or not the number of measurements in the low frequency portion of the histogram described with reference to FIGS. 6A and 6B exceeds a preset threshold value (S6).

文字判別部28は、上記計測数が閾値を超えている場合に、対象となる文字が手書き文字と判別し(S7)、閾値を超えていない場合には活字文字と判別する(S8)。   The character discriminating unit 28 discriminates that the target character is a handwritten character when the measured number exceeds the threshold (S7), and discriminates it as a printed character when the measured number does not exceed the threshold (S8).

以上、本発明の実施形態をいくつか紹介したが、本発明は上記実施形態に限定されるものではない。   Although several embodiments of the present invention have been introduced above, the present invention is not limited to the above embodiments.

本発明にかかる文書処理装置の一実施形態のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of one Embodiment of the document processing apparatus concerning this invention. 本発明にかかる文書処理装置の一実施形態の機能ブロック図である。It is a functional block diagram of one Embodiment of the document processing apparatus concerning this invention. 線分抽出部が行う細線化処理の説明図である。It is explanatory drawing of the thinning process which a line segment extraction part performs. 線分抽出部が行う線分抽出方法の一例の説明図である。It is explanatory drawing of an example of the line segment extraction method which a line segment extraction part performs. 曲率変化量演算部が行う曲率変化量の演算方法の例を示す図である。It is a figure which shows the example of the calculation method of the curvature change amount which a curvature change amount calculation part performs. 文字判別部が手書き文字と活字文字との判別処理に使用するヒストグラムの例を示す図である。It is a figure which shows the example of the histogram which a character discrimination | determination part uses for the discrimination | determination process of a handwritten character and a printed character. 本発明にかかる文書処理装置の動作例のフロー図である。It is a flowchart of the operation example of the document processing apparatus concerning this invention.

符号の説明Explanation of symbols

10 CPU、12 RAM、14 画像読取装置、16 表示装置、18 入力装置、20 ハードディスク装置、21 バス、22 文字画像取得部、24 線分抽出部、26 曲率変化量演算部、28 文字判別部。   10 CPU, 12 RAM, 14 image reading device, 16 display device, 18 input device, 20 hard disk device, 21 bus, 22 character image acquisition unit, 24 line segment extraction unit, 26 curvature change amount calculation unit, 28 character discrimination unit.

Claims (6)

文字画像から線分を抽出する線分抽出手段と、
前記線分の曲率の変化量を演算する曲率変化量演算手段と、
前記曲率の変化量に基づいて手書き文字と活字文字とを判別する文字判別手段と、
を備えることを特徴とする文書処理装置。
A line segment extracting means for extracting a line segment from a character image;
Curvature change amount calculating means for calculating a change amount of the curvature of the line segment;
Character discriminating means for discriminating between handwritten characters and printed characters based on the amount of change in curvature;
A document processing apparatus comprising:
請求項1記載の文書処理装置において、前記線分抽出手段は、文字を構成する線分を細線化処理した後に線分抽出を行うことを特徴とする文書処理装置。   2. The document processing apparatus according to claim 1, wherein the line segment extraction unit performs line segment extraction after thinning a line segment constituting a character. 請求項2記載の文書処理装置において、前記線分抽出手段は、文字を構成する線分の端点及び交点を抽出し、これらの点で文字を線分毎に分解して線分を抽出することを特徴とする文書処理装置。   3. The document processing apparatus according to claim 2, wherein the line segment extracting unit extracts end points and intersections of line segments constituting the character, and extracts the line segment by decomposing the character into line segments at these points. A document processing apparatus characterized by the above. 請求項1から請求項3のいずれか一項記載の文書処理装置において、前記文字判別手段は、前記曲率変化量のヒストグラムから手書き文字と活字文字とを判別することを特徴とする文書処理装置。   4. The document processing apparatus according to claim 1, wherein the character discriminating unit discriminates between a handwritten character and a printed character from the curvature change amount histogram. 5. 請求項4記載の文書処理装置において、前記文字判別手段は、前記ヒストグラムのピーク形状から手書き文字と活字文字とを判別することを特徴とする文書処理装置。   5. The document processing apparatus according to claim 4, wherein the character discriminating unit discriminates handwritten characters and printed characters from the peak shape of the histogram. コンピュータを、
文字画像から線分を抽出する線分抽出手段、
前記線分の曲率の変化量を演算する曲率変化量演算手段、
前記曲率の変化量に基づいて手書き文字と活字文字とを判別する文字判別手段、
として機能させることを特徴とする文書処理プログラム。
Computer
A line segment extracting means for extracting a line segment from a character image;
Curvature change amount calculation means for calculating a change amount of the curvature of the line segment,
Character discriminating means for discriminating handwritten characters and printed characters based on the amount of change in curvature,
A document processing program characterized by functioning as
JP2007218456A 2007-08-24 2007-08-24 Document processor and document processing program Pending JP2009053826A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2007218456A JP2009053826A (en) 2007-08-24 2007-08-24 Document processor and document processing program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2007218456A JP2009053826A (en) 2007-08-24 2007-08-24 Document processor and document processing program

Publications (1)

Publication Number Publication Date
JP2009053826A true JP2009053826A (en) 2009-03-12

Family

ID=40504875

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2007218456A Pending JP2009053826A (en) 2007-08-24 2007-08-24 Document processor and document processing program

Country Status (1)

Country Link
JP (1) JP2009053826A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011074067A1 (en) * 2009-12-15 2011-06-23 富士通フロンテック株式会社 Character recognition method, character recognition device, and character recognition program

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011074067A1 (en) * 2009-12-15 2011-06-23 富士通フロンテック株式会社 Character recognition method, character recognition device, and character recognition program
US8588520B2 (en) 2009-12-15 2013-11-19 Fujitsu Frontech Limited Character recognition method, character recognition apparatus, and character recognition program
JP5363591B2 (en) * 2009-12-15 2013-12-11 富士通フロンテック株式会社 Character recognition method, character recognition device, and character recognition program

Similar Documents

Publication Publication Date Title
US8503780B2 (en) Apparatus for detecting text recognition region and method of recognizing text
US20110273474A1 (en) Image display apparatus and image display method
US9076067B2 (en) Information processing apparatus and method for classifier-based object detection
JP4738469B2 (en) Image processing apparatus, image processing program, and image processing method
JP5600723B2 (en) Method and system for splitting characters in a text line having various character widths
JP7244223B2 (en) Identifying emphasized text in electronic documents
RU2581786C1 (en) Determination of image transformations to increase quality of optical character recognition
RU2673015C1 (en) Methods and systems of optical recognition of image series characters
JP2013171309A (en) Character segmentation method, and character recognition device and program using the same
JP5633188B2 (en) Method and apparatus for extracting raster images from portable electronic documents
US20100014752A1 (en) Image processing apparatus, image processing method and program
JP2009053826A (en) Document processor and document processing program
JP2008027058A (en) Image detection apparatus and image detection method
JP2014078168A (en) Character recognition apparatus and program
JP2012022413A (en) Image processing apparatus, image processing method and program
JP4291870B1 (en) Kanji recognition program, portable terminal device, and kanji recognition method
JP5277750B2 (en) Image processing program, image processing apparatus, and image processing system
JP5298830B2 (en) Image processing program, image processing apparatus, and image processing system
JP2010092426A (en) Image processing device, image processing method, and program
US9047535B2 (en) Image processing apparatus, image processing method, and computer readable medium
JP2010258627A (en) Image processor, image processing method, program, and storage medium
JP2004158041A (en) Surface image processor and its program storage medium
JP2013186610A (en) Character extraction device and character extraction program
JP3880091B2 (en) Information processing apparatus and method
JP4915337B2 (en) Print data processing program, method and apparatus