JPH0242589A - Character pattern segmenting device - Google Patents

Character pattern segmenting device

Info

Publication number
JPH0242589A
JPH0242589A JP63193975A JP19397588A JPH0242589A JP H0242589 A JPH0242589 A JP H0242589A JP 63193975 A JP63193975 A JP 63193975A JP 19397588 A JP19397588 A JP 19397588A JP H0242589 A JPH0242589 A JP H0242589A
Authority
JP
Japan
Prior art keywords
character
pattern
evaluation value
recognition
shape evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP63193975A
Other languages
Japanese (ja)
Inventor
Hiroyasu Miyahara
景泰 宮原
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Priority to JP63193975A priority Critical patent/JPH0242589A/en
Publication of JPH0242589A publication Critical patent/JPH0242589A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

PURPOSE:To improve character recognizing accuracy by segmenting a character pattern by means of the character shape evaluating value of the basic pattern or a coupled pattern of an objective character when the value is larger than a prescribed value, or with adding a character recognition evaluating value when it is smaller than the prescribed value. CONSTITUTION:The character string pattern obtained by a scanning means 1 is stored into a pattern memory means 2, the basic pattern and the coupled pattern are detected by a pattern area detecting means 3, and from this, a character shape evaluating value calculating means 4 calculates the character shape evaluating value. On the other hand, a character recognizing means 6 recognizes the basic pattern or the coupled pattern as the character pattern by a recognizing dictionary 5, and calculates the character recognition evaluating value. An evaluating value deciding/segmenting means 7 segments the character pattern when the character shape evaluating value is larger than a prescribed threshold. When it is smaller when the threshold, the character recognition evaluating value is obtained by the means 6, it is added to the character shape evaluating value, and the character pattern is segmented based on the value.

Description

【発明の詳細な説明】 〔産業上の利用分野〕 この発明は文書に記入された文字列から文字パターンを
切り出す文字パターン切り出し装置に関し、特に文字枠
が指定されていないフリーピッチの文字列を対象とする
文字パターンの切り出しを行う文字パターン切り出し装
置に関するものである。
[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a character pattern cutting device that cuts out character patterns from character strings written in documents, and is particularly applicable to free-pitch character strings for which no character frame is specified. The present invention relates to a character pattern cutting device for cutting out a character pattern.

〔従来の技術〕[Conventional technology]

第8図は例えば先に当社より出願した特開昭61−17
5878号(特願昭60−17265号)に示された従
来の文字パターン切り出し装置の構成を示すブロック図
である。図おいて、1は文書に記入された文字列を光学
的に走査して光電変換する走査手段、2は上記光電変換
された文字列のパターンを記憶する文字列パターン記憶
手段、3は上記文字列の方向と直交する方向に上記文字
列パターンを走査して求めた黒点の周辺分布値に基づい
て文字列パターンを分割して得た基本パターンの左右端
と上下端との座標から基本パターン領域を検出する基本
パターン領域検出手段、4は単独の基本パターン領域の
文字形状評価値と連続する複数個の基本パターン領域を
結合した結合パ左右端と上下端との座標から算出する文
字形状評価値算出手段、9は文字形状評価値算出手段4
で求められた基本パターン領域の文字形状評価値と結合
パターン領域の文字形状評価値とに基づいて上記文字列
パターンから個々の文字パターンを切り出す文字切り出
し手段、8は文字切り出し手段9によって切り出された
文字パターンを出力する出力手段である。
Figure 8 shows, for example, the Japanese Patent Laid-Open No. 61-17 filed earlier by our company.
5878 (Japanese Patent Application No. 60-17265) is a block diagram showing the configuration of a conventional character pattern cutting device. In the figure, 1 is a scanning means for optically scanning and photoelectrically converting a character string written on a document, 2 is a character string pattern storage means for storing the pattern of the photoelectrically converted character string, and 3 is a character string for the above-mentioned characters. The basic pattern area is calculated from the coordinates of the left and right ends and top and bottom ends of the basic pattern obtained by dividing the character string pattern based on the peripheral distribution value of black points obtained by scanning the above character string pattern in a direction perpendicular to the column direction. 4 is a character shape evaluation value calculated from the coordinates of the left and right ends and the top and bottom ends of a combination of a character shape evaluation value of a single basic pattern region and a combination of a plurality of consecutive basic pattern regions; Calculation means, 9 is character shape evaluation value calculation means 4
Character cutting means 8 cuts out individual character patterns from the character string pattern based on the character shape evaluation value of the basic pattern area and the character shape evaluation value of the combined pattern area obtained in . This is an output means for outputting character patterns.

次に動作について説明する。文書内の文字列を走査手段
1で走査し得られた文字列パターンを文字列パターン記
憶手段2に格納する。格納された文字列パターンは基本
パターン領域検出手段3に渡され基本パターンを検出す
る。具体的には上記文字列パターンを文字列方向と垂直
に走査し、黒点数を計数し周辺分布値を求める。その周
辺分布値が所定のしきい値を越える領域を求め、基本パ
ターンを得る。第5図は文字列パターンと周辺分布値と
の一例であり、それから得られた基本パターンが第6図
である。次に上記基本パターン領域検出手段3で求めた
各基本パターンの上下端および左右端の座標を入力とし
て文字形状評価値算出手段4は各基本パターンあるいは
連続する複数の基本パターンが結合してできたパターン
(以後結合パターンと呼ぶ)がどの程度に一つの文字と
して文字らしいかを定量的に示す値である文字形状評価
値を算出する。これは、例えばパターンに外接する矩形
が正方形に近く、そのパターンの両端にある空白部分が
大きいほど大きい値をとるような関数によって求められ
る。
Next, the operation will be explained. A character string pattern in a document is scanned by a scanning means 1 and the obtained character string pattern is stored in a character string pattern storage means 2. The stored character string pattern is passed to basic pattern area detection means 3 to detect the basic pattern. Specifically, the character string pattern is scanned perpendicularly to the character string direction, the number of black dots is counted, and the marginal distribution value is determined. A basic pattern is obtained by finding a region whose marginal distribution value exceeds a predetermined threshold. FIG. 5 is an example of a character string pattern and marginal distribution values, and FIG. 6 is a basic pattern obtained from it. Next, by inputting the coordinates of the upper and lower ends and left and right ends of each basic pattern obtained by the basic pattern area detection means 3, the character shape evaluation value calculation means 4 calculates the shape of each basic pattern or the combination of a plurality of consecutive basic patterns. A character shape evaluation value is calculated, which is a value quantitatively indicating how much a pattern (hereinafter referred to as a combined pattern) resembles a single character. This is determined, for example, by a function that takes on a larger value as the rectangle circumscribing the pattern is closer to a square and the blank areas at both ends of the pattern are larger.

次に文字切り出し手段9は、文字形状評価値算出手段4
が求めた基本パターンや結合パターンの文字形状評価値
に基づいて上記文字列パターンから個々の文字パターン
を切り出す。さらに詳しくは各パターンの文字形状評価
値にそのパターンに含まれる基本パターン数をかけた値
(以後重み付き形状評価値と呼ぶ)を求め、各パターン
をそれぞれ1文字と見なして切り出し可能な組み合わせ
のうち重み付き形状評価値の和が最大となるものを選び
切り出し結果とする。第6図に示した基本パターンに対
する切り出し可能な組み合わせの例を第7図に示す。文
字切り出し手段9によって切り出されたパターンは、一
つの文字パターンとして出力手段8から出力される。
Next, the character cutting means 9 uses the character shape evaluation value calculating means 4.
Individual character patterns are cut out from the character string pattern based on the character shape evaluation values of the basic pattern and combined pattern determined by. More specifically, the character shape evaluation value of each pattern is multiplied by the number of basic patterns included in that pattern (hereinafter referred to as the weighted shape evaluation value), and each pattern is considered to be one character and combinations that can be extracted are calculated. Among them, the one with the largest sum of weighted shape evaluation values is selected and used as the cutout result. FIG. 7 shows examples of combinations that can be cut out for the basic pattern shown in FIG. 6. The pattern cut out by the character cutting means 9 is outputted from the output means 8 as one character pattern.

〔発明が解決しようとする課題〕[Problem to be solved by the invention]

従来の文字パターン切り出し装置は以上のように文字パ
ターンに外接する矩形の位置と形状とだけで文字を決定
していたので、隣接する文字が接近していたり、あるい
は漢字の場合であると偏と労とが離れているなどの要因
で文字パターンの誤った切り出しを行ってしまう場合が
あり、文字の認識精度を低下させるという問題点があっ
た。
As mentioned above, conventional character pattern extraction devices determine characters only by the position and shape of the rectangle circumscribing the character pattern, so if adjacent characters are close together or in the case of kanji, it may be difficult to identify characters. There is a problem in that character patterns may be cut out incorrectly due to factors such as the distance between the characters and the characters, which reduces the accuracy of character recognition.

この発明は上記のような問題点を解消するためになされ
たもので、文字間が接近していたり、漢字の偏と労とが
離れていたりして、各文字の位置や形状が不安定なフリ
ーピッチで書かれた文字列からでも正しく文字パターン
を切り出すことができ、文字のL!2識精度の向上を図
れる文字パターン切り出し装置を得ることを目的とする
This invention was made to solve the above-mentioned problems, and the position and shape of each character is unstable due to the characters being close together or the kanji's bias and labor being far apart. It is possible to correctly cut out character patterns even from character strings written in free pitch, and the letter L! An object of the present invention is to obtain a character pattern cutting device capable of improving the recognition accuracy.

〔課題を解決するための手段〕[Means to solve the problem]

この発明に係る文字パターン切り出し装置は、認識対象
文字の標準パターンを格納した認識辞書5と、この認識
辞書5を用いて基本パターンあるいは結合パターンを文
字パターンとして認識し文字認識評価値を出力する文字
認識手段6と、文字形状評価値が所定のしきい値より大
きいときは文字形状評価値に基づいて文字パターンの切
り出しを行い、しきい値以下のときは文字パターンを文
字認識手段6に渡して認識させ出力された文字認識評価
値を文字形状評価値に加えて得た値に基づいて文字パタ
ーンを切り出す評価値判定・切り出し手段7とを備えた
ことを特徴とするものである。
The character pattern cutting device according to the present invention includes a recognition dictionary 5 storing standard patterns of characters to be recognized, and a character recognition dictionary 5 that recognizes a basic pattern or a combination pattern as a character pattern and outputs a character recognition evaluation value. When the character shape evaluation value is larger than a predetermined threshold value, the recognition means 6 cuts out a character pattern based on the character shape evaluation value, and when it is less than the threshold value, the character pattern is passed to the character recognition means 6. The present invention is characterized by comprising an evaluation value determination/cutting means 7 for cutting out a character pattern based on the value obtained by adding the recognized and outputted character recognition evaluation value to the character shape evaluation value.

〔作用〕[Effect]

文字認識手段6は認識辞書5を用いて基本バタンあるい
は結合パターンを文字パターンとして認識し文字認識評
価値を出力する。評価値判定・切り出し手段7は、文字
形状評価値が所定のしきい値より大きいとき、文字形状
評価値に基づいて文字パターンの切り出しを行う。また
、評価値判定・切り出し手段7は、文字形状評価値がし
きい値以下のとき、文字パターンを文字認識手段6に渡
して認識させ、その結果出力された文字認識評価値を文
字形状評価値に加えて得た値に基づいて文字パターンを
切り出す。
The character recognition means 6 uses the recognition dictionary 5 to recognize the basic slam or combination pattern as a character pattern and outputs a character recognition evaluation value. Evaluation value determination/cutout means 7 cuts out a character pattern based on the character shape evaluation value when the character shape evaluation value is larger than a predetermined threshold. Furthermore, when the character shape evaluation value is less than the threshold value, the evaluation value determination/cutting means 7 passes the character pattern to the character recognition means 6 for recognition, and converts the resulting character recognition evaluation value into the character shape evaluation value. Cut out a character pattern based on the value obtained in addition to .

〔発明の実施例〕[Embodiments of the invention]

第1図はこの発明の一実施例に係る文字パターン切り出
し装置の構成を示すブロック図である。
FIG. 1 is a block diagram showing the configuration of a character pattern cutting device according to an embodiment of the present invention.

第1図ににおいて、第8図に示す構成要素に対応するも
のには同一の符号を付し、その説明を省略する。第1図
において、5は認識対象文字の標準パターンを格納した
認識辞書、6は認識辞書5を用いて基本パターンあるい
は結合パターンを文字パターンとして認識し文字認識評
価値を出力する文字認識手段、7は文字形状評価値が所
定のしきい値より大きいときは文字形状評価値に基づい
て文字パターンの切り出しを行い、しきい値以下のとき
は文字パターンを文字認識手段6に渡して認識させ出力
された文字認識評価値を文字形状評価値に加えて得た値
に基づいて文字パターンを切り出す評価値判定・切り出
し手段である。
In FIG. 1, components corresponding to those shown in FIG. 8 are given the same reference numerals, and their explanations will be omitted. In FIG. 1, 5 is a recognition dictionary storing standard patterns of characters to be recognized; 6 is character recognition means for recognizing basic patterns or combination patterns as character patterns using the recognition dictionary 5 and outputting character recognition evaluation values; 7; When the character shape evaluation value is larger than a predetermined threshold, a character pattern is cut out based on the character shape evaluation value, and when it is less than the threshold, the character pattern is passed to the character recognition means 6 to be recognized and output. This evaluation value judgment/cutting means cuts out a character pattern based on the value obtained by adding the character recognition evaluation value obtained by adding the character recognition evaluation value to the character shape evaluation value.

第2図は文字列パターンとその文字パターンから検出し
た基本パターンとであり、図中の10は文字列パターン
の例、11は検出した基本パターンの例を示す。第3図
は、第2図中の基本パターン11と、その基本パターン
11を結合してできた結合パターンの文字形状評価値と
、文字認識評価値との一例である。図中12は文字形状
評価値、13は文字認識評価値である。なお、この例で
は結合パターンを連続する3個以下の基本パターンを結
合したものとしている。
FIG. 2 shows a character string pattern and a basic pattern detected from the character pattern, and 10 in the figure shows an example of the character string pattern, and 11 shows an example of the detected basic pattern. FIG. 3 shows an example of the basic pattern 11 in FIG. 2, the character shape evaluation value of the combined pattern created by combining the basic patterns 11, and the character recognition evaluation value. In the figure, 12 is a character shape evaluation value, and 13 is a character recognition evaluation value. In this example, the combined pattern is a combination of three or less consecutive basic patterns.

第4図(al、 (b)はこの実施例において評価値の
合計を算出した例である。図中14は第2図の文字列パ
ターン10について“文”+“字”+“言”十“忍言”
+“哉”というパターンに分割した例であり、15は分
割された各パターンの重み付き形状評価値、16はその
形状評価値15の和、17は文字認識評価値に基本パタ
ーン数をかけた値(以後重み付き認識評価値と呼ぶ)、
18はその認識評価値17の和、19は重み付き形状評
価値15と重み付き認識評価値17との総計である。
Figures 4 (al) and (b) are examples of calculating the total evaluation value in this example. In the figure, 14 indicates the character string pattern 10 in Figure 2, consisting of 10 "sentence" + "character" + "word". “Ninja words”
In this example, 15 is the weighted shape evaluation value of each divided pattern, 16 is the sum of the shape evaluation values 15, and 17 is the character recognition evaluation value multiplied by the number of basic patterns. value (hereinafter referred to as weighted recognition evaluation value),
18 is the sum of the recognition evaluation values 17, and 19 is the total of the weighted shape evaluation value 15 and the weighted recognition evaluation value 17.

また20は“文”+6字”+“認”+“識”というパタ
ーンに分割した場合であり、21はその各パターンの重
み付き形状評価値、22はその形状評価値21の和、2
3は重み付き認識評価値、24はその認識評価値23の
和、25は重み付き形状評価値21と重み付き認識評価
値23との総計をとった値である。
20 is the case where the pattern is divided into “sentence” + 6 characters” + “recognition” + “knowledge”, 21 is the weighted shape evaluation value of each pattern, 22 is the sum of the shape evaluation values 21, and 2
3 is the weighted recognition evaluation value, 24 is the sum of the recognition evaluation values 23, and 25 is the sum of the weighted shape evaluation value 21 and the weighted recognition evaluation value 23.

次に動作について説明する。走査手段1で例えば第2図
に示すような“文字認識”なる文字列を走査し文字列パ
ターン10を得る。文字列パターン10は文字列パター
ン記憶手段2に格納され、基本パターン領S検出手段3
が基本パターン11を検出する。上記基本パターン領域
検出手段3によって求められた各基本パターン11の上
下端および左右端の座標により、文字形状評価値算出手
段4は各基本パターン11や結合パターンの文字形状評
価値12を算出する。評価値判定・切り出し手段7は文
字形状評価値12から各パターン14.20の重み付き
形状評価値15.21をそれぞれ算出し、それが最大と
なるものを調べ、第4図に示すような“文“+1字”+
“言゛+“忍言”+“哉”なる組み合わせ14を得る。
Next, the operation will be explained. A character string pattern 10 is obtained by scanning a character string "character recognition" as shown in FIG. 2, for example, by the scanning means 1. The character string pattern 10 is stored in the character string pattern storage means 2, and the basic pattern region S detection means 3
detects the basic pattern 11. Based on the coordinates of the upper and lower ends and left and right ends of each basic pattern 11 determined by the basic pattern area detection means 3, the character shape evaluation value calculation means 4 calculates the character shape evaluation value 12 of each basic pattern 11 and combined pattern. The evaluation value judgment/cutting means 7 calculates the weighted shape evaluation value 15.21 of each pattern 14.20 from the character shape evaluation value 12, examines the one with the maximum weighted shape evaluation value, and calculates the weighted shape evaluation value 15.21 as shown in FIG. Sentence “+1 character”+
Obtain the combination 14: “Koto + “Ninja” + “Ya”.

次にこの各パターンの文字形状評価値12としきい値T
Hとを比べる。例えばしきい値TH=0.65とする。
Next, the character shape evaluation value 12 of each pattern and the threshold value T
Compare with H. For example, the threshold value TH=0.65.

各パターンの文字形状評価値12がすべて0.65より
大きい場合には、この組み合わせ14に含まれるパター
ンを文字パターンとして切り出すが、第4図の例では“
言”というパターンの文字形状評価値12が0.65よ
り小さいため評価値判定・切り出し手段7は基本パター
ンと結合パターンとを文字認識手段6に渡す。文字認識
手段6は渡されたパターンと認識辞書5内の標準パター
ンとから文字認識評価値■3を算出する。ここで文字認
識評価値13とはあるパターンがどの程度文字らしいか
を定量的に示す値であり、例えば入力されたパターンか
ら抽出された特徴ベクトルをP = (p I+ p 
z +  ・・・pfi)、認識辞書5内に格納されて
いる文字c1の標準的な特徴ベクトルをF(ci) =
 (feil +  ’Ci2+  H+ Hfci7
)とするとき、次に示すSの最大値で与えられる。
If the character shape evaluation values 12 of each pattern are all larger than 0.65, the patterns included in this combination 14 are cut out as character patterns, but in the example of FIG.
Since the character shape evaluation value 12 of the pattern "" is smaller than 0.65, the evaluation value judgment/cutting means 7 passes the basic pattern and the combined pattern to the character recognition means 6.The character recognition means 6 recognizes the passed pattern. A character recognition evaluation value ■3 is calculated from the standard pattern in the dictionary 5. Here, the character recognition evaluation value 13 is a value that quantitatively indicates how much a certain pattern is like a character. The extracted feature vector is expressed as P = (p I+ p
z + ... pfi), the standard feature vector of the character c1 stored in the recognition dictionary 5 is F(ci) =
(feil + 'Ci2+ H+ Hfci7
), it is given by the maximum value of S shown below.

(P、F  (p= )) S= IPII −IP(c五 ) ■ ここで(P、F(Ci ))はベクトルPとベクトルF
 (Ct )の内積であり、[lPIは(P、  P”
)を示す。
(P, F (p= )) S= IPII - IP (c5) ■ Here, (P, F(Ci)) are vector P and vector F
(Ct), and [lPI is (P, P”
) is shown.

評価値判定・切り出し手段7は、文字認識手段6が算出
した文字認識評価値13から重み付き認識評価値を求め
、重み付き形状評価値に加える。
The evaluation value determination/cutting means 7 obtains a weighted recognition evaluation value from the character recognition evaluation value 13 calculated by the character recognition means 6, and adds it to the weighted shape evaluation value.

そしてその和が最大となる組み合わせを調べ、“文”+
“字”+“認”+“識”という組み合わせ20を得る。
Then, find the combination that maximizes the sum, “sentence” +
Obtain 20 combinations of "character" + "recognition" + "knowledge".

出力手段8はこの組み合わせ20を切り出し結果として
各文字パターンを出力する。
The output means 8 extracts this combination 20 and outputs each character pattern as a result.

実際第4図を見ると、重み付き形状評価値の和16.2
2は“文”+“字”+“言”+“忍言”+“哉”なる組
み合わせ14においては4.23で、“文”+“字”+
“認”+“識”という組み合わせ20においては3.9
8であり、重み付き形状評価値の和16は重み付き形状
評価値の和22より大きいが、これらに重み付き認識評
価値をそれぞれ付加すると、前者の組み合わせ14での
7.32という値に対して後者の組み合わせ20の値の
方が8.70と大きくなり逆転する。
In fact, looking at Figure 4, the sum of weighted shape evaluation values is 16.2.
2 is 4.23 in combination 14, which is “bun” + “character” + “word” + “ningo” + “ya”, and “bun” + “character” +
In the combination 20 of “recognition” + “knowledge”, it is 3.9
8, and the sum of the weighted shape evaluation values, 16, is larger than the sum of the weighted shape evaluation values, 22. However, when weighted recognition evaluation values are added to these, the value of 7.32 for the former combination 14 becomes The value of the latter combination 20 is larger, 8.70, and is reversed.

以上のように文字形状評価値12だけでは“文”+“字
”+“言”+“忍言”+“fa”という組み合わせ14
が選択されてしまう場合でも文字認識評価値13を併用
することで“文”+1字”+“認”+“識”という正し
い組み合わせ2oが選ばれるようになる。
As mentioned above, if the character shape evaluation value is only 12, the combination of “sentence” + “character” + “word” + “ninja word” + “fa” is 14.
Even if the character recognition evaluation value 13 is used in conjunction with the character recognition evaluation value 13, the correct combination 2o of "sentence" + "1 character" + "recognition" + "sense" will be selected.

上記実施例によれば、文字パターンを切り出す場合の評
価値として、パターンの位置・形状を示す文字形状評価
値だけでな(パターンの認識情報である文字認識評価値
を併用することで、位置や形状の偏った文字を多く含む
文字列に対しても正しく文字パターンを切り出すことが
できる。また、文字形状評価値がある程度大きい値をと
るとき、すなわち各パターンの位置形状が安定しており
、文字形状評価値によって切り出されたパターンがすべ
て一つの文字である可能性が高い場合には認識を行わず
、それらはそのまま文字パターンとして切り出すことで
、認識処理を加えたことによる処理時間の増加も最小に
抑えられる。
According to the above embodiment, the evaluation value when cutting out a character pattern is not only the character shape evaluation value that indicates the position and shape of the pattern (by using the character recognition evaluation value that is pattern recognition information in conjunction with the character recognition evaluation value that indicates the position and shape of the pattern), Character patterns can be correctly extracted even from character strings that include many characters with uneven shapes.Also, when the character shape evaluation value takes a relatively large value, that is, the position and shape of each pattern is stable, and the character pattern If there is a high possibility that the patterns extracted based on the shape evaluation value are all one character, recognition is not performed and they are extracted as character patterns as they are, thereby minimizing the increase in processing time due to the addition of recognition processing. can be suppressed to

なお、上記実施例では結合パターンとして第3図に示す
ように連続する3個までの基本パターンを結合したもの
を説明したが、4個以上の基本パターンを結合したもの
であってもよい。
In the above embodiment, the combined pattern is a combination of up to three consecutive basic patterns as shown in FIG. 3, but it may be a combination of four or more basic patterns.

〔発明の効果〕〔Effect of the invention〕

以上のように本発明によれば、認識対象文字の標準パタ
ーンを格納した認識辞書と、基本パターンや結合パター
ンを文字パターンとして認識する文字認識手段と、文字
形状評価値が所定のしきい値を越えたときはその値に基
づいて切り出しを行い、しきい値以下の場合には上記文
字認識手段に文字パターンを認識させ、その結果出力さ
れた文字認識評価値を上記文字形状評価値に加えて得た
値に基づいて文字パターンを切り出す評価値判定・切り
出し手段とを備えて構成したので、文字形状評価値だけ
では誤った文字パターンの切り出しを行うような場合に
対しては文字認識評価値が併用され、これにより文字間
が接近していたり、漢字の偏と労とが離れていたりして
、各文字の位置や形状が不安定なフリーピッチで書かれ
た文字列からでも正しく文字パターンを切り出すことが
でき、したがって文字の認識精度が向上するという効果
が得られる。
As described above, according to the present invention, there is provided a recognition dictionary storing standard patterns of characters to be recognized, a character recognition means for recognizing basic patterns and combination patterns as character patterns, and a character shape evaluation value that satisfies a predetermined threshold value. When it exceeds the threshold, it is cut out based on the value, and when it is below the threshold, the character recognition means is made to recognize the character pattern, and the character recognition evaluation value output as a result is added to the character shape evaluation value. Since the structure is equipped with an evaluation value judgment/cutting means for cutting out a character pattern based on the obtained value, the character recognition evaluation value can be used in cases where an incorrect character pattern is cut out using only the character shape evaluation value. This is used in conjunction with this method, allowing for correct character patterns even from character strings written with free pitch, where the positions and shapes of each character are unstable, such as when the characters are close together or the kanji are far apart from each other. Therefore, the effect of improving character recognition accuracy can be obtained.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図はこの発明の一実施例に係る文字パターン切り出
し装置の構成を示すブロック図、第2図は文字列パター
ンと基本パターンとの一例図、第3図は基本パターンと
結合パターンの文字形状評価値及び文字認識評価値との
一例図、第4図(a)。 (b)は具体的なパターンの組み合わせに対して最終的
な評価値を求めた一例図、第5図は文字例パターンと周
辺分布値との一例図、第6図は基本パターンの一例図、
第7図は第6図の基本パターンを結合してできる結合パ
ターンの組み合わせの一例図、第8図は従来の文字パタ
ーン切り出し装置の構成を示すブロック図である。 1・・・走査手段、2・・・文字列パターン記憶手段、
3・・・基本パターン領域検出手段、4・・・文字形状
評価値算出手段、5・・・認識辞書、6・・・文字認識
手段、7・・・評価値判定・切り出し手段。 代理人  大  岩  増  雄(ほか2名)第1剋 第3目 第4゛口 書(自発) 1、事件の表示 特願昭 63−193975号 2、発明の名称 3、補正をする者 代表者 4、代 88図 85図 5 補正の対象 発明の詳細な説明、図面の欄。 α 補正の内容 (1)明細書第8頁第5行目「第1図ににおいて」とあ
るのを「第1図において」と補正する。 (21回書第10頁第15行目乃至第16行目「各パタ
ーン14.20の」とあるのを「各パターンの」と補正
する (3)図面、第3図を別紙のとおり補正する。 以上
Fig. 1 is a block diagram showing the configuration of a character pattern cutting device according to an embodiment of the present invention, Fig. 2 is an example of a character string pattern and a basic pattern, and Fig. 3 is a character shape of the basic pattern and a combined pattern. An example diagram of evaluation values and character recognition evaluation values, FIG. 4(a). (b) is an example diagram of the final evaluation value obtained for a specific combination of patterns, Figure 5 is an example diagram of character example patterns and marginal distribution values, Figure 6 is an example diagram of basic patterns,
FIG. 7 is an example of a combination of combined patterns created by combining the basic patterns shown in FIG. 6, and FIG. 8 is a block diagram showing the configuration of a conventional character pattern cutting device. 1... Scanning means, 2... Character string pattern storage means,
3... Basic pattern area detection means, 4... Character shape evaluation value calculation means, 5... Recognition dictionary, 6... Character recognition means, 7... Evaluation value determination/cutting means. Agent: Masuo Oiwa (and 2 others) 1st issue, 3rd item, 4th statement (spontaneous) 1. Indication of the case Patent Application No. 1983-193975 2. Name of the invention 3. Representative of the person making the amendment 4, 88 Figure 85 Figure 5 Detailed description of the invention subject to amendment, drawing column. α Contents of the amendment (1) On page 8, line 5 of the specification, the phrase "in Figure 1" is amended to read "in Figure 1." (Correct the phrase ``of each pattern 14.20'' in lines 15 and 16 of page 10 of the 21st edition to ``of each pattern.'' (3) Correct the drawings and Figure 3 as shown in the attached sheet. . that's all

Claims (1)

【特許請求の範囲】[Claims] 文書に記入された文字列を光学的に走査して光電変換す
る走査手段と、上記光電変換された文字列のパターンを
記憶する文字列パターン記憶手段と、上記文字列の方向
と直交する方向に上記文字列パターンを走査して求めた
黒点の周辺分布値に基づいて文字列パターンを分割して
得た基本パターンの左右端と上下端との座標から基本パ
ターン領域を検出する基本パターン領域検出手段と、単
独の基本パターン領域の文字形状評価値と連続する複数
個の基本パターン領域を結合した結合パターン領域の文
字形状評価値とを上記基本パターン領域検出手段で検出
された基本パターン領域の左右端と上下端との座標から
算出する文字形状評価値算出手段とを備え、上記基本パ
ターン領域の文字形状評価値と上記結合パターン領域の
文字形状評価値とに基づいて上記文字列パターンから個
々の文字パターンを切り出す文字パターン切り出し装置
において、認識対象文字の標準パターンを格納した認識
辞書と、この認識辞書を用いて基本パターンあるいは結
合パターンを文字パターンとして認識し文字認識評価値
を出力する文字認識手段と、上記文字形状評価値が所定
のしきい値より大きいときは文字形状評価値に基づいて
文字パターンの切り出しを行い、しきい値以下のときは
文字パターンを上記文字認識手段に渡して認識させ出力
された文字認識評価値を文字形状評価値に加えて得た値
に基づいて文字パターンを切り出す評価値判定・切り出
し手段とを設けたことを特徴とする文字パターン切り出
し装置。
a scanning means for optically scanning and photoelectrically converting a character string written on a document; a character string pattern storage means for storing a pattern of the photoelectrically converted character string; Basic pattern area detection means for detecting a basic pattern area from the coordinates of the left and right ends and top and bottom ends of the basic pattern obtained by dividing the character string pattern based on the peripheral distribution value of black points obtained by scanning the character string pattern. and the character shape evaluation value of a single basic pattern area and the character shape evaluation value of a combined pattern area that combines a plurality of consecutive basic pattern areas, at the left and right ends of the basic pattern area detected by the basic pattern area detection means. and character shape evaluation value calculation means for calculating from the coordinates of the upper and lower ends, the character shape evaluation value calculating means calculates each character from the character string pattern based on the character shape evaluation value of the basic pattern area and the character shape evaluation value of the combined pattern area. A character pattern cutting device for cutting out a pattern includes a recognition dictionary that stores standard patterns of characters to be recognized, and a character recognition means that uses the recognition dictionary to recognize a basic pattern or a combined pattern as a character pattern and outputs a character recognition evaluation value. When the character shape evaluation value is greater than a predetermined threshold, a character pattern is cut out based on the character shape evaluation value, and when it is less than the threshold, the character pattern is passed to the character recognition means to be recognized and output. 1. A character pattern cutting device comprising: evaluation value determination/cutting means for cutting out a character pattern based on the obtained character recognition evaluation value and character shape evaluation value.
JP63193975A 1988-08-03 1988-08-03 Character pattern segmenting device Pending JPH0242589A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP63193975A JPH0242589A (en) 1988-08-03 1988-08-03 Character pattern segmenting device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP63193975A JPH0242589A (en) 1988-08-03 1988-08-03 Character pattern segmenting device

Publications (1)

Publication Number Publication Date
JPH0242589A true JPH0242589A (en) 1990-02-13

Family

ID=16316889

Family Applications (1)

Application Number Title Priority Date Filing Date
JP63193975A Pending JPH0242589A (en) 1988-08-03 1988-08-03 Character pattern segmenting device

Country Status (1)

Country Link
JP (1) JPH0242589A (en)

Similar Documents

Publication Publication Date Title
JP4607633B2 (en) Character direction identification device, image forming apparatus, program, storage medium, and character direction identification method
US5539841A (en) Method for comparing image sections to determine similarity therebetween
CA2116600C (en) Methods and apparatus for inferring orientation of lines of text
US5390259A (en) Methods and apparatus for selecting semantically significant images in a document image without decoding image content
US5384863A (en) Methods and apparatus for automatic modification of semantically significant portions of a document without document image decoding
JP3345224B2 (en) Pattern extraction device, pattern re-recognition table creation device, and pattern recognition device
JP2006031546A (en) Character direction identifying device, character processing device, program and storage medium
US8229232B2 (en) Computer vision-based methods for enhanced JBIG2 and generic bitonal compression
Kim et al. Word segmentation of printed text lines based on gap clustering and special symbol detection
JPH0242589A (en) Character pattern segmenting device
JPH0516632B2 (en)
JPH0350692A (en) Character recognizing device
JPH03225579A (en) Device for segmenting character pattern
JP2569103B2 (en) Character detection method
JPH06348911A (en) English character recognition device
JPH0728935A (en) Document image processor
KR102673900B1 (en) Table data extraction system and the method of thereof
JPH11191135A (en) Japanese/english discriminating method for document image, document recognizing method and recording medium
JP3384634B2 (en) Character type identification method
JPS62190575A (en) Character pattern segmenting device
JPH0420507B2 (en)
JPH08293002A (en) Character recognition unit and method
Dori et al. Object-process based segmentation and recognition of ANSI and ISO standard dimensioning texts
Green et al. Layout analysis of book pages
JPH01171080A (en) Recognizing device for error automatically correcting character