JP3086264B2

JP3086264B2 - Character space recognition method

Info

Publication number: JP3086264B2
Application number: JP03018476A
Authority: JP
Inventors: 隆邦嶺脇
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1991-01-18
Filing date: 1991-01-18
Publication date: 2000-09-11
Anticipated expiration: 2015-09-11
Also published as: JPH04236685A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、日本語文章を処理対象
とする文字認識装置に係り、特に、この種の文字認識装
置において、文字間のスペース（スペース文字）を全角
／半角を区別して認識する文字間スペース認識方法に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition apparatus for processing Japanese sentences, and more particularly to a character recognition apparatus of this kind which distinguishes a space between characters (space character) into two-byte / one-byte characters. The present invention relates to an inter-character space recognition method.

【０００２】[0002]

【従来の技術】従来、英文ＯＣＲ等における文字間スペ
ース（スペース文字）の認識方法は、文字の印字ピッチ
を計算し、この印字ピッチと文字間空白部の幅とを比較
することにより、文字間空白部でのスペースの有無の判
定及びスペース数の決定を行なうという方法が一般的で
あつた。2. Description of the Related Art Conventionally, a method of recognizing an inter-character space (space character) in an English sentence OCR or the like is to calculate a character print pitch and compare the print pitch with the width of a space between characters. It has been common practice to determine the presence or absence of a space in a blank portion and determine the number of spaces.

【０００３】[0003]

【発明が解決しようとする課題】日本語文章の場合、全
角の漢字・ひらがなに混じって、半角の数字・英字が用
いられることが多く、それとともにスペースも全角スペ
ースと半角スペースとが混在して用いられることが多
い。このような全角・半角スペースが混在する文章の認
識においては、全角スペースと半角スペースとを区別し
て認識し出力する必要がある。In the case of Japanese sentences, half-width numbers and alphabets are often used in addition to full-width kanji and hiragana, and the space is also a mixture of full-width and half-width spaces. Often used. In recognizing a sentence in which full-width and half-width spaces are mixed, it is necessary to recognize and output a full-width space and a half-width space separately.

【０００４】しかし、従来の文字間スペース認識方法を
日本語文章に適用した場合、様々な不都合がある。すな
わち、全角の漢字、ひらがな、記号の中にも、片側に偏
在して印字される文字が存在するため、単に物理的な空
白幅と印字ピッチとを比較するという従来方法による
と、過剰なスペースが発生したり、全角スペースと半角
スペースとを誤認するという問題がある。However, there are various inconveniences when a conventional inter-character space recognition method is applied to Japanese sentences. In other words, even in full-width kanji, hiragana, and symbols, there are characters that are printed unevenly on one side, and according to the conventional method of simply comparing the physical blank width with the printing pitch, excessive space is used. And a problem that a full-width space and a half-width space are erroneously recognized.

【０００５】例えば、文字列「・・・です。次に・・
・」中の句点の後に余分な半角または全角のスペースが
入ってしまう。同様の現象は、他の括弧類、句読点類あ
るいは半角英数字の前後でも発生することがある。[0005] For example, the character string "...
・ An extra half-width or full-width space is inserted after the period in "". Similar phenomena may occur around other parentheses, punctuation, or half-width alphanumeric characters.

【０００６】このような現象が発生すると、文字認識装
置の出力文字列が過剰なスペースまたは全角スペースと
半角スペースとの誤認によって原稿文字列とは異なった
ものとなり、文字認識装置によって原稿内容を忠実に入
力する必要があるアプリケーションでは大きな問題とな
る。When such a phenomenon occurs, the character string output from the character recognition device becomes different from the original character string due to an excessive space or an erroneous recognition of a full-width space and a half-width space. This is a big problem for applications that need to input data to

【０００７】本発明の目的は、そのような日本語文章を
認識する文字認識装置において、文字間のスペースを全
角／半角を区別して高精度に認識する方法を提供するこ
とにある。An object of the present invention is to provide a method of recognizing a space between characters with high accuracy by distinguishing full-width / half-width characters in a character recognition device for recognizing such a Japanese sentence.

【０００８】[0008]

【課題を解決するための手段】請求項１記載の発明は、
日本語文章を処理対象とする文字認識装置において、文
字間空白部のサイズと同じ行内の標準文字サイズとの比
較により、該文字間空白部について全角スペースと半角
スペースとを区別してスペースを認識するとともに、文
字間空白部の前または後の文字の文字認識結果が予め設
定された特定の文字である場合には、文字間空白部のサ
イズを補正し、該補正後の文字間空白部のサイズと同じ
行内の標準文字サイズとの比較によりスペース認識を行
なうことを特徴とする。According to the first aspect of the present invention,
In a character recognition device that processes Japanese sentences, the size of a space between characters is compared with a standard character size in the same line, and the space between the characters is recognized by distinguishing a full-width space and a half-width space. together, if the character recognition results of before or after the inter-character text space portion is a particular character that has been set in advance, and corrects the size of the inter-character space portion, the size of the inter-character space portion after the correction Same as
It is characterized in that space recognition is performed by comparison with a standard character size in a line .

【０００９】請求項２記載の発明は、日本語文章を処理
対象とする文字認識装置において、文字間空白部の前ま
たは後の文字の文字認識結果が予め設定された特定の文
字である場合に、該文字間空白部のサイズを補正する処
理を行ない、文字空白部のサイズあるいは補正後の文字
間空白部のサイズを同じ行内の標準文字サイズで除算し
て、その商を全角スペースの個数とし、さらに、その剰
余について、標準文字サイズをもとに全角スペースある
いは半角スペースのスペース認識を行ない、この認識結
果に上記除算の商として得られた個数の全角スペースを
加えて最終的なスペース認識結果とすることを特徴とす
る。According to a second aspect of the present invention, there is provided a character recognition apparatus for processing a Japanese sentence, wherein a character recognition result of a character before or after a space between characters is a predetermined specific character. The process of correcting the size of the space between characters is performed, and the size of the space between characters or the size of the space between characters after correction is divided by the standard character size in the same line.
The quotient as the number of double-byte spaces, and the remainder
There is a double-byte space for the rest based on the standard character size
Alternatively, space recognition of a half-width space is performed, and the recognition result is added with the number of full-width spaces obtained as the quotient of the division to obtain a final space recognition result.

【００１０】請求項３記載の発明は、請求項１または２
記載の発明の文字間スペース認識方法において、特定の
文字として行頭側または行末側に寄せて印字される性質
の文字を設定することを特徴とする。[0010] The third aspect of the present invention is the first or second aspect.
In the inter-character space recognition method according to the invention described above, a character having a property of being printed near the beginning or end of the line as a specific character is set.

【００１１】請求項４記載の発明は、請求項１または２
記載の発明の文字間スペース認識方法において、文字間
空白部のサイズの補正により、標準文字サイズの半分の
値を差し引くことを特徴とする。The invention described in claim 4 is the first or second invention.
In the inter-character space recognition method according to the invention described above, half of the standard character size is subtracted by correcting the size of the inter-character space.

【００１２】請求項５記載の発明は、請求項１または２
記載の発明の文字間スペース認識方法において、文字間
空白部のサイズの補正により、その前または後に隣接す
る特定の文字に依存した係数と標準文字サイズとの積を
差し引くことを特徴とする。[0012] The invention according to claim 5 is the invention according to claim 1 or 2.
In the inter-character space recognition method according to the invention described above, the product of the coefficient depending on the specific character adjacent thereto before or after and the standard character size is subtracted by correcting the size of the inter-character space.

【００１３】[0013]

【作用】処理しようとしている文字間空白部と同じ行に
ついて、行の高さ（縦書きであれば、行の幅）や文字列
中の明らかに半角である文字を除いた文字の幅（縦書き
であれば、文字の高さ）の平均値などを用いて、当該行
における標準的な全角文字の幅（縦書きであれば、高
さ）を計算し、これを行内の標準文字サイズとする。な
お、このような標準文字サイズの検出の方法は公知であ
る。Function: For the same line as the space between characters to be processed, the height of the line (width of the line if it is vertical writing) or the width of the character (vertical) except characters that are clearly half-width characters in the character string. If writing, calculate the width (height for vertical writing) of the standard full-width character in the line using the average value of the character height, etc., and calculate this as the standard character size in the line. I do. A method for detecting such a standard character size is known.

【００１４】この標準文字サイズを全角スペース（文
字）のサイズ（横書きなら幅、縦書きなら高さ）の基準
として用い得ることは明らかであり、また半角スペース
のサイズの基準としても用い得ることは明らかである。
したがって、文字間に全角スペースまたは半角スペース
が１文字だけ存在するか、あるいは存在しないか、のい
ずれかであるという前提であれば、文字間空白部のサイ
ズ（横書きなら幅、縦書きなら高さ）と当該行の標準文
字サイズとを比較し、文字間空白部のサイズが標準文字
サイズ以上であれば全角スペース、標準文字サイズより
小さく、その半分のサイズ以上であれば半角スペース、
それ以外はスペースなし、というように文字間スペース
の認識が可能である。そして、標準文字サイズは印字ピ
ッチよりも全角スペースのサイズをより的確に反映した
値であるので、印字ピッチを基準とした方法より、認識
精度を高めることができる。It is clear that this standard character size can be used as a reference for the size of a full-width space (character) (width for horizontal writing, height for vertical writing), and that it can also be used as a reference for half-width space. it is obvious.
Therefore, if it is assumed that there is only one full-width space or half-width space between characters, or that there is no space, the size of the space between characters is assumed.
'S (horizontal if the width, if vertical height) compared to the standard character size of the line, em space if the size of the inter-character space portion above the standard character size, smaller than the standard character size, which is half If the size is more than half size space,
Other than that, there is no space, and it is possible to recognize the space between characters. Since the standard character size is a value that more accurately reflects the size of the double-byte space than the print pitch, recognition accuracy can be improved as compared with the method based on the print pitch.

【００１５】ただし、このような単純な比較のみでは、
半角英数字と全角文字との間の空白部や、行頭側または
行末側に偏在して印刷される括弧類や句読点類の間の空
白部が、スペースとして誤認識されたり、半角スペース
が全角スペースと間違って認識されることがある。この
ような不都合は、印字位置が全角文字の標準的な印字位
置に対して行頭側または行末側へ偏在する分だけ文字間
空白部のサイズが増減するために起こるのであるから、
その増減分だけ文字間空白部サイズを補正してから、標
準文字サイズとの比較によるスペース認識を行なうこと
により、正しい認識結果が得られる。However, such a simple comparison alone gives
Blank spaces between single-byte alphanumeric characters and double-byte characters, and blank spaces between parentheses and punctuation marks printed eccentrically at the beginning or end of the line are misrecognized as spaces, and single-byte spaces are replaced with double-byte spaces. May be mistakenly recognized. Such inconvenience occurs because the size of the space between characters is increased or decreased by the amount that the printing position is unevenly distributed to the beginning or end of the line with respect to the standard printing position of full-width characters,
Correcting the inter-character space size by the increase or decrease and then performing space recognition by comparison with the standard character size allows a correct recognition result to be obtained.

【００１６】請求項１とその従属項３、４または５の発
明によれば、そのような空白部サイズの増減を生じさせ
るような特定の文字の間の空白部であるか否かを、文字
認識結果より判断することにより、必要な補正を文字間
空白部サイズに施してから標準文字サイズとの比較を行
なうため、半角英数字、括弧類、句読点類の間のスペー
スを高精度に認識することができる。According to the invention of claim 1 and its dependent claims 3, 4 or 5, it is determined whether or not a blank portion between specific characters which causes such an increase or decrease of the blank portion size. Judging from the recognition result, the necessary correction is applied to the space between characters and then compared with the standard character size, so the space between half-width alphanumeric characters, parentheses and punctuation marks is recognized with high accuracy be able to.

【００１７】また文字間空白部のサイズを増減させるよ
うな括弧類、句読点類といっても様々なものがあり、そ
れぞれに空白部サイズの増減値が異なる。請求項１の従
属項５の発明によれば、空白部の前、後の文字に応じて
文字間空白部サイズの補正値を最適化することができる
ため、より高精度のスペース認識が可能となる。There are also various kinds of parentheses and punctuation marks that increase or decrease the size of the space between characters, and each of them has a different increase / decrease value of the space size. According to the fifth aspect of the present invention, since the correction value of the inter-character space size can be optimized according to characters before and after the space, more accurate space recognition can be performed. Become.

【００１８】請求項２の発明と、その従属項３、４また
は５の発明によれば、文字間空白部サイズを前後文字に
応じて補正した後に標準文字サイズで除算して、その商
を全角スペースの個数とするとともに、さらに、その剰
余部分を標準文字サイズをもとに全角／半角スペースの
認識を行ない、この認識結果に前記除算の商として得ら
れた文字数の全角スペースを加えて最終的な認識結果を
得るので、文字間空白部に１文字以上の全角または半角
スペースが存在する場合も高精度のスペース認識が可能
である。According to the invention of claim 2 and the inventions of the dependent claims 3, 4 and 5, the inter-character space size is corrected according to the preceding and following characters, and then divided by the standard character size to obtain the quotient.
Is the number of double-byte spaces, and
The remainder is recognized as a full-width / half-width space based on the standard character size, and the recognition result is obtained as the quotient of the division .
Since the final recognition result is obtained by adding the full-width spaces of the number of characters, high-precision space recognition is possible even when one or more full-width or half-width spaces exist in the inter-character space.

【００１９】[0019]

【実施例】図１は本発明の各実施例に係る文字認識装置
の概略ブロック図である。この文字認識装置において、
画像入力部１１はスキャナー等により原稿の画像を読み
取り、その２値画像データを入力し、画像メモリ１１に
格納する。行・文字切り出し部１２は、画像メモリ１１
内の入力画像から文字行と文字画像を切り出し、文字画
像データを文字画像メモリ１３に格納し、また文字切り
出し位置、文字幅（ここでは横書きとして説明してい
る。縦書きなら文字高さ）、文字間空白部の幅（縦書き
なら高さ）、行の高さ（縦書きなら高さ）等の切り出し
情報を切り出し情報メモリ１４に格納する。また、行毎
に行の高さや文字幅の平均値等を用いて標準文字幅（縦
書きなら高さ）を計算し、その結果も切り出し情報メモ
リ１４に格納する。FIG. 1 is a schematic block diagram of a character recognition apparatus according to each embodiment of the present invention. In this character recognition device,
The image input unit 11 reads an image of a document by a scanner or the like, inputs the binary image data, and stores the binary image data in the image memory 11 . The line / character cutout unit 12 is provided in the image memory 11.
A character line and a character image are cut out from the input image in the box, the character image data is stored in the character image memory 13, the character cutout position, the character width (here described as horizontal writing, and the character height for vertical writing), The cutout information such as the width of the space between characters (height for vertical writing) and the height of a line (height for vertical writing) is stored in the cutout information memory 14. In addition, a standard character width (height in the case of vertical writing) is calculated for each line using an average value of the line height and character width, and the result is stored in the cutout information memory 14.

【００２０】文字認識部１５は、文字画像メモリ１３よ
り文字画像データを読み出し、正規化後に特徴量を抽出
し、抽出した特徴量と文字辞書メモリ１６内の辞書との
マッチングを行ない距離の小さい認識結果候補を決定
し、その文字コードと距離データ等を認識結果メモリ１
７に格納する。The character recognizing section 15 reads out character image data from the character image memory 13, extracts a characteristic amount after normalization, performs matching between the extracted characteristic amount and the dictionary in the character dictionary memory 16 and recognizes a small distance. A result candidate is determined, and its character code and distance data are stored in a recognition result memory 1.
7 is stored.

【００２１】スペース認識部１８は、切り出し情報メモ
リ１４と認識結果メモリ１７の内容を参照し文字間スペ
ース認識処理を行ない、文字間のスペース（文字）の個
数、種類（全角／半角スペースの別）を決定し、このス
ペースの情報を認識結果メモリ１７に文字並びに従って
格納する。このスペース認識処理の詳細については、実
施例別に後述する。The space recognizing section 18 refers to the contents of the cut-out information memory 14 and the recognition result memory 17 to perform inter-character space recognition processing, and the number and type of spaces (characters) between characters (separate double-byte / half-width spaces). Is determined, and the information of this space is stored in the recognition result memory 17 in accordance with the character arrangement. Details of the space recognition processing will be described later for each embodiment.

【００２２】結果出力部１９は、認識結果メモリ１７の
内容をディスプレイ、プリンタ、磁気ディスク装置等に
出力する。The result output unit 19 outputs the contents of the recognition result memory 17 to a display, a printer, a magnetic disk device, and the like.

【００２３】次に、横書きの半角英字を含む次の文字列「これは、『新型 BICOH WP』です」を処理する場合を例に、各実施例におけるスペース認識
処理の内容について説明する。なお、以下の説明におい
て、表記の便宜上から、文字列中の半角スペース
を「；」で、全角スペースを「：」で、それぞれ表現す
る。この表記法によれば、上記文字列の正しい表記は、「これは、『新型；BICOH；WP』です」であり、英字は半角である。また、この文字列につい
て、表１に示すような文字幅、文字間空白幅、標準文字
幅と文字認識結果が得られたものとする。[0023] Next, horizontal following string "This is," new BICOH WP is "" including the alphabetic characters of an example when processing, will be explained on the contents of the space recognition process in each example. In the following description, for convenience of notation, a half-width space in a character string is represented by ";" and a full-width space is represented by ":" . According to this notation, the correct notation of the above character string is "This is 'new type; BICOH ; WP'", and the alphabetic characters are half-width. It is also assumed that the character width, character space width, standard character width, and character recognition result as shown in Table 1 are obtained for this character string.

【００２４】[0024]

【表１】 [Table 1]

【００２５】なお、図２はスペース認識処理の概略フロ
ーチャートであり、これは各実施例に共通である。ま
た、このフローチャートでは、横書き文書の処理を想定
している。縦書きの場合には「幅」を「高さ」に置き換
えて同様に処理できる。FIG. 2 is a schematic flowchart of the space recognition process, which is common to each embodiment. In this flowchart, processing of a horizontally written document is assumed. In the case of vertical writing, the same processing can be performed by replacing “width” with “height”.

【００２６】実施例１行の先頭の文字間空白部より、順に処理する。注目して
いる文字間空白部の前の文字の文字認識結果が空白部幅
の補正の対象となっている特定の文字であるか否かを調
べる。このような特定文字とは具体的には、行頭（左）
側に寄せて印字される句読点類（、。．，等）や閉じ括
弧類（」』｝）＞等）である。Embodiment 1 Processing is performed in order starting from the space between characters at the head of a line. It is checked whether or not the character recognition result of the character before the focused inter-character space is a specific character for which the space width is to be corrected. Specifically, such a specific character is, at the beginning of the line (left)
Punctuation marks (, .., etc.) and closing parentheses (""｝)> etc.) which are printed to the side.

【００２７】このような特定の文字である場合、注目文
字間空白部の幅から補正値として、標準文字幅の半分の
値（ここでは３０）を差し引く。特定の文字でない場合
は、この補正を行なわない。In the case of such a specific character, a half of the standard character width (here, 30) is subtracted as a correction value from the width of the space between the noticed characters. If it is not a specific character, this correction is not performed.

【００２８】次に、注目している文字間空白部の後の文
字が補正対象の特定文字であるか否かを調べる。この特
定文字とは具体的には、行末（右）側に寄せて印字され
る開き括弧類（「『｛（＜等）である。このような特定
文字である場合、注目している文字間空白部の幅より、
標準文字幅の半分の値を差し引く。特定文字でない場合
は、この補正を行なわない。Next, it is checked whether or not the character after the focused inter-character space is a specific character to be corrected. Specifically, the specific character is an opening parenthesis (“｛(<, etc.). The specific character is a space between the characters to be focused on. From the width of the blank part,
Subtract half the standard character width. If it is not a specific character, this correction is not performed.

【００２９】次に、文字間にスペースが２文字以上存在
することもあり得るため、補正処理後の文字間空白部の
幅を標準文字幅で除算する。そして、その商を全角スペ
ース数として記憶する。なお、文字間にスペースが２文
字以上存在しないという前提であれば、この除算を行な
う必要はない。Next , two or more spaces exist between the characters.
Therefore, the width of the space between characters after the correction processing is divided by the standard character width . Then, the quotient is stored as the number of double-byte spaces. Note that there are two spaces between characters
If it is assumed that there are no more characters, there is no need to perform this division.

【００３０】次に、前記除算の剰余（文字間空白部幅の
全角スペース相当分を除いた残り部分の幅）について、
さらに標準文字幅にもとづく比較により、該全角スペー
ス相当分を除いた残り部分の幅が全角スペースあるいは
半角スペースと見做せるか、スペース認識を行なう。す
なわち、剰余値が標準文字幅に近いならば、該残り部分
（剰余の部分）も全角スペースと認識し、それ以下で標
準文字幅の半分程度までならば、半角スペースと認識す
る。そこで、まず、標準文字幅から所定の値を引いて全
角スペース閾値として剰余との比較を行ない、全角スペ
ース閾値以上であれば、剰余の部分を全角スペース１文
字と認識する。全角スペースでない場合、次に、標準文
字幅のほぼ半分の値を半角スペース閾値として剰余と比
較し、半角スペース閾値以上であれば、剰余の部分を半
角スペース１文字と認識する。剰余が半角スペース閾値
未満であれば、スペースなしと認識する。Next, the remainder of the division (the width of the remaining portion excluding the full width space of the space between characters) is calculated as follows:
In addition, based on the comparison based on the standard character width,
The width of the remaining part excluding the space equivalent is
It can be regarded as a half-width space, or space recognition is performed. You
That is, if the remainder value is close to the standard character width, the remainder
(Remainder part) is also recognized as a double-byte space,
If it is up to about half the quasi-character width, it is recognized as a half-width space.
You. Therefore, first, a predetermined value is subtracted from the standard character width and compared with the remainder as a full-width space threshold . If the full-width space threshold is exceeded , the remainder is recognized as one full-width space character. If not a full-width space, then the standard sentence
A value of approximately half of the character width is compared with the remainder as a half-width space threshold, and if it is equal to or larger than the half-width space threshold , the remainder is recognized as one half-width space character. If the remainder is less than the half-width space threshold, it is recognized that there is no space.

【００３１】すなわち、ここでは、標準文字幅は６０で
あり、例えば、剰余が５０以上であれば全角スペース１
文字、剰余が５０未満で３０までであれば半角スペース
１文字、剰余が３０未満であればスペースなし、と認識
する。That is, here, the standard character width is 60
Yes, for example, if the remainder is 50 or more, double-byte space 1
If the character and the remainder are less than 50 and up to 30 , it is recognized that one half-width space is one character, and if the remainder is less than 30, there is no space.

【００３２】そして、剰余についての認識結果に先に求
めた全角スペース数を加えて、注目している文字間での
スペース認識の最終結果を得る。The final result of the space recognition between the characters of interest is obtained by adding the number of full-width spaces previously obtained to the result of recognition of the remainder.

【００３３】なお、文字間にスペースが２文字以上存在
しないという前提であれば、前後文字に応じた必要な補
正を施した後の文字間空白部幅そのものと全角スペース
閾値とを比較して全角スペースを認識し、これが認識で
きなかったときは空白部の幅そのものと半角スペース閾
値との比較により半角スペースを認識し、これを最終結
果となる。If it is assumed that two or more spaces do not exist between the characters, the width of the space between characters after the necessary correction according to the preceding and succeeding characters is compared with the full-width space threshold to compare the full-width space threshold. Recognize the space, this is the recognition
If not, a half-width space is recognized by comparing the width of the blank portion itself with the half-width space threshold, and this is the final result.

【００３４】ここで例にしている文字列において、先頭
から４文字目の読点と次の『との間の空白部の幅が６３
である。しかし、左側が特定文字であるので標準文字幅
６０の半分値３０が差し引かれ、同様に右側が特定文字
であるので３０が差し引かれる結果、空白部幅の補正値
は３（＝６３−３０−３０）となるので、スペース認識
結果は「スペースなし」となる。またこの後半の』と次
の文字との間の空白部幅は３５であるが、前文字が特定
文字であるので３０が差し引かれる結果、スペースな
し、と判断される。このような空白部の幅の補正結果と
スペース認識結果とは表２に示す如くになり、例の文字
列とスペースを含めて同一の文字列が認識結果メモリ１
７に得られる。In the character string exemplified here, the width of the blank portion between the reading point of the fourth character from the head and the next “
It is. However, since the left side is a specific character, half the value 30 of the standard character width 60 is subtracted. Similarly, since the right side is a specific character, 30 is subtracted. As a result, the correction value of the blank portion width is 3 (= 63−30−). 30), the space recognition result is “no space”. The width of the blank portion between the second half and the next character is 35, but since the previous character is a specific character, 30 is subtracted, so that it is determined that there is no space. Table 2 shows the correction result of the width of the blank portion and the space recognition result, and the same character string including the character string and the space in the example is stored in the recognition result memory 1.
7 is obtained.

【００３５】[0035]

【表２】 [Table 2]

【００３６】なお、前後文字による補正を行なわずに全
角スペース閾値または半角スペース閾値を用いてスペー
ス認識を行なうと、結果は表３に示す如くとなり、処理
後の文字列は「これは、：『新型；BICOH；WP』；で
す」となり、全角スペース１文字、半角スペース１文字
が余分に認識されてしまう。If the space recognition is performed using the full-width space threshold or the half-width space threshold without performing the correction by the preceding and following characters, the result is as shown in Table 3, and the character string after the processing is "This is:" New type; BICOH ; WP ”;”, and one full-width space and one half-width space will be recognized extra.

【００３７】[0037]

【表３】 [Table 3]

【００３８】実施例２文字間空白部幅の補正の対象とな
る特定文字の個々について補正値を表４のように設定す
る。Embodiment 2 A correction value is set as shown in Table 4 for each specific character to be corrected for the inter-character space width.

【００３９】[0039]

【表４】 [Table 4]

【００４０】そして、文字間空白部の前、後の文字の文
字認識結果が表４内のいずれかの文字と一致する場合、
その文字に対して設定された補正値を文字間空白部幅か
ら差し引くことにより、空白部幅を補正し、この補正処
理後の幅について標準文字幅による除算を行ない、その
剰余についての全角／半角スペース認識を行ない、その
結果に商に等しい個数の全角スペースを加えて最終的な
スペース認識結果を得る。When the character recognition result of the character before and after the space between the characters matches any of the characters in Table 4,
By subtracting the correction value set for the character from the space between characters, the width of the space is corrected, the width after this correction processing is divided by the standard character width, and the full-width / half-width of the remainder is obtained. Space recognition is performed, and a full-width space equal to the quotient is added to the result to obtain a final space recognition result.

【００４１】例の文字列の場合、先頭から３文字目の読
点と『との間の空白部幅は６３であるが、前文字による
補正値は表４より３６（＝６０×０．６）、後文字によ
る補正値は表４より１８（＝６０×０．３）、合計した
補正値は５４であるから、補正後の空白部幅は９とな
り、したがつてスペースなしと判断される。同様にして
スペース認識処理の結果は表５の如くになり、例の文字
列中のスペースを正しく認識できる。In the case of the character string in the example, the blank portion width between the reading point of the third character from the beginning and "is 63, but the correction value of the previous character is 36 (= 60 × 0.6) according to Table 4. From Table 4, the correction value for the succeeding character is 18 (= 60 × 0.3), and the total correction value is 54. Therefore, the blank space width after correction is 9, and thus it is determined that there is no space. Similarly, the result of the space recognition processing is as shown in Table 5, and the space in the character string in the example can be correctly recognized.

【００４２】[0042]

【表５】 [Table 5]

【００４３】なお、本実施例においても、文字間に２文
字以上のスペースが存在しないという前提が成り立つ場
合には、補正後の文字間空白部幅の除算を行なわず、直
ちに全角／半角スペース幅との比較によるスペース認識
を行なってよい。In this embodiment, two sentences are also inserted between characters.
If the premise that there is no space equal to or greater than a character is satisfied, the space may be immediately recognized by comparing with the full-width / half-width space width without dividing the corrected inter-character space width.

【００４４】[0044]

【発明の効果】以上説明した如く、請求項１ないし５記
載の発明によれば、半角スペースと全角スペースが混在
し、かつ文字間空白部のサイズが前後の文字の印字位置
の偏在の影響で変動する日本語文章について、文字間の
スペースを全角／半角スペースを区別して高精度に認識
することができるため、文字認識装置により、スペース
を含めて原稿文字列に極めて忠実な文字列を入力するこ
とが可能となる。As described above, according to the first to fifth aspects of the present invention, half-size spaces and full-size spaces are mixed, and the size of the space between characters is affected by the uneven distribution of the print positions of the characters before and after. Since the space between characters can be recognized with high accuracy by distinguishing full-width / half-width spaces from a fluctuating Japanese sentence, a character recognition device inputs a character string that is extremely faithful to the original character string including the space. It becomes possible.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の各実施例に係る文字認識装置の概略ブ
ロック図である。FIG. 1 is a schematic block diagram of a character recognition device according to each embodiment of the present invention.

【図２】スペース認識処理の概略フローチャートであ
る。FIG. 2 is a schematic flowchart of a space recognition process.

[Explanation of symbols]

１０画像入力部１１画像メモリ１２行・文字切り出し部１３文字画像メモリ１４切り出し情報メモリ１５文字認識装置１６文字辞書メモリ１７認識結果メモリ１８スペース認識部１９結果出力部 Reference Signs List 10 Image input unit 11 Image memory 12 Line / character cutout unit 13 Character image memory 14 Cutout information memory 15 Character recognition device 16 Character dictionary memory 17 Recognition result memory 18 Space recognition unit 19 Result output unit

Claims

(57) [Claims]

1. A character recognition apparatus for processing a Japanese sentence, wherein a full-width space and a half-width space are distinguished for an inter-character space by comparing the size of the inter-character space with a standard character size in the same line. Recognize the space, and if the character recognition result of the character before or after the inter-character space is a predetermined specific character, correct the size of the inter-character space and correct the inter-character space. Same as part size
An inter-character space recognition method characterized by performing space recognition by comparing with a standard character size in the same line .

2. A character recognition apparatus for processing a Japanese sentence, wherein when a character recognition result of a character before or after a space between characters is a predetermined specific character, the space between characters is set in advance. The size of the space between characters or the size of the space between characters after correction is divided by the standard character size in the same line , and the quotient is calculated.
Is the number of double-byte spaces, and the remainder is calculated based on the standard character size.
An inter-character space recognition method characterized by performing space recognition of a square space or a half- width space, and adding a full-width space obtained as a quotient of the division to the recognition result to obtain a final space recognition result.

3. The space between characters according to claim 1 or 2.
In recognition method, characterized in that the particular character is a character of the property of being printed closer to beginning side or end of the line side
Intercharacter space recognition method .

4. A space between characters according to claim 1 or 2.
In recognition method, especially the correction of size between character blank portion, that half the value of the standard character size is subtracted
Character recognition method for character space .

5. The space between characters according to claim 1 or 2.
In recognition method, the correction of the size between the character space portion, and wherein the product of the coefficient and the standard character size depending on the particular character adjacent to the front or rear is subtracted
The inter-character space recognition method .