JPH04239989A

JPH04239989A - Electronic dictionary

Info

Publication number: JPH04239989A
Application number: JP3006966A
Authority: JP
Inventors: Hideo Tanimoto; 谷本　英雄; Yoshimi Yamada; 義美山田; Kazuo Ito; 伊藤　和郎
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1991-01-24
Filing date: 1991-01-24
Publication date: 1992-08-27

Abstract

PURPOSE:To correctly segment a word by successively comparing the width of a nonprinting area with the width of the other printing area and the other nonprinting area sequentially from the top of a character string, judging a relative size and detecting the position of a space between the mutual words. CONSTITUTION:A word segmenting part 4 which is provided with a word segmenting control part 5, a binary picture memory 6 and a vertical projecting part 7 sequentially compares the nonprinting area detected from binary picture data of the character string which is optically read with the other printing area and the other nonprinting area from the top of the character string. That is, whether one nonprinting area is the width permitted to be as a character pitch or is the space provided between the mutual words is not simply judged from the absolute size of the nonprinting area width but is relatively judged by permitting the other printing area width and the other nonprinting area width to be a reference.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、光学式読取り装置で読
取った文字で構成される単語に基づき、この単語に対応
する他の情報（例えば訳語）を検索して出力する電子辞
書に関し、特に検索対象となる単語の切出し機能に関す
るものである。[Field of Industrial Application] The present invention relates to an electronic dictionary that searches for and outputs other information (for example, translated words) corresponding to a word based on a word composed of characters read by an optical reader, and in particular, This is related to the function of cutting out words to be searched.

【０００２】0002

【従来の技術】図２は従来の電子辞書を示すブロック図
である。2. Description of the Related Art FIG. 2 is a block diagram showing a conventional electronic dictionary.

【０００３】従来の電子辞書では、まず、原稿Ｐ上の文
字を光学式読取部２１で読取る。[0003] In the conventional electronic dictionary, first, the characters on the document P are read by the optical reading section 21.

【０００４】上記読取りは、操作者が上記読取部２１を
原稿に密着させて検索対象単語上を走査させることによ
り行なわれるが、通常は上記読取部２１に駆動部（図示
せず）を備えて、一定速度で走査できるように設定され
ている。また、読取部２１に駆動部を備えた電子辞書で
は、検索対象単語以外の単語をも読取るので、従来は不
要の単語を隠して読取部を走査させていたが、近年では
上記電子辞書に後述の単語切出し部を備え検索対象単語
を読取ることが一般的になっている。[0004] The above-mentioned reading is carried out by the operator bringing the reading section 21 into close contact with the document and scanning the search target word. Usually, the reading section 21 is equipped with a drive section (not shown). , is set to scan at a constant speed. In addition, electronic dictionaries with a driving unit in the reading unit 21 read words other than the search target words, so conventionally the reading unit was scanned while hiding unnecessary words, but in recent years the electronic dictionary has been It has become common to have a word extraction section to read search target words.

【０００５】上記読取部２１から出力されたアナログ信
号は、アナログ／デジタル変換部２２でデジタル信号に
変換され、前処理部２３に出力される。前処理部２３で
は、上記デジタル信号についてノイズ除去、２値化など
の前処理が行われ、文字認識部２４に送られる。文字認
識部２４では、２値化された画像データを受けて、該２
値画像データから文字を切出して文字パターンが得られ
、予め認識辞書２５に記憶されている標準文字パターン
と照合されて文字として認識される。上記文字パターン
には、単語相互間に設けられる空白領域（以下、スペー
スと略記することがある）も一つの文字として含まれて
おり、上記従来の電子辞書では、所定の幅を有する非印
字領域がスペースとして認識される。[0005] The analog signal outputted from the reading section 21 is converted into a digital signal by an analog/digital conversion section 22 and outputted to a preprocessing section 23 . In the preprocessing section 23 , preprocessing such as noise removal and binarization is performed on the digital signal, and the signal is sent to the character recognition section 24 . The character recognition unit 24 receives the binarized image data and converts the two
Character patterns are obtained by cutting out characters from the value image data, and are compared with standard character patterns stored in the recognition dictionary 25 in advance to be recognized as characters. The above character pattern also includes a blank area (hereinafter sometimes abbreviated as space) provided between words, and in the above conventional electronic dictionary, a non-printing area with a predetermined width is included. is recognized as a space.

【０００６】上記認識操作の繰返しによって認識された
文字列は、単語切出部２６に送られる。単語切出部２６
では、上記文字列中から所定の幅を有する非印字領域を
スペースとして検出し上記文字列をスペースごとに区切
ることにより、文頭からスペース、スペース相互間、或
いはスペースから文尾までが一単語とされ、被検索用の
原単語として切出される。The character string recognized by repeating the above recognition operation is sent to the word extraction section 26. Word cutting section 26
Then, by detecting a non-printing area with a predetermined width as a space in the above character string and separating the above character string into spaces, it is possible to define one word from the beginning of the sentence to spaces, between spaces, or from spaces to the end of the sentence. , is extracted as the original word to be searched.

【０００７】そして、上記原単語が単語検索部２７に送
られる。単語検索部２７は、単語辞書２８を検索するこ
とにより送られてきた原単語に対応する翻訳情報を得、
上記原単語及び翻訳情報を表示部２９に出力する。[0007] Then, the original word is sent to the word search section 27. The word search unit 27 obtains translation information corresponding to the sent original word by searching the word dictionary 28,
The original word and translation information are output to the display section 29.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、従来の
電子辞書における単語切出し手段では、読取変換手段に
て得られた文字列の中から所定の幅を有する非印字領域
を検出した際に該非印字領域をスペースと判定している
ので、文字ピッチが一定のプリンタ又はタイプライタ等
により印字された原稿の場合には正確にスペースを判定
できるが、活字印刷など行端を揃えるために行毎に文字
ピッチが変えられている原稿の場合にはスペースの判定
を正確に行ないにくいとの問題がある。[Problems to be Solved by the Invention] However, in the conventional electronic dictionary word cutting means, when a non-printing area having a predetermined width is detected from the character string obtained by the reading and converting means, the non-printing area is Since it is determined that the character pitch is a space, it is possible to accurately determine the space in the case of a document printed by a printer or typewriter, etc. where the character pitch is constant, but in order to align the line edges in type printing, the character pitch is determined for each line. In the case of a manuscript in which the space has been changed, there is a problem in that it is difficult to accurately judge the space.

【０００９】そこで、本発明は上記したような従来技術
の課題を解決するためになされたものであり、その目的
は、単語切出し性能に優れた電子辞書を提供することに
ある。SUMMARY OF THE INVENTION The present invention has been made to solve the problems of the prior art as described above, and its purpose is to provide an electronic dictionary with excellent word extraction performance.

【００１０】0010

【課題を解決するための手段】本発明に係る電子辞書は
、原稿上に配列された文字列を光学的に読取り上記文字
の列を表わす２値画像データに変換する読取変換手段と
、上記２値画像データを受けて上記文字列から印字領域
と非印字領域とを検出する検出手段と、上記検出手段で
検出された複数の非印字領域から所定の基準を満たす非
印字領域を検出し該非印字領域を上記文字列の中の単語
相互間に設けられた空白領域と判定することにより単語
を切出す単語切出手段と、上記単語切出手段で得られた
単語から該単語を構成する文字を切出し認識する文字認
識手段と、上記文字認識手段により認識された単語を受
けて対応する情報を検索し出力する単語検索手段と、上
記単語検索手段から受けた情報を表示する表示手段とを
有することを特徴としている。[Means for Solving the Problems] An electronic dictionary according to the present invention includes a reading conversion means for optically reading a character string arranged on a document and converting it into binary image data representing the character string; a detection means for receiving value image data and detecting a printing area and a non-printing area from the character string; and detecting a non-printing area that satisfies a predetermined criterion from the plurality of non-printing areas detected by the detection means and detecting the non-printing area. word cutting means for cutting out a word by determining the area as a blank area provided between words in the character string; and a word cutting means for cutting out a word by determining the area as a blank area provided between words in the character string; It has a character recognition means for cutting out and recognizing, a word search means for receiving the word recognized by the character recognition means, searching for and outputting the corresponding information, and a display means for displaying the information received from the word search means. It is characterized by

【００１１】[0011]

【作用】本発明の電子辞書は、光学的に読取られた文字
列から、まず単語を切出し、次いで切出された単語を構
成する個々の文字を認識する構成となっている。[Operation] The electronic dictionary of the present invention is configured to first cut out words from an optically read character string, and then recognize the individual characters constituting the cut out words.

【００１２】上記電子辞書における単語切出し手段は、
光学的に読取られた文字列の２値画像データから検出さ
れた非印字領域を上記文字列の先頭から順に他の印字領
域幅及び非印字領域幅と比較することにより行なわれる
。上記方法によれば、ある非印字領域が文字ピッチとし
て許容される幅であるか、単語相互間に設けられたスペ
ースであるかを、単に該非印字領域幅の絶対的大きさか
ら判定するのではなく、他の印字領域幅及び非印字領域
幅を基準として相対的に判定するので、上記文字列のス
ペースの位置が正確に判定される。[0012] The word extraction means in the electronic dictionary is as follows:
This is done by comparing the non-print area detected from the optically read binary image data of the character string with other print area widths and non-print area widths in order from the beginning of the character string. According to the above method, whether a certain non-printing area has an allowable width as a character pitch or a space provided between words is determined simply from the absolute size of the width of the non-printing area. Instead, the position of the space in the character string can be accurately determined because the determination is made relatively with reference to other print area widths and non-print area widths.

【００１３】従って、行端を揃えるなど印刷上の体裁か
ら行毎に文字ピッチが変えられている原稿であっても単
語切出しが有利に行なわれる。[0013] Therefore, word cutting can be advantageously carried out even in a document in which the character pitch is changed for each line due to printing style such as alignment of line ends.

【００１４】[0014]

【実施例】以下に本発明を図示の実施例に基づいて説明
する。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be explained below based on the illustrated embodiments.

【００１５】図１は、本発明に係る電子辞書の一実施例
を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of an electronic dictionary according to the present invention.

【００１６】本発明の電子辞書では、まず、光学式読取
部１を原稿Ｐに密着させて走査させることにより、原稿
Ｐ上の文字列を読取る。このとき、読取部１は図示しな
い駆動部により一定速度で走査される。読取部１を一定
速度で走査させることにより、手動読取走査時に起きる
文字ピッチの変動、画像データの変形などを避けること
ができる。In the electronic dictionary of the present invention, first, a character string on the original P is read by bringing the optical reading section 1 into close contact with the original P and scanning it. At this time, the reading section 1 is scanned at a constant speed by a driving section (not shown). By scanning the reading section 1 at a constant speed, it is possible to avoid variations in character pitch, deformation of image data, etc. that occur during manual scanning.

【００１７】上記読取部１から出力されたアナログ信号
は、アナログ／デジタル変換部２でデジタル信号に変換
され、前処理部３に出力される。前処理部３では、上記
デジタル信号についてノイズ除去、２値化などの前処理
が行われ、２値画像データが単語切出部４に送られる。The analog signal output from the reading section 1 is converted into a digital signal by an analog/digital conversion section 2 and output to a preprocessing section 3. In the preprocessing section 3, preprocessing such as noise removal and binarization is performed on the digital signal, and the binary image data is sent to the word cutting section 4.

【００１８】本実施例において、単語切出部４は、単語
切出制御部５、２値画像メモリ６、及び、縦方向投影部
７からなっている。上記前処理部３から入力された２値
画像データは、一旦２値画像メモリ６に格納される。次
いで縦方向投影部７は、上記２値画像データに基づき、
投影法により印字領域と非印字領域とを検出する。In this embodiment, the word extraction section 4 includes a word extraction control section 5, a binary image memory 6, and a vertical projection section 7. The binary image data input from the preprocessing section 3 is temporarily stored in the binary image memory 6. Next, the vertical projection unit 7, based on the binary image data,
Print areas and non-print areas are detected using a projection method.

【００１９】図３は、投影法による印字領域と非印字領
域との検出方法を示す図である。図３に示す方法によれ
ば、２値画像メモリ６に格納された２値画像データは、
縦方向（文字列における文字の配列方向に対して直交す
る方向）の所定の幅の列毎に黒画素が計数される。この
ことを本明細書では、「縦方向に投影する」という。図
３の下方のグラフは、上記縦方向の投影結果を示してお
り、横軸が黒画素の位置を表し、縦軸が黒画素の数を表
している。FIG. 3 is a diagram showing a method of detecting print areas and non-print areas using a projection method. According to the method shown in FIG. 3, the binary image data stored in the binary image memory 6 is
Black pixels are counted for each column of a predetermined width in the vertical direction (direction perpendicular to the direction in which characters are arranged in the character string). In this specification, this is referred to as "projecting in the vertical direction." The lower graph in FIG. 3 shows the projection result in the vertical direction, where the horizontal axis represents the position of black pixels, and the vertical axis represents the number of black pixels.

【００２０】縦方向投影部７は、上記投影法により、黒
画素の数が所定値（零に近い値。ノイズを考慮して零よ
り少し大きな値とする。）より多い領域を印字領域、そ
れ以外の領域を非印字領域として検出する。印字領域は
、それぞれ左から順にＭ１、Ｍ２、Ｍ３、．．．Ｍｉと
付番され、非印字領域は、左から順にＳ１、Ｓ２、Ｓ３
、．．．Ｓｉと付番される。Using the projection method described above, the vertical projection unit 7 converts an area in which the number of black pixels is more than a predetermined value (a value close to zero; a value slightly larger than zero in consideration of noise) into a printing area. Areas other than the above are detected as non-print areas. The printing areas are M1, M2, M3, . ．．．． Numbered Mi, the non-printing areas are S1, S2, and S3 from the left.
,．．．．．． It is numbered Si.

【００２１】本発明の電子辞書は、非印字領域の幅を一
定の判定基準にしたがって文字列の先頭から逐次比較し
ていくことにより、単語相互間に設けられているスペー
スの位置を判定する単語切出手段を有することを特徴と
している。[0021] The electronic dictionary of the present invention determines the position of the space provided between words by successively comparing the width of the non-printing area from the beginning of the character string according to a certain criterion. It is characterized by having a cutting means.

【００２２】本実施例では、各非印字領域幅Ｓｉ（ｉ　
＝　１，２，３，．．．）について、下記の条件１およ
び条件２の何れかを満たすかどうかの判定を行ない、何
れかを満たしていればスペースであると判定する。但し
、下記の条件式で、Ｓｉはｉ番目の非印字領域の幅、Ｓ
ｍａｘはＳ１〜Ｓｉ−１の最大値、ＭｍｉｎはＭ１〜Ｍ
ｉ−１の最小値、Ｔ１〜Ｔ４はしきい値であり、それぞ
れ例えば次の様な値に定められる。In this embodiment, each non-printing area width Si(i
= 1, 2, 3,. ．．．． ), it is determined whether either condition 1 or condition 2 below is satisfied, and if either condition is satisfied, it is determined that it is a space. However, in the conditional expression below, Si is the width of the i-th non-printing area, and S
max is the maximum value of S1 to Si-1, Mmin is M1 to M
The minimum value of i-1, T1 to T4, are threshold values, each of which is determined, for example, to the following values.

【００２３】Ｔ１：０．５〜１．５、本実施例では、１
．０。T1: 0.5 to 1.5, in this example, 1
．． 0.

【００２４】Ｔ２：１〜５、本実施例では、２．５。T2: 1 to 5, in this example, 2.5.

【００２５】Ｔ３：０．５〜１．５、本実施例では、１
．０。T3: 0.5 to 1.5, in this example, 1
．． 0.

【００２６】Ｔ４：１〜５、本実施例では、２．５。T4: 1 to 5, in this example, 2.5.

【００２７】条件１Ｓｉ／Ｍｍｉｎ　　＞　　Ｔ１かつＳｉ／Ｓｍａｘ　　＞　　Ｔ２のとき、Ｓｉをスペースと認める。即ちＭ１〜Ｍｉを単
語と認める。Condition 1: When Si/Mmin > T1 and Si/Smax > T2, Si is recognized as a space. That is, M1 to Mi are recognized as words.

【００２８】条件２Ｓ１／Ｍｍｉｎ　　＞　　Ｔ３かつＳ１／Ｓｉ　　＞　　Ｔ４のとき、Ｓ１をスペースと認める。即ちＭ１のみが単語
（１文字からなる単語）を構成することを認める。Condition 2: When S1/Mmin > T3 and S1/Si > T4, S1 is recognized as a space. That is, only M1 is allowed to constitute a word (a word consisting of one character).

【００２９】上記のうち条件１は、２文字以上の単語の
切出しの条件を表わし、条件２は、１文字からなる単語
の切出し条件を表わす。Among the above conditions, condition 1 represents a condition for cutting out a word having two or more characters, and condition 2 represents a condition for cutting out a word consisting of one character.

【００３０】図４は、上記本実施例の単語切出しの手順
を示すフローチャートである。次に、図４を参照しなが
ら、本実施例の単語切出しについて説明する。FIG. 4 is a flow chart showing the procedure for word extraction in this embodiment. Next, word extraction in this embodiment will be explained with reference to FIG.

【００３１】まず、最初の非印字領域幅Ｓ１、印字領域
幅Ｍ１をそれぞれＳｍａｘ、Ｍｍｉｎとする（ステップ
１０１）。次に、カウント値（パラメータ）ｉを２とす
る。次にＭｉがＭｍｉｎにより小さいかどうかを判定し
、小さければＭｉを新たなＭｍｉｎとする（ステップ１
０３、１０４）。First, the initial non-print area width S1 and the initial print area width M1 are set to Smax and Mmin, respectively (step 101). Next, the count value (parameter) i is set to 2. Next, it is determined whether Mi is smaller than Mmin, and if it is, Mi is set as a new Mmin (step 1
03, 104).

【００３２】次に上記の条件１が満たされているかどう
かの判定をする（ステップ１０５、１０６）。満たされ
ていなければ、次に上記の条件２が満たされているかど
うかの判定をする（ステップ１０８、１０９）。満たさ
れていなければ、次にＳｉがＳｍａｘより大きいかどう
かの判定をし、大きければ、Ｓｉを新たなＳｍａｘとす
る（ステップ１１１、１１２）。次にカウント値ｉを１
だけ増加させて（ステップ１１３）ステップ１０３に戻
る。Next, it is determined whether the above condition 1 is satisfied (steps 105 and 106). If not, then it is determined whether the above condition 2 is satisfied (steps 108, 109). If it is not satisfied, then it is determined whether Si is greater than Smax, and if so, Si is set as a new Smax (steps 111, 112). Next, set the count value i to 1
(step 113) and returns to step 103.

【００３３】ステップ１０５、１０６で、条件１が満た
されている場合には、Ｓｉをスペースと認め、Ｍ１〜Ｍ
ｉを単語と認定する（ステップ１０７）。In steps 105 and 106, if condition 1 is satisfied, Si is recognized as a space and M1 to M
i is recognized as a word (step 107).

【００３４】ステップ１０８、１０９で、条件２が満た
されている場合には、Ｓ１をスペースと認め、Ｍ１のみ
により単語（１文字からなる単語）が構成されていると
認定する（ステップ１１０）。If condition 2 is satisfied in steps 108 and 109, S1 is recognized as a space, and a word (a word consisting of one character) is recognized to be composed only of M1 (step 110).

【００３５】上記した手順により切出された原単語は、
次いで、図１に示す単語切出制御部５から文字認識部８
に出力される。この段階では上記原単語は２値画像デー
タであり、該原単語を構成する個々の文字の印字領域は
認識されているが、文字パターンは未だ認識されていな
い。[0035] The original words extracted by the above procedure are:
Next, the word extraction control section 5 to the character recognition section 8 shown in FIG.
is output to. At this stage, the original word is binary image data, and although the printing area of each character constituting the original word has been recognized, the character pattern has not yet been recognized.

【００３６】文字認識部８では、単語切出部から送られ
た２値画像データを受けて、該２値画像データから上記
原単語を構成する個々の文字を切出して文字パターンを
得、予め認識辞書９に記憶されている標準文字パターン
と照合し、文字として認識される。The character recognition unit 8 receives the binary image data sent from the word extraction unit, extracts individual characters constituting the original word from the binary image data, obtains a character pattern, and performs recognition in advance. It is compared with standard character patterns stored in the dictionary 9 and recognized as a character.

【００３７】上記認識操作の繰返しによって認識された
文字列が、被検索用の原単語となる。The character string recognized by repeating the above recognition operation becomes the original word to be searched.

【００３８】そして、上記被検索用原単語が単語検索部
１０に送られる。単語検索部１０は、単語辞書１１を検
索することにより送られてきた被検索用原単語に対応す
る翻訳情報を得、上記被検索用原単語及び翻訳情報を表
示部１２に出力する。The original word to be searched is then sent to the word search section 10. The word search section 10 obtains translation information corresponding to the sent original word to be searched by searching the word dictionary 11, and outputs the original word to be searched and the translation information to the display section 12.

【００３９】以上のように本実施例では、非印字領域幅
の最小値および印字領域幅の最大値に所定の係数を掛け
た値との比較により、文字ピッチが変化する場合にも単
語の切出しを確実に行うことができる。As described above, in this embodiment, words can be cut out even when the character pitch changes by comparing the minimum value of the non-printing area width and the maximum value of the printing area width multiplied by a predetermined coefficient. can be done reliably.

【００４０】なを、上記の実施例では、非印字領域幅の
最大値および印字領域幅の最小値に基ずいて、単語の切
出し条件を定めたが、平均値に基ずく判定を採用しても
よく、また非印字領域幅や印字領域幅のばらつきに応じ
て、しきい値を変えてもよい。In the above embodiment, the word extraction conditions were determined based on the maximum value of the non-print area width and the minimum value of the print area width, but the judgment based on the average value was adopted. Alternatively, the threshold value may be changed depending on variations in the width of the non-printing area or the width of the printing area.

【００４１】[0041]

【発明の効果】本発明の電子辞書によれば、非印字領域
の幅を文字列の先頭から順に他の印字領域及び非印字領
域の幅と逐次比較して、相対的な大きさを判定し単語相
互間に設けられているスペースの位置を検出することに
より、単語切出を行なうことができる。[Effects of the Invention] According to the electronic dictionary of the present invention, the width of the non-printing area is sequentially compared with the width of other printing areas and the non-printing area from the beginning of the character string to determine the relative size. Words can be extracted by detecting the positions of spaces provided between words.

【００４２】従って、行毎に文字ピッチが変えられてい
て、単語相互間が極端に詰っていたり、逆に文字相互間
の非印字領域がかなり広くなっている原稿であっても、
正確に単語を切出すことができる。[0042] Therefore, even if the character pitch is changed for each line, and the spaces between words are extremely close together, or conversely, the non-printing area between characters is quite wide,
Able to accurately cut out words.

[Brief explanation of the drawing]

【図１】本発明に係る電子辞書の一実施例を示すブロッ
ク図である。FIG. 1 is a block diagram showing an embodiment of an electronic dictionary according to the present invention.

【図２】従来の電子辞書の構成例を示すブロック図であ
る。FIG. 2 is a block diagram showing a configuration example of a conventional electronic dictionary.

【図３】投影法による印字領域と非印字領域との検出方
法を示す図である。FIG. 3 is a diagram showing a method of detecting print areas and non-print areas using a projection method.

【図４】本発明に係わる電子辞書における単語切出手順
の一例を示すフローチャートである。FIG. 4 is a flowchart showing an example of a word extraction procedure in the electronic dictionary according to the present invention.

【符号の説明】４　　単語切出部５　　単語切出制御部６　　２値画像メモリ７　　縦方向投影部[Explanation of symbols] 4 Word extraction part 5 Word extraction control section 6 Binary image memory 7 Vertical projection section

Claims

[Claims]

1. Reading and converting means for optically reading a character string arranged on a document and converting it into binary image data representing the string of characters, and receiving the binary image data and printing from the character string. a detection means for detecting a non-print area and a non-print area, and a detection means for detecting a non-print area satisfying a predetermined criterion from a plurality of non-print areas detected by the detection means, and detecting the non-print area between words in the character string. a word extraction means for cutting out a word by determining that it is a blank area provided in the word extraction means; a character recognition means for cutting out and recognizing characters constituting the word from the word obtained by the word extraction means; 1. An electronic dictionary comprising: word search means for receiving a word recognized by the word search means, searching and outputting corresponding information; and display means for displaying the information received from the word search means.