JP2005135210A

JP2005135210A - Portable device with character recognition function

Info

Publication number: JP2005135210A
Application number: JP2003371499A
Authority: JP
Inventors: Masashi Koga; 昌史古賀; Tatsuya Kameyama; 達也亀山; Ryuji Mine; 竜治嶺; Hiroshi Shinjo; 広新庄
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2003-10-31
Filing date: 2003-10-31
Publication date: 2005-05-26

Abstract

<P>PROBLEM TO BE SOLVED: To allow the input means of a portable terminal with a camera to automatically detect the subject of reading for character recognition from the surrounding scenes and recognize characters. <P>SOLUTION: A character string to be recognized is automatically set according to a user-designated keyword. A process for extracting the character string and/or recognizing characters is carried out on images continuously inputted from an image input means, and if recognition has been successful, it is reported to the user by means of voice or vibration. This makes it possible to automatically detect the character string in scenes and report it to the operator, which has been difficult with conventional devices. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は，カメラを有する携帯型の端末における入力手段に関する技術分野に属する。 The present invention belongs to a technical field relating to input means in a portable terminal having a camera.

従来より，カメラで撮った画像をデジタル化し，ファイルとして蓄積することができるデジタルスチルカメラ，カメラ付き携帯電話，カメラ付きPDAなどの装置が実用化されている。さらに，これらの機器を文字認識における画像入力手段として用いることが試みられている。例えば，H. Fujisawa, H. Sako, Y. Okada, and S-W. Lee, "Information Capturing Camera and Developmental Issues," Proc. Int. Conf. Document Analysis and Recognition, ICDAR'99, Bangalore, India, Sep. 20-22, 1999, pp. 205-208（非特許文献１）では，カメラ付きの携帯機器で外国語の標識や看板の文字を読取り，翻訳，検索などに用いることが述べられている。 Conventionally, devices such as a digital still camera, a camera-equipped mobile phone, and a camera-equipped PDA that can digitize images stored with a camera and store them as files have been put into practical use. Furthermore, attempts have been made to use these devices as image input means in character recognition. For example, H. Fujisawa, H. Sako, Y. Okada, and SW. Lee, "Information Capturing Camera and Developmental Issues," Proc. Int. Conf. Document Analysis and Recognition, ICDAR'99, Bangalore, India, Sep. 20 -22, 1999, pp. 205-208 (Non-Patent Document 1) describes that foreign language signs and signboard characters are read, translated, and searched by a portable device with a camera.

一般に文字認識には，(1) 文字行領域の検出(文字行抽出)，(2) 文字行領域からの各文字の領域の検出(文字切出し)，(3) 切出した個々の文字がいかなる文字であるかの識別(文字識別)，(4) 文字識別結果の解釈(言語処理)の４つの機能が必要である。ただし，必ずしもこれらの順に処理を行うとは限らない。例えば，村瀬洋，“言語情報を利用した手書き文字列からの文字きりだしと認識,”信学論 (D), vol.J69-D, no.9, pp.765-772（非特許文献２）においては，様々な仮説に基づいて文字切出した後，文字識別結果と言語処理によって文字の切出し方を確定する。 In general, character recognition involves (1) detection of character line area (character line extraction), (2) detection of each character area from the character line area (character extraction), and (3) any character extracted from each character. (4) Interpretation of character identification results (language processing) is required. However, the processing is not necessarily performed in these order. For example, Hiroshi Murase, “Lettering and Recognizing Characters from Handwritten Character Strings Using Linguistic Information,” Science Review (D), vol.J69-D, no.9, pp.765-772 (non-patent document 2 In), after character extraction based on various hypotheses, the character extraction method is determined by the character identification result and language processing.

従来の文字認識技術では，主に紙の文書に書かれた文字を対象としており，画像はイメージスキャナによって入力していた。これに対し，カメラを入力手段として看板や標識などを読取る場合には，上記の機能の内，特に文字行切出しおよび文字切出しを高度化する必要がある。なぜなら，文字と背景の分離や文字配置の解析といった，従来の文字行抽出や文字切出しで必要な要素機能の実現が困難になるからである。例えば，カメラで画像を入力する場合には，スキャナで画像を入力する場合と異なり，照明条件を予め知ることができないため，文字と背景の分離は困難になる。また，紙の文書では文字行の配置に文書固有の制約があるが，カメラで撮った画像中の看板や標識にはこうした制約はないため，文字の配置に関する先見的な知識に基づいて文字行を切出すことは困難になる。 Conventional character recognition technology mainly targets characters written on paper documents, and images are input by an image scanner. On the other hand, when reading a signboard or a sign using a camera as an input means, it is necessary to enhance the character line extraction and character extraction among the above functions. This is because it becomes difficult to realize element functions necessary for conventional character line extraction and character extraction, such as character and background separation and character layout analysis. For example, when inputting an image with a camera, unlike the case of inputting an image with a scanner, it is difficult to separate the characters and the background because the lighting conditions cannot be known in advance. Also, in paper documents, there are document-specific restrictions on the arrangement of character lines, but there are no such restrictions on signs and signs in images taken with cameras, so character lines are based on a priori knowledge of character arrangement. It becomes difficult to cut out.

H. Fujisawa, H. Sako, Y. Okada, and S-W. Lee, "Information Capturing Camera and Developmental Issues," Proc. Int. Conf. Document Analysis and Recognition, ICDAR'99, Bangalore, India, Sep. 20-22, 1999, pp. 205-208H. Fujisawa, H. Sako, Y. Okada, and SW. Lee, "Information Capturing Camera and Developmental Issues," Proc. Int. Conf. Document Analysis and Recognition, ICDAR'99, Bangalore, India, Sep. 20-22 , 1999, pp. 205-208

村瀬洋，“言語情報を利用した手書き文字列からの文字きりだしと認識,”信学論 (D), vol.J69-D, no.9, pp.765-772Hiroshi Murase, “Character extraction and recognition from handwritten strings using linguistic information,” IEICE (D), vol.J69-D, no.9, pp.765-772 R.M.K. Sinha, B. Prasada, G.F. Houle, M. Sabourin, “Hybrid Contextual Text Recognition with String Matching,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 15, No. 9, December 1993R.M.K.Sinha, B. Prasada, G.F.Houle, M. Sabourin, “Hybrid Contextual Text Recognition with String Matching,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 15, No. 9, December 1993 A.K. Jain, B. Yu, “Automatic Text Location in Images and Video Frames,” Pattern Recognition, Vol. 31, No. 12, pp. 2055-2076, 1998A.K. Jain, B. Yu, “Automatic Text Location in Images and Video Frames,” Pattern Recognition, Vol. 31, No. 12, pp. 2055-2076, 1998 C.-L. Liu, M. Koga and H. Fujisawa, "Lexicon-driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading," IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 24, No. 11, Nov. 2002, pp. 425-1437C.-L. Liu, M. Koga and H. Fujisawa, "Lexicon-driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading," IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 24, No. 11, Nov . 2002, pp. 425-1437

本発明の携帯機器は，カメラを文字認識の画像入力手段として用い，街頭などで特定の文字の書かれた看板・標識類を読取るものである。従来の技術では，操作者が読取り対象の看板・標識類を自ら探し，これにカメラを向け，文字認識をする。これに対し，本発明の装置は，広角のカメラを用いて広い範囲を撮影し，自動的に読み取り対象を検出する機能を有する。これにより，操作者が自ら読取り対象を探してカメラを向ける必要がなくなり，大幅に利便性が向上する。 The portable device of the present invention uses a camera as an image input means for character recognition, and reads signboards / signs on which specific characters are written on a street or the like. In the conventional technology, an operator searches for a signboard / signs to be read by himself / herself and points a camera at the sign / signs to be recognized. In contrast, the apparatus of the present invention has a function of photographing a wide range using a wide-angle camera and automatically detecting a reading target. This eliminates the need for the operator to search for the reading object and point the camera at the user, greatly improving convenience.

こうした機能を実現するために，本発明で解決しようとするのは以下のような課題である。 In order to realize such functions, the present invention intends to solve the following problems.

第一に，文字の検出・認識には処理時間がかかることである。本発明の携帯機器では従来のものに比べ処理する画像の領域が大きいため，検出される文字行も多くなり，文字切出し，文字識別，言語処理の処理時間が増大する。処理時間の増大により，操作者が必要な情報を得るために待つ時間が延び，機器の利便性は低下する。このため，処理時間の削減は重要な課題である。 First, it takes time to detect and recognize characters. Since the portable device of the present invention has a larger image area to be processed than the conventional device, the number of detected character lines increases, and the processing time for character extraction, character identification, and language processing increases. Due to the increase in processing time, the time that an operator waits to obtain necessary information is extended, and the convenience of the device is reduced. Therefore, reduction of processing time is an important issue.

第二に，画像中には不要な文字列が多数ある一方，探している看板・標識にどのような文字が書かれているか，必ずしも正確には操作者にはわからないことである。街中で画像を撮った場合には，多数の文字が画像に写る。しかし，操作者の関心のあるものはそれらの一部であることが多い。ここで，認識対象を操作者が予め指定した文字列に限定することにより，不要な文字を排除することができる。しかし，一つの事柄を意味する文字列が何通りもあることも多い。例えば，英語で便所を意味する文字列は「bathroom」「toilet」「rest room」「lavatory」など様々なものがある。これらの様々な文字列のいずれが画像中に出現するかを予め操作者が知ることは出来ない。また，これらを全て操作者が指定するのには,多くの手間がかかる。このように，不要な文字を排除することと，多様な文字列を認識対象にすることを同時に実現することは，重要であり，解決が困難な課題である。 Second, while there are many unnecessary character strings in the image, the operator does not necessarily know exactly what characters are written on the signboard / sign that is being searched for. When an image is taken in the city, many characters appear in the image. However, what is of interest to the operator is often part of them. Here, unnecessary characters can be eliminated by limiting the recognition target to character strings designated in advance by the operator. However, there are often many strings that mean one thing. For example, there are various character strings such as “bathroom”, “toilet”, “rest room”, and “lavatory”. The operator cannot know in advance which of these various character strings appears in the image. Also, it takes a lot of time for the operator to specify all of these. As described above, it is important to eliminate unnecessary characters and to make various character strings to be recognized at the same time, which is a difficult problem to solve.

第三に，撮影した画像中には必ずしも操作者に関心のあるの文字列が写っているとは限らないことである。上述したように，操作者に代わって携帯機器が読取り対象の文字を画像中から探し出せるようにすることが，本発明の目的である。この目的を実現するためには，画像を撮影した時点では，操作者が関心のある文字列が写っているか否かを意識しないでよいようにする必要がある。この問題の自明な解決法として，画像を撮影して画像中の文字を認識した後，操作者が関心がある文字列の有無を携帯機器が操作者に提示するという手段がある。しかし,この手段を用いた場合には，操作者は撮影，文字列の有無の確認を何度も繰り返す必要があり，装置の利便性は悪くなる。このように，撮影した画像中に必ずしも操作者の関心のある文字列があるとは限らない問題を，装置の利便性を低下させずに解決することは，重要な課題である。 Third, the captured image does not necessarily include a character string of interest to the operator. As described above, it is an object of the present invention to enable a mobile device to search for characters to be read from an image on behalf of an operator. In order to realize this purpose, it is necessary that the operator does not have to be aware of whether or not a character string of interest is captured when an image is taken. As a self-explanatory solution to this problem, there is a means in which after the image is captured and characters in the image are recognized, the portable device presents to the operator whether or not there is a character string of interest to the operator. However, when this means is used, the operator needs to repeat photographing and checking the presence / absence of a character string many times, and the convenience of the apparatus is deteriorated. As described above, it is an important problem to solve the problem that the character string of interest to the operator is not always present in the captured image without reducing the convenience of the apparatus.

本発明では，上記の第一の課題を解決するために，予め認識対象の文字列を絞り込めるよう，操作者が認識対象の文字列の集合を指定する手段を提供する。さらに，指定した認識対象の文字列の集合に応じて文字識別の字種を絞り込む手段を提供する。一般に，文字識別の処理時間は，識別する文字種の数に応じて増加する。このため，予め認識対象の文字列が限定できるならば，それに応じて識別する文字種も絞り込むことで，処理時間を削減することができる。そこで，本発明の携帯機器では，操作者が関心のある文字列の集合を指定できるようにし，さらに指定した文字列の集合に応じて自動的に文字識別で用いる文字種を絞り込むようにする。また，認識対象の文字列群を予め記憶しておき，操作者がそれを必要に応じて選択できるようにすることで，認識対象文字列の指定を容易にする。さらに，携帯機器が状況に応じて自動的に行う手段を設けることで,一層の容易化を実現する。 In order to solve the first problem, the present invention provides means for an operator to designate a set of character strings to be recognized so that the character strings to be recognized can be narrowed down in advance. Furthermore, a means for narrowing down character types for character identification according to a set of designated character strings to be recognized is provided. In general, the processing time for character identification increases according to the number of character types to be identified. Therefore, if the character strings to be recognized can be limited in advance, the processing time can be reduced by narrowing down the character types to be identified accordingly. Therefore, in the portable device of the present invention, the operator can designate a set of character strings of interest, and automatically narrow down the character types used for character identification according to the designated character string set. In addition, a recognition target character string can be easily specified by storing a recognition target character string group in advance and allowing the operator to select it as necessary. Furthermore, it will be even easier by providing means for the mobile device to automatically perform according to the situation.

また，第二の課題を解決するために，同じ意味の文字列の集合を自動的に取得する手段を設ける。例えば，操作者が「便所」という文字列を入力，もしくはメニューから選択すると，携帯機器は予め記憶してある同意語に関する情報を参照し，「便所」の同意語の集合を取得する。さらに，一般的な言語の知識を利用し，必要に応じて同意語をさらに変化させ，認識対象の文字列の集合を決定する。ここで一般的な言語知識で同意語を変化させることは，例えば，日本語の場合では送り仮名の多様さに対応して「預かり所」から「預かり所」「預り所」「預所」といった文字列の集合を生成することである。また，英語の場合では大文字・小文字の表記の多様さに対応して「rest room」から「rest room」「Rest Room」「REST ROOM」といった文字列の集合を生成することである。 In order to solve the second problem, means for automatically acquiring a set of character strings having the same meaning is provided. For example, when the operator inputs the character string “toilet” or selects it from the menu, the portable device refers to information on the synonym stored in advance and acquires a set of synonyms for “toilet”. Furthermore, using a general language knowledge, synonyms are further changed as necessary, and a set of character strings to be recognized is determined. Here, changing synonyms with general linguistic knowledge means, for example, in the case of Japanese, from “Depository” to “Depositary”, “Depositary”, “Depositary”, etc. To create a set of strings. In the case of English, a set of character strings such as “rest room”, “rest room”, “Rest Room”, and “REST ROOM” is generated corresponding to the various notation of uppercase and lowercase letters.

また，第三の課題を解決するために，携帯機器が自動的に撮像と文字認識を繰り返すようにするとともに，操作者に関心のある文字列が画像中から認識された場合には，その旨を音声や振動などで操作者に伝えるとともに，認識された文字列の場所を画像で表示する。このことにより，操作者が個々の画像に関心のある文字列が写っているかどうかを意識する必要がなくなる。さらに，関心のある文字列が見つかった場合には，速やかにそれを知ることができるとともに，その文字列を容易に目視で見つけ，カメラを介さずに詳細を確認することができるようになる。 In addition, in order to solve the third problem, the mobile device automatically repeats imaging and character recognition, and when a character string of interest to the operator is recognized from the image, this is indicated. Is transmitted to the operator by voice or vibration, and the location of the recognized character string is displayed as an image. This eliminates the need for the operator to be aware of whether or not a character string of interest is shown in each image. Furthermore, if a character string of interest is found, it can be quickly known, and the character string can be easily found visually and the details can be confirmed without using a camera.

従来は困難であった，景観中の文字列を自動的に検出して操作者に告知することが可能となる。 It is possible to automatically detect the character string in the landscape and notify the operator, which was difficult in the past.

図１に本発明の第一の実施例における画像の入力から画像ファイルの保存に至る処理の流れをデータフロー図で示す。図中の枠１０１で示す範囲が携帯機器内部で行う処理である。 FIG. 1 is a data flow diagram showing the flow of processing from image input to image file storage in the first embodiment of the present invention. A range indicated by a frame 101 in the figure is processing performed inside the mobile device.

本実施例では，カメラなどによって画像を入力(１０９)後，認識対象文字列辞書１０８を参照し，画像中から文字列を認識（１１０）する。認識が成功，すなわち認識対象が認識対象文字列辞書１０８に格納されている文字列を画像中から検出した際には，その旨を音声もしくは振動で操作者に告知１１１するとともに，認識結果を表示する（１１２）。 In this embodiment, after inputting an image by a camera or the like (109), the character string is recognized (110) from the image by referring to the recognition target character string dictionary 108. When the recognition is successful, that is, when a character string stored in the recognition target character string dictionary 108 is detected from the image, the operator is notified 111 by voice or vibration and the recognition result is displayed. (112).

認識文字列辞書１０８には，認識対象となる文字列の集合を格納する。格納する文字列は，以下の３通りの方式で決定する。 The recognition character string dictionary 108 stores a set of character strings to be recognized. The character string to be stored is determined by the following three methods.

（１）操作者が関心のある言葉，すなわちキーワードを携帯機器に入力する（１０２）。携帯機器では，携帯機器中に格納してある同意語辞書１０４もしく携帯機器の外部に格納されている外部同意語辞書１１４を参照し，入力されたキーワードの同意語の集合を得て（同意語展開処理１０３），さらにこれらを一般的な言語知識で変化させて，これらを認識対象文字列辞書１０８に格納する。同意語辞書１０４と外部同意語辞書１１４は，単語とその同意語の集合の関係を記憶するものである。外部同意語辞書１１４には後述する通信手段を介して参照する。また,一般的な言語知識による変化としては，以下の処理を行う。
・英単語の先頭文字を大文字にする。
・英単語の全ての文字を大文字にする。 (1) The operator inputs a word of interest, that is, a keyword to the portable device (102). In the portable device, the synonym dictionary 104 stored in the portable device or the external synonym dictionary 114 stored outside the portable device is referred to, and a set of synonyms of the input keyword is obtained (consent The word expansion process 103) is further changed by general language knowledge, and these are stored in the recognition target character string dictionary 108. The synonym dictionary 104 and the external synonym dictionary 114 store a relationship between a word and a set of synonyms thereof. The external synonym dictionary 114 is referred to via communication means described later. In addition, as a change due to general language knowledge, the following processing is performed.
・ Capitalize the first letter of English words.
・ Capitalize all English letters.

同意語展開の機能により，操作者が多数の同意語を逐一携帯機器に入力する必要がなくなり，利便性が大幅に向上する。さらに，認識対象が外国語の場合，キーワードを母国語で指定し，これに対応する同意語を外国語としてもよい。 The synonym expansion function eliminates the need for the operator to input a large number of synonyms into the portable device one by one, greatly improving convenience. Furthermore, when the recognition target is a foreign language, the keyword may be specified in the native language, and the corresponding synonym may be the foreign language.

（２）携帯機器は，認識メニュー情報１０６を参照し，キーワードをメニューの形で操作者に表示する。操作者は，メニュー中から関心のあるものを選択する（１０５）。選択結果に応じ，携帯機器はメニュー情報１０６に格納してある認識対象文字列の集合を認識対象文字列辞書１０８に格納する。メニュー中から自分の関心のあるキーワードを選択する手段を設けることにより，操作者が多数の同意語を逐一携帯機器に入力する必要がなくなり，携帯機器の利便性が大幅に向上する。 (2) The portable device refers to the recognition menu information 106 and displays the keyword to the operator in the form of a menu. The operator selects an item of interest from the menu (105). In accordance with the selection result, the portable device stores a set of recognition target character strings stored in the menu information 106 in the recognition target character string dictionary 108. By providing a means for selecting a keyword of interest from the menu, the operator does not need to input many synonyms into the portable device one by one, and the convenience of the portable device is greatly improved.

（３）携帯機器は，認識対照文字列取得を要求する信号を，通信手段を介して，外部の計算機に送る。これを受信した計算機は，認識対象文字列の集合を携帯機器に送信する。受信した認識対象文字列の集合を携帯機器は認識対象文字列辞書１０８に格納する。認識対象文字列取得信号は，操作者が特定の操作をした時に送信するようにしてもよい。また，一定の時刻に送信するようにしてもよい。また，外部の計算機が送信する認識対象文字列を，携帯機器の場所，時刻，操作者などに適合して変化させることで，さらに携帯機器の利便性は向上する。 (3) The portable device sends a signal requesting acquisition of the recognition reference character string to an external computer via the communication means. Upon receiving this, the computer transmits a set of recognition target character strings to the portable device. The mobile device stores the received set of recognition target character strings in the recognition target character string dictionary 108. The recognition target character string acquisition signal may be transmitted when the operator performs a specific operation. Moreover, you may make it transmit at a fixed time. Further, the convenience of the mobile device is further improved by changing the recognition target character string transmitted by the external computer in accordance with the location, time, operator, etc. of the mobile device.

本実施例では，文字列認識処理としては，例えば，R.M.K. Sinha, B. Prasada, G.F. Houle, M. Sabourin, “Hybrid Contextual Text Recognition with String Matching,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 15, No. 9, December 1993（非特許文献３）のような方式を用いる。図２に文字列認識処理のデータフローの一例を示す。まず２０１において入力画像から文字行を切出す。文字行切出しには，例えばA.K. Jain, B. Yu, “Automatic Text Location in Images and Video Frames,” Pattern Recognition, Vol. 31, No. 12, pp. 2055-2076, 1998（非特許文献４）のような方式を用いる。次に２０２において文字行中から個々の文字を切出す。複数の文字行が切出された場合には，それら全てを以降の処理の対象とする。次に，ステップ２０３において切出した個々の文字が何の文字であるかを識別する。この際，文字種限定２０４にて認識文字列辞書１０８中に含まれる文字種のみに識別字種を限定し，文字識別辞書２０６を参照する。文字識別辞書２０６は，各文字の形状に関する情報を記憶してある。最後にステップ２０５において，文字識別した結果を文字列として解釈する。この際，認識文字列辞書１０８を参照する。また，文字列認識処理の出力は，認識対象文字列の有無と，認識された文字列である。 In this embodiment, for example, RMK Sinha, B. Prasada, GF Houle, M. Sabourin, “Hybrid Contextual Text Recognition with String Matching,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. , No. 9, December 1993 (Non-Patent Document 3). FIG. 2 shows an example of the data flow of character string recognition processing. First, in 201, a character line is cut out from the input image. For example, AK Jain, B. Yu, “Automatic Text Location in Images and Video Frames,” Pattern Recognition, Vol. 31, No. 12, pp. 2055-2076, 1998 (Non-patent Document 4) Such a method is used. Next, in 202, individual characters are cut out from the character line. When a plurality of character lines are cut out, all of them are subjected to subsequent processing. Next, what character each individual character extracted in step 203 is identified. At this time, the character type limitation 204 limits the identification character type to only the character types included in the recognized character string dictionary 108 and refers to the character identification dictionary 206. The character identification dictionary 206 stores information on the shape of each character. Finally, in step 205, the character identification result is interpreted as a character string. At this time, the recognized character string dictionary 108 is referred to. The output of the character string recognition process is the presence / absence of a character string to be recognized and the recognized character string.

文字識別の処理時間は，おおよそ，識別対象の文字種数に比例する。このため，認識対象の文字列に出現しうる文字種に識別対象を限定することで，大幅な処理時間の削減が可能となる。この効果は，特に，漢字やハングルなど文字種が多い言語で顕著となる。 The processing time for character identification is roughly proportional to the number of character types to be identified. For this reason, the processing time can be significantly reduced by limiting the identification target to the character types that can appear in the character string to be recognized. This effect is particularly noticeable in languages with many character types such as Kanji and Korean.

本実施例では，文字切出し，文字識別，後処理を逐次的に実行しているが，C.-L. Liu, M. Koga and H. Fujisawa, "Lexicon-driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading," IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 24, No. 11, Nov. 2002, pp. 425-1437（非特許文献５）にあるように，これらを統合した処理を実行してもよい。この場合には，文字種の限定は認識の過程で動的に行われ，一層の高速化，高精度化が実現できる。また，別の実施例として，言語情報を用いずに文字認識を実行した後，通常のテキストマッチングのアルゴリズムを用いて認識文字列辞書１０８中の単語と文字認識結果を照合するようにしてもよい。 In this embodiment, character extraction, character recognition, and post-processing are sequentially performed. However, C.-L. Liu, M. Koga and H. Fujisawa, "Lexicon-driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading, "IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 24, No. 11, Nov. 2002, pp. 425-1437 (Non-Patent Document 5). May be. In this case, the character type is limited dynamically during the recognition process, so that higher speed and higher accuracy can be realized. As another embodiment, after character recognition is performed without using language information, a word in the recognized character string dictionary 108 may be collated with a character recognition result using a normal text matching algorithm. .

図３に，本発明の実施例におけるハードウエアの構成を示す。画像は，レンズ,絞りなどからなる光学装置３０２によって撮像された後，例えばCCD素子などの光電変換素子３０３で電気信号に変換される。さらに得られた電気信号はアナログ・デジタル変換器３０４にてデジタル信号に変換し，さらに例えばDSPなどの信号処理素子３０５により，色空間変換，フィルタ処理などの処理を施す。この結果は，ＲＡＭ３０９へと転送する。また,認識対象文字列辞書１０８もＲＡＭ３０９に格納する。演算装置３０７は，ＲＯＭ３０８に格納されている処理手順と文字識別辞書などのデータを参照し，ＲＡＭ３０９に格納されている画像を入力として文字列認識処理１１０を実行する。また，入力装置３１２は，キーワードを入力する際に用いる。また，表示装置３０６は，撮影時の画像の確認，文字列認識結果の表示に用いる。音声発生装置３１３と振動装置３１４は，文字列存在告知１１１に用いる。また，通信装置３１５は，認識対象文字列の受信や外部同意語辞書１１３のアクセスに用いる。 FIG. 3 shows a hardware configuration in the embodiment of the present invention. An image is picked up by an optical device 302 including a lens, a diaphragm, etc., and then converted into an electric signal by a photoelectric conversion element 303 such as a CCD element. Further, the obtained electrical signal is converted into a digital signal by an analog / digital converter 304, and further subjected to processing such as color space conversion and filter processing by a signal processing element 305 such as a DSP. The result is transferred to the RAM 309. The recognition target character string dictionary 108 is also stored in the RAM 309. The arithmetic unit 307 refers to the processing procedure stored in the ROM 308 and data such as a character identification dictionary, and executes the character string recognition processing 110 using the image stored in the RAM 309 as an input. The input device 312 is used when inputting a keyword. The display device 306 is used to check an image at the time of shooting and display a character string recognition result. The sound generation device 313 and the vibration device 314 are used for the character string presence notification 111. The communication device 315 is used for receiving a recognition target character string and accessing the external synonym dictionary 113.

図４に，本発明の実施例における装置の外観を示す。筐体４０１の前面には，光学装置２０２のレンズ部４０２を配置する。レンズには，操作者の周囲の看板,標識などを撮影するのに十分な広い画角のものを用いる。上部には，入力装置３１２の一部であって画像入力１０９を指示するためのシャッター４０３と，電源スイッチ４０４と，音声発生装置３１３の一部であるスピーカ開口部４１０を配置する。側面には，通信装置３１５のアンテナ４０７を配置する。背面には，入力装置３１２の一部である，キャンセルボタン４０５，カーソルキー４０６を配置する。さらに背面には，表示装置３０６の表示面４０９を配置する。カーソルキー４０６は，上下左右の端を押すと方向を指示する信号を入力装置３１２に送り，中央を押すと別の信号を入力装置３１２へ送る。本実施例は，操作時以外は携帯機器は操作者の胸ポケットなどに収納し，レンズのみを外に出して景観を撮影できるようにするとよい。 FIG. 4 shows the appearance of the apparatus in the embodiment of the present invention. The lens unit 402 of the optical device 202 is disposed on the front surface of the housing 401. Use a lens with a wide angle of view sufficient to photograph signs and signs around the operator. In the upper part, a shutter 403 for instructing the image input 109 as a part of the input device 312, a power switch 404, and a speaker opening 410 which is a part of the sound generator 313 are arranged. The antenna 407 of the communication device 315 is disposed on the side surface. On the back side, a cancel button 405 and a cursor key 406, which are a part of the input device 312 are arranged. Further, a display surface 409 of the display device 306 is disposed on the back surface. The cursor key 406 sends a signal indicating a direction to the input device 312 when the upper, lower, left and right ends are pressed, and sends another signal to the input device 312 when the center is pressed. In this embodiment, it is preferable that the portable device is stored in an operator's breast pocket or the like so that the scenery can be photographed by taking out only the lens outside except during operation.

図５に，本実施例における携帯機器の状態遷移を示す。まず，電源スイッチ４０４を押すことにより，開始状態５０１から待機状態５０２へ遷移する。待機状態５０２にて，キーワード入力操作を起動することにより，キーワード入力状態５０３へ遷移し，入力が完了すると再び待機状態５０２に戻る。キーワード入力状態５０３では，例えば，仮想キーボードが表示面４０９に現れ,これをカーソルキー４０６で操作して文字列を入力する。また待機状態５０２にてメニュー選択操作を起動することで，メニュー選択状態５０４へ遷移し，メニュー選択が完了すると再び待機状態５０２へ戻る。また，待機状態５０２にて認識文字列取得を起動することにより，認識文字列取得状態５０５となり，認識対象文字列の取得が完了すると，再び待機状態５０２に戻る。認識文字列取得状態５０５では，携帯機器への操作は一時的に受け付けられなくなり，その間に，携帯機器が外部の計算機と通信して認識対象の文字列を取得する。待機状態５０２にて電源スイッチ４０４を押すことで，開始状態５０１へ戻る。待機状態５０２にて，認識を起動することにより，認識状態５０６へと遷移する。認識状態５０７では，後述するように，画像中から認識対象の文字列が認識されるまで，携帯機器は画像入力１０９と文字列認識１１０を繰り返す。この状態では，操作者はカメラのレンズだけを外に出して，携帯機器をポケットなどに収納して携行する。文字列が認識されたなら，文字列存在告知状態５０７へ遷移する。文字列存在告知状態５０７では，携帯機器は操作者の注意を喚起するため，音声もしくは振動を発する。操作者がキャンセルボタン４０５を押すと，音声もしくは振動は停止し，文字列認識結果表示状態５０８へ遷移する。文字列認識結果表示状態５０８では，表示面４０９に文字列認識結果が画像と重ねて表示される。ここで，キャンセルボタン４０５を押すことで，再び待機状態５０２へ遷移する。 FIG. 5 shows the state transition of the portable device in the present embodiment. First, when the power switch 404 is pressed, a transition is made from the start state 501 to the standby state 502. When the keyword input operation is activated in the standby state 502, the state transitions to the keyword input state 503. When the input is completed, the state returns to the standby state 502 again. In the keyword input state 503, for example, a virtual keyboard appears on the display surface 409, and this is operated with the cursor key 406 to input a character string. In addition, when the menu selection operation is started in the standby state 502, the state transitions to the menu selection state 504. When the menu selection is completed, the state returns to the standby state 502 again. Moreover, by starting the recognition character string acquisition in the standby state 502, the recognition character string acquisition state 505 is entered. When the acquisition of the recognition target character string is completed, the process returns to the standby state 502 again. In the recognized character string acquisition state 505, operations on the portable device are temporarily not accepted, and during that time, the portable device communicates with an external computer to acquire a character string to be recognized. Pressing the power switch 404 in the standby state 502 returns to the start state 501. When the recognition is activated in the standby state 502, the state transitions to the recognition state 506. In the recognition state 507, as will be described later, the portable device repeats the image input 109 and the character string recognition 110 until the recognition target character string is recognized from the image. In this state, the operator takes out only the camera lens and carries the portable device in a pocket or the like. If the character string is recognized, the state transitions to the character string presence notification state 507. In the character string presence notification state 507, the mobile device emits voice or vibration in order to call the operator's attention. When the operator presses the cancel button 405, the voice or vibration is stopped and the state transitions to the character string recognition result display state 508. In the character string recognition result display state 508, the character string recognition result is displayed on the display surface 409 so as to overlap the image. Here, when the cancel button 405 is pressed, the state transitions to the standby state 502 again.

図６に，待機状態５０２での表示面４０９を示す。表示面４０９にはメニュー６０１が表示される。メニュー中での選択項目は，操作者がカーソルキーの上下部分を押すことで変更し，確定の際にはカーソルキー４０６の中央を押す。いずれも選択しない場合には，キャンセルボタン４０５を押す。キーワード入力状態５０３に遷移する際には，操作者が「キーワード入力」６０３を選択し，カーソルキー４０６の中央を押す。また，メニュー選択状態５０４に遷移するには，「キーワード選択」６０２を選択して確定する。また，認識文字列取得状態５０５に遷移するには，「キーワード取得」６０４を選択して確定する。また，認識５０６状態に遷移するには，「認識」６０５を選択して確定する。 FIG. 6 shows the display surface 409 in the standby state 502. A menu 601 is displayed on the display surface 409. The selection item in the menu is changed by the operator pressing the upper and lower portions of the cursor key, and the center of the cursor key 406 is pressed when confirming. If neither is selected, the cancel button 405 is pressed. When transitioning to the keyword input state 503, the operator selects “keyword input” 603 and presses the center of the cursor key 406. In order to change to the menu selection state 504, the “keyword selection” 602 is selected and confirmed. In order to shift to the recognized character string acquisition state 505, “Keyword acquisition” 604 is selected and confirmed. In order to change to the recognition 506 state, “recognition” 605 is selected and confirmed.

図７に，メニュー選択状態５０４での表示面４０９を示す。ここでは，ウインドウ７０１上に「便所」「案内所」などのキーワードが表示され,各キーワードの左にはチェックボックス７０２が配置してある。チェックボックスを操作することにより，キーワードを選択することができる。ここで,メニューに表示されるキーワードとそれに対応する同意語群は予め認識メニュー情報１０６に記憶してあり，それらが認識対象文字列として用いられる。例えば，「便所」をここで選択した場合には，「rest room」「Restroom」「toilet」などの文字列が認識対象となる。 FIG. 7 shows the display screen 409 in the menu selection state 504. Here, keywords such as “toilet” and “information center” are displayed on the window 701, and a check box 702 is arranged to the left of each keyword. A keyword can be selected by operating a check box. Here, a keyword displayed on the menu and a synonym group corresponding to the keyword are stored in the recognition menu information 106 in advance, and they are used as a recognition target character string. For example, when “toilet” is selected here, character strings such as “rest room”, “Restroom”, and “toilet” are recognized.

図８に，入力画像の一例を示す。この例では，「Restroom」８０１と「Exit」８０２の二つの文字列が画像中にある。 FIG. 8 shows an example of the input image. In this example, there are two character strings “Restroom” 801 and “Exit” 802 in the image.

図９に，図５の状態遷移図における認識状態５０６から文字列認識結果表示状態５０８に至る処理手順を示す。まず，ステップ９０１にて，認識対象文字列の集合に応じて，文字識別で識別対象とする文字種を限定する。次に，ループ９０２にて，画像入力９０３，文字列認識９０４を繰り返す。もし，画像中から対象の文字列が認識されたなら，ステップ９０５にて繰り返しを終了する。次に，ステップ９０６にて音声もしくは振動で文字列が認識できた旨を操作者に告知する。次にステップ９０７にて，文字列の認識結果を表示する。 FIG. 9 shows a processing procedure from the recognition state 506 to the character string recognition result display state 508 in the state transition diagram of FIG. First, in step 901, character types to be identified by character identification are limited according to a set of recognition target character strings. Next, in a loop 902, image input 903 and character string recognition 904 are repeated. If the target character string is recognized from the image, the repetition is terminated at step 905. Next, in step 906, the operator is notified that the character string has been recognized by voice or vibration. In step 907, the character string recognition result is displayed.

図１０に，認識結果表示状態５０９の表示面４０９の状態を示す。ここでは認識された文字列８１０の位置を四辺形で囲むことにより，位置を明らかにする。操作者は，認識の告知後，移動したり向きを変えたりする前に速やかに認識結果を画面で確認することにより，容易に周囲から認識された文字列を見つけ出すことができる。また，認識結果の文字列，当該文字列の元となるキーワード，認識した時刻をウインドウ１００１上に表示する。これにより，操作者が同意語に明るくない場合(例えば認識対象が外国語の場合)でも，認識された文字列が，指定したキーワードのいずれに対応するかを容易に確認できる。 FIG. 10 shows the state of the display surface 409 in the recognition result display state 509. Here, the position of the recognized character string 810 is surrounded by a quadrilateral to clarify the position. The operator can easily find the character string recognized from the surroundings by quickly confirming the recognition result on the screen before moving or changing the direction after the notification of the recognition. In addition, the character string of the recognition result, the keyword that is the basis of the character string, and the recognized time are displayed on the window 1001. As a result, even when the operator is not cheerful about the synonym (for example, when the recognition target is a foreign language), it can be easily confirmed which of the specified keywords the recognized character string corresponds to.

図１１に，同意語辞書１０４および外部同意語辞書１１３での記憶形式を示す。ここでは，キーワード１１０１と同意語群１１０２を１レコードとするテーブルで，同意語情報を格納する。 FIG. 11 shows a storage format in the synonym dictionary 104 and the external synonym dictionary 113. Here, synonym information is stored in a table in which the keyword 1101 and the synonym group 1102 are one record.

図１２に，認識メニュー情報１０６での記憶形式を示す。ここでは，認識対象に含めるか否かを表す真偽値１２０１とメニューに表示するキーワード１２０２と認識対象の文字列群１２０３を１レコードとするテーブルで，メニュー情報を記憶する。 FIG. 12 shows a storage format in the recognition menu information 106. Here, the menu information is stored in a table in which one record is a true / false value 1201 indicating whether to be included in the recognition target, a keyword 1202 displayed on the menu, and a character string group 1203 to be recognized.

画像入力から画像ファイル出力に至る処理の流れを示すデータフロー図。The data flow figure which shows the flow of the process from an image input to an image file output. 文字列認識処理の流れを示すデータフロー図。The data flow figure which shows the flow of a character string recognition process. ハードウエアの構成図。The block diagram of hardware. 装置の外観図。FIG. 装置の操作の状態遷移を表す図。The figure showing the state transition of operation of an apparatus. 待機状態での表示面の状態を表す図。The figure showing the state of the display surface in a standby state. メニュー選択状態での表示面の状態を表す図。The figure showing the state of the display surface in a menu selection state. 入力画像の模式図。The schematic diagram of an input image. 認識状態から認識結果表示状態に至る処理手順を表す図。The figure showing the process sequence from a recognition state to a recognition result display state. 文字列認識結果表示時の表示面。Display surface when displaying the character recognition result. 同意語辞書の記憶形式を表す図。The figure showing the memory format of a synonym dictionary. 認識メニュー情報の記憶形式を表す図。The figure showing the storage format of recognition menu information.

Explanation of symbols

１０１・・画像入力から認識結果表示に至る処理，１０２・・・キーワード入力，１０３・・同意語展開，１０４・・・同意語辞書，１０５・・・メニュー選択，１０６・・・認識対象文字列要求，１０７・・・認識対象文字列指定，１０８・・・認識対象文字列，１０９・・・画像入力，１１０・・・文字列認識，１１１・・・文字列存在告知，１１２・・・文字列認識結果表示，１１３・・・外部同意語辞書，２０１・・・文字行切出し,２０２・・・文字切出し，２０３・・・文字識別，２０４・・・文字種限定，２０５・・・後処理，２０６・・・文字識別辞書，３０１・・・携帯機器，３０２・・・光学装置，３０３・・・光電変換素子，３０４・・・アナログデジタル変換器，３０５・・・信号処理素子，３０６・・・表示装置，３０７・・・演算装置，３０８・・・ＲＯＭ，３０９・・・ＲＡＭ，３１０・・・入出力装置，３１１・・・メモリーカード，３１２・・・入力装置，３１３・・・音声発生装置，３１４・・・振動装置，３１５・・・通信装置，４０１・・・筐体，４０２・・・レンズ，４０３・・・シャッター，４０４・・・電源スイッチ，４０５・・・キャンセルボタン，４０６・・・カーソルキー，４０７・・・アンテナ，４０９・・・表示面，４１０・・・スピーカ開口部，５０１・・・開始状態，５０２・・・待機状態，５０３・・・キーワード入力状態，５０４・・・メニュー選択状態，５０５・・・認識文字列取得状態，５０６・・・認識状態，５０７・・・文字列存在告知状態，５０８・・・文字列認識結果表示状態，，６０１・・・メニュー，６０２・・・キーワード選択，６０３・・・キーワード入力，６０４・・・キーワード取得，６０５・・・認識，７０１・・・キーワード指定メニュー，７０２・・・チェックボックス，８０１・・・画像中の文字列「Restroom」，８０２・・・画像中の文字列「Exit」，９０１・・文字種の限定を行うステップ，９０２・・・ループ，９０３・・・画像入力ステップ，９０４・・文字列認識ステップ，９０５・・・繰り返し終了ステップ，９０４・・・告知ステップ，９０５・・・認識結果表示ステップ，１００１・・・文字列認識結果，１１０１・・・同意語辞書のキーワードを格納するフィールド，１１０２・・・同意語辞書の同意語群を格納するフィールド，１２０１・・・認識メニュー情報で認識対象か否かをあらわすフラグ，１２０２・・・認識メニュー情報でメニューに表示するキーワードを格納するフィールド，１２０３・・・認識メニュー情報で認識対象文字列群を格納するフィールド。

101 ··· Processing from image input to recognition result display, 102 ··· Keyword input, 103 ··· Synonym expansion, 104 ··· Synonym dictionary, 105 ··· Menu selection, 106 ··· Character string to be recognized Request: 107 ... Character string to be recognized, 108 ... Character string to be recognized, 109 ... Image input, 110 ... Character string recognition, 111 ... Character string presence notification, 112 ... Character Column recognition result display, 113 ... external synonym dictionary, 201 ... character line extraction, 202 ... character extraction, 203 ... character identification, 204 ... character type limitation, 205 ... post-processing, 206 ... Character identification dictionary, 301 ... Portable device, 302 ... Optical device, 303 ... Photoelectric conversion element, 304 ... Analog-to-digital converter, 305 ... Signal processing element, 306 ...・ Display device 307... Arithmetic unit, 308... ROM, 309... RAM, 310... I / O device, 311... Memory card, 312. ... Vibration device, 315 ... Communication device, 401 ... Case, 402 ... Lens, 403 ... Shutter, 404 ... Power switch, 405 ... Cancel button, 406 ... Cursor key, 407 ... antenna, 409 ... display surface, 410 ... speaker opening, 501 ... start state, 502 ... standby state, 503 ... keyword input state, 504 ... Menu selection state, 505 ... recognized character string acquisition state, 506 ... recognition state, 507 ... character string presence notification state, 508 ... character string recognition result display state, 601 ... menu 602 ... Keyword selection, 603 ... Keyword input, 604 ... Keyword acquisition, 605 ... Recognition, 701 ... Keyword specification menu, 702 ... Check box, 801 ... In the image Character string “Restroom”, 802... Character string “Exit” in the image, 901... Step of limiting the character type, 902... Loop, 903. Step, 905 ... Repeat end step, 904 ... Notification step, 905 ... Recognition result display step, 1001 ... Character string recognition result, 1101 ... Field for storing synonym dictionary keywords, 1102・・・ Field for storing synonym group of synonym dictionary, 1201... Field that contains the keywords that appear in the menu in the 202 ... recognition menu information, the field that stores the recognition target character string group in 1203 ... recognition menu information.

Claims

Image capturing means for photoelectrically converting an image into a digital signal, recognition target character string storage means for storing a set of character strings to be recognized, and character strings stored in the recognition target character string storage means are recognized from the image. Means to
A portable device characterized in that a set of character strings to be recognized is automatically generated from a keyword representing a recognition target character string group and stored in a recognition target character string storage means.

2. The portable device according to claim 1, wherein the means for generating a character string to be recognized is to acquire a set of synonyms of keywords.

Image capturing means for photoelectrically converting an image into a digital signal, recognition target character string storage means for storing a set of character strings to be recognized, and character strings stored in the recognition target character string storage means are recognized from the image. Means to
And a recognition menu information storage means for storing a plurality of pairs of a keyword representing the recognition target character string group and a recognition target character string set, and a means for selecting a keyword in the recognition menu information storage means. A portable device that copies a recognition target character string in a recognition menu information storage unit to a recognition target character string storage unit according to a result.

Image capturing means for photoelectrically converting an image into a digital signal, recognition target character string storage means for storing a set of character strings to be recognized, and character strings stored in the recognition target character string storage means are recognized from the image. Means to
Furthermore, a portable device having means for communicating with an external computer, wherein a recognition target character string group is acquired from the outside and stored in a recognition target character string storage means.

Image capturing means for photoelectrically converting an image into a digital signal, recognition target character string storage means for storing a set of character strings to be recognized, and character strings stored in the recognition target character string storage means are recognized from the image. Means to
A portable device that automatically repeats imaging and character string recognition and generates voice or vibration when the character string recognition is successful.

Image capturing means for photoelectrically converting an image into a digital signal, recognition target character string storage means for storing a set of character strings to be recognized, and character strings stored in the recognition target character string storage means are recognized from the image. Means to
A portable device that automatically repeats imaging and character string recognition, and displays a captured image and a recognition result when the character string recognition is successful.