JPH07319909A

JPH07319909A - Device and method for retrieving document, character recognizing device and preparation of dictionary

Info

Publication number: JPH07319909A
Application number: JP6115059A
Authority: JP
Inventors: Eisuke Miyoshi; 英輔三由; Yasuo Tanosaki; 康雄田野崎
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1994-05-27
Filing date: 1994-05-27
Publication date: 1995-12-08

Abstract

PURPOSE:To quickly retrieve an objective document even when a retrieving word whose notation is unknown is inputted. CONSTITUTION:A control part 4 reads out a homonym corresponding to a phonetic character string inputted from an input part 1 from a homonym information storing part 3. Then, the control part 4 retrieves a document including the pertinent homonym out of respective homonyms read out from the storing part 3 from a document storing part 2 and then outputs the retrieval information from an output part 5 in accordance with a prescribed format. Thereby, even when the notation of a retrieving word is unknown, an objective document can be retrieved by inputting a phonetic character string corresponding to the retrieving word.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、データベースに格納さ
れた文書等を検索する文書検索装置に係わり、特にキー
ワードを用いずに文書を検索する文書検索方法並びに、
前記文書等を作成する文書作成装置に搭載される文字認
識装置における手書き入力された文字を認識するパター
ンを登録して辞書を作成する辞書作成方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document retrieval apparatus for retrieving documents stored in a database, and more particularly to a document retrieval method for retrieving documents without using keywords.
The present invention relates to a dictionary creating method for creating a dictionary by registering a pattern for recognizing a character input by handwriting in a character recognizing device installed in a document creating device for creating the document or the like.

【０００２】[0002]

【従来の技術】従来この種の文書検索装置では、データ
ベース作成時に予め文書に対してキーワードを付加して
おき、検索時このキーワードに基づいて該当の文書を検
索する方法が用いられていた。この方法では、検索速度
は比較的高速であるが、データベースに格納する文書に
キーワードを付加する作業に負荷がかかってしまうとい
う不具合があった。しかも、複数の文書に適当なキーワ
ードを付加する作業は必ずしも容易ではなく、適切なキ
ーワードが付加されていない場合には検索時に検索者の
目的とする文書が得られない場合もあった。2. Description of the Related Art Conventionally, in this type of document retrieval apparatus, a method has been used in which a keyword is added to a document in advance when a database is created, and the relevant document is retrieved based on this keyword at the time of retrieval. With this method, the search speed is relatively high, but there is a problem in that the work of adding keywords to the document stored in the database is burdensome. Moreover, it is not always easy to add an appropriate keyword to a plurality of documents, and if the appropriate keyword is not added, the document targeted by the searcher may not be obtained during the search.

【０００３】これに対して近年、コンピュータの処理速
度の高速化・大容量化に伴い、所望の文書を検索する際
に全文検索による検索方法が実用化されつつある。この
方法では、前記文書中の全ての文字列が検索可能となる
ため、前記文書をデータベースに格納する時にキーワー
ドを付加する必要がなく、前記文書の検索時に検索者の
指定した語句を含む前記データベース内にある全ての文
書を得ることができる。更に、同義語辞書や類義語辞書
を利用することにより、前記検索者が指定した１つの検
索語に対して複数の語の検索を行う方法も利用されるよ
うになってきている。On the other hand, in recent years, with the increase in processing speed and capacity of computers, a search method by full-text search is being put to practical use when searching for a desired document. In this method, since all the character strings in the document can be searched, it is not necessary to add a keyword when the document is stored in the database, and the database including the phrase specified by the searcher at the time of searching the document. You can get all the documents in it. Furthermore, a method of searching a plurality of words for one search word designated by the searcher has also been used by using a synonym dictionary or a synonym dictionary.

【０００４】しかし、このような全文検索による文書の
検索方法では、表記（字）の分かっている語の検索しか
できないめ、発音（読み）が分かっていても表記（字）
の分からない語の検索を行なうことができなかった。特
に、日本語における固有名詞等のように、読み方だけが
分かっていて当てる漢字が分からなかった場合、必ずし
も目的の文書を検索することができるとは限らないとい
う欠点があった。However, in the document search method by such a full-text search, only a word of which the notation (character) is known can be retrieved, and therefore the notation (character) is known even if the pronunciation (reading) is known.
I couldn't search for a word I didn't understand. In particular, there is a drawback in that it is not always possible to retrieve the target document when the kanji to be applied is not known and only the reading is known, such as proper nouns in Japanese.

【０００５】ところで、上記した文書検索装置を搭載す
る文書作成装置には、表示装置と座標入力装置が積層一
体化された入力表示装置を有し、この入力表示装置には
紙に文字や図形を書き込む感覚で情報が入力できるた
め、最近、各方面で使用されるようになってきた。これ
に伴い、従来キーボードを入力装置として文字等を入力
し、文章を作成していた装置も、キーボードを用いずに
液晶タブレットの上にスタイラスペンを用いて座標点列
の情報という形で入力し、その情報を文字として認識し
て、装置内に入力するようになってきた。これに伴っ
て、この種の文書作成装置には、前記入力表示装置を含
んで構成される文字認識装置が搭載されている。By the way, a document preparation apparatus equipped with the above-mentioned document retrieval apparatus has an input display device in which a display device and a coordinate input device are laminated and integrated, and the input display device has characters and figures on paper. Since information can be entered as if writing it, it has recently come to be used in various fields. Along with this, devices that used to use the keyboard as an input device to input characters and create sentences are also input in the form of coordinate point sequence information using a stylus pen on the LCD tablet without using the keyboard. The information has been recognized as characters and input into the device. Along with this, a document recognition device of this type is equipped with a character recognition device including the input display device.

【０００６】ここで、前記文字認識装置における文字を
認識する方法は入力者が入力した座標情報と予め登録さ
れている認識辞書内の複数のパターンとをマッチングし
て、一番入力した座標情報に近い登録文字を認識候補と
している方式を始めとして、多数の認識方式が提案され
ている。しかしどの認識方式においても未だ十分な文字
認識率が得られておらず、文字認識率を向上させるため
に、予め与えられた文字認識辞書に加えて個人ごとの文
字認識辞書を用意して、その個人用の文字認識辞書に認
識しずらい文字を登録することで認識率の向上を実現し
ようとしてる。しかし、入力者が新たな文字パターンを
登録する場合に、基本となる認識辞書や、過去に登録し
た個人用の文字認識辞書の中に類似したパターンが存在
すると、かえって類似文字の識別が困難になってしまう
こともあった。Here, in the method of recognizing a character in the character recognition device, the coordinate information input by the input person is matched with a plurality of patterns in a recognition dictionary registered in advance, and the most input coordinate information is obtained. A large number of recognition methods have been proposed, including a method in which registered characters are used as recognition candidates. However, no sufficient character recognition rate has been obtained in any of the recognition methods, and in order to improve the character recognition rate, a character recognition dictionary for each individual is prepared in addition to the character recognition dictionary given in advance. We are trying to improve the recognition rate by registering difficult-to-recognize characters in a personal character recognition dictionary. However, when the input person registers a new character pattern, if similar patterns exist in the basic recognition dictionary or the personal character recognition dictionary registered in the past, it is rather difficult to identify similar characters. Sometimes it became.

【０００７】[0007]

【発明が解決しようとする課題】上記のように従来の文
書検索装置では、全文検索に、更に同義語辞書や類義語
辞書を利用した検索方法を加えることにより、検索者が
指定した、１つの検索語の入力に対して、前記検索語の
同義語や類義語を含む文書についても前記データベース
から検索ができるようになったが、表記（字）の分から
ない検索語に関しては、必ずしも目的の文書を迅速に検
索ができるとは限らないという欠点があった。As described above, in the conventional document search apparatus, a search method using a synonym dictionary or a synonym dictionary is added to the full-text search so that one search specified by the searcher is performed. Documents that include synonyms or synonyms of the search words can be searched from the database in response to the entered words, but for the search words whose notation (character) is not known, the target document is not always prompt. There was a drawback that it was not always possible to search.

【０００８】又、従来の入力表示装置を有する文字認識
装置では、予め与えられた文字認識辞書に加えて個人ご
との文字認識辞書を用意し、この辞書に認識しずらい文
字を登録することで認識率の向上を実現することが考え
られているが、利用者が新たな文字パターンを登録する
場合に、基本となる認識辞書や過去に登録した個人用の
文字認識辞書の中に類似したパターンが存在すると、か
えって類似文字の識別が困難になってしまうという欠点
があった。Further, in a conventional character recognition device having an input display device, a character recognition dictionary for each individual is prepared in addition to a character recognition dictionary given in advance, and characters difficult to recognize are registered in this dictionary. It is considered to improve the recognition rate, but when the user registers a new character pattern, a pattern similar to the basic recognition dictionary or a personal character recognition dictionary registered in the past is used. However, the existence of the character makes it difficult to identify similar characters.

【０００９】そこで本発明は上記の事情を鑑みてなされ
たもので、表記の分からない検索語が入力された場合に
も目的の文書を迅速に検索することができる文書検索装
置と文書検索方法並びに、入力者が新たに個人用の文字
認識辞書に文字パターンを登録することによって、手書
き文字の認識率を向上させることができる文字認識装置
及び辞書作成方法を提供することを目的としている。Therefore, the present invention has been made in view of the above circumstances, and a document search apparatus and a document search method capable of quickly searching for a target document even when a search word whose notation is unknown is input. An object of the present invention is to provide a character recognition device and a dictionary creation method capable of improving the recognition rate of handwritten characters by the input person newly registering a character pattern in a personal character recognition dictionary.

【００１０】[0010]

【課題を解決するための手段】請求項１の発明はデータ
ベースに格納されている文書を検索語に基づいて検索す
る文書検索装置であって、表音文字列とこれに対応する
同音異字語を格納した記憶手段と、入力された表音文字
列に対応する同音異字語を前記記憶手段から求める同音
異字語取得手段と、この同音異字語取得手段によって取
得された各同音異字語を前記検索語として前記データベ
ースから文書を検索する検索手段とを具備した構成を有
する。According to a first aspect of the present invention, there is provided a document retrieval device for retrieving a document stored in a database on the basis of a retrieval word, wherein a phonetic character string and a homophonic word corresponding to the phonetic character string are provided. The stored storage means, the homophone variant for obtaining homophones corresponding to the input phonetic character string from the storage means, and each homophone variant acquired by the homophone variant is the search word. And a search means for searching a document from the database.

【００１１】請求項４の発明は座標入力手段から入力さ
れた座標点列のパターンを文字認識辞書内のパターンと
照合することにより、手書き文字を文字認識して入力す
る文字認識装置において、前記座標入力手段から入力さ
れた座標点列のパターンとこのパターンに対応する文字
コードとを前記文字認識辞書に登録する登録手段と、こ
の登録手段により登録されたパターンと類似する既存の
パターンを前記文字認識辞書から検索する検索手段と、
この検索手段によって検索されたパターンの前記文字認
識辞書内の候補優先順位情報を変更する修正手段とを具
備した構成を有する。According to a fourth aspect of the present invention, in the character recognition device for character-recognizing and inputting handwritten characters by collating the pattern of the coordinate point sequence input from the coordinate input means with the pattern in the character recognition dictionary, the coordinates are Registration means for registering the pattern of the coordinate point sequence input from the input means and the character code corresponding to this pattern in the character recognition dictionary, and the existing pattern similar to the pattern registered by this registration means for the character recognition. Search method to search from the dictionary,
And a modifying unit that modifies the candidate priority information in the character recognition dictionary for the pattern searched by the searching unit.

【００１２】[0012]

【作用】請求項１の発明の文書検索装置において、記憶
手段は表音文字列とこれに対応する同音異字語を格納す
る。同音異字語取得手段は入力された表音文字列に対応
する同音異字語を前記記憶手段から求める。検索手段は
前記同音異字語取得手段によって取得された各同音異字
語を前記検索語として前記データベースから文書を検索
する。これにより、In the document retrieval apparatus according to the first aspect of the invention, the storage means stores the phonetic character string and the corresponding homophones and acronyms. The homophone variant has a homophone variant corresponding to the input phonetic character string from the storage means. The search means searches the document from the database by using the homophones that have been acquired by the homophone variant as the search words. This allows

【００１３】請求項４の発明の文字認識装置において、
登録手段は前記座標入力手段から入力された座標点列の
パターンとこのパターンに対応する文字コードとを前記
文字認識辞書に登録する。修正手段は前記登録手段によ
り登録されたパターンと類似する既存のパターンを前記
文字認識辞書から検索する検索手段と、この検索手段に
よって検索されたパターンの前記文字認識辞書内の候補
優先順位情報を変更する。In the character recognition device of the invention of claim 4,
The registration unit registers the pattern of the coordinate point sequence input from the coordinate input unit and the character code corresponding to this pattern in the character recognition dictionary. The modifying means modifies the searching means for searching the character recognition dictionary for an existing pattern similar to the pattern registered by the registering means, and the candidate priority information in the character recognition dictionary for the pattern searched by the searching means. To do.

【００１４】[0014]

【実施例】以下、本発明の一実施例を図面を参照して説
明する。図１は本発明の文書検索装置の一実施例を示し
たブロック図である。１０１は検索すべき文字列を入力
する入力部、１０２は検索される文書を記憶した文書記
憶部、１０３は表音文字列をこの文字列に対応する同音
異字語を組にして記憶した同音異字語情報記憶部、１０
４は入力部１０１から入力された文字列を同音異字語情
報記憶部１０３の情報を用いて入力文字列の同音異字語
群を得た後、文書記憶部１０２中の文書から前記同音異
字語群を含む文書を検索し、１０５は検索結果を出力す
るＬＣＤ又はＣＲＴ等の出力部である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of a document search device of the present invention. Reference numeral 101 is an input unit for inputting a character string to be searched, 102 is a document storage unit that stores a document to be searched, and 103 is a homophone character that stores a phonetic character string as a set of homophones corresponding to this character string. Word information storage unit, 10
Reference numeral 4 denotes a homophone group of a character string input from the input unit 101 after obtaining a homophone acronym group of the input character string using the information of the homophone acronym information storage unit 103. Reference numeral 105 is an output unit such as an LCD or a CRT that searches for a document including the search result.

【００１５】次に本実施例の動作について説明する。入
力部１０１から入力された検索文字列（表音文字列）は
制御部１０４に送られ、制御部１０４は同音異字語情報
記憶部１０３の情報を利用して入力検索文字列の同音異
字語群を得る。ここで、同音異字語情報記憶部１０３は
図２に示すような構造を有しており、表音文字列とそれ
に対応する同音異字語群を記憶している。例えば検索文
字列が「さかい」に対して同音異字語群として、「坂
井」「阪井」「酒井」「堺」が格納されており、最後に
検索文字列自身の「さかい」が格納されている。更に、
制御部１０４は得られた同音異字語群のそれぞれについ
て、文書記憶部３に記憶されている文書の中から、同じ
文字列が含まれているものを検索する。Next, the operation of this embodiment will be described. The search character string (phonetic character string) input from the input unit 101 is sent to the control unit 104, and the control unit 104 uses the information in the homophone acronym information storage unit 103 to generate a homophone acronym group of the input search character string. To get Here, the homophone acronym information storage unit 103 has a structure as shown in FIG. 2, and stores a phonetic character string and a homophone acronym group corresponding thereto. For example, "Sakai", "Sakai", "Sakai", and "Sakai" are stored as homonyms for the search character string "Sakai", and finally the search character string "Sakai" is stored. . Furthermore,
For each of the obtained homophone variants, the control unit 104 searches the documents stored in the document storage unit 3 for a document containing the same character string.

【００１６】図３は上記した制御部１０４の検索処理の
流れを示したフローチャートである。まず、制御部１０
４はステップ３０１にて入力部１０１より検索表音文字
列を得た後、ステップ３０２にて、この検索表音文字列
と図２に示した同音異字語情報記憶部１０３内の同音異
字語情報テーブル内の表音項目中の先頭の文字列とを比
較する。制御部１０４はステップ３０３にてその表音項
目中の文字列と検索表音文字列とが一致するかどうかを
調べ、一致したならばステップ３０４にて前記同音異字
語情報テーブル内から対応する同音異字語群を得、一致
しなければステップ３０５にて前記同音異字語情報テー
ブルの全ての表音項目を参照したかを確認する。前記同
音異字語情報テーブルの全ての表音項目について検索表
音文字列と一致するものがなければ、制御部１０４はス
テップ３０５にて同音異字語なしという情報を出力部１
０５に出力して処理を終了する。FIG. 3 is a flow chart showing the flow of the search processing of the control unit 104 described above. First, the control unit 10
In step 301, after obtaining the search phonetic character string from the input unit 101 in step 301, in step 302, the search phonetic character string and the homophone different word information in the homophone different word information storage unit 103 shown in FIG. Compare with the first character string in the phonetic items in the table. The control unit 104 checks in step 303 whether or not the character string in the phonetic item and the search phonetic character string match, and if they match, in step 304, the corresponding homophone from the homophone / synonym information table. An allophone word group is obtained, and if they do not match, it is confirmed at step 305 whether all phonetic items in the homonym word information table have been referenced. If there is no match with the search phonetic character string for all phonetic items in the homophone synonym information table, the control unit 104 outputs information that there is no homophone variant in step 305.
Output to 05 and end the processing.

【００１７】制御部１０４はステップ３０５にて未だ参
照していない表音文字列があることが分かると、ステッ
プ３０７にて次の表音項目と検索表音文字列とを比較
し、ステップ３０３に戻り、その表音項目と検索表音文
字列が一致するかを調べ、上記処理を同音異字語群が得
られるまで繰り返す。上記したステップ３０４にて同音
異字語群が得られたならば、制御部１０４はステップ３
０８で同音異字語群の第１候補と文書記憶部１０２内に
格納されている複数の文書の先頭の文書内の文字列とを
比較し、ステップ３０９にて文書内の文字列中に前記候
補と同じ文字列があるか否かを判定し、ある場合はステ
ップ３１０にて該当の文書について図４に示すような構
造の異字語と、それを含む文書名を組にした検索結果情
報を作成しなければステップ３１１の処理へ飛ぶ。When it is determined in step 305 that there is a phonetic character string that has not been referred to yet, the control unit 104 compares the next phonetic item and the retrieved phonetic character string in step 307, and then proceeds to step 303. Returning, it is checked whether the phonetic item and the search phonetic character string match, and the above process is repeated until a homophone group is obtained. If the homonym group is obtained in step 304, the control unit 104 proceeds to step 3
In step 08, the first candidate of the homonym group is compared with the character strings in the first document of the plurality of documents stored in the document storage unit 102, and in step 309, the candidates are included in the character strings in the document. It is determined whether there is the same character string as the above, and if there is, the search result information is created in step 310 by combining the acronym with the structure shown in FIG. 4 and the document name including the same for the relevant document. If not, the process jumps to step 311.

【００１８】次にステップ３１１にて前記文書記憶部１
０２内の全ての文書を検索したか否かを調べ、未だ検索
していない文書があればステップ３１２にて前記候補と
文書記憶部１０２内の次の文書の文字列とを比較し、ス
テップ３０９に戻って同じ文字列があるかどうかを調べ
る。上記処理は文書記憶部１０２内の全ての文書につい
て繰り返す。制御部１０４はステップ３１１で全ての文
書が検索されたことを確認できれば、ステップ３１３に
進み、ここで検索していない同音異字語候補が存在する
かどうかを調べる。その結果、存在すれば制御部１０４
はステップ３１４にて次の同音異字語候補と文書記憶部
１０２内の複数の文書の先頭の文書内の文字列とを比較
し、再びステップ３０９に戻って、同じ文字列があるか
どうかを調べ、上記処理を全ての候補について繰り返
す。制御部１０４はステップ３１３にて全ての候補につ
いて検索したことを確認したならば、ステップ３１５に
進んで、図４のような検索結果から図５のような同音異
字語と同音語の種類とそれぞれに対応する文書名を、利
用者に明示できる形の出力情報に作成し、これをステッ
プ３１６にて、出力部１０５に送り出して処理を終了す
る。Next, at step 311, the document storage unit 1
It is checked whether all the documents in 02 have been searched, and if there is a document that has not been searched yet, the candidate is compared with the character string of the next document in the document storage unit 102 in step 312, and step 309 Go back to and see if you have the same string. The above process is repeated for all the documents in the document storage unit 102. If it is confirmed in step 311 that all the documents have been searched, the control unit 104 proceeds to step 313 and checks whether there are homophone variants that have not been searched here. As a result, if present, the control unit 104
Compares the next homophone candidate with the character string in the first document of the plurality of documents in the document storage unit 102 in step 314, and returns to step 309 again to check whether the same character string exists. , The above process is repeated for all candidates. If the control unit 104 confirms that all candidates have been searched in step 313, the control unit 104 proceeds to step 315 and determines from the search result as shown in FIG. 4 the homophones and the homophone types as shown in FIG. A document name corresponding to is created in the output information in a form that can be clearly shown to the user, and this is sent to the output unit 105 in step 316, and the processing ends.

【００１９】上記した出力部１０５は制御部１０４から
送られてくる出力情報を表示するが、以下、具体例を用
いてその動作を説明する。入力部１０１から例えば「さ
かい」という文字列が入力された場合、その文字列が制
御部１０４に送られる。制御部１０４は、同音異字語情
報記憶部１０３に図２に示すような同音異字語テーブル
が記憶されている場合、表音項目から「さかい」とかく
文字列を探し、対応する同音異字語群として「坂井」
「阪井」「酒井」「堺」を得る。この同音異字語群のそ
れぞれの語について、その語を含む文書を文書記憶部１
０２から検索する。この検索により図４に示すように、
制御部１０４は「坂井」を含む文書として「文書１」
「文書２」「文書３」「文書４」を、「阪井」を含む文
書として「文書５」を、「酒井」を含む文書として「文
書３」「文書６」「文書７」を、「堺」を含む文書とし
て「文書７」「文書８」を得る。又、入力文字列そのも
のである「さかい」からは「文書５」「文書９」という
情報を得る。次に制御部１０４は図４に示すような同音
異字語とそれを含む文書名を組みにした情報を、利用者
にどのような異字語が存在して、異字語毎にそれを含む
文書名が何であるかを明示できるように、図５に示すよ
うな形態に直して、これを出力部１０５に表示する。The output unit 105 described above displays the output information sent from the control unit 104, and its operation will be described below using a specific example. For example, when the character string “Sakai” is input from the input unit 101, the character string is sent to the control unit 104. When the homophone variation word information storage unit 103 stores a homophone variation word table as shown in FIG. 2, the control unit 104 searches for a character string such as “Sakai” from the phonetic item and sets it as a corresponding homography variation group. "Sakai"
Get "Sakai", "Sakai" and "Sakai". For each word of this homophone group, a document containing the word is stored in the document storage unit 1.
Search from 02. With this search, as shown in FIG.
The control unit 104 selects "Document 1" as a document including "Sakai".
"Document 2""Document3""Document4","Document5" as a document containing "Sakai", "Document 3""Document6""Document7" as a document containing "Sakai", "Sakai""Document7" and "Document 8" are obtained as documents including ". In addition, the information "document 5" and "document 9" are obtained from the input character string "Sakai". Next, the control unit 104 uses the information, which is a combination of the homophones and the document names including the same as shown in FIG. 4, as to what the different acronyms exist in the user and the document name including the different acronyms. In order to clearly show what is, the form is modified as shown in FIG. 5 and displayed on the output unit 105.

【００２０】又、英語の同音異字語検索においても、上
記した日本語の同音異字語検索と同様の処理により、同
音異字語の検索を行うことができる。図６に示すような
情報が同音異字語情報記憶部１０３に記憶されている場
合で、ｒａｉｔという表音文字列が入力部１０１から入
力されると、制御部１０４は表音項目から［ｒａｉｔ］
という文字列を探し、対応する同音異字語群として「ｒ
ｉｔｅ」「ｒｉｇｈｔ」「ｗｒｉｔｅ」「ｗｒｉｇｈ
ｔ」を得る。制御部１０４は前記同音異字語群の語それ
ぞれについて、該当語を含む文書を文書記憶部１０２か
ら検索する。制御部１０４はこうして得られた同音異字
語とそれを含む文書名を組にした情報を出力部１０５よ
り出力する。In the English homophone search, the homophone search can be performed by the same process as the Japanese homophone search described above. When the information as shown in FIG. 6 is stored in the homonym word information storage unit 103, and when a phonetic character string called “rait” is input from the input unit 101, the control unit 104 selects [rait] from phonetic items.
"R" as the corresponding homophone group
ite ”“ right ”“ write ”“ write ”
t ”is obtained. The control unit 104 searches the document storage unit 102 for a document including the corresponding word for each word of the homonym word group. The control unit 104 outputs, from the output unit 105, information in which the homophones thus obtained and the document name including the same are combined.

【００２１】本実施例によれば、入力部１０１から入力
された入力文字列については勿論、この文字列の同音異
字語を含む文字列を検索語として、文書記憶部１０２を
検索することにより、文書中に前記検索語を含む文書を
全て検索することができる。このため、表記（字）の分
からない検索語に関しても同音異字語を含む文字列にて
検索できる可能性があるため、検索時に、検索者が指定
した文字列にて、検索者が目的とする文書を迅速に検索
できる。特に、表記が分からなくても、その読みさえ分
かっておれば、この読みを示す表音文字列を検索語とし
て用いることによって、目的の文書を検索することがで
きる。従って、文書を文書記憶部１０２に記憶する際に
キーワードを用いる必要がなく、簡単に文書を文書記憶
部１０２に記憶することができる。According to the present embodiment, the character string including the homophones of this character string is used as the search word for the input character string input from the input unit 101, and the document storage unit 102 is searched. It is possible to search all documents that include the search word in the document. For this reason, it is possible that a search word whose notation (character) is not known can be searched with a character string that includes homophones. Therefore, at the time of search, the searcher uses the character string specified by the searcher for the purpose. Documents can be searched quickly. In particular, even if the notation is not known, if only the reading is known, the target document can be searched by using the phonetic character string indicating the reading as a search word. Therefore, it is not necessary to use the keyword when storing the document in the document storage unit 102, and the document can be easily stored in the document storage unit 102.

【００２２】尚、本発明は上述した実施例に限定される
ものではない。実施例では日本語及び英語を使用してい
るが、同音異字語を持つ他の原語にも応用できる。又、
検索時に入力文字列も同時に検索しているが、入力文字
列と、この入力文字列の同音異字語を別々に検索しても
同様の効果がある。この場合、図２、図５の同音語群か
ら入力文字列が省かれることになる。更に、検索語の入
力は音声入力により行うこともできる。ここで、文書中
の文字列から検索語を捜し出す方法については、文字列
の称号を用いる方法や、文書入力時に予め検索表を作っ
ておく方法等、様々な方法があるが、ここではその方法
を特に問わない。The present invention is not limited to the above embodiment. Although Japanese and English are used in the embodiment, the present invention can be applied to other original languages having homophones. or,
Although the input character string is also searched at the same time when searching, the same effect can be obtained by separately searching the input character string and the homophones of this input character string. In this case, the input character string is omitted from the homophone group of FIGS. Furthermore, the search term can be input by voice input. Here, there are various methods for finding a search word from a character string in a document, such as a method of using the title of the character string and a method of creating a search table in advance when inputting a document. Does not particularly matter.

【００２３】図７は本発明の文字認識装置の一実施例を
示したブロック図である。本例の文字認識装置は透明タ
ブレット１と、この透明タブレット１上の座標を指示す
るスタイラスペン２とからなる座標入力装置と、この座
標入力装置から得られた２次元の座標点列の情報に基づ
き辞書登録を行なうマイクロプロセッサを主体とした制
御装置３と、前記スタイラスペン２から入力した登録パ
ターンの筆跡データを表示するための例えば液晶ディス
プレイ等の表示装置４と、制御装置３が辞書登録を行な
う際にアクセスする外部記憶装置５で構成されている。FIG. 7 is a block diagram showing an embodiment of the character recognition device of the present invention. The character recognition device of this example uses a transparent tablet 1 and a coordinate input device including a stylus pen 2 for indicating coordinates on the transparent tablet 1, and two-dimensional coordinate point sequence information obtained from the coordinate input device. A control device 3 mainly composed of a microprocessor for performing dictionary registration based on the display device, a display device 4 such as a liquid crystal display for displaying handwriting data of a registration pattern input from the stylus pen 2, and the control device 3 perform dictionary registration. It is composed of an external storage device 5 that is accessed when performing.

【００２４】但し、表示装置４としては、液晶ディスプ
レイの他にプラズマディスプレイ等も用いることができ
る。前記表示装置４としての液晶ディスプレイは透明タ
ブレット１と積層一体化されている。つまり、この液晶
ディスプレイと積層一体化された透明タブレット１とは
同一寸法の同一座標面を形成するものであり、液晶ディ
スプレイに表示された情報は透明タブレット１を介して
視認できるようになっている。このように積層一体化さ
れた透明タブレット１と表示装置４とにより、透明タブ
レット１上での座標指示位置が表示装置４上での同一位
置での情報として表示され、例えば紙上に文字・図形を
描く感覚で情報入力を行なうことができるようになって
いる。However, as the display device 4, a plasma display or the like can be used in addition to the liquid crystal display. The liquid crystal display as the display device 4 is laminated and integrated with the transparent tablet 1. That is, the transparent tablet 1 laminated and integrated with this liquid crystal display forms the same coordinate plane with the same size, and the information displayed on the liquid crystal display can be visually confirmed through the transparent tablet 1. . By the transparent tablet 1 and the display device 4 which are laminated and integrated as described above, the coordinate pointing position on the transparent tablet 1 is displayed as information at the same position on the display device 4, and, for example, a character / graphic is displayed on paper. You can input information as if you were drawing.

【００２５】図８は図７に示した文書作成装置の内部構
成例を示した図である。制御装置３は初期設定部３１、
入力部３２、個人辞書作成部３３、文字認識部３４、認
識辞書操作部３５、表示制御部３６、記憶部３７からな
つている。記憶部３７は前記透明タブレット１から入力
された座標点列の情報を一時格納しておく入力データバ
ッファ３７１、辞書登録する文字パターンの文字コード
を一時格納しておく文字コードバッファ３７２、個人用
文字認識辞書を作成するための一時的なテンポラリ領域
である辞書作成用バッファ３７３、前記表示装置（液晶
ディスプレイ）４に表示するデータを一旦格納しておく
表示データバッファ３７４、システムの初期化の際に表
示する初期画面のイメージを格納している画面イメージ
データ３７５、前記表示装置４を介して表示される入力
画面の入力枠及び各種ボタンの座標位置の情報を格納す
る画面領域テーブル３７６で構成されている。FIG. 8 is a diagram showing an example of the internal configuration of the document creating apparatus shown in FIG. The control device 3 includes an initial setting unit 31,
The input unit 32, the personal dictionary creation unit 33, the character recognition unit 34, the recognition dictionary operation unit 35, the display control unit 36, and the storage unit 37. The storage unit 37 is an input data buffer 371 for temporarily storing information on the coordinate point sequence input from the transparent tablet 1, a character code buffer 372 for temporarily storing character codes of character patterns to be registered in the dictionary, and personal characters. A dictionary creation buffer 373, which is a temporary temporary area for creating a recognition dictionary, a display data buffer 374 for temporarily storing data to be displayed on the display device (liquid crystal display) 4, and at the time of system initialization. It is composed of screen image data 375 which stores an image of an initial screen to be displayed, a screen area table 376 which stores information on an input frame of the input screen displayed via the display device 4 and coordinate positions of various buttons. There is.

【００２６】次に本実施例の動作について説明する。初
期設定部３１は表示装置４への初期画面の表示、各種バ
ッファのクリア等の初期化処理を行なう。又、入力部３
２は前記座標入力装置１から入力された座標情報に関す
る処理を行なう。更に個人辞書作成部３３は前記入力部
３２にて入力された座標情報から個人用文字認識辞書５
２を作成する処理を行なう。文字認識部３４は前記入力
データバッファ３７１に格納されている辞書登録パター
ンの座標情報を外部記憶装置５の個人用文字認識辞書５
１及びに個人用文字認識辞書５２内のデータを参照指定
文字として認識する処理を行なうものである。認識辞書
操作部３５は前記文字認識部３４で出力された文字候補
の評価値から個人用文字認識辞書５２内の辞書登録パタ
ーンと類似した辞書パターンが標準文字認識辞書５１内
にあるか否かを調べ、類似した辞書パターンが存在すれ
ば、その辞書パターンがその後の認識処理で認識候補と
して選択されるので、その候補順位を下げるという情報
を付加する処理を行なう。更に、表示制御部３６は記憶
部３７に格納されたデータを表示装置４に出力したり、
消去したりする処理に関する制御を行なうものである。Next, the operation of this embodiment will be described. The initial setting unit 31 performs initialization processing such as displaying an initial screen on the display device 4 and clearing various buffers. Also, the input unit 3
2 carries out a process relating to the coordinate information inputted from the coordinate input device 1. Furthermore, the personal dictionary creation unit 33 uses the personal character recognition dictionary 5 based on the coordinate information input by the input unit 32.
2 is performed. The character recognition unit 34 converts the coordinate information of the dictionary registration pattern stored in the input data buffer 371 into the personal character recognition dictionary 5 of the external storage device 5.
The data in the personal character recognition dictionary 52 are recognized as reference designated characters. The recognition dictionary operation unit 35 determines whether a dictionary pattern similar to the dictionary registration pattern in the personal character recognition dictionary 52 exists in the standard character recognition dictionary 51 based on the evaluation value of the character candidate output in the character recognition unit 34. If a similar dictionary pattern is found, the dictionary pattern is selected as a recognition candidate in the subsequent recognition processing, and therefore processing for adding information that lowers the rank of the candidate is performed. Further, the display control unit 36 outputs the data stored in the storage unit 37 to the display device 4,
It controls the erasing process.

【００２７】制御装置３の表示制御部３６は表示装置４
に図９に示すような入力画面４０を表示する。利用者は
この入力画面４０の中の入力枠４１に辞書登録したい文
字パターンを前記スタイラスペン２で入力し、更にその
登録する文字の文字コードを文字コード入力エリア４２
に入力する。その後、スタイラスペン２で登録アイコン
４４をタッチすると、辞書登録作業と認識辞書修正作業
が開始される。入力枠４１に登録文字を入力した後、そ
のデータを再度入力したいときには、取り消しアイコン
４３をタッチすれば、入力枠４１内のデータと、文字コ
ード入力エリア４２内のデータが初期化される。The display control unit 36 of the control device 3 is the display device 4
The input screen 40 as shown in FIG. 9 is displayed. The user uses the stylus pen 2 to input a character pattern to be registered in the dictionary in the input frame 41 of the input screen 40, and the character code of the character to be registered is further input in the character code input area 42.
To enter. Then, when the registration icon 44 is touched with the stylus pen 2, dictionary registration work and recognition dictionary correction work are started. After inputting the registered character in the input frame 41 and when wanting to input the data again, touch the cancel icon 43 to initialize the data in the input frame 41 and the data in the character code input area 42.

【００２８】外部記憶装置５は上記した如く、標準的な
文字パターンが登録されている標準文字認識辞書５１
と、入力者個人の文字パターンを登録できる個人用文字
認識辞書５２から構成されている。標準文字認識辞書５
１と個人用文字認識辞書５２は文字認識部３４で文字認
識する際に参照する文字情報が格納されている。文字情
報は図１０に示す通り、辞書番号、文字認識処理で認識
後方の順位をいくつ下げるかの情報を格納する変動順位
データ、文字、文字コード、画数、１つの画の始点と終
点とを第１画の始点を基準とする相対座標で表した情報
等が格納されている。As described above, the external storage device 5 has a standard character recognition dictionary 51 in which standard character patterns are registered.
And a personal character recognition dictionary 52 capable of registering the character pattern of the individual input person. Standard character recognition dictionary 5
1 and the personal character recognition dictionary 52 store character information referred to when the character recognition unit 34 recognizes a character. As shown in FIG. 10, the character information includes a dictionary number, variable rank data that stores information about how much the rank behind recognition is lowered in the character recognition process, a character, a character code, the number of strokes, and the start and end points of one image. Information and the like expressed in relative coordinates with the start point of one stroke as a reference are stored.

【００２９】図１１は図７又図８に示した装置で入力者
個人の文字パターンを個人用文字認識辞書５２に登録す
る処理の流れを示したフローチャートである。制御装置
３の初期設定部３１はステップ５０１にて、処理の始め
に記憶部３７内の各種バッファなどをクリアし、画面イ
メージデータ３７５のイメージデータを基にして図９に
示すような初期画面４０を表示制御部３６により表示装
置４に表示する。次にステップ５０２にて入力者がスタ
イラスペン２を用いて図９に示した入力画面４０の入力
枠４１に登録パターンを、もしくは文字コード入力エリ
ア４２に登録する文字コードを入力する。するとステッ
プ５０３にて入力部３２は画面領域テーブル３７６の値
を参照して入力しているか否かの判断を行う。ここで、
画面領域テーブル３７６は例えば図１２に示すように入
力枠・各種アイコンの領域を表す領域番号と、その領域
の左上と右下の座標を格納している。入力枠４１に登録
文字パターンを入力しているのであれば、ステップ５０
４にて入力部３２は表示制御部３６を介して筆跡を表示
装置４に表示して、ステップ５０５にて入力データバッ
ファ３７１に座標情報を格納した後、ステップ５０２に
戻る。ここで、入力データバッファ３７１には図１３に
示すようにデータのｘ座標、ｙ座標が格納されている。
更に、入力データバッファ３７１には、後ほどの文字認
識処理で利用するために、１画分の区切りを示すセパレ
ータも併せて格納する。FIG. 11 is a flow chart showing the flow of processing for registering the character pattern of the individual input person in the personal character recognition dictionary 52 by the apparatus shown in FIG. 7 or 8. In step 501, the initial setting unit 31 of the control device 3 clears various buffers and the like in the storage unit 37 at the beginning of the process, and based on the image data of the screen image data 375, the initial screen 40 as shown in FIG. Is displayed on the display device 4 by the display control unit 36. Next, in step 502, the input person uses the stylus pen 2 to input a registration pattern in the input frame 41 of the input screen 40 shown in FIG. 9 or a character code to be registered in the character code input area 42. Then, in step 503, the input unit 32 refers to the values in the screen area table 376 and determines whether or not the values are input. here,
The screen area table 376 stores, for example, as shown in FIG. 12, an area number indicating the area of the input frame / various icons and the upper left and lower right coordinates of the area. If the registered character pattern is input in the input frame 41, step 50
In 4, the input unit 32 displays the handwriting on the display device 4 via the display control unit 36, stores the coordinate information in the input data buffer 371 in Step 505, and then returns to Step 502. Here, as shown in FIG. 13, the input data buffer 371 stores the x coordinate and the y coordinate of the data.
Further, the input data buffer 371 also stores a separator indicating a division of one stroke for use in the later character recognition processing.

【００３０】ここで、１画分の区切りの判断はスタイラ
スペン２が透明タブレット１から離れた時点とする。
又、ステップ５０３で入力者が登録パターンの文字を入
力枠４１の中に入力していないと判断された場合に、ス
テップ５０６にて入力部３２は画面領域テーブル３７６
を基に文字コードが入力されているか否を調べることに
より、文字コード入力エリア４２の中に登録文字の文字
コードを入力しているか否かを判断する。入力部３２が
文字登録の文字コードを入力していると判断すれば、ス
テップ５０７にてタッチしている対応の文字コードを表
示制御部３６を介して液晶タブレット４に表示して、ス
テップ５０８にて文字コードを文字コードバッファ３７
２に格納して、ステップ５０２へ戻る。Here, the judgment of the division of one stroke is made when the stylus pen 2 is separated from the transparent tablet 1.
If it is determined in step 503 that the input person has not input the characters of the registered pattern into the input frame 41, the input unit 32 causes the screen area table 376 to operate in step 506.
By checking whether or not the character code has been input based on, it is determined whether or not the character code of the registered character is input in the character code input area 42. If it is determined that the input unit 32 is inputting the character code for character registration, the corresponding character code touched in step 507 is displayed on the liquid crystal tablet 4 via the display control unit 36, and then in step 508. Character code buffer 37
2 and then returns to step 502.

【００３１】更にステップ５０６で入力者が文字コード
入力エリア４２の中に入力していないと判断された場合
に、入力部３２は画面領域テーブル３７６を基に取消ボ
タン４３の中に入力しているか否かで、ステップ５０９
にて取消処理を行っているか否かを判断する。取消ボタ
ン４３をタッチしていれば、ステップ５１０にて入力部
３２は表示制御部３６を介して文字コード入力エリア４
２内の文字コードの表示と入力枠４１内の筆跡の表示の
初期化処理を行う。そして、ステップ５１１にて入力デ
ータバッファ３７１と文字コードバッファ３７２の内容
をクリアして、ステップ５０２へ戻る。しかし、ステッ
プ５０９で入力者が取消ボタン４３をペン２がタッチし
ていないと判定された時、入力部３２は画面領域テーブ
ル３７６のデータをチェックすることにより登録ボタン
４４の中をタッチすることにより、ステップ５１３にて
辞書登録開始処理を行っているかを判断する。辞書登録
開始処理を行っていない時は無効な位置にペン２が置か
れたと判断して、何もせずにステップ５０２に戻る。Further, when it is determined in step 506 that the input person has not entered the character code input area 42, the input section 32 has entered the cancel button 43 based on the screen area table 376. No, step 509
Determines whether or not cancellation processing is being performed. If the cancel button 43 is touched, the input unit 32 causes the character code input area 4 via the display control unit 36 in step 510.
Initialization processing of the display of the character code in 2 and the display of the handwriting in the input frame 41 is performed. Then, in step 511, the contents of the input data buffer 371 and the character code buffer 372 are cleared, and the process returns to step 502. However, when it is determined in step 509 that the input person does not touch the cancel button 43 with the pen 2, the input unit 32 checks the data in the screen area table 376 to touch the registration button 44. In step 513, it is determined whether dictionary registration start processing is being performed. When the dictionary registration start process is not performed, it is determined that the pen 2 is placed at an invalid position, and the process returns to step 502 without doing anything.

【００３２】登録ボタン４４がペン２でタッチされてい
れば、個人辞書作成部３３に処理が移り、登録処理に入
る。まず、個人辞書作成部３３は文字コードバッファ３
７２の値を調べ、登録する文字コードが正しいものか否
かを調べる。文字コードが正しいものではなかった時に
は個人辞書作成部３３は警告文をステップ５１５にて表
示制御部３６及び表示データバッファ３７４を介して表
示し、ステップ５０２に戻る。ステップ５１４にて文字
コードが正しい判定された場合、ステップ５１６にて続
いて入力データバッファ３７１の中を調べて登録するデ
ータが入っているか否かを調べる。データが入っていな
ければ、個人辞書作成部３３は警告文をステップ５１７
にて表示制御部３６及び表示データバッファ３７４を介
して表示し、ステップ５０２に戻る。登録データが存在
している時には、個人辞書作成部３３は次にのように個
人辞書への登録を行う。即ち、個人辞書作成部３３は個
人用文字認識辞書５２の内容を一度辞書作成用バッファ
３７３に転送し、辞書作成用バッファ３７３上で作業を
行う。個人辞書には図１０に示すようなデータ構造の形
で登録パターンの追加を行い、ステップ５１８にて作業
が終了した段階で、追加した個人用文字認識辞書５２を
外部記憶装置５に転送する。If the registration button 44 is touched with the pen 2, the processing moves to the personal dictionary creating section 33, and the registration processing starts. First, the personal dictionary creation unit 33 uses the character code buffer 3
The value of 72 is checked to see if the character code to be registered is correct. When the character code is not correct, the personal dictionary creating unit 33 displays the warning sentence in step 515 via the display control unit 36 and the display data buffer 374, and returns to step 502. When the character code is determined to be correct in step 514, the input data buffer 371 is checked in step 516 to see if there is data to be registered. If the data is not entered, the personal dictionary creating unit 33 gives a warning message in step 517.
The display is performed via the display controller 36 and the display data buffer 374, and the process returns to step 502. When the registration data exists, the personal dictionary creating unit 33 registers in the personal dictionary as follows. That is, the personal dictionary creation unit 33 once transfers the contents of the personal character recognition dictionary 52 to the dictionary creation buffer 373, and works on the dictionary creation buffer 373. A registered pattern is added to the personal dictionary in the form of a data structure as shown in FIG. 10. When the work is completed in step 518, the added personal character recognition dictionary 52 is transferred to the external storage device 5.

【００３３】個人用文字認識辞書５２への登録作業が終
了したら、認識辞書操作部３５に処理が移る。認識辞書
操作部３５は以下のようにステップ５１８にて登録され
た文字パターンと類似した辞書パターンについて登録情
報の変更を行う。まず、認識辞書操作部３５はステップ
５１９にて文字認識部３４を介して入力データバッファ
３７１に格納されている登録文字データを文字認識す
る。この文字認識処理は外部記憶装置５に格納されてい
る入力データと同一の画数をもつ標準文字認識辞書５１
の登録パターンを用いて次のように行われる。まず、入
力データバッファ３７１に格納されている２次元座標デ
ータを標準文字認識辞書５１に格納されている第１画の
始点を基準とする相対座標の形に変換する。次に相対座
標情報と１画分の始点・終点の各座標点において、その
距離を計算する。そして各座標点の距離の合計値を求
め、ステップ５１９にてその合計値を画数で割った値を
評価値として算出する。算出した評価値を基に評価値が
予め設定されてたしきい値と比較してしきい値より評価
値が小さいか否かをステップ５２０にて調べる。認識辞
書操作部３５は評価値がしきい値より小さい場合、文字
認識した認識候補の辞書パターンがステップ５１８で登
録した登録パターンと類似していると判断する。その場
合、認識辞書操作部３５はステップ５２１にて標準文字
認識辞書５１内の類似している辞書パターンの変動順位
データの値を１増加させ、このステップ５２１が終了し
たら処理が終了する。又、ステップ５２０で評価値がし
きい値より大きかった場合にはそのまま本例の処理を終
了する。When the work of registering in the personal character recognition dictionary 52 is completed, the processing moves to the recognition dictionary operation unit 35. The recognition dictionary operation unit 35 changes the registration information for the dictionary pattern similar to the character pattern registered in step 518 as follows. First, the recognition dictionary operation unit 35 character-recognizes the registered character data stored in the input data buffer 371 via the character recognition unit 34 in step 519. This character recognition processing is performed by the standard character recognition dictionary 51 having the same number of strokes as the input data stored in the external storage device 5.
It is performed as follows using the registration pattern of. First, the two-dimensional coordinate data stored in the input data buffer 371 is converted into the form of relative coordinates based on the starting point of the first image stored in the standard character recognition dictionary 51. Next, the relative coordinate information and the distance at each of the start and end coordinate points of one stroke are calculated. Then, the total value of the distances of the coordinate points is obtained, and in step 519, the value obtained by dividing the total value by the number of strokes is calculated as the evaluation value. Based on the calculated evaluation value, the evaluation value is compared with a preset threshold value to check in step 520 whether the evaluation value is smaller than the threshold value. When the evaluation value is smaller than the threshold value, the recognition dictionary operation unit 35 determines that the dictionary pattern of the recognition candidate for which the character is recognized is similar to the registration pattern registered in step 518. In that case, the recognition dictionary operation unit 35 increments the value of the variable rank data of the similar dictionary pattern in the standard character recognition dictionary 51 by 1 in step 521, and when this step 521 ends, the processing ends. If the evaluation value is greater than the threshold value in step 520, the processing of this example is ended.

【００３４】その後、文字認識部３４の文字認識処理の
では文字候補が算出されから、この変動順位データを用
いて候補順位の変更を行う。具体的に文字候補が「い」
「り」「”」で、それらの変動順位データがそれぞれ
「２」「１」「０」の時、第１候補は２つ順位が下が
り、第２候補は１つ順位が下がるので、候補順位
は「”」「り」「い」と変更される。After that, in the character recognition processing of the character recognition unit 34, character candidates are calculated, and the candidate rank is changed using this variable rank data. Specifically, the character candidate is "i"
When the ranking data of "ri" and "" are "2", "1" and "0" respectively, the rank of the first candidate is lowered by two and the rank of the second candidate is lowered by one. Is changed to "", "ri", and "i".

【００３５】本実施例によれば、個人用の文字認識辞書
５２に新たな文字パターンを登録する際に、その登録文
字パターンと類似している既に標準文字認識辞書５１に
登録済みのパターンを、その後の認識処理で候補に挙が
った時にその候補順位を下げて、入力者が新たに登録文
字を辞書５１に追加しても、その追加によって誤認識を
引き起こすことを回避することができる。これにより入
力者は自分の認識し易い文字パターンを次々と登録する
ことができ、それに伴い文字認識率の向上も実現するこ
とができる。According to this embodiment, when registering a new character pattern in the personal character recognition dictionary 52, a pattern similar to the registered character pattern and already registered in the standard character recognition dictionary 51 is Even if the input person newly adds a registered character to the dictionary 51 by lowering the candidate rank when a candidate is picked up in the subsequent recognition processing, it is possible to avoid causing the recognition error due to the addition. As a result, the input person can successively register character patterns that he or she can easily recognize, and the character recognition rate can be improved accordingly.

【００３６】尚、本発明は上述した実施例に限定される
ものではない。例えば、本実施例では標準文字認識辞書
のみで辞書パターンに変動順位データの変更を行った
が、個人用文字認識辞書を含めた複数の認識辞書を対象
に辞書パターンの候補順位の変更の情報付加を行っても
よい。又、本実施例では登録パターンと類似している認
識パターンの変動順位データを直接変更していたが、認
識辞書がＲＯＭ上にあることを想定して変動順位データ
と辞書番号の対応を辞書本体とは別にテーブルとしてＲ
ＯＭ上に保持し、その情報を変更することで、辞書の修
正を行うというような認識辞書のデータ構造にしてもよ
い。The present invention is not limited to the above embodiment. For example, in the present embodiment, the variable rank data is changed to the dictionary pattern only with the standard character recognition dictionary, but the information of the change of the candidate rank of the dictionary pattern is added to the plurality of recognition dictionaries including the personal character recognition dictionary. You may go. Further, in the present embodiment, the variation rank data of the recognition pattern similar to the registered pattern was directly changed, but assuming that the recognition dictionary is on the ROM, the correspondence between the variation rank data and the dictionary number is determined by the dictionary body. R as a separate table
The data structure of the recognition dictionary may be such that the dictionary is modified by holding the information on the OM and changing the information.

【００３７】又、本実施例では文字認識の方法を辞書の
登録している文字と入力した座標との距離計算によるマ
ッチングで認識していたが、文字をある基本的な形に抽
象化して、その形とのマッチングを行うなど、評価値が
算出できるならば他の文字認識手法を使用してもよい。
それに付随して認識辞書のデータ構造も変更してもよ
い。本実施例では変動順位データを基に該当する辞書パ
ターンの候補順位を変動順位データの値の数だけ下げる
という順位変更処理を行ったが、各候補の評価値を調べ
て評価値に応じて後方の順位を下げる等の本実施例以外
の候補順位の入れ替え処理を行うようにしても同様の効
果がある。In the present embodiment, the character recognition method is recognized by matching the distance between the character registered in the dictionary and the input coordinate, but the character is abstracted into a certain basic form. Other character recognition methods may be used as long as the evaluation value can be calculated, such as matching with the shape.
The data structure of the recognition dictionary may be changed accordingly. In the present embodiment, the rank changing process of lowering the candidate rank of the corresponding dictionary pattern based on the variable rank data by the number of values of the variable rank data is performed. The same effect can be obtained by performing the replacement process of the candidate ranks other than the present embodiment, such as lowering the rank.

【００３８】[0038]

【発明の効果】以上記述した如く本発明の文書検索装
置，文書検索方法，文字認識装置及び辞書作成方法によ
れば、検索者が指定した１つの検索表音文字列入力に対
し、複数の同音異字語を有する文字列を含む文書の検索
も可能にして、表記の分からない検索語が入力された場
合にも目的の文書を迅速に検索することができる文書検
索装置と文書検索方法並びに、入力者が新たに個人用の
文字認識辞書に文字パターンを登録することによって、
手書き文字の認識率を向上させることができる同音異字
語変換情報を記憶しておくことにより、１つの表音文字
列を入力するだけで、１以上の同音異字語の検索をする
ことができるAs described above, according to the document search device, the document search method, the character recognition device, and the dictionary creation method of the present invention, a plurality of homophones are input for one search phonetic character string input designated by the searcher. A document search device and a document search method that enable a search for a document including a character string having an acronym and can quickly search for a target document even when a search word whose description is unknown is input, and an input method. By newly registering a character pattern in the personal character recognition dictionary,
By storing the homophone conversion information that can improve the recognition rate of handwritten characters, it is possible to search for one or more homophones only by inputting one phonetic character string.

[Brief description of drawings]

【図１】本発明の文書検索装置の一実施例を示したブロ
ック図。FIG. 1 is a block diagram showing an embodiment of a document search device according to the present invention.

【図２】図１に示した同音異字語情報記憶部内の構造例
を示した図。FIG. 2 is a diagram showing an example of the structure in the homonym word information storage unit shown in FIG.

【図３】図１に示した制御部の検索処理を示したフロー
チャート。FIG. 3 is a flowchart showing a search process of a control unit shown in FIG.

【図４】図１に示した制御部が各同音語を含む文書を検
索した結果例を示した図。FIG. 4 is a diagram showing an example of a result of a search performed by the control unit shown in FIG. 1 for a document including each homophone.

【図５】図４に示した結果例を図１に示した出力部に表
示するフォーマット例を示した図。5 is a diagram showing a format example displayed on the output unit shown in FIG. 1 as an example of the result shown in FIG.

【図６】図１に示した入力部により入力される文字列が
英語の場合の同音異字語を示した図。FIG. 6 is a diagram showing homophones when the character string input by the input unit shown in FIG. 1 is English.

【図７】本発明の文字認識装置の一実施例を示した概略
構成図。FIG. 7 is a schematic configuration diagram showing an embodiment of a character recognition device of the present invention.

【図８】図７に示した制御装置の詳細例を示したブロッ
ク図。8 is a block diagram showing a detailed example of the control device shown in FIG.

【図９】図８に示した表示装置に表示される個人用文字
パターン入力画面例を示した図。9 is a view showing an example of a personal character pattern input screen displayed on the display device shown in FIG.

【図１０】図８に示した標準文字認識辞書のデータ構造
例を示した図。10 is a diagram showing an example of the data structure of the standard character recognition dictionary shown in FIG.

【図１１】図８に示した制御装置の文字パターン登録処
理を示したフローチャート。11 is a flowchart showing a character pattern registration process of the control device shown in FIG.

【図１２】図８に示した画面領域テーブルのデータ構造
例を示した図。12 is a diagram showing an example of the data structure of the screen area table shown in FIG.

【図１３】図８に示した入力データバッファのデータ構
造例を示した図。13 is a diagram showing an example of the data structure of the input data buffer shown in FIG.

[Explanation of symbols]

１…透明タブレット２…スタイラスペ
ン３…制御装置４…表示装置５…外部記憶装置３１…初期設定部３２…入力部３３…個人辞書作
成部３４…文字認識部３５…認識辞書操
作部３６…表示制御部３７…記憶部３７１…入力データバッファ３７２…文字コー
ドバッファ３７３…辞書作成用バッファ３７４…表示デー
タバッファ３７５…画面イメージデータ３７６…画面領域
テーブル５１…標準文字認識辞書５２…個人用文字
認識辞書１０１…入力部１０２…文書記憶
部１０３…同音異字語情報記憶部１０４…制御部１０５…出力部1 ... Transparent tablet 2 ... Stylus pen 3 ... Control device 4 ... Display device 5 ... External storage device 31 ... Initial setting part 32 ... Input part 33 ... Personal dictionary creation part 34 ... Character recognition part 35 ... Recognition dictionary operation part 36 ... Display Control unit 37 ... Storage unit 371 ... Input data buffer 372 ... Character code buffer 373 ... Dictionary creation buffer 374 ... Display data buffer 375 ... Screen image data 376 ... Screen area table 51 ... Standard character recognition dictionary 52 ... Personal character recognition dictionary Reference numeral 101 ... Input unit 102 ... Document storage unit 103 ... Homophonetic word information storage unit 104 ... Control unit 105 ... Output unit

Claims

[Claims]

1. A document retrieval device for retrieving a document stored in a database based on a retrieval word, comprising storage means for storing a phonetic character string and a corresponding homophone acronym, and an input table. A homophone variant for obtaining a homophone word corresponding to a phonetic character string from the storage means, and a search means for searching a document from the database using each homophone variant acquired by the homophone variant word as the search word. A document retrieval device comprising:

2. The document search device according to claim 1, wherein the search means also searches the database for a document including an input phonetic character string.

3. A document retrieval method for retrieving a document stored in a database based on a retrieval word, wherein homophones corresponding to an input phonetic character string are obtained, and then the obtained homophones A document search method, wherein a document is searched from the database using a word as the search word.

4. A character recognition device for character-recognizing and inputting a handwritten character by collating a pattern of a sequence of coordinate points input from the coordinate input means with a pattern in a character recognition dictionary, wherein the coordinate input means From the character recognition dictionary, a registration means for registering the pattern of the coordinate point sequence input from and the character code corresponding to this pattern in the character recognition dictionary, and an existing pattern similar to the pattern registered by this registration means A character recognition device comprising: a search means for searching; and a correction means for changing candidate priority order information in the character recognition dictionary of a pattern searched by the search means.

5. The character recognition device according to claim 1, further comprising a dictionary in which a new pattern and a character code corresponding thereto are registered by the registration means.

6. The similarity between the pattern registered by the registration means and an existing pattern in the character recognition dictionary is determined based on an evaluation value obtained when character-recognizing the registered pattern. The character recognition device according to claim 4 or 5.

7. The correcting means, when the pattern searched by the searching means is listed as a character recognition candidate in the character recognition processing, changes the candidate priority information in the character recognition dictionary related to the pattern. 7. The character recognition device according to claim 4, wherein the character recognition device is a character recognition device.

8. A character recognition device for character-recognizing and inputting handwritten characters by collating a pattern of a coordinate point sequence input from coordinate input means with a pattern in a character recognition dictionary, wherein the coordinate point sequence In the dictionary creating method for creating a dictionary by registering the pattern of the above in the character recognition dictionary, the pattern of the coordinate point sequence input from the coordinate input means and the character code corresponding to this pattern are registered in the character recognition dictionary. A method of creating a dictionary, characterized in that, when performing the search, an existing pattern similar to the registered pattern is searched from the character recognition dictionary, and candidate priority information in the character recognition dictionary of the searched pattern is changed.