JP2007305046A

JP2007305046A - Information processor for generating kanji reading, information processing method, program for attaining information processing and recording medium with the program recorded thereon

Info

Publication number: JP2007305046A
Application number: JP2006135281A
Authority: JP
Inventors: Yoshiyuki Koyama; 至幸小山
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2006-05-15
Filing date: 2006-05-15
Publication date: 2007-11-22
Anticipated expiration: 2026-05-15
Also published as: JP4785614B2

Abstract

<P>PROBLEM TO BE SOLVED: To efficiently generate reading of a full name. <P>SOLUTION: This program includes the steps of: acquiring individual identification information from mail address information (S200); dividing the acquired individual identification information (S202); calculating the matching degree among the divided individual identification information, family name candidate and given name candidate (S300); calculating the matching degree between the divided individual identification information and given name candidate (S300); and determining the reading of a full name on the basis of the calculated matching degrees (S400). <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、漢字の読みを生成する情報処理装置に関し、特に、漢字で表わされた姓名の読みを生成する装置に関する。 The present invention relates to an information processing apparatus that generates kanji readings, and more particularly to an apparatus that generates readings of surnames expressed in kanji.

現在、光学式文字読取（ＯＣＲ：Optical Character Recognition）技術は、文書読取装置、帳票読取装置など、さまざまな文字読取装置に利用されている。その中には、名刺読取装置のように、名刺の紙面を撮像した画像情報から文字情報を認識し、認識された文字情報（文字列）を姓名、住所、会社名などの項目に分類する装置がある。このように分類された文字列の読みを生成する技術がある。このような文字列の読みを生成する技術に関して、たとえば、以下の公報に開示された技術がある。 Currently, optical character recognition (OCR) technology is used in various character readers such as document readers and form readers. Among them, a device for recognizing character information from image information obtained by imaging the paper surface of a business card and classifying the recognized character information (character string) into items such as first name, last name, address, company name, etc. There is. There is a technique for generating readings of character strings classified in this way. As a technique for generating such a character string reading, for example, there is a technique disclosed in the following publications.

特開平５−２０３００号公報（特許文献１）は、文字認識の認識精度を向上させ、かつ認識した文字の読みを正確に生成する技術を開示する。特許文献１に開示された文書処理装置は、名刺の表面イメージと裏面イメージとを読取るための読取り手段と、読取り手段で読取った表面イメージと裏面イメージとに基づいて、名刺の表面文字と裏面文字とを認識するための認識手段と、認識手段によって認識された裏面文字が英語表記であるのか否かを判定するための判定手段と、判定手段によって英語表記であると判定されたときには、表面文字と裏面文字とを住所、姓名、電話番号等の各住所録データに分割するための分割手段と、分割手段によって分割した表面文字と裏面文字の各住所録データをそれぞれ比較し、表面文字の表記と一致する裏面文字の表記を選択するための選択手段と、選択手段によって選択された裏面文字の表記に基づいて、その表記に対応する読みを生成するための生成手段と、生成手段によって生成された読みを、その読みに対応する表面文字の表記に関連づけて記憶するための記憶手段とを含む。 Japanese Patent Application Laid-Open No. 5-20300 (Patent Document 1) discloses a technique for improving the recognition accuracy of character recognition and generating a recognized character reading accurately. The document processing device disclosed in Patent Document 1 includes a reading unit for reading a front image and a back image of a business card, and a front character and a back character of a business card based on the front image and the back image read by the reading unit. Recognizing means for recognizing, a determining means for determining whether or not the back side character recognized by the recognizing means is in English notation, and when the determining means determines that it is in English notation, And the back side characters are divided into each address book data such as address, first name, phone number, etc., and the front side character and back side character address book data divided by the split means are respectively compared and the front side character notation Based on the notation of the reverse character selected by the selection means and the reverse character selected by the selection means, the reading corresponding to the notation is generated. Comprising a generation unit of order, the readings generated by the generating means, and storage means for storing in association with the representation of the surface character corresponding to the read.

この公報に開示された文書処理装置によると、読取り手段によって読取られた名刺の表面イメージと裏面イメージとが、認識手段により表面文字と裏面文字として認識される。判定手段により裏面文字が英語表記であるのか否かが判定される。判定結果が英語表記であるときには、分割手段により表面文字と裏面文字とが住所、姓名、電話番号等の各住所録データに分割される。分割された表面文字と裏面文字との各住所録データが選択手段によりそれぞれ比較され、表面文字の表記と一致する裏面文字の表記が選択される。生成手段により、選択された裏面文字の表記に基づいて、その表記に対応する読みが生成される。生成された読みは、その読みに対応する表面文字の表記に関連づけられて、記憶手段に記憶される。したがって、名刺の裏面に表面文字の英語表記がある場合には、英語表記が参照されて表面文字が認識されるので、認識率が向上する。また、裏面の英語表記から読みが生成されるので、正確な読みを生成することができる。
特開平５−２０３００号公報 According to the document processing apparatus disclosed in this publication, the front image and the back image of the business card read by the reading unit are recognized as the front and back characters by the recognition unit. It is determined by the determining means whether or not the back side character is in English. When the determination result is written in English, the dividing means divides the front character and the reverse character into address book data such as an address, first name, and phone number. The address book data of the divided front character and back character are respectively compared by the selecting means, and the notation of the back character that matches the notation of the front character is selected. The generation means generates a reading corresponding to the notation based on the notation of the selected back side character. The generated reading is stored in the storage means in association with the notation of the surface character corresponding to the reading. Therefore, when there is an English notation of the front character on the back side of the business card, the front character is recognized by referring to the English notation, so that the recognition rate is improved. In addition, since readings are generated from the English notation on the back side, accurate readings can be generated.
JP-A-5-20300

しかしながら、特許文献１に開示された文書処理装置においては、表面文字の表記に対応する読みを生成するためには、名刺の表面イメージと裏面イメージとを読取る必要があり時間がかかるという問題があった。 However, the document processing apparatus disclosed in Patent Document 1 has a problem in that it takes time to read the front and back images of a business card in order to generate a reading corresponding to the notation of the front character. It was.

本発明は、上述の課題を解決するためになされたものであって、その目的は、姓名の読みを効率よく生成することができる情報処理装置、情報処理方法、これらの情報処理を実現するプログラム、およびそのプログラムを記録した記録媒体を提供することである。 The present invention has been made to solve the above-described problems, and an object of the present invention is to provide an information processing apparatus, an information processing method, and a program for realizing the information processing capable of efficiently generating readings of first and last names And a recording medium on which the program is recorded.

第１の発明に係る情報処理装置は、漢字と漢字の読みを表わす表音文字（たとえばローマ字や仮名）とを対応させた第１の情報を予め記憶するための手段と、個人の姓名を表わす漢字と個人のメールアドレスに含まれる第２の情報とを取得するための取得手段と、第１の情報に基づいて、姓名を表わす漢字の読みの候補を生成するための生成手段と、第２の情報と候補とを照合した結果に基づいて、姓名を表わす漢字の読みを決定するための決定手段とを含む。 An information processing apparatus according to a first aspect of the present invention represents means for preliminarily storing first information in which kanji and phonetic characters (for example, romaji and kana) representing kanji reading are associated with each other, and represents an individual's first and last name Acquisition means for acquiring kanji and second information included in the personal mail address; generation means for generating candidates for reading kanji representing first and last names based on the first information; Determining means for determining the reading of the kanji representing the first and last names based on the result of collating the information with the candidates.

第１の発明によると、個人の姓名を表わす漢字と個人のメールアドレスに含まれる第２の情報とが取得手段により取得される。個人の姓名とメールアドレスとは、たとえば名刺の場合、表面あるいは裏面のいずれかの同一面上に記載されることが多い。そのため、名刺のいずれか一方の紙面のみを撮像した画像情報から姓名を表わす漢字と第２の情報とを取得することができる。取得された個人の姓名を表わす漢字の読みの候補が、漢字の読みを表音文字として予め記憶した第１の情報に基づいて生成される。生成された候補と、取得されたメールアドレスに含まれる第２の情報（たとえば、個人を識別するための情報（以下、個人識別情報とも称する））とを照合した結果（たとえば一致する文字の数や位置）に基づいて、姓名を表わす漢字の読みが決定される。メールアドレスには、個人の姓名の全部または一部の読みをローマ字で記載したものが含まれることが多い。そのため、生成された候補から姓名の読みを決定することができる。その結果、姓名の読みを効率よく決定することができる情報処理装置を提供することができる。 According to the first aspect of the invention, the kanji representing the personal first name and the second information included in the personal e-mail address are acquired by the acquiring means. For example, in the case of a business card, an individual's full name and e-mail address are often written on the same surface, either the front surface or the back surface. Therefore, the kanji representing the first and last name and the second information can be acquired from the image information obtained by imaging only one of the business cards. A kanji reading candidate representing the acquired first and last name of the individual is generated based on first information stored in advance as a phonetic character. The result (for example, the number of matching characters) of the generated candidate and the second information (for example, information for identifying an individual (hereinafter also referred to as personal identification information)) included in the acquired email address And kanji readings representing first and last names are determined. Email addresses often include romanized readings of all or part of an individual's full name. Therefore, the reading of the full name can be determined from the generated candidates. As a result, it is possible to provide an information processing apparatus that can efficiently determine the reading of first and last names.

第２の発明に係る情報処理装置においては、第１の発明の構成に加えて、決定手段は、第２の情報と候補とで一致する文字の数に基づいて、読みを決定するための手段を含む。 In the information processing apparatus according to the second invention, in addition to the configuration of the first invention, the determining means is means for determining reading based on the number of characters matching the second information and the candidate including.

第２の発明によると、第２の情報と候補とで一致する文字の数に基づいて、読みが決定される。そのため、姓名の全部または一部の読みと候補とで一致する文字の数に基づいて、個人の姓名の読みを決定することができる。そのため、姓名の読みを精度よく決定することができる。 According to the second invention, reading is determined based on the number of characters that match the second information and the candidate. Therefore, it is possible to determine the reading of an individual's full name based on the number of characters that match the reading of all or part of the full name and the candidate. Therefore, the reading of first and last names can be determined with high accuracy.

第３の発明に係る情報処理装置においては、第２の発明の構成に加えて、決定手段は、一致する文字の数が最も多い候補を、読みとして決定するための手段を含む。 In the information processing apparatus according to the third invention, in addition to the configuration of the second invention, the determining means includes means for determining a candidate having the largest number of matching characters as a reading.

第３の発明によると、第２の情報と候補とで一致する文字の数が最も多い候補が読みとして決定される。そのため、他の候補に比べて個人の姓名の読みとより多く一致する候補を姓名の読みとして決定することができる。 According to the third invention, the candidate having the largest number of matching characters between the second information and the candidate is determined as the reading. Therefore, a candidate that more closely matches the reading of a person's full name than other candidates can be determined as a reading of the full name.

第４の発明に係る情報処理装置においては、第１の発明の構成に加えて、決定手段は、第２の情報と候補とで一致する文字の位置に基づいて、読みを決定するための手段を含む。 In the information processing apparatus according to the fourth invention, in addition to the configuration of the first invention, the determining means determines the reading based on the position of the character that matches the second information and the candidate. including.

第４の発明によると、第２の情報と候補とで一致する文字の位置に基づいて、読みが決定される。たとえば、先頭または末尾の文字が第２の情報の先頭または末尾の文字と一致する候補が優先されて、読みが決定される。先頭または末尾の文字が第２の情報の先頭または末尾の文字と一致する候補は一致しない候補と比べて、より正確な読みであることが多い。そのため、姓名の読みを精度よく決定することができる。 According to the fourth aspect, reading is determined based on the position of the character that matches the second information and the candidate. For example, reading is determined with priority given to candidates whose first or last character matches the first or last character of the second information. Candidates whose leading or trailing characters match the leading or trailing characters of the second information are often more accurate readings than non-matching candidates. Therefore, the reading of first and last names can be determined with high accuracy.

第５の発明に係る情報処理装置においては、第４の発明の構成に加えて、決定手段は、先頭の文字が第２の情報の先頭の文字と一致する候補を優先させて、読みを決定するための手段を含む。 In the information processing apparatus according to the fifth aspect of the invention, in addition to the configuration of the fourth aspect of the invention, the determination means prioritizes a candidate whose leading character matches the leading character of the second information and determines reading Means for doing so.

第５の発明によると、先頭の文字が第２の情報の先頭の文字と一致する候補が優先されて、読みが決定される。先頭の文字が第２の情報の先頭の文字と一致する候補は一致しない候補と比べて、より正確な読みであることが多い。そのため、姓名の読みを精度よく決定することができる。 According to the fifth invention, the candidate whose leading character matches the leading character of the second information is prioritized and reading is determined. Candidates whose first character matches the first character of the second information often read more accurately than candidates that do not match. Therefore, the reading of first and last names can be determined with high accuracy.

第６の発明に係る情報処理装置においては、第４の発明の構成に加えて、決定手段は、末尾の文字が第２の情報の末尾の文字と一致する候補を優先させて、読みを決定するための手段を含む。 In the information processing device according to the sixth aspect of the invention, in addition to the configuration of the fourth aspect of the invention, the deciding means prioritizes candidates whose last character matches the last character of the second information and decides reading Means for doing so.

第６の発明によると、末尾の文字が第２の情報の末尾の文字と一致する候補が優先されて、読みが決定される。末尾の文字が第２の情報の末尾の文字と一致する候補は一致しない候補と比べて、より正確な読みであることが多い。そのため、姓名の読みを精度よく決定することができる。 According to the sixth aspect of the present invention, the candidate whose last character matches the last character of the second information is given priority and reading is determined. Candidates whose last character matches the last character of the second information often read more accurately than candidates that do not match. Therefore, the reading of first and last names can be determined with high accuracy.

第７の発明に係る情報処理装置においては、第４〜第６のいずれかの発明の構成に加えて、生成手段は、姓を表わす漢字の読みである姓候補と名を表わす漢字の読みである名候補とを生成するための手段を含む。決定手段は、姓候補と第２の情報とで一致する文字の位置と、名候補と第２の情報とで一致する文字の位置とが、第２の情報において異なるように読みを決定するための手段を含む。 In the information processing apparatus according to the seventh invention, in addition to the configuration of any one of the fourth to sixth inventions, the generation means reads the surname candidate and the kanji character representing the first name as the kanji character representing the surname. Means for generating a name candidate. The determining means determines the reading so that the position of the character matching the surname candidate and the second information is different from the position of the character matching the first name candidate and the second information in the second information. Including means.

第７の発明によると、姓候補と名候補とが別々に生成される。姓候補と第２の情報とで一致する文字の位置と、名候補と第２の情報とで一致する文字の位置とが、第２の情報において異なるように読みが決定される。これにより、第２情報における姓候補および名候補と一致する文字が重複することが抑制されるため、姓および名の正確な読みを決定することができる。 According to the seventh invention, the surname candidate and the first name candidate are generated separately. Reading is determined so that the position of the character that matches the surname candidate and the second information is different from the position of the character that matches the surname candidate and the second information in the second information. Thereby, since it is suppressed that the character which matches the last name candidate and the first name candidate in 2nd information is duplicated, the correct reading of a last name and a first name can be determined.

第８の発明に係る情報処理装置においては、第１〜第７のいずれかの発明の構成に加えて、決定手段は、第２の情報および候補のいずれかの文字数に対する、第２の情報と候補とで一致する文字の数の割合が、すべての候補において予め定められた割合より小さい場合は、予め定められた条件を満足する候補を読みとして決定するための手段を含む。 In the information processing apparatus according to the eighth invention, in addition to the configuration of any one of the first to seventh inventions, the determining means includes the second information and the second information for any number of candidate characters and When the ratio of the number of characters matching the candidate is smaller than a predetermined ratio in all candidates, a means for determining a candidate satisfying a predetermined condition as a reading is included.

第８の発明によると、第２の情報および候補のいずれかの文字数に対する、第２の情報と候補とで一致する文字の数の割合が、すべての候補において予め定められた割合より小さい場合は、予め定められた条件を満足する候補に読みが決定される。たとえば、姓名に用いられる頻度の高い候補が読みとして決定される。そのため、メールアドレスに個人の姓名が含まれない場合や、メールアドレスの一部が個人の姓名の読みの一部と偶然に一致する場合であっても、正確な読みを決定することができる。 According to the eighth aspect, when the ratio of the number of characters matching the second information and the candidate to the number of characters of the second information and the candidate is smaller than a predetermined ratio in all candidates The reading is determined as a candidate that satisfies a predetermined condition. For example, a candidate that is frequently used for the first and last names is determined as a reading. Therefore, even when the personal name is not included in the e-mail address, or even when a part of the e-mail address coincides with a partial reading of the personal first and last name, an accurate reading can be determined.

第９の発明に係る情報処理装置においては、第８の発明の構成に加えて、決定手段は、姓名に用いられる頻度の高い候補を読みとして決定するための手段を含む。 In the information processing apparatus according to the ninth aspect of the invention, in addition to the configuration of the eighth aspect of the invention, the determining means includes means for determining, as a reading, a candidate that is frequently used for first and last names.

第９の発明によると、一致する文字の数の割合が、すべての候補において予め定められた割合より小さい場合は、姓名に用いられる頻度の高い候補が読みとして決定される。そのため、メールアドレスに個人の姓名が含まれない場合や、メールアドレスの一部が個人の姓名の読みの一部と偶然に一致する場合であっても、正確な読みである可能性が高い候補を読みに決定することができる。 According to the ninth aspect, when the ratio of the number of matching characters is smaller than a predetermined ratio in all candidates, a candidate that is frequently used for the first and last names is determined as a reading. Therefore, even if an email address does not include an individual's first and last name, or a part of the email address coincides with a part of the reading of an individual's first and last name, a candidate who is likely to be an accurate reading Can be determined to read.

第１０の発明に係る情報処理装置においては、第１〜第７のいずれかの発明の構成に加えて、決定手段は、第２の情報および候補のいずれかの文字数に対する、第２の情報と候補とで一致する文字の数の割合が、すべての候補において予め定められた割合より小さい場合は、第２の情報に基づいて、読みを決定するための手段を含む。 In the information processing device according to the tenth invention, in addition to the configuration of any one of the first to seventh inventions, the determining means includes the second information and the second information for any number of candidate characters. If the ratio of the number of characters that match the candidate is smaller than a predetermined ratio for all candidates, a means for determining reading based on the second information is included.

第１０の発明によると、第２の情報および候補のいずれかの文字数に対する、第２の情報と候補とで一致する文字の数の割合が、すべての候補において予め定められた割合より小さい場合は、第２の情報に基づいて、読みが決定される。たとえば、第２の情報に対応する仮名が読みとして決定される。そのため、姓名の読みを決定することができ、かつ、正確な読みの候補が生成されない場合であっても、正確な読みである可能性が高い読みを決定することができる。 According to the tenth aspect, when the ratio of the number of characters matching the second information and the candidate to the number of characters of the second information and the candidate is smaller than a predetermined ratio in all candidates The reading is determined based on the second information. For example, a kana corresponding to the second information is determined as a reading. Therefore, it is possible to determine readings of first and last names, and to determine readings that are highly likely to be accurate readings even when accurate reading candidates are not generated.

第１１の発明に係る情報処理装置は、第１〜第１０のいずれかの発明の構成に加えて、第２の情報に対応する仮名を生成するための手段をさらに含む。生成手段は、候補を仮名で生成するための手段を含む。決定手段は、対応する仮名と候補とを照合した結果に基づいて、読みを決定するための手段を含む。 The information processing apparatus according to the eleventh invention further includes means for generating a pseudonym corresponding to the second information in addition to the configuration of any one of the first to tenth inventions. The generating means includes means for generating candidates with pseudonyms. The determining means includes means for determining a reading based on the result of collating the corresponding kana and the candidate.

第１１の発明によると、第２の情報に対応する仮名と候補の読みを表わす仮名とを照合した結果に基づいて、読みを決定することができる。 According to the eleventh aspect, the reading can be determined based on the result of collating the kana corresponding to the second information with the kana representing the candidate reading.

第１２の発明に係る情報処理装置においては、第１〜第１１のいずれかの発明の構成に加えて、第２の情報は、ローマ字を含む。生成手段は、候補をローマ字で生成するための手段を含む。 In the information processing apparatus according to the twelfth invention, in addition to the configuration of any one of the first to eleventh inventions, the second information includes a Roman character. The generating means includes means for generating candidates in Roman letters.

第１２の発明によると、第２の情報に含まれるローマ字と候補の読みを表わすローマ字とを照合した結果に基づいて、読みを決定することができる。 According to the twelfth aspect, the reading can be determined based on the result of collating the Roman character included in the second information with the Roman character representing the candidate reading.

第１３の発明に係る情報処理装置においては、第１〜第１２のいずれかの発明の構成に加えて、第２の情報は、メールアドレスに含まれる個人を識別するための文字である。 In the information processing apparatus according to the thirteenth aspect, in addition to the configuration of any one of the first to twelfth aspects, the second information is a character for identifying an individual included in the mail address.

第１３の発明によると、第２の情報は、メールアドレスに含まれる個人を識別するための文字である。メールアドレスに含まれる個人を識別するための文字には、個人の姓名の全部または一部をローマ字で記載したものが含まれることが多い。そのため、姓名の読みを精度よく生成することができる。 According to the thirteenth invention, the second information is a character for identifying an individual included in the mail address. In many cases, characters for identifying an individual included in an e-mail address include all or part of a person's full name in Roman letters. Therefore, it is possible to accurately generate readings of first and last names.

第１４の発明に係る情報処理装置においては、第１〜第１３のいずれかの発明の構成に加えて、取得手段は、撮像された画像情報を認識することにより文字情報を取得するための手段を含む。 In the information processing apparatus according to the fourteenth invention, in addition to the configuration of any one of the first to thirteenth inventions, the acquisition means acquires character information by recognizing captured image information. including.

第１４の発明によると、撮像された画像情報を認識することにより文字情報が取得される。そのため、たとえば、姓名とメールアドレスが記載された名刺などの像が撮像された画像情報から文字情報である姓名を表わす漢字およびメールアドレスを取得することができる。 According to the fourteenth aspect, character information is acquired by recognizing captured image information. Therefore, for example, kanji representing a first and last name, which is character information, and an e-mail address can be acquired from image information obtained by capturing an image such as a business card in which the first and last name and e-mail address are described.

第１５の発明に係る情報処理装置においては、第１〜第１４のいずれかの発明の構成に加えて、第１の情報は、漢字と、漢字を姓名に用いる場合の読みとを対応させた情報である。 In the information processing apparatus according to the fifteenth aspect, in addition to the configuration of any one of the first to fourteenth aspects, the first information associates kanji with readings when kanji is used for first and last names. Information.

第１５の発明によると、第１の情報は、漢字と、漢字を姓名に用いる場合の読みとを対応させた情報である。そのため、漢字のすべての読みを記憶する場合と比べて、第１の情報を記憶するための容量を低減することができる。 According to the fifteenth aspect of the invention, the first information is information that associates kanji with readings when kanji is used for first and last names. Therefore, the capacity for storing the first information can be reduced as compared with the case of storing all readings of kanji.

第１６の発明に係る情報処理方法は、記憶手段、取得手段、生成手段、および決定手段を備える情報処理装置によって行なわれる情報処理方法である。この情報処理方法は、漢字と漢字の読みを表わす表音文字とを対応させた第１の情報を、記憶手段を用いて記憶するステップと、個人の姓名を表わす漢字および個人のメールアドレスに含まれる第２の情報を、取得手段を用いて取得するステップと、第１の情報に基づいて、姓名を表わす漢字の読みの候補を、生成手段を用いて生成するステップと、第２の情報と候補とを照合した結果に基づいて、姓名を表わす漢字の読みを、決定手段を用いて決定するステップとを含む。 An information processing method according to a sixteenth aspect of the invention is an information processing method performed by an information processing apparatus including a storage unit, an acquisition unit, a generation unit, and a determination unit. This information processing method includes the step of storing, using the storage means, first information that associates kanji and phonetic characters representing the reading of kanji, and includes kanji representing an individual's first and last name and an individual's e-mail address. Obtaining the second information using the obtaining means, generating the kanji reading candidates representing the first and last names based on the first information using the creating means, and the second information Determining a kanji reading representing the first and last names based on the result of collating with the candidate using a determining means.

第１６の発明によると、個人の姓名を表わす漢字と個人のメールアドレスに含まれる第２の情報とが取得手段を用いて取得される。個人の姓名とメールアドレスとは、たとえば名刺の場合、表面あるいは裏面のいずれかの同一面上に記載されることが多い。そのため、名刺を撮像した画像情報から姓名を表わす漢字と第２の情報とを取得する場合、名刺のいずれか一方の面の画像情報を取得するだけでよい。取得された個人の姓名を表わす漢字の読みの候補が、漢字の読みを表音文字として予め記憶した第１の情報に基づいて生成される。生成された候補と、取得されたメールアドレスに含まれる第２の情報（たとえば、個人識別情報）とを照合した結果（たとえば一致する文字の数や位置）に基づいて、姓名を表わす漢字の読みが決定される。メールアドレスには、個人の姓名の全部または一部の読みをローマ字で記載したものが含まれることが多い。そのため、生成された候補のうちから姓名の読みを決定することができる。その結果、姓名の読みを効率よく生成することができる情報処理方法を提供することができる。 According to the sixteenth invention, the kanji representing the personal first name and the second information included in the personal e-mail address are obtained using the obtaining means. For example, in the case of a business card, an individual's full name and e-mail address are often written on the same surface, either the front surface or the back surface. Therefore, when acquiring the kanji representing the first and last name and the second information from the image information obtained by capturing the business card, it is only necessary to acquire the image information of one side of the business card. A kanji reading candidate representing the acquired first and last name of the individual is generated based on first information stored in advance as a phonetic character. Based on the result (for example, the number and position of matching characters) of the generated candidate and the second information (for example, personal identification information) included in the acquired e-mail address, the reading of kanji representing the first and last names Is determined. Email addresses often include romanized readings of all or part of an individual's full name. Therefore, it is possible to determine the reading of first and last names from the generated candidates. As a result, it is possible to provide an information processing method capable of efficiently generating first and last name readings.

第１７の発明に係るプログラムは、第１６の発明に係る情報処理方法を実現するプログラムである。これにより、第１７の発明に係るプログラムを、たとえばコンピュータにインストールすることで、第１６の発明の情報処理方法をコンピュータに実行させることができる。 A program according to the seventeenth invention is a program for realizing the information processing method according to the sixteenth invention. Thus, by installing the program according to the seventeenth invention on, for example, a computer, the computer can execute the information processing method according to the sixteenth invention.

第１８の発明に係る記録媒体は、第１７の発明に係るプログラムを記録したコンピュータ読取可能な記録媒体である。これにより、第１８の発明に係る記録媒体から第１７の発明に係るプログラムを、たとえばコンピュータにインストールすることで、第１６の発明に係る情報処理方法をコンピュータに実行させることができる。 A recording medium according to an eighteenth aspect of the invention is a computer-readable recording medium recording the program according to the seventeenth aspect of the invention. Thus, by installing the program according to the seventeenth invention from the recording medium according to the eighteenth invention, for example, on the computer, the computer can execute the information processing method according to the sixteenth invention.

以下、図面を参照しつつ、本発明の実施の形態について説明する。以下の説明では、同一の部品には同一の符号を付してある。それらの名称および機能も同じである。したがって、それらについての詳細な説明は繰返さない。なお、本実施の形態においては、名刺の画像情報から認識した姓名の読みを生成する情報処理装置について説明するが、本発明に係る情報処理装置はこれに限定されない。たとえば、本発明に係る情報処理装置は、通信先から受信した電子メール情報に含まれる姓名の読みを生成する装置に適用できる。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description, the same parts are denoted by the same reference numerals. Their names and functions are also the same. Therefore, detailed description thereof will not be repeated. In the present embodiment, an information processing apparatus that generates a reading of first and last names recognized from business card image information will be described. However, the information processing apparatus according to the present invention is not limited to this. For example, the information processing apparatus according to the present invention can be applied to an apparatus that generates readings of first and last names included in electronic mail information received from a communication destination.

図１を参照して、本実施の形態に係る情報処理装置１００について説明する。情報処理装置１００は、入力装置２００と出力装置３００とに接続されている。 With reference to FIG. 1, an information processing apparatus 100 according to the present embodiment will be described. The information processing apparatus 100 is connected to the input device 200 and the output device 300.

入力装置２００は、イメージスキャナやＣＣＤ（Charge Coupled Device）などで構成される。入力装置２００は、名刺などの被写体の像を入力して画像情報に変換し、情報処理装置１００に送信する。なお、入力装置２００は、通信先や記録媒体などから被写体の画像情報を入力する装置であってもよい。 The input device 200 includes an image scanner, a CCD (Charge Coupled Device), and the like. The input device 200 inputs an image of a subject such as a business card, converts it into image information, and transmits the image information to the information processing device 100. The input device 200 may be a device that inputs image information of a subject from a communication destination or a recording medium.

出力装置３００は、液晶表示装置などで構成され、情報処理装置１００で生成した読みや他の文字列を出力する。 The output device 300 is configured by a liquid crystal display device or the like, and outputs readings and other character strings generated by the information processing device 100.

情報処理装置１００は、記憶部１２０と、文字認識部１３０と、情報取得部１４０と、生成部１５０と、決定部１６０と、これらの各部に接続された制御部１１０とを含む。 The information processing apparatus 100 includes a storage unit 120, a character recognition unit 130, an information acquisition unit 140, a generation unit 150, a determination unit 160, and a control unit 110 connected to these units.

記憶部１２０は、各部で実行される処理の中間データを記憶する。また、記憶部１２０には、文字認識情報部１２２と、漢字読み情報部１２４とが予め記憶される。 The storage unit 120 stores intermediate data of processing executed by each unit. The storage unit 120 stores a character recognition information unit 122 and a kanji reading information unit 124 in advance.

文字認識部１３０は、制御部１１０からの制御信号を受信すると、入力装置２００から受信した画像情報を文字認識情報部１２２を用いて文字情報として認識する。文字認識技術としては、ＯＣＲを使用する。なお、他の技術を使用してもよい。 When the character recognition unit 130 receives the control signal from the control unit 110, the character recognition unit 130 recognizes the image information received from the input device 200 as character information using the character recognition information unit 122. OCR is used as a character recognition technique. Other techniques may be used.

情報取得部１４０は、制御部１１０からの制御信号を受信すると、文字認識部１３０により認識された文字情報から姓名情報とメールアドレス情報とを取得する。 When the information acquisition unit 140 receives the control signal from the control unit 110, the information acquisition unit 140 acquires first and last name information and mail address information from the character information recognized by the character recognition unit 130.

生成部１５０は、制御部１１０からの制御信号を受信すると、情報取得部１４０で取得された姓名情報の読みの候補を漢字読み情報部１２４を用いて生成する。 When the generation unit 150 receives the control signal from the control unit 110, the generation unit 150 generates a candidate for reading the first and last name information acquired by the information acquisition unit 140 using the kanji reading information unit 124.

決定部１６０は、制御部１１０からの制御信号を受信すると、生成部１５０で生成された読みの候補とメールアドレス情報とを照合し、姓名情報の読みの候補から姓名情報の読みを決定する。 When receiving the control signal from the control unit 110, the determination unit 160 collates the reading candidate generated by the generation unit 150 with the mail address information, and determines the reading of the first and last name information from the reading candidate of the first and last name information.

制御部１１０は、各部に制御信号を送信し各部の処理を制御したり、各部からの情報に基づいて演算したりして、情報処理装置１００全体を制御する。 The control unit 110 controls the entire information processing apparatus 100 by transmitting a control signal to each unit and controlling the processing of each unit, or calculating based on information from each unit.

図２を参照して、漢字読み情報部１２４について説明する。図２に示すように、漢字読み情報部１２４には、漢字とその読みを表わしたローマ字とが対応付けられて記憶される。漢字読み情報部１２４は、記憶容量を低減するために、漢字に対するすべての読みではなく、漢字が姓や名に用いられる場合の読みのみが記憶される。読みが複数ある場合は、姓名に用いられる頻度が高い順に記憶される。これは、たとえば「子」の場合、姓名に用いられる場合の読みの頻度が、「ｋｏ」のほうが「ｓｈｉ」より高いことが理由である。 The kanji reading information unit 124 will be described with reference to FIG. As shown in FIG. 2, the kanji reading information unit 124 stores kanji and roman characters representing the reading in association with each other. In order to reduce the storage capacity, the kanji reading information unit 124 stores not only all readings for kanji, but only readings when kanji are used for surnames and first names. When there are a plurality of readings, they are stored in the descending order of frequency used for first and last names. This is because, for example, in the case of “child”, the frequency of reading when used as first and last names is higher in “ko” than in “shi”.

なお、漢字読み情報部１２４の内容は、漢字とその読みを表わした情報であれば、図２に示したような情報に限定されない。たとえば、漢字読み情報部１２４は、１つの漢字に対して姓用の読みと名用の読みとが分けられて記憶される情報であってもよい。また、漢字読み情報部１２４は、読みが姓や名に用いられる頻度情報が予め記憶された情報であってもよい。漢字読み情報部１２４は、姓名に用いられる位置で読みの頻度が異なる漢字（たとえば、名の１文字目の場合は「よし」と、２文字目以降では「み」と読む頻度が高い「美」）に対して、姓名に用いられる位置に応じた読みの頻度情報をさらに記憶した情報であってもよい。漢字読み情報部１２４は、漢字ごとの情報ではなく、姓および名の単語単位での読みを記憶した情報であってもよい。 The contents of the kanji reading information unit 124 are not limited to the information shown in FIG. 2 as long as the information represents kanji and its reading. For example, the kanji reading information unit 124 may be information that stores a reading for a surname and a reading for a name separately for one kanji. In addition, the kanji reading information unit 124 may be information in which frequency information used for reading a surname or first name is stored in advance. The kanji reading information unit 124 has different reading frequencies at positions used for first and last names (for example, “good” for the first character of the first name and “mi” for the second and subsequent characters) )), Information that further stores reading frequency information according to the position used for the first and last names may be used. The kanji reading information section 124 may be information that stores readings in word units of surnames and first names, instead of information for each kanji.

図３を参照して、本実施の形態に係る情報処理装置１００を構成する制御部１１０が読みの候補を生成する際に実行するプログラムの制御構造について説明する。 With reference to FIG. 3, a control structure of a program executed when control unit 110 constituting information processing apparatus 100 according to the present embodiment generates reading candidates will be described.

ステップ（以下、ステップをＳと略す）１００にて、制御部１１０は、入力装置２００から送信される画像情報を受信したか否かを判断する。受信すると（Ｓ１００にてＹＥＳ）、処理はＳ１０２に移される。そうでないと（Ｓ１００にてＮＯ）、処理はＳ１００に戻される。 In step (hereinafter, step is abbreviated as S) 100, control unit 110 determines whether or not image information transmitted from input device 200 has been received. If received (YES in S100), the process proceeds to S102. Otherwise (NO in S100), the process returns to S100.

Ｓ１０２にて、制御部１１０は、受信した画像情報を文字情報として認識させるように文字認識部１３０に制御信号を送信する。 In S102, control unit 110 transmits a control signal to character recognition unit 130 so that the received image information is recognized as character information.

Ｓ１０４にて、制御部１１０は、認識された文字情報から姓名情報を取得するように情報取得部１４０に制御信号を送信する。制御部１１０は、たとえば、姓名によく用いられる文字が連続している文字列を姓名情報として取得するように制御信号を送信する。なお、姓名情報の取得方法はこれに限定されない。たとえば、名刺の画像情報を受信した場合、制御部１１０は、受信した名刺の画像情報のうち、中央付近に存在する最も大きいサイズの文字列を姓名情報として取得するように制御信号を送信するようにしてもよい。 In S104, control unit 110 transmits a control signal to information acquisition unit 140 so as to acquire first and last name information from the recognized character information. For example, the control unit 110 transmits a control signal so as to acquire a character string in which characters frequently used for first and last names are continuous as first and last name information. In addition, the acquisition method of full name information is not limited to this. For example, when receiving image information of a business card, the control unit 110 transmits a control signal so as to acquire, as first and last name information, a character string of the largest size existing near the center among the received image information of the business card. It may be.

Ｓ１０６にて、制御部１１０は、認識された文字情報からメールアドレス情報を取得するように情報取得部１４０に制御信号を送信する。制御部１１０は、たとえば、受信した画像情報のうち、「＠」を含む一連の文字列から、「E-mail:」などのキーワード除いた文字列をメールアドレス情報として取得するように制御信号を送信する。なお、メールアドレス情報の取得方法はこれに限定されない。たとえば、制御部１１０は、「E-mail:」などのキーワードを含む文字列をメールアドレス情報として取得するように制御信号を送信してもよい。 In S106, control unit 110 transmits a control signal to information acquisition unit 140 so as to acquire mail address information from the recognized character information. For example, the control unit 110 transmits a control signal so as to acquire, as mail address information, a character string obtained by removing a keyword such as “E-mail:” from a series of character strings including “@” in the received image information. Send. In addition, the acquisition method of e-mail address information is not limited to this. For example, the control unit 110 may transmit a control signal so as to acquire a character string including a keyword such as “E-mail:” as mail address information.

Ｓ１０８にて、制御部１１０は、取得された姓名情報を姓情報と名情報とに分割するように情報取得部１４０に制御信号を送信する。制御部１１０は、たとえば、姓名情報の空白文字より前に記載された情報を姓情報、空白文字より後に記載された情報を名情報として分割するように制御信号を送信する。なお、空白文字がない場合は、分割する位置を姓名情報の文字列の中央付近にしたり、姓よりも名で使われる頻度が高い漢字の前にしたりしてもよい。また、分割する位置が複数考えられる場合は、複数の候補について以下のステップの処理を行ない、後述する候補決定処理で最良に一致するものを決定してもよい。 In S108, control unit 110 transmits a control signal to information acquisition unit 140 so as to divide the acquired first and last name information into last name information and first name information. For example, the control unit 110 transmits a control signal so as to divide information described before the blank character of the first name and last name information as last name information and information described after the blank character as first name information. If there is no blank character, the dividing position may be near the center of the character string of the surname information, or before the kanji that is used more frequently than the surname. If a plurality of positions to be divided are considered, the following steps may be performed for a plurality of candidates, and the best matching may be determined by a candidate determination process described later.

Ｓ１１０にて、制御部１１０は、姓情報と名情報とを記憶部１２０に記憶する。Ｓ１１２にて、制御部１１０は、姓情報の１文字を記憶部１２０から読み出す。 In S110, control unit 110 stores last name information and first name information in storage unit 120. In S112, control unit 110 reads one character of the surname information from storage unit 120.

Ｓ１１４にて、制御部１１０は、読み出した姓情報の１文字についての読みを、漢字読み情報部１２４を用いて検索するように生成部１５０に制御信号を送信する。 At S <b> 114, control unit 110 transmits a control signal to generation unit 150 so as to search for reading of one character of the read surname information using kanji reading information unit 124.

Ｓ１１６にて、制御部１１０は、姓情報に含まれるすべての文字の読みを検索したか否かを判断する。すべての文字の読みを検索すると（Ｓ１１６にてＹＥＳ）、処理はＳ１１８に移される。そうでないと（Ｓ１１６にてＮＯ）、処理はＳ１１２に戻される。 In S116, control unit 110 determines whether or not all character readings included in the surname information have been searched. If all character readings are searched (YES in S116), the process proceeds to S118. Otherwise (NO in S116), the process returns to S112.

Ｓ１１８にて、制御部１１０は、検索された読みを組合せて、姓情報の読みの候補（以下、姓候補とも称する）を生成するように生成部１５０に制御信号を送信する。制御部１１０は、検索された読みが複数存在する場合、すべての組合せの姓候補を生成する。 At S118, control unit 110 transmits a control signal to generation unit 150 so as to generate a candidate for reading last name information (hereinafter also referred to as a surname candidate) by combining the retrieved readings. When there are a plurality of retrieved readings, the control unit 110 generates surname candidates for all combinations.

Ｓ１２０にて、制御部１１０は、名情報の１文字を記憶部１２０から読み出す。Ｓ１２２にて、制御部１１０は、読み出した名情報の１文字についての読みを、漢字読み情報部１２４を用いて検索するように生成部１５０に制御信号を送信する。 In S120, control unit 110 reads one character of the name information from storage unit 120. In S122, control unit 110 transmits a control signal to generation unit 150 so as to search for one character of the read name information using kanji reading information unit 124.

Ｓ１２４にて、制御部１１０は、名情報に含まれるすべての文字の読みを検索したか否かを判断する。すべての文字の読みを検索すると（Ｓ１２４にてＹＥＳ）、処理はＳ１２６に移される。そうでないと（Ｓ１２４にてＮＯ）、処理はＳ１２０に戻される。 In S124, control unit 110 determines whether or not all character readings included in the name information have been searched. If all character readings are searched (YES in S124), the process proceeds to S126. Otherwise (NO in S124), the process returns to S120.

Ｓ１２６にて、制御部１１０は、検索された読みを組合せて、名情報の読みの候補（以下、名候補とも称する）を生成するように生成部１５０に制御信号を送信する。制御部１１０は、検索された読みが複数存在する場合、すべての組合せの名候補を生成する。 In S126, control unit 110 transmits a control signal to generation unit 150 so as to generate a candidate for reading name information (hereinafter also referred to as a name candidate) by combining the retrieved readings. When there are a plurality of retrieved readings, the control unit 110 generates name candidates for all combinations.

図４を参照して、本実施の形態に係る情報処理装置１００を構成する制御部１１０が姓名情報の読みを決定する際に実行するプログラムの制御構造について説明する。 With reference to FIG. 4, a control structure of a program executed when control unit 110 configuring information processing apparatus 100 according to the present embodiment determines reading of first name and last name information will be described.

Ｓ２００にて、制御部１１０は、メールアドレス情報から個人識別情報を取得するように決定部１６０に制御信号を送信する。個人識別情報とは、メールアドレス情報に含まれる、個人を識別するための文字列である。制御部１１０は、たとえば、メールアドレス情報に含まれる「＠」より前の文字列を個人識別情報として取得する。 In S200, control unit 110 transmits a control signal to determination unit 160 so as to acquire personal identification information from the mail address information. The personal identification information is a character string for identifying an individual included in the mail address information. For example, the control unit 110 acquires a character string before “@” included in the mail address information as personal identification information.

Ｓ２０２にて、制御部１１０は、取得された個人識別情報を分割するように決定部１６０に制御信号を送信する。制御部１１０は、たとえば、分割区切り文字の前後で個人識別情報を分割するように制御信号を送信する。分割区切り文字とは、メールアドレスに含まれる文字を区切るために使用される文字であり、たとえば、「_」、「.」、「-」、および数字などである。なお、Ｓ２００およびＳ２０２において、メールアドレス情報から個人識別情報を取得した後に個人識別情報を分割したが、メールアドレス情報を分割した後に個人識別情報を取得してもよい。 In S202, control unit 110 transmits a control signal to determination unit 160 so as to divide the acquired personal identification information. For example, the control unit 110 transmits a control signal so as to divide the personal identification information before and after the division delimiter. The division delimiter is a character used to delimit characters included in the mail address, and examples thereof include “_”, “.”, “-”, And numbers. In S200 and S202, the personal identification information is divided after obtaining the personal identification information from the mail address information. However, the personal identification information may be obtained after the mail address information is divided.

Ｓ２０４にて、制御部１１０は、分割された個人識別情報を記憶部１２０に記憶する。Ｓ２０６にて、制御部１１０は、分割された個人識別情報の１つを記憶部１２０から読み出す。Ｓ２０８にて、制御部１１０は、姓候補を記憶部１２０から読み出す。 In S204, control unit 110 stores the divided personal identification information in storage unit 120. In S206, control unit 110 reads one of the divided pieces of personal identification information from storage unit 120. In S208, control unit 110 reads a surname candidate from storage unit 120.

Ｓ３００にて、制御部１１０は、読み出された個人識別情報と姓候補との一致度合を算出する。なお、本処理の詳細は後述する。 In S300, control unit 110 calculates the degree of coincidence between the read personal identification information and the surname candidate. Details of this process will be described later.

Ｓ２１０にて、制御部１１０は、すべての名候補の一致度合を算出したか否かを判断する。すべての名候補の一致度合を算出すると（Ｓ２１０にてＹＥＳ）、処理はＳ２１２に移される。そうでないと（Ｓ２１０にてＮＯ）、処理はＳ２０８に戻される。 In S210, control unit 110 determines whether or not the matching degrees of all name candidates have been calculated. When the matching degrees of all name candidates are calculated (YES in S210), the process proceeds to S212. Otherwise (NO in S210), the process returns to S208.

Ｓ２１２にて、制御部１１０は、名候補を記憶部１２０から読み出す。Ｓ３００にて、制御部１１０は、読み出された個人識別情報と名候補との一致度合を算出する。Ｓ２１４にて、制御部１１０は、すべての名候補の一致度合を算出したか否かを判断する。すべての名候補の一致度合を算出すると（Ｓ２１４にてＹＥＳ）、処理はＳ２１６に移される。そうでないと（Ｓ２１４にてＮＯ）、処理はＳ２１２に戻される。 In S212, control unit 110 reads name candidates from storage unit 120. In S300, control unit 110 calculates the degree of coincidence between the read personal identification information and the name candidate. In S214, control unit 110 determines whether or not the matching degrees of all name candidates have been calculated. When the matching degrees of all name candidates are calculated (YES in S214), the process proceeds to S216. Otherwise (NO in S214), the process returns to S212.

Ｓ２１６にて、制御部１１０は、分割された個人識別情報のすべての一致度合を算出したか否かを判断する。すべての一致度合を算出すると（Ｓ２１６にてＹＥＳ）、処理はＳ４００に移される。そうでないと（Ｓ２１６にてＮＯ）、処理はＳ２０６に戻される。 In S216, control unit 110 determines whether or not all the matching degrees of the divided personal identification information have been calculated. When all the matching degrees are calculated (YES in S216), the process proceeds to S400. Otherwise (NO in S216), the process returns to S206.

Ｓ４００にて、制御部１１０は、姓名の読みを決定するように決定部１６０に制御信号を送信する。なお、本処理の詳細は後述する。 At S400, control unit 110 transmits a control signal to determination unit 160 so as to determine reading of the first and last names. Details of this process will be described later.

Ｓ２１８にて、制御部１１０は、図５に示すような記憶部１２０に記憶されるローマ字と平仮名との対応情報を用いて、決定された読みの候補をローマ字から平仮名に変換する。なお、図５におけるローマ字は、ヘボン式で記載されているが、訓令式であってもよい。Ｓ２２０にて、制御部１１０は、変換した読みを出力装置３００に出力する。 At S218, control unit 110 converts the determined reading candidate from Roman characters to Hiragana using the correspondence information between Roman characters and Hiragana stored in storage unit 120 as shown in FIG. In addition, although the Roman character in FIG. 5 is described by the Hebon type | formula, a ceremonial type may be sufficient. In S220, control unit 110 outputs the converted reading to output device 300.

図６を参照して、本実施の形態に係る情報処理装置１００を構成する制御部１１０が、個人識別情報と姓候補あるいは名候補との一致度合を算出する際に実行するプログラムの制御構造について説明する。なお、本構造についての説明においては、便宜上、姓候補と名候補とを区別することなく単に候補と記載する。 Referring to FIG. 6, a control structure of a program executed when control unit 110 constituting information processing apparatus 100 according to the present embodiment calculates the degree of coincidence between personal identification information and a surname candidate or first name candidate. explain. In the description of this structure, for the sake of convenience, the surname candidate and the first name candidate are simply described as candidates without distinction.

Ｓ３０２にて、制御部１１０は、個人識別情報と候補とで一致する文字数を一致数としてカウントする。なお、制御部１１０は、先頭および末尾を除き、連続して一致する文字数が１以下である場合には、一致数にカウントしない。 In S302, control unit 110 counts the number of characters that match the personal identification information and the candidate as the number of matches. Note that the controller 110 does not count the number of matches when the number of consecutively matched characters is 1 or less except for the beginning and the end.

Ｓ３０４にて、制御部１１０は、個人識別情報または候補の先頭または末尾の文字が一致するか否かを判断する。一致すると（Ｓ３０４にてＹＥＳ）、処理はＳ３０６に移される。そうでないと（Ｓ３０４にてＮＯ）、処理はＳ３０８に移される。 In S304, control unit 110 determines whether the personal identification information or the first or last character of the candidate matches. If they match (YES in S304), the process proceeds to S306. Otherwise (NO in S304), the process proceeds to S308.

Ｓ３０６にて、制御部１１０は、一致数にプラス１カウントする。Ｓ３０８にて、制御部１１０は、個人識別情報と候補とで一致する文字の個人識別情報における位置（以下、一致位置と称する）を記憶する。 In S306, control unit 110 adds 1 to the number of matches. In S308, control unit 110 stores a position (hereinafter referred to as a matching position) in the personal identification information of a character that matches the personal identification information and the candidate.

図７を参照して、本実施の形態に係る情報処理装置１００を構成する制御部１１０が、姓名の読みを決定する際に実行するプログラムの制御構造について説明する。 With reference to FIG. 7, a control structure of a program executed when control unit 110 configuring information processing apparatus 100 according to the present embodiment determines reading of first and last names will be described.

Ｓ４０２にて、制御部１１０は、個人識別情報の文字数に対する一致数の割合が予め定められた割合より大きい姓候補が存在するか否かを判断する。制御部１１０は、たとえば、一致数の割合が０パーセントより大きい姓候補が存在するか否か（すなわち、一致する文字を含む姓候補が存在するか否か）を判断する。なお、一致数の割合は０パーセントに限定されない。また、制御部１１０は、姓候補の文字数に対する一致数の割合が予め定められた割合より大きい候補が存在するか否かを判断してもよい。予め定められた割合よりも大きい姓候補が存在すると（Ｓ４０２にてＹＥＳ）、処理はＳ４０４に移される。そうでないと（Ｓ４０２にてＮＯ）、処理はＳ４０６に移される。 In S402, control unit 110 determines whether there is a surname candidate whose ratio of the number of matches to the number of characters of the personal identification information is greater than a predetermined ratio. For example, the control unit 110 determines whether there is a surname candidate whose rate of matching is greater than 0 percent (that is, whether there is a surname candidate including a matching character). Note that the ratio of the number of matches is not limited to 0 percent. In addition, the control unit 110 may determine whether there is a candidate whose ratio of the number of matches to the number of characters of the surname candidate is greater than a predetermined ratio. If there is a surname candidate larger than the predetermined ratio (YES in S402), the process proceeds to S404. Otherwise (NO in S402), the process proceeds to S406.

Ｓ４０４にて、制御部１１０は、一致数が最も多い姓候補を姓情報の読みとして決定するように決定部１６０に制御信号を送信する。 In S404, control unit 110 transmits a control signal to determination unit 160 so as to determine the surname candidate having the largest number of matches as the reading of the last name information.

Ｓ４０６にて、制御部１１０は、姓に用いられる頻度が高い姓候補を姓情報の読みとして決定するように決定部１６０に制御信号を送信する。すなわち、漢字読み情報部１２４は、姓名に用いられる頻度の高い順に読みを記憶しているため、制御部１１０は、漢字読み情報部１２４に最初に記憶されている読みを姓情報の読みに決定するように決定部１６０に制御信号を送信する。 In S406, control unit 110 transmits a control signal to determining unit 160 so as to determine a surname candidate that is frequently used as a surname as a reading of the last name information. That is, since the kanji reading information unit 124 stores readings in the order of frequency used for the first and last names, the control unit 110 determines the first reading stored in the kanji reading information unit 124 as the reading of the last name information. Then, a control signal is transmitted to the determination unit 160.

Ｓ４０８にて、制御部１１０は、一致位置が決定された姓候補と重複しない名候補を抽出する。Ｓ４１０にて、制御部１１０は、抽出された名候補のうち、個人識別情報の文字数に対する一致数の割合が予め定められた割合より大きい名候補が存在するか否かを判断する。制御部１１０は、たとえば、一致数の割合が０パーセントより大きい名候補が存在するか否か（すなわち、一致する文字を含む名候補が存在するか否か）を判断する。なお、一致数の割合は０パーセントに限定されない。また、制御部１１０は、名候補の文字数に対する一致数の割合が予め定められた割合より大きい候補が存在するか否かを判断してもよい。予め定められた割合よりも大きい名候補が存在すると（Ｓ４１０にてＹＥＳ）、処理はＳ４１２に移される。そうでないと（Ｓ４１０にてＮＯ）、処理はＳ４１４に移される。 In S408, control unit 110 extracts name candidates that do not overlap with the surname candidates whose matching positions are determined. In S410, control unit 110 determines whether or not there are name candidates in which the ratio of the number of matches to the number of characters in the personal identification information is greater than a predetermined ratio among the extracted name candidates. For example, the control unit 110 determines whether or not there is a name candidate whose rate of matching is greater than 0 percent (that is, whether or not there is a name candidate including a matching character). Note that the ratio of the number of matches is not limited to 0 percent. In addition, the control unit 110 may determine whether or not there is a candidate whose ratio of the number of matches to the number of characters of the name candidate is greater than a predetermined ratio. If there are name candidates greater than a predetermined ratio (YES in S410), the process proceeds to S412. Otherwise (NO in S410), the process proceeds to S414.

Ｓ４１２にて、制御部１１０は、一致数が最も多い名候補に名情報の読みを決定するように決定部１６０に制御信号を送信する。 In S412, control unit 110 transmits a control signal to determination unit 160 so as to determine reading of name information for the name candidate having the largest number of matches.

Ｓ４１４にて、制御部１１０は、名に用いられる頻度が高い名候補に名情報の読みを決定するように決定部１６０に制御信号を送信する。すなわち、漢字読み情報部１２４は、読みを姓名に用いられる頻度の高い順に記憶しているため、制御部１１０は、漢字読み情報部１２４に最初に記憶されている読みを名情報の読みに決定するように決定部１６０に制御信号を送信する。 In S414, control unit 110 transmits a control signal to determination unit 160 so as to determine reading of name information for name candidates that are frequently used for names. That is, since the kanji reading information unit 124 stores the readings in the order in which the readings are used in the first and last names, the control unit 110 determines the reading stored first in the kanji reading information unit 124 as the reading of the name information. Then, a control signal is transmitted to the determination unit 160.

Ｓ４１６にて、制御部１１０は、決定した姓情報の読みと名情報の読みとを組合せて氏名情報の読みを決定するように決定部１６０に制御信号を送信する。 At S416, control unit 110 transmits a control signal to determination unit 160 so as to determine the reading of the name information by combining the reading of the last name information and the reading of the name information.

以上のような構造およびフローチャートに基づく、本実施の形態に係る情報処理装置１００の動作について説明する。 An operation of information processing apparatus 100 according to the present embodiment based on the above-described structure and flowchart will be described.

第１に、図８に示す名刺を文字認識した場合の動作を説明する。この名刺には、姓名を表わす漢字である「角田美子」、メールアドレス「ykakuta@xyz.com」が表面に記載されている。そのため、名刺の表面の像の画像情報を入力装置２００に入力するだけで、「角田美子」と「ykakuta@xyz.com」とが情報処理装置１００に入力される。 First, the operation when the business card shown in FIG. 8 is recognized will be described. This business card has “Kumida Miko”, which is a kanji for first and last names, and an email address “ykakuta@xyz.com” on the front. Therefore, just by inputting image information of the image on the surface of the business card to the input device 200, “Miko Tsunoda” and “ykakuta@xyz.com” are input to the information processing device 100.

入力装置２００から画像情報を受信すると（Ｓ１００にてＹＥＳ）、「ＸＹＺ」、「総務部」、「角田美子」、「ＸＹＺ株式会社」、「大阪市阿倍野区○○町△△番□□号」、「電話（06）1234-5678」、「FAX（06）1234-9999」、「E-mail: ykakuta@xyz.com」の文字が認識される（Ｓ１０２）。これらの文字情報のうち、姓名でよく使われる文字が連続している「角田美子」が姓名情報として取得され（Ｓ１０４）、「E-mail」や「＠」を含む「E-mail: ykakuta@xyz.com」のうち、「E-mail:」を除いた「ykakuta@xyz.com」がメールアドレス情報として取得される（Ｓ１０６）。 When image information is received from the input device 200 (YES in S100), “XYZ”, “General Affairs Department”, “Miko Tsunoda”, “XYZ Co., Ltd.”, “Abano-ku, Osaka City "," Telephone (06) 1234-5678 "," FAX (06) 1234-9999 "," E-mail: ykakuta@xyz.com "are recognized (S102). Among these character information, “Miko Tsunoda” in which characters frequently used in first and last names are consecutive is acquired as first name and last name information (S104), and “E-mail: ykakuta @ including“ E-mail ”and“ @ ”is acquired. Of “xyz.com”, “ykakuta@xyz.com” excluding “E-mail:” is acquired as mail address information (S106).

姓名情報「角田美子」は、空白文字より前に記載された姓情報「角田」と、空白文字より後に記載された名情報「美子」とに分割され（Ｓ１０８）、記憶部１２０に記憶される（Ｓ１１０）。 The first and last name information “Miko Kakuda” is divided into last name information “Kakuda” written before the blank character and first name information “Miko” written after the blank character (S108) and stored in the storage unit 120. (S110).

「角田」のうち「角」と「田」の読み仮名が順次検索され（Ｓ１１２〜Ｓ１１６）、図９（Ａ）に示すように、「tsunota」、「tsunoda」などの複数の姓候補が生成される（Ｓ１１８）。「美子」についても同様にして、図９（Ｂ）に示すように、「miko」、「yoshiko」の名候補が生成される（Ｓ１２０〜Ｓ１２６）。 Among the “Kakuda”, “Kaku” and “Tana” readings are sequentially searched (S112 to S116), and as shown in FIG. 9A, a plurality of surname candidates such as “tsunota” and “tsunoda” are generated. (S118). Similarly for “Meiko”, name candidates “miko” and “yoshiko” are generated as shown in FIG. 9B (S120 to S126).

メールアドレス情報「ykakuta@xyz.com」のうち、「＠」より前の文字列「ykakuta」が個人識別情報として取得される（Ｓ２００）。なお、この「ykakuta」は、「角田美子」の読みである「かくたよしこ」の「よしこ」をローマ字で表わした「yoshiko」の最初の文字である「y」と、「かくた」をローマ字で表わした「kakuta」とを組合せたものである。 Of the mail address information “ykakuta@xyz.com”, the character string “ykakuta” before “@” is acquired as personal identification information (S200). This “ykakuta” is the first letter of “yoshiko” in “Roman” for “Yoshiko” in “Kutata Yoshiko”, which is the reading of “Miko Tsunoda”, and “Kuta” in Roman It is a combination of “kakuta” expressed in.

「ykakuta」と姓候補および名候補の一致度合が算出される（Ｓ３００）。図９（Ａ）に示すように、姓候補「tsunota」の場合、「ykakuta」と一致する「ta」の２文字分が一致数としてカウントされ（Ｓ３０２）、末尾の文字である「ａ」が一致しているため一致数にプラス１カウントされ（Ｓ３０６）、一致数の合計は「３」となる。さらに、一致位置が「ykakuta」における６文字目から７文字目であることが記憶される（Ｓ３０８）。このような処理がすべての姓候補について行なわれると（Ｓ２１０にてＹＥＳ）、名候補の一致度合も同様に算出される（Ｓ２１２、Ｓ３００）。 The degree of coincidence between “ykakuta”, the surname candidate and the first name candidate is calculated (S300). As shown in FIG. 9A, in the case of the surname candidate “tsunota”, two characters of “ta” that match “ykakuta” are counted as the number of matches (S302), and the last character “a” is Since there is a match, the number of matches is incremented by one (S306), and the total number of matches is “3”. Further, it is stored that the matching position is from the sixth character to the seventh character in “ykakuta” (S308). When such processing is performed for all surname candidates (YES in S210), the matching degree of first name candidates is calculated in the same manner (S212, S300).

このようにして、図９（Ａ）および（Ｂ）のような算出結果を得ると、姓名の読みを決定する処理が行なわれる（Ｓ４００）。 When the calculation results as shown in FIGS. 9A and 9B are obtained in this way, a process for determining reading of the first and last names is performed (S400).

一致数の合計が「７」で最も大きい姓候補「kakuta」に姓の読みが決定される（Ｓ４０２にてＹＥＳ、Ｓ４０４）。姓候補「kakuta」の一致位置は「２−７（「ykakuta」における２文字目から７文字目）」であるため、一致位置が重複しない「１−１（「ykakuta」における１文字目）」である「yoshiko」が抽出される（Ｓ４０８）。抽出された名候補は「yoshiko」だけであり一致数が最も多い候補であるため、名の読みが「yoshiko」に決定される（Ｓ４１０にてＹＥＳ、Ｓ４１２）。氏名情報の読みが「kakuta yoshiko」に決定される（Ｓ４１６）。「kakuta yoshiko」が「かくたよしこ」に変換され（Ｓ２１８）、出力装置３００に出力される（Ｓ２２０）。 The reading of the surname is determined for the largest surname candidate “kakuta” whose total number of matches is “7” (YES in S402, S404). Since the matching position of the surname candidate “kakuta” is “2-7 (second to seventh characters in“ ykakuta ”)”, the matching position does not overlap “1-1 (first character in“ ykakuta ”)” “Yoshiko” is extracted (S408). Since the extracted name candidate is only “yoshiko” and the candidate having the largest number of matches, the reading of the name is determined as “yoshiko” (YES in S410, S412). The reading of the name information is determined as “kakuta yoshiko” (S416). “Kakuta yoshiko” is converted into “Kakuta Yoshiko” (S218) and output to the output device 300 (S220).

このように、姓候補と名候補とが別々に生成され、姓候補および名候補と個人識別情報「ykakuta」との一致位置が重複しないように姓と名との読みが決定される。そのため、個人識別情報「ykakuta」における姓候補および名候補と一致する文字が重複することが抑制される。これにより、姓および名の正確な読みを決定することができる。 In this way, the surname candidate and the surname candidate are generated separately, and the reading of the surname and the surname is determined so that the match positions of the surname candidate and the surname candidate and the personal identification information “ykakuta” do not overlap. Therefore, it is possible to suppress duplication of the surname candidates and the characters matching the first name candidates in the personal identification information “ykakuta”. This allows the correct reading of the surname and first name to be determined.

第２に、図１０に示す名刺を文字認識した場合の動作を説明する。図８に示す名刺を文字認識した場合と同様の処理が行なわれ（Ｓ１００〜Ｓ１２４）、氏名情報「角田美子」の姓候補と名候補が生成され（Ｓ１１８、Ｓ１２６）、個人識別情報「yo-kakuta」が取得される（Ｓ２００）。なお、この「yo-kakuta」は、「角田美子」の読みである「かくたよしこ」の「よしこ」をローマ字で表わした「yoshiko」の「yo」と、「かくた」をローマ字で表わした「kakuta」とを、「-」で組合せたものである。 Second, the operation when the business card shown in FIG. 10 is recognized will be described. The same processing as when the business card shown in FIG. 8 is recognized (S100 to S124), surname candidates and name candidates for the name information “Miko Tsunoda” are generated (S118, S126), and the personal identification information “yo-” kakuta "is acquired (S200). In addition, this "yo-kakuta" expresses "yo" of "yoshiko" which expressed "Yoshiko" of "Kutata Yoshiko" which is the reading of "Miko Tsunoda" in Roman letters and "Kuta" in Roman letters. "Kakuta" is combined with "-".

「yo-kakuta」は、「-」の前後で「yo」と「kakuta」とに分割される（Ｓ２０２）。図１１に示すように、「yo」と各姓候補および各名候補との一致度合が算出される（Ｓ３００）。図１２に示すように、「kakuta」と各姓候補および各名候補との一致度合が算出される（Ｓ２１６、Ｓ３００）。 “Yo-kakuta” is divided into “yo” and “kakuta” before and after “-” (S202). As shown in FIG. 11, the degree of coincidence between “yo” and each surname candidate and each first name candidate is calculated (S300). As shown in FIG. 12, the degree of coincidence between “kakuta” and each surname candidate and each first name candidate is calculated (S216, S300).

図１１（Ａ）に示すように、１つ目の個人識別情報「yo」と各姓候補とは一致する文字がなく、すべての候補の一致数の合計がゼロとして算出される。 As shown in FIG. 11A, the first personal identification information “yo” and each surname candidate have no matching character, and the total number of matches of all candidates is calculated as zero.

図１１（Ｂ）に示すように、１つ目の個人識別情報「yo」と名候補「miko」とは、末尾の「o」が一致し一致数の合計が「２」と算出される（Ｓ３０２、Ｓ３０６）。１つ目の個人識別情報「yo」と名候補「yoshiko」とは、先頭の「yo」が一致し一致数合計が「３」と算出される（Ｓ３０２、Ｓ３０６）。 As shown in FIG. 11B, the first personal identification information “yo” and the name candidate “miko” are calculated such that “o” at the end matches and the total number of matches is “2” ( S302, S306). The first personal identification information “yo” and the name candidate “yoshiko” are calculated such that the leading “yo” matches and the total number of matches is “3” (S302, S306).

２つ目の個人識別情報「kakuta」についても同様に一致度合が算出され（Ｓ４００）、図１２に示すような算出結果となる。 The degree of coincidence is similarly calculated for the second personal identification information “kakuta” (S400), resulting in a calculation result as shown in FIG.

２つ目の個人識別情報「kakuta」と完全に一致し、一致数の合計が最も多い「８」である姓候補「kakuta」が姓の読みとして決定される（Ｓ４０２にてＹＥＳ、Ｓ４０４）。２つ目の個人識別情報「kakuta」の１文字目から６文字目までと重複しない（Ｓ４０８）、１つ目の個人識別情報「yo」との一致数の合計が最も多い「yoshiko」が名の読みとして決定される（Ｓ４１０にてＹＥＳ、Ｓ４１２）。 The surname candidate “kakuta” that is “8”, which completely matches the second personal identification information “kakuta” and has the largest total number of matches, is determined as the last name reading (YES in S402, S404). The first personal identification information “kakuta” does not overlap with the first to sixth characters (S408). The name “yoshiko” with the largest total number of matches with the first personal identification information “yo” is the name. (YES in S410, S412).

第３に、図１３に示す名刺を文字認識した場合の動作を説明する。図８に示す名刺を文字認識した場合と同様の処理が行なわれ（Ｓ１００〜Ｓ１２４）、氏名情報「田中健」の姓候補と名候補が生成され（Ｓ１１８、Ｓ１２６）、個人識別情報「ktanaka」が取得される（Ｓ２００）。なお、この「ktanaka」は、「田中健」の読みである「たなかけん」の「けん」をローマ字で表わした「ken」の最初の文字である「k」と、「たなか」をローマ字で表わした「tanaka」とを組合せたものである。 Third, the operation when the business card shown in FIG. 13 is recognized will be described. The same processing as when the business card shown in FIG. 8 is recognized (S100 to S124), surname candidates and first name candidates for the name information “Ken Tanaka” are generated (S118, S126), and the personal identification information “ktanaka”. Is acquired (S200). This “ktanaka” is the first letter of “ken”, which is “Ken” in “Tanaka Ken”, which is the reading of “Tanaka Ken”, and “Tanaka” in Roman letters. It is a combination of "tanaka".

一致度合の算出結果を図１４に示す。この場合、最も一致数の合計が多い姓候補「tanaka」が姓の読みに決定される（Ｓ４０２にてＹＥＳ、Ｓ４０４）。名候補「takeshi」は、他の名候補「ken」と一致数の合計が同じ「２」であるが、名候補「takeshi」の一致位置「２−３（「ktanaka」の２文字目から３文字目）」は、決定された姓候補「tanaka」の一致位置「２−７（「ktanaka」の２文字目から７文字目）」と重複しているため、「takeshi」は名候補として取得されずに「ken」が取得される（Ｓ４０８）。このように、個人識別情報における一致位置が重複しないように姓候補と名候補との読みが決定されるため、姓および名の正確な読みを決定することができる。 FIG. 14 shows the calculation result of the degree of coincidence. In this case, the surname candidate “tanaka” having the largest total number of matches is determined to read the surname (YES in S402, S404). The name candidate “takeshi” is “2” having the same total number of matches as the other name candidates “ken”, but the match position “2-3” of the name candidate “takeshi” (from the second character of “ktanaka” is 3 "Character") is duplicated with the matching position "2-7 (2nd to 7th characters of" ktanaka ")" of the determined surname candidate "tanaka", so "takeshi" is acquired as a name candidate Instead, “ken” is acquired (S408). Thus, since the reading of the surname candidate and the surname candidate is determined so that the matching positions in the personal identification information do not overlap, it is possible to determine the correct reading of the surname and the surname.

第４に、図１５に示す名刺を文字認識した場合の動作を説明する。図８に示す名刺を文字認識した場合と同様の処理が行なわれて（Ｓ１００〜Ｓ１２４）、氏名情報「田中健」の姓候補と名候補が生成され（Ｓ１１８、Ｓ１２６）、個人識別情報「ke-tanaka」とが取得される（Ｓ２００）。なお、この「ke-tanaka」は、「たなかけん」の「けん」をローマ字で表わした「ken」の「ke」と、「たなか」をローマ字で表わした「tanaka」とを、「-」で組合せたものである。 Fourth, the operation when the business card shown in FIG. 15 is recognized will be described. The same processing as when the business card shown in FIG. 8 is recognized (S100 to S124), surname candidates and name candidates for the name information “Ken Tanaka” are generated (S118, S126), and the personal identification information “ke -tanaka "is acquired (S200). This “ke-tanaka” is a “ken” that represents “Ken” of “Tanaka Ken” in Roman letters, “tanaka” that represents “Tanaka” in Roman letters, and “-”. It is a combination.

「ke-tanaka」は、「-」の前後で「ke」と「tanaka」とに分割される（Ｓ２０２）。一致度合の算出結果を図１６および図１７に示す。 “Ke-tanaka” is divided into “ke” and “tanaka” before and after “-” (S202). The calculation results of the degree of coincidence are shown in FIGS.

図１６（Ａ）に示すように、１つ目の個人識別情報「ke」と各姓候補とは一致する文字がなく、すべての候補の一致数の合計がゼロとして算出される。 As shown in FIG. 16A, the first personal identification information “ke” and each surname candidate do not have a matching character, and the total number of matches of all candidates is calculated as zero.

図１６（Ｂ）に示すように、１つ目の個人識別情報「ke」と名候補「ken」とは、「ke」の２文字が一致し、先頭の文字「k」が一致するため、一致数の合計が「３」と算出される。１つ目の個人識別情報「ke」と名候補「takeshi」とは、「takeshi」の中間の「ke」の２文字が一致するため一致数の合計が「２」と算出される。このように、一致する文字数が同じであっても、先頭の文字が一致する「ken」が、中間の文字が一致する「takeshi」より一致数の合計が多く算出される。これにより、正確な読みである「ken」を優先させて、姓名の読みを精度よく決定することができる。 As shown in FIG. 16B, the first personal identification information “ke” and the name candidate “ken” match the two characters “ke” and the first character “k”. The total number of matches is calculated as “3”. The first personal identification information “ke” and the name candidate “takeshi” are calculated as “2” because the two characters “ke” in the middle of “takeshi” match. Thus, even if the number of matching characters is the same, “ken” that matches the first character is calculated to have a larger total number of matches than “takeshi” that matches the middle character. As a result, it is possible to prioritize “ken”, which is an accurate reading, and accurately determine the reading of the first and last names.

２つ目の個人識別情報「tanaka」についても同様に一致度合が算出され、図１７に示すような算出結果となる。 The degree of coincidence is similarly calculated for the second personal identification information “tanaka”, and the calculation result as shown in FIG. 17 is obtained.

２つ目の個人識別情報「tanaka」と完全に一致し、一致数の合計が最も多い「８」である姓候補「tanaka」が姓の読みとして決定される（Ｓ４０２にてＹＥＳ、Ｓ４０４）。決定された姓候補「tanaka」と一致位置が重複しない（Ｓ４０８）、１つ目の個人識別情報「ke」との一致数が最も多い「３」である「ken」が名の読みとして決定される（Ｓ４１０にてＹＥＳ、Ｓ４１２）。 The surname candidate “tanaka”, which is “8”, which completely matches the second personal identification information “tanaka” and has the largest total number of matches, is determined as the last name reading (YES in S402, S404). The matching position does not overlap with the determined surname candidate “tanaka” (S 408), “ken”, which is “3” having the largest number of matches with the first personal identification information “ke”, is determined as the reading of the first name. (YES in S410, S412).

なお、第１〜第４の動作の説明においては、一致数の割合が予め定められた割合より大きい候補が存在する場合（Ｓ４０２にてＹＥＳ、Ｓ４１０にてＹＥＳ）について説明した。これに対し、一致数の割合が予め定められた割合より大きい候補が存在しない場合（Ｓ４０２にてＮＯ、Ｓ４１０にてＮＯ）には、各候補のうち、姓名に用いられる頻度が高い候補が読みとして決定される（Ｓ４０６、Ｓ４１４）。すなわち、漢字読み情報部１２４は、姓名に用いられる頻度の高い順に読みを記憶しているため、漢字読み情報部１２４に最初に記憶されている読みを組合せた候補が読みに決定される。そのため、一致数の割合が予め定められた割合より大きい候補が存在しない場合であっても、正確な読みである可能性が高い候補を読みに決定することができる。 In the description of the first to fourth operations, the case has been described where there are candidates for which the ratio of the number of matches is greater than a predetermined ratio (YES in S402, YES in S410). On the other hand, if there is no candidate whose ratio of the number of matches is larger than a predetermined ratio (NO in S402, NO in S410), a candidate that is frequently used for the first and last names is read out of each candidate. (S406, S414). That is, since the kanji reading information unit 124 stores readings in the descending order of frequency used for first and last names, the candidate that combines the readings stored first in the kanji reading information unit 124 is determined to be reading. For this reason, even if there is no candidate whose rate of coincidence is greater than a predetermined rate, it is possible to determine a candidate that is highly likely to be accurate reading.

以上のように、本実施の形態に係る情報処理装置によれば、名刺の画像情報から氏名情報とメールアドレスの個人識別情報とが取得される。氏名を表わす漢字の読みの候補が個人識別情報と照合された結果で、正確な氏名の読みが決定される。そのため、姓名の読みを効率よく決定することができる。 As described above, according to the information processing apparatus according to the present embodiment, the name information and the personal identification information of the mail address are acquired from the image information of the business card. The correct reading of the name is determined as a result of collating the kanji reading candidates representing the name with the personal identification information. Therefore, it is possible to efficiently determine the reading of first and last names.

なお、上述した実施の形態において、漢字読み情報部１２４は、図２に示すように、漢字とその読みを表わしたローマ字とが対応付けられて記憶されていたが、図１８に示すように、漢字とその読みを平仮名で表わした情報であってもよい。この場合、姓候補と名候補とを平仮名で生成し、個人識別情報に含まれるローマ字を平仮名に変換して、各候補と個人識別情報との平仮名の一致度合を算出することで、姓名の読みを決定することができる。 In the above-described embodiment, the kanji reading information unit 124 stores kanji and a roman character representing the reading in association with each other as shown in FIG. 2, but as shown in FIG. It may be information that represents kanji and their readings in hiragana. In this case, first name surname and first name candidate are generated as hiragana, and the Roman characters contained in the personal identification information are converted into hiragana, and the degree of matching of the hiragana between each candidate and the personal identification information is calculated. Can be determined.

また、上述した実施の形態において、一致数の割合が予め定められた割合より大きい候補が存在しない場合（Ｓ４０２にてＮＯ、Ｓ４１０にてＮＯ）、姓や名に用いられる頻度が高い候補を読みとして決定した（Ｓ４０６、Ｓ４１４）。これに対し、たとえば、個人識別情報に含まれるローマ字を仮名に変換したものを、姓名の読みとして決定してもよい。これにより、姓名の読みを決定することができ、かつ、正確な読みの候補が生成されない場合であっても正確な読みである可能性が高い読みを決定することができる。 Further, in the above-described embodiment, if there is no candidate whose ratio of the number of matches is greater than a predetermined ratio (NO in S402, NO in S410), a candidate that is frequently used for a surname or first name is read. (S406, S414). On the other hand, for example, a character obtained by converting a Roman character included in the personal identification information into a kana may be determined as a reading of the surname. This makes it possible to determine the reading of first and last names, and to determine a reading that is highly likely to be an accurate reading even when an accurate reading candidate is not generated.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

本発明の実施の形態に係る情報処理装置１００の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing apparatus 100 which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理装置１００に記憶される情報を示す図（その１）である。It is a figure (the 1) which shows the information memorize | stored in the information processing apparatus 100 which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理装置１００を構成する制御部の制御構造を示すフローチャート（その１）である。It is a flowchart (the 1) which shows the control structure of the control part which comprises the information processing apparatus 100 which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理装置１００を構成する制御部の制御構造を示すフローチャート（その２）である。It is a flowchart (the 2) which shows the control structure of the control part which comprises the information processing apparatus 100 which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理装置１００に記憶される情報を示す図（その２）である。It is FIG. (2) which shows the information memorize | stored in the information processing apparatus 100 which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理装置１００を構成する制御部の制御構造を示すフローチャート（その３）である。It is a flowchart (the 3) which shows the control structure of the control part which comprises the information processing apparatus 100 which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理装置１００を構成する制御部の制御構造を示すフローチャート（その４）である。It is a flowchart (the 4) which shows the control structure of the control part which comprises the information processing apparatus 100 which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理装置１００が画像情報として取得する名刺を表わす図（その１）である。It is a figure (the 1) showing the business card which the information processing apparatus 100 which concerns on embodiment of this invention acquires as image information. 本発明の実施の形態に係る情報処理装置１００が算出した一致度合の結果を示す図（その１）である。It is a figure (the 1) which shows the result of the coincidence degree which the information processing apparatus 100 which concerns on embodiment of this invention calculated. 本発明の実施の形態に係る情報処理装置１００が画像情報として取得する名刺を表わす図（その２）である。It is FIG. (The 2) showing the business card which the information processing apparatus 100 which concerns on embodiment of this invention acquires as image information. 本発明の実施の形態に係る情報処理装置１００が算出した一致度合の結果を示す図（その２）である。It is FIG. (2) which shows the result of the coincidence degree which the information processing apparatus 100 which concerns on embodiment of this invention calculated. 本発明の実施の形態に係る情報処理装置１００が算出した一致度合の結果を示す図（その３）である。It is a figure (the 3) which shows the result of the coincidence degree which the information processing apparatus 100 which concerns on embodiment of this invention calculated. 本発明の実施の形態に係る情報処理装置１００が画像情報として取得する名刺を表わす図（その３）である。It is FIG. (The 3) showing the business card which the information processing apparatus 100 which concerns on embodiment of this invention acquires as image information. 本発明の実施の形態に係る情報処理装置１００が算出した一致度合の結果を示す図（その４）である。It is FIG. (The 4) which shows the result of the coincidence degree which the information processing apparatus 100 which concerns on embodiment of this invention calculated. 本発明の実施の形態に係る情報処理装置１００が画像情報として取得する名刺を表わす図（その４）である。It is FIG. (The 4) showing the business card which the information processing apparatus 100 which concerns on embodiment of this invention acquires as image information. 本発明の実施の形態に係る情報処理装置１００が算出した一致度合の結果を示す図（その５）である。It is FIG. (5) which shows the result of the coincidence degree which the information processing apparatus 100 which concerns on embodiment of this invention calculated. 本発明の実施の形態に係る情報処理装置１００が算出した一致度合の結果を示す図（その６）である。It is FIG. (6) which shows the result of the coincidence degree which the information processing apparatus 100 which concerns on embodiment of this invention calculated. 本発明の実施の形態に係る情報処理装置１００に記憶される情報を示す図（その３）である。It is FIG. (3) which shows the information memorize | stored in the information processing apparatus 100 which concerns on embodiment of this invention.

Explanation of symbols

１００情報処理装置、１１０制御部、１２０記憶部、１２２文字認識情報部、１２４漢字読み情報部、１３０文字認識部、１４０情報所得部、１５０生成部、１６０決定部。 DESCRIPTION OF SYMBOLS 100 Information processing apparatus, 110 Control part, 120 Storage part, 122 Character recognition information part, 124 Kanji reading information part, 130 Character recognition part, 140 Information income part, 150 Generation part, 160 Determination part.

Claims

Means for preliminarily storing first information associating kanji and phonetic characters representing the reading of the kanji;
An acquisition means for acquiring kanji representing an individual's first and last name and second information included in the individual's email address;
Generating means for generating candidates for reading kanji representing the first and last names based on the first information;
An information processing apparatus comprising: a determination unit configured to determine reading of a kanji character representing the first and last name based on a result of collating the second information with the candidate.

The information processing apparatus according to claim 1, wherein the determining unit includes a unit for determining the reading based on a number of characters that match the second information and the candidate.

The information processing apparatus according to claim 2, wherein the determining means includes means for determining a candidate having the largest number of matching characters as the reading.

The information processing apparatus according to claim 1, wherein the determination unit includes a unit for determining the reading based on a character position that matches the second information and the candidate.

The information processing apparatus according to claim 4, wherein the determining unit includes a unit that prioritizes a candidate whose leading character matches the leading character of the second information and determines the reading.

The information processing apparatus according to claim 4, wherein the determination unit includes a unit that prioritizes a candidate whose last character matches the last character of the second information and determines the reading.

The generating means includes means for generating a surname candidate that is a reading of a kanji representing a surname and a name candidate that is a reading of a kanji representing a first name,
In the second information, the determination unit may be configured such that a character position matching the last name candidate and the second information is different from a character position matching the first name candidate and the second information in the second information. The information processing apparatus according to claim 4, further comprising means for determining the reading.

The determination means has a ratio of the number of characters matching the second information and the candidate with respect to the number of characters of the second information and the candidate is smaller than a predetermined ratio for all candidates. The information processing apparatus according to claim 1, further comprising means for determining a candidate satisfying a predetermined condition as the reading.

The information processing apparatus according to claim 8, wherein the determination unit includes a unit for determining the candidate that is frequently used for first and last names as the reading.

The determination means has a ratio of the number of characters matching the second information and the candidate with respect to the number of characters of the second information and the candidate is smaller than a predetermined ratio for all candidates. The information processing apparatus according to claim 1, further comprising means for determining the reading based on the second information.

The information processing apparatus further includes means for generating a pseudonym corresponding to the second information,
The generating means includes means for generating the candidate with a pseudonym,
The information processing apparatus according to claim 1, wherein the determining unit includes a unit for determining the reading based on a result of collating the corresponding kana and the candidate.

The second information includes a romaji,
The information processing apparatus according to claim 1, wherein the generation unit includes a unit for generating the candidate in Roman letters.

The information processing apparatus according to claim 1, wherein the second information is a character for identifying an individual included in the mail address.

The information processing apparatus according to claim 1, wherein the acquisition unit includes a unit for acquiring character information by recognizing captured image information.

The information processing apparatus according to claim 1, wherein the first information is information in which the kanji is associated with reading when the kanji is used as a first name and last name.

An information processing method performed by an information processing apparatus including a storage unit, an acquisition unit, a generation unit, and a determination unit,
Storing, using the storage means, first information associating a kanji with a phonetic character representing a reading of the kanji;
Using the acquisition means to acquire kanji representing an individual's first and last name and second information included in the individual's email address;
Generating a kanji reading candidate representing the first and last names based on the first information using the generating means;
An information processing method including: determining, using the determining means, a reading of a kanji character representing the first and last name based on a result of collating the second information with the candidate.

The program which implement | achieves the information processing method of Claim 16.

The computer-readable recording medium which recorded the program of Claim 17.