JP4991407B2

JP4991407B2 - Information processing apparatus, control program thereof, computer-readable recording medium storing the control program, and control method

Info

Publication number: JP4991407B2
Application number: JP2007160637A
Authority: JP
Inventors: 千絵木内; 至幸小山; 充宏斗谷
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2007-06-18
Filing date: 2007-06-18
Publication date: 2012-08-01
Anticipated expiration: 2027-06-18
Also published as: JP2008310772A

Description

本発明は、特定の媒体に記載された文字列のうちから氏名、会社名、店舗名などを表す文字列を同定するものであって、特に、上記媒体に記載されている電子メールアドレス、ネットワークアドレスの情報を用いて氏名、会社名、店舗名などを表す文字列を同定する情報処理装置、その制御プログラムおよび該制御プログラムを記録したコンピュータ読み取り可能な記録媒体、ならびに制御方法に関するものである。 The present invention identifies character strings representing names, company names, store names, etc. from among character strings described in a specific medium, and in particular, an e-mail address and a network described in the medium The present invention relates to an information processing apparatus for identifying a character string representing a name, a company name, a store name, and the like using address information, a control program for the information processing apparatus, a computer-readable recording medium on which the control program is recorded, and a control method.

従来から、名刺などといった、特定の媒体に記載された文字列の内容を、カメラ機能付き携帯電話機で読み取って電話帳に記録するという技術が知られている。なお、カメラ機能付き携帯電話機で読み取った文字列の内容を電話帳に記録する場合、電話帳には氏名、電話番号などの項目があるので、名刺などに記載された文字列の内容を分類し、正しく各項目部分に記録する必要がある。 2. Description of the Related Art Conventionally, a technique is known in which the contents of a character string written on a specific medium such as a business card are read by a mobile phone with a camera function and recorded in a telephone directory. In addition, when recording the contents of a character string read with a mobile phone with a camera function in the phone book, there are items such as name and phone number in the phone book, so the contents of the character string written on the business card etc. are classified. It is necessary to record correctly in each item part.

また、近年では、上記分類を人手でなく、自動で行う技術も知られている。例えば、特許文献１では、会社名に含まれる「（株）」または「株式会社」といった特定の分類の指標となる文字列を利用して、文字列を自動で分類する文字認識装置が開示されている。 In recent years, a technique for performing the above classification automatically, not manually, is also known. For example, Patent Document 1 discloses a character recognition device that automatically classifies character strings using a character string that is a specific classification index such as “(stock)” or “stock” included in the company name. ing.

他にも、名刺などに記載されている氏名の、１文字ごとの文字間隔が比較的広く、文字が大きいという一般的な特徴を利用して、名刺から氏名を表す文字列を同定して分類するといった項目分類方法も知られている。 In addition, by using the general feature that the character spacing on each name is relatively wide and the characters are large, the character string representing the name is identified and classified from the name card. There is also known a method for classifying items.

さらに、特許文献２では、文字認識によって得られた各文字列に対して、スコア付与手段によってスコアを付与し、当該スコアに基づいて文字列の尤もらしい意味属性（氏名、住所、電話番号、役職等の情報の分類）を決定して分類する情報処理装置が開示されている。特許文献２に開示の情報処理装置では、文字認識によって得られる文字列の文字列群（つまり、同一行で見つかった文字列をパターンとする文字列群）の各文字列の組み合わせパターンに基づいて、文字列を各意味属性に分類するので、特定の分類の指標となる文字列が当該文字列に含まれていない場合であっても、比較的高精度に項目ごとに分類することが可能になっている。 Further, in Patent Document 2, a score is assigned to each character string obtained by character recognition, and a probable semantic attribute (name, address, telephone number, job title) of the character string based on the score. An information processing apparatus for determining and classifying information) is disclosed. In the information processing apparatus disclosed in Patent Document 2, based on a combination pattern of character strings of a character string group obtained by character recognition (that is, a character string group having a character string found on the same line as a pattern). Since character strings are classified into each semantic attribute, even if a character string that serves as a specific classification index is not included in the character string, it can be classified for each item with relatively high accuracy. It has become.

ところで、特定の媒体として取り上げた「名刺」は、ビジネスにおいて使用されることが多い。近年では、仕事によって日本だけではなく海外を取引先とする場合も多数あることから、「名刺」として、日本語だけでなく、英語、ドイツ語、およびフランス語などの多種多様な言語で記載されたものが存在している。
特開平１０−７８９９７号公報（平成１０年３月２４日公開）特開２００６−１８５３４２号公報（平成１８年７月１３日公開） By the way, “business cards” taken up as a specific medium are often used in business. In recent years, there are many cases where business is done not only in Japan but also overseas, so “business cards” were written in various languages such as English, German, and French, as well as Japanese. Things exist.
JP-A-10-78997 (published March 24, 1998) JP 2006-185342 A (published July 13, 2006)

しかしながら、特許文献１に開示の文字認識装置では、英文名刺などの日本語以外で記載された名刺から得られる文字列を自動で分類することが困難であるという問題点を有している。すなわち、日本語で記載された名刺（和文名刺）でいうところの「（株）」または「株式会社」といった特定の分類の指標となる文字列（英文名刺における「Corp.」「Inc.」など）を利用して分類しようとしても、英文名刺においては記載されていないことが多いため、特定の分類の指標となる文字列を利用して文字列を自動で分類することができないという問題がある。 However, the character recognition device disclosed in Patent Document 1 has a problem that it is difficult to automatically classify a character string obtained from a business card written in a language other than Japanese, such as an English business card. That is, a character string (such as “Corp.” or “Inc.” in an English business card) that is an index for a specific classification such as “(stock)” or “stock” in a business card written in Japanese (Japanese business card) ) Is often not listed on English business cards, and there is a problem that character strings cannot be automatically classified using a character string that is an index for a specific classification. .

さらに、上記従来の項目分類方法も、英文名刺などの日本語以外で記載された名刺から得られる文字列を自動で分類することが困難であるという問題点を有している。すなわち、英文名刺における氏名は姓と名との間隔は比較的広いものの、姓または名に含まれる１文字ごとの文字間隔は他の項目と違いがないことが多いため、和文名刺における一般的な特徴を利用して、英文名刺から氏名を表す文字列を同定して分類することは難しいという問題点がある。 Further, the conventional item classification method has a problem that it is difficult to automatically classify a character string obtained from a business card written in a language other than Japanese such as an English business card. That is, although the name name in English business cards has a relatively wide interval between the last name and the first name, the character spacing of each character included in the last name or the first name is often the same as other items. There is a problem that it is difficult to identify and classify character strings representing names from English business cards using features.

また、特許文献２に開示の情報処理装置では、文字認識で得られた文字列から氏名などを分類する場合に、予め辞書データベース（ＤＢ）、確率情報ＤＢ、ルールＤＢなどといった記録媒体を備える必要があるので、膨大な記録媒体の容量が必要となる。また、確率モデルを使用したスコア付与を行うため、処理が複雑になる。すなわち、特許文献２に開示の情報処理装置では、文字認識で得られた文字列から氏名などを分類することが容易でないという問題点を有している。また、特許文献２に開示の情報処理装置を、英文名刺から得られる文字列に適用しようとした場合、英文に対応したデータベースをさらに備える必要があるため、必要とする記録媒体の容量がさらに増加し、文字認識で得られた文字列から氏名などを分類することがより容易に実現できないことになる。 Further, the information processing apparatus disclosed in Patent Document 2 needs to have a recording medium such as a dictionary database (DB), a probability information DB, and a rule DB in advance when classifying names and the like from character strings obtained by character recognition. Therefore, a huge amount of recording medium is required. Further, since the score assignment using the probability model is performed, the processing becomes complicated. That is, the information processing apparatus disclosed in Patent Document 2 has a problem that it is not easy to classify names and the like from character strings obtained by character recognition. In addition, when the information processing apparatus disclosed in Patent Document 2 is applied to a character string obtained from an English business card, it is necessary to further include a database corresponding to English, so that the required capacity of the recording medium further increases. However, classification of names and the like from character strings obtained by character recognition cannot be realized more easily.

本発明は、上記従来の問題点に鑑みなされたものであって、その目的は、特定の媒体についての画像情報より抽出される文字列から、より容易に、少なくとも氏名、会社名、および店舗名のうちのいずれかを表す文字列を同定することを可能にする情報処理装置、その制御プログラムおよび該制御プログラムを記録したコンピュータ読み取り可能な記録媒体、ならびに制御方法を提供することにある。 The present invention has been made in view of the above-described conventional problems, and its purpose is to more easily at least a name, a company name, and a store name from a character string extracted from image information about a specific medium. It is an object of the present invention to provide an information processing apparatus capable of identifying a character string representing any one of the above, a control program thereof, a computer-readable recording medium recording the control program, and a control method.

本発明の情報処理装置は、上記課題を解決するために、文字列が記載された媒体の画像情報をもとに、文字認識辞書を利用して文字認識を行い、上記媒体に記載されている文字列を取得する情報処理装置であって、上記文字認識で得られた文字列から、少なくとも電子メールアドレスを表す文字列であるメールアドレス情報、およびネットワークアドレスを表す文字列であるネットワークアドレス情報のうちのいずれかであるアドレス情報を取得するアドレス情報取得手段と、上記文字認識で得られた文字列から、上記メールアドレス情報および上記ネットワークアドレス情報以外の文字列である複数の比較照合用文字列を取得する比較照合用文字列取得手段と、上記アドレス情報取得手段によって取得した上記アドレス情報に基づいて、少なくとも氏名の識別に用いる個人識別文字列、会社名の識別に用いる会社識別文字列、および店舗名の識別に用いる店舗識別文字列のうちのいずれかである識別文字列を生成する識別文字列生成手段と、上記識別文字列生成手段によって生成した上記識別文字列と上記比較照合用文字列取得手段によって取得した上記複数の比較照合用文字列のそれぞれとを比較照合する文字列比較照合手段と、上記文字列比較照合手段での比較照合の結果に基づいて、上記個人識別文字列に類似すると判定した比較照合用文字列を、氏名を表す文字列として同定し、上記会社識別文字列に類似すると判定した比較照合用文字列を、会社名を表す文字列として同定し、上記店舗識別文字列に類似すると判定した比較照合用文字列を、店舗名を表す文字列として同定する同定手段とを備えることを特徴としている。 In order to solve the above problems, the information processing apparatus of the present invention performs character recognition using a character recognition dictionary based on image information of a medium on which a character string is described, and is described in the medium. An information processing apparatus that acquires a character string, and includes, from the character string obtained by the character recognition, at least mail address information that is a character string that represents an email address and network address information that is a character string that represents a network address. A plurality of character strings for comparison and collation that are character strings other than the mail address information and the network address information from the character information obtained by the character recognition and the address information obtaining means for obtaining address information that is any one of them Based on the address information acquired by the comparison verification character string acquisition means and the address information acquisition means. An identification character string that generates an identification character string that is one of a personal identification character string used to identify the name, a company identification character string used to identify the company name, and a store identification character string used to identify the store name. A character string comparison / collation unit that compares and collates the identification character string generated by the identification character string generation unit and each of the plurality of comparison / collation character strings acquired by the comparison / collation character string acquisition unit. The character string for comparison and collation determined to be similar to the personal identification character string based on the result of comparison and collation by the character string comparison and collation means is identified as a character string representing a name, and similar to the company identification character string. Then, the determined character string for comparison and collation is identified as a character string representing the company name, and the character string for comparison and collation determined to be similar to the store identification character string is used as a character string representing the store name. It is characterized in that it comprises a identification means constant to.

また、本発明の制御方法は、上記課題を解決するために、文字列が記載された媒体の画像情報をもとに、文字認識辞書を利用して文字認識を行い、上記媒体に記載されている文字列を取得する情報処理装置の制御方法であって、アドレス情報取得手段によって、上記文字認識で得られた文字列から、少なくとも電子メールアドレスを表す文字列であるメールアドレス情報、およびネットワークアドレスを表す文字列であるネットワークアドレス情報のうちのいずれかであるアドレス情報を取得するアドレス情報取得工程と、比較照合用文字列取得手段によって、上記文字認識で得られた文字列から、上記メールアドレス情報および上記ネットワークアドレス情報以外の文字列である複数の比較照合用文字列を取得する比較照合用文字列取得工程と、識別文字列生成手段によって、上記アドレス情報取得手段によって取得された上記アドレス情報に基づいて、少なくとも氏名の識別に用いる個人識別文字列、会社名の識別に用いる会社識別文字列、および店舗名の識別に用いる店舗識別文字列のうちのいずれかである識別文字列を生成する識別文字列生成工程と、文字列比較照合手段によって、上記識別文字列生成手段によって生成された上記識別文字列と上記比較照合用文字列取得手段によって取得した上記複数の比較照合用文字列のそれぞれとを比較照合する文字列比較照合工程と、同定手段によって、上記文字列比較照合工程での比較照合の結果に基づいて、上記個人識別文字列に類似すると判定した比較照合用文字列を、氏名を表す文字列として同定し、上記会社識別文字列に類似すると判定した比較照合用文字列を、会社名を表す文字列として同定し、上記店舗識別文字列に類似すると判定した比較照合用文字列を、店舗名を表す文字列として同定する同定工程とを含むことを特徴としている。 In order to solve the above problems, the control method of the present invention performs character recognition using a character recognition dictionary based on image information of a medium on which a character string is described, and is described in the medium. A method of controlling an information processing apparatus for acquiring a character string, wherein the address information acquisition means is a mail address information that is a character string representing at least an e-mail address from the character string obtained by the character recognition, and a network address From the character string obtained by the character recognition by the address information obtaining step for obtaining address information which is any of the network address information which is a character string representing the character string and the comparison character string obtaining means, the mail address A string for comparison and collation that obtains multiple strings for comparison and collation that is a character string other than information and the above network address information And, based on the address information acquired by the address information acquisition means by the identification character string generation means, at least a personal identification character string used for name identification, a company identification character string used for company name identification, and a store name An identification character string generation step of generating an identification character string that is one of the store identification character strings used for identification of the store, and the identification character string generated by the identification character string generation means by a character string comparison / collation means; A character string comparison / collation step for comparing and collating each of the plurality of comparison / collation character strings acquired by the comparison / collation character string obtaining unit, and a comparison / collation result in the character string comparison / collation step by the identification unit. Based on the above, the comparison verification character string determined to be similar to the personal identification character string is identified as a character string representing a name, and the company identification character string An identification step of identifying a comparison matching character string determined to be similar as a character string representing a company name, and identifying a comparison matching character string determined to be similar to the store identification character string as a character string representing a store name; It is characterized by including.

上記の発明によれば、文字列が記載された媒体の画像情報をもとに、文字認識辞書を利用して文字認識を行って取得した文字列から、アドレス情報取得手段によって、少なくとも電子メールアドレスを表す文字列であるメールアドレス情報、およびネットワークアドレスを表す文字列であるネットワークアドレス情報のうちのいずれかであるアドレス情報を取得し、識別文字列生成手段によって、上記アドレス情報に基づいて、少なくとも氏名の識別に用いる個人識別文字列、会社名の識別に用いる会社識別文字列、および店舗名の識別に用いる店舗識別文字列のうちのいずれかである識別文字列を生成する構成になっている。また、上記文字認識を行って取得した文字列から、比較照合用文字列取得手段によって、メールアドレス情報およびネットワークアドレス情報以外の文字列である複数の比較照合用文字列を取得する構成になっている。そして、文字列比較照合手段によって、上記識別文字列と上記複数の比較照合用文字列のそれぞれとを比較照合し、同定手段によって、上記比較照合の結果に基づいて、個人識別文字列に類似すると判定した比較照合用文字列を、氏名を表す文字列として同定し、会社識別文字列に類似すると判定した比較照合用文字列を、会社名を表す文字列として同定し、店舗識別文字列に類似すると判定した比較照合用文字列を、店舗名を表す文字列として同定するので、氏名、会社名、店舗名を表す文字列を同定するために、新しく辞書およびデータベースを情報処理装置に備える必要がない。 According to the above invention, at least an e-mail address is obtained from the character string obtained by performing character recognition using the character recognition dictionary based on the image information of the medium in which the character string is described, by the address information obtaining unit. Address information that is any one of mail address information that is a character string representing network address information and network address information that is a character string representing a network address, and at least based on the address information by the identification character string generation means An identification character string that is one of a personal identification character string used for identifying a name, a company identification character string used for identifying a company name, and a store identification character string used for identifying a store name is generated. . In addition, from the character string acquired by performing the character recognition, a plurality of comparison and verification character strings that are character strings other than the mail address information and the network address information are acquired by the comparison and verification character string acquisition means. Yes. Then, the character string comparison / collation means compares the identification character string with each of the plurality of comparison / collation character strings, and the identification means is similar to the personal identification character string based on the result of the comparison / collation. The determined comparison matching character string is identified as a character string representing the name, and the comparison matching character string determined to be similar to the company identification character string is identified as a character string representing the company name, and similar to the store identification character string. Then, since the determined comparison matching character string is identified as a character string representing a store name, it is necessary to provide a new dictionary and database in the information processing apparatus in order to identify a character string representing a name, a company name, and a store name. Absent.

また、文字認識を行って取得した文字列から得られた、少なくともメールアドレス情報およびネットワークアドレス情報のうちのいずれかに由来する識別文字列と、メールアドレス情報およびネットワークアドレス情報以外の文字列である複数の比較照合用文字列とを比較照合し、識別文字列に類似する比較照合用文字列を抽出することによって、氏名、会社名、店舗名を同定するので、複雑な辞書データおよび複雑な処理のルールが、情報処理装置に必要とならない。よって、比較的簡単な処理によって、文字列が記載された媒体についての画像情報から、少なくとも氏名、会社名、および店舗名のうちのいずれかを表す文字列を同定することが可能になる。 Also, an identification character string derived from at least one of mail address information and network address information obtained from character strings obtained through character recognition, and a character string other than mail address information and network address information By comparing and collating multiple character strings for comparison and extracting character strings for comparison and collation similar to identification character strings, name, company name and store name are identified, so complex dictionary data and complex processing This rule is not required for the information processing apparatus. Therefore, it is possible to identify a character string representing at least one of the name, the company name, and the store name from the image information about the medium in which the character string is described by a relatively simple process.

さらに、上述したように、少なくともメールアドレス情報およびネットワークアドレス情報のうちのいずれかから生成される、氏名の識別に用いる個人識別文字列、会社名の識別に用いる会社識別文字列、店舗名の識別に用いる店舗識別文字列等の識別文字列と、メールアドレス情報およびネットワークアドレス情報以外の文字列である複数の比較照合用文字列とを比較照合し、識別文字列に類似する比較照合用文字列を抽出することによって、氏名、会社名、店舗名を同定するので、「（株）」、「株式会社」、「Corp.」、「Inc.」といったような、特定の分類の指標となる文字列が、文字認識を行って取得した文字列に含まれていなかった場合であっても、氏名、会社名、店舗名を表す文字列を同定することが可能になる。また、特に、氏名、会社名、店舗名に特定の分類の指標となる文字列が含まれることの少ない英文表記の名刺等の媒体についての画像情報より抽出される文字列から、より容易に、少なくとも氏名、会社名、および店舗名のうちのいずれかを表す文字列を同定することが可能になる。 Further, as described above, a personal identification character string used to identify a name, a company identification character string used to identify a company name, and a store name identification generated from at least one of email address information and network address information The comparison character string similar to the identification character string by comparing and collating the identification character string such as the store identification character string used for the item and a plurality of comparison character strings that are character strings other than the mail address information and the network address information The name, company name, and store name are identified by extracting, so characters that serve as indicators for a specific classification, such as “(Co),” “Co.,” “Corp.”, “Inc.” Even if the column is not included in the character string obtained by performing character recognition, it is possible to identify the character string representing the name, the company name, and the store name. Also, in particular, more easily from character strings extracted from image information about media such as business cards in English notation that the name, company name, store name rarely contain character strings that serve as specific classification indices, It becomes possible to identify a character string representing at least one of a name, a company name, and a store name.

その結果、特定の媒体についての画像情報より抽出される文字列から、より容易に、少なくとも氏名、会社名、および店舗名のうちのいずれかを表す文字列を同定することが可能になる。 As a result, it is possible to more easily identify a character string representing at least one of a name, a company name, and a store name from a character string extracted from image information on a specific medium.

また、本発明の情報処理装置では、前記アドレス情報取得手段は、前記メールアドレス情報または前記ネットワークアドレス情報に含まれる少なくとも特定の文字および特定の文字列のうちのいずれかを利用することによって、少なくとも上記メールアドレス情報および上記ネットワークアドレス情報のうちのいずれかを取得することが好ましい。 In the information processing apparatus of the present invention, the address information acquisition unit uses at least one of at least a specific character and a specific character string included in the mail address information or the network address information. It is preferable to acquire either the mail address information or the network address information.

これにより、メールアドレス情報またはネットワークアドレス情報に含まれる、少なくとも特定の文字および特定の文字列のうちのいずれかを利用することによって、アドレス情報取得手段が、少なくともメールアドレス情報およびネットワークアドレス情報のうちのいずれかを取得することが可能になる。よって、メールアドレス情報またはネットワークアドレス情報に特徴的な文字または文字列を、上記特定の文字または文字列とすれば、アドレス情報取得手段によって容易にメールアドレス情報およびネットワークアドレス情報を判別して取得することが可能になる。 As a result, by using at least one of the specific character and the specific character string included in the mail address information or the network address information, the address information acquisition unit can perform at least the mail address information and the network address information. It will be possible to get either. Therefore, if the character or character string characteristic of the mail address information or the network address information is the specific character or character string, the mail address information and the network address information can be easily determined and acquired by the address information acquisition means. It becomes possible.

また、本発明の情報処理装置では、前記アドレス情報取得手段は、さらに、取得した前記メールアドレス情報または前記ネットワークアドレス情報のうちの、メールアドレスおよびネットワークアドレスの構成要素以外を表す文字列であることが明らかな文字列を削除してメールアドレス情報またはネットワークアドレス情報とすることが好ましい。 Moreover, in the information processing apparatus of the present invention, the address information acquisition unit is a character string representing a component other than the components of the email address and the network address in the acquired email address information or the network address information. It is preferable to delete the clear character string to obtain mail address information or network address information.

これにより、アドレス情報取得手段が、メールアドレスおよびネットワークアドレスの構成要素以外を表す文字列であることが明らかな文字列を削除してメールアドレス情報またはネットワークアドレス情報とするので、メールアドレスおよびネットワークアドレスの構成要素以外を表す文字列であることが明らかな文字列を含まない識別文字列を、識別文字列生成手段で生成することが容易になる。また、文字列比較照合手段での比較照合に用いられる識別文字列に、メールアドレスおよびネットワークアドレスの構成要素以外を表す文字列であることが明らかな文字列を含まないことは、上記比較照合の精度を上げるとともに、無駄な比較照合を低減させることになるので、特定の媒体についての画像情報より抽出される文字列から、さらに容易に、少なくとも氏名、会社名、および店舗名のうちのいずれかを表す文字列を同定することが可能になるとともに、情報処理装置での処理量を低減させることが可能になる。 As a result, the address information acquisition means deletes a character string that is clearly a character string representing elements other than the constituent elements of the mail address and the network address, so that the mail address information or the network address information is obtained. It is easy to generate an identification character string that does not include a character string that is clearly a character string that represents a component other than the above-described components by the identification character string generation unit. In addition, the fact that the identification character string used for the comparison and collation in the character string comparison and collation means does not include a character string that is clearly a character string representing elements other than the components of the mail address and the network address is As it increases accuracy and reduces unnecessary comparisons and collations, it is even easier to use at least one of the name, company name, and store name from the character string extracted from the image information about a specific medium. Can be identified, and the amount of processing in the information processing apparatus can be reduced.

また、本発明の情報処理装置では、前記比較照合用文字列取得手段は、前記文字認識で得られた文字列のうち、前記メールアドレス情報および前記ネットワークアドレス情報以外の文字列を、所定の区切り文字の前後で分割することによって、前記複数の比較照合用文字列を取得することが好ましい。 Further, in the information processing apparatus of the present invention, the comparison / matching character string obtaining unit delimits a character string other than the mail address information and the network address information among character strings obtained by the character recognition. It is preferable to obtain the plurality of character strings for comparison and collation by dividing the character before and after.

これにより、所定の区切り文字の前後で分割したメールアドレス情報およびネットワークアドレス情報以外の文字列を、比較照合用文字列取得手段で取得する比較照合用文字列とすることになる。メールアドレス情報およびネットワークアドレス情報以外の文字列に氏名、会社名、店舗名を表す文字列が含まれている場合、氏名、会社名、店舗名を表す文字列は、一般的に区切り文字で区切られていることが多いので、氏名、会社名、店舗名を表す文字列のみを含む比較照合用文字列を取得することが可能になる。氏名、会社名、店舗名を表す文字列のみを含む比較照合用文字列が取得された場合、文字列比較照合手段での比較照合の結果、同定手段が氏名、会社名、または店舗名を表す文字列として判定する比較照合用文字列が、氏名、会社名、店舗名を表す文字列のみを含む比較照合用文字列となるので、より正確に、少なくとも氏名、会社名、および店舗名のうちのいずれかを表す文字列を同定することが可能になる。 As a result, character strings other than the mail address information and the network address information divided before and after the predetermined delimiter character are used as the comparison / collation character string acquired by the comparison / collation character string acquisition means. When a character string other than e-mail address information and network address information includes a character string that represents a name, company name, or store name, the character string that represents a name, company name, or store name is generally separated by a delimiter. Therefore, it is possible to obtain a comparison / matching character string including only a character string representing a name, a company name, and a store name. When a comparison / matching character string that includes only the character string representing the name, company name, and store name is acquired, the identification means represents the name, company name, or store name as a result of the comparison / matching by the string comparison / collation means. Since the comparison / matching character string to be determined as a character string is a comparison / matching character string that includes only the character string representing the name, company name, and store name, more accurately, at least of the name, company name, and store name It becomes possible to identify a character string representing either of the following.

また、本発明の情報処理装置では、前記比較照合用文字列取得手段は、さらに、前記複数の比較照合用文字列のうちから、氏名、会社名、および店舗名以外を表す文字列であることが明らかな文字列を削除して比較照合用文字列とすることが好ましい。 In the information processing apparatus according to the present invention, the comparison / matching character string acquisition unit may be a character string representing a name, a company name, and a store name other than the plurality of comparison / matching character strings. It is preferable to delete a character string that is clearly defined as a character string for comparison and collation.

これにより、比較照合用文字列取得手段が、氏名、会社名、および店舗名以外を表す文字列であることが明らかな文字列を削除して比較照合用文字列とすることになる。また、文字列比較照合手段での比較照合に用いられる比較照合用文字列に、氏名、会社名、および店舗名以外を表す文字列であることが明らかな文字列を含まないことは、上記比較照合の精度を上げるとともに、無駄な比較照合を低減させることになるので、特定の媒体についての画像情報より抽出される文字列から、さらに容易に、少なくとも氏名、会社名、および店舗名のうちのいずれかを表す文字列を同定することが可能になるとともに、情報処理装置での処理量を低減させることが可能になる。 As a result, the comparison / matching character string acquisition unit deletes a character string that is clearly a character string representing a name other than the name, company name, and store name, and forms a comparison / matching character string. In addition, the comparison / matching character string used for the comparison / collation in the character string comparison / collation means does not include a character string that is clearly a character string other than the name, company name, and store name. As this will increase the accuracy of verification and reduce unnecessary comparison verification, it is even easier to use at least one of name, company name, and store name from a character string extracted from image information about a specific medium. It becomes possible to identify a character string representing one of them and reduce the amount of processing in the information processing apparatus.

また、本発明の情報処理装置では、前記識別文字列生成手段は、前記アドレス情報取得手段によってネットワークアドレス情報が取得されなかった場合には、前記メールアドレス情報に含まれる前記会社識別文字列を会社識別文字列として生成することが好ましい。 Further, in the information processing apparatus of the present invention, the identification character string generation unit uses the company identification character string included in the mail address information as a company when the network address information is not acquired by the address information acquisition unit. It is preferable to generate the identification character string.

これにより、アドレス情報取得手段によってネットワークアドレス情報が取得されなかった場合にも、識別文字列生成手段によってメールアドレス情報から会社識別文字列を生成することが可能になる。 Thereby, even when the network address information is not acquired by the address information acquisition unit, the identification character string generation unit can generate the company identification character string from the mail address information.

また、本発明の情報処理装置では、前記識別文字列生成手段は、前記アドレス情報取得手段によって取得した前記メールアドレス情報に含まれる、電子メールアドレスのユーザ名とホスト名との間に入る特定の文字の前後で、上記メールアドレス情報を分割することによって、前記個人識別文字列と前記会社識別文字列または店舗識別文字列とを生成することが好ましい。 In the information processing apparatus according to the present invention, the identification character string generation unit includes a specific name that falls between the user name and the host name of the email address included in the email address information acquired by the address information acquisition unit. It is preferable to generate the personal identification character string and the company identification character string or the store identification character string by dividing the mail address information before and after the character.

これにより、識別文字列生成手段が、電子メールアドレスのユーザ名とホスト名との間に入る特定の文字、すなわち“@”の前後でメールアドレス情報を分割することになる。ビジネスなどで用いられる電子メールアドレスは“@”以前の部分に氏名を使用することが多く、“@”以降の部分に会社名、店舗名を使用することが多いため、ビジネス名刺を文字認識の対象としていた場合に、より精度を高く、氏名を表す文字列を個人識別文字列として生成したり、会社名を表す文字列を会社識別文字列として生成したり、店舗名を表す文字列を店舗識別文字列として生成したりすることが可能になる。従って、特定の媒体についての画像情報より抽出される文字列から、より正確に、少なくとも氏名、および会社名または店舗名のうちのいずれかを表す文字列を同定することが可能になる。 Thereby, the identification character string generating means divides the mail address information before and after a specific character that falls between the user name and host name of the electronic mail address, that is, “@”. E-mail addresses used in business often use the name before the “@”, and often use the company name and store name after the “@”. If it is the target, the character string representing the name is generated as a personal identification character string, the character string representing the company name is generated as the company identification character string, or the character string representing the store name is stored in the store. It can be generated as an identification character string. Therefore, it is possible to more accurately identify at least the name and the character string representing either the company name or the store name from the character string extracted from the image information on the specific medium.

また、本発明の情報処理装置では、前記識別文字列生成手段は、前記メールアドレス情報のうちの、前記特定の文字よりも前方の文字列を個人識別文字列または店舗識別文字列として生成することが好ましい。 In the information processing apparatus of the present invention, the identification character string generation unit generates a character string ahead of the specific character in the mail address information as a personal identification character string or a store identification character string. Is preferred.

これにより、識別文字列生成手段が、メールアドレス情報から、上記特定の文字、すなわち“@”よりも前方の文字列を個人識別文字列または店舗識別文字列として生成することになる。ビジネス名刺などで用いられる電子メールアドレスは“@”以前の部分に氏名を使用することが多いため、ビジネス名刺を文字認識の対象としていた場合に、より精度を高く、氏名を表す文字列を個人識別文字列として生成することが可能になる。また、店舗の情報が掲載された雑誌などで掲載されている電子メールアドレスには、“@”以前の部分に店舗名を使用することが多いため、上述のような雑誌を文字認識の対象としていた場合に、より精度を高く、店舗名を表す文字列を店舗識別文字列として生成することが可能となる。従って、特定の媒体についての画像情報より抽出される文字列から、より正確に、氏名または店舗名識別文字列を表す文字列を同定することが可能になる。 As a result, the identification character string generation means generates the specific character, that is, the character string ahead of “@” as the personal identification character string or the store identification character string from the mail address information. E-mail addresses used for business name cards often use names in front of “@”, so when business name cards are targeted for character recognition, it is more accurate and a character string representing the name is personal. It can be generated as an identification character string. In addition, e-mail addresses that are published in magazines that contain store information often use store names before the “@”, so magazines such as those listed above are subject to character recognition. If it is, the character string representing the store name can be generated with higher accuracy as the store identification character string. Therefore, it is possible to more accurately identify the character string representing the name or the store name identification character string from the character string extracted from the image information about the specific medium.

また、本発明の情報処理装置では、前記識別文字列生成手段は、前記メールアドレス情報のうちの、前記特定の文字よりも後方の文字列を会社識別文字列または店舗識別文字列として生成することが好ましい。 In the information processing apparatus according to the present invention, the identification character string generation unit generates a character string behind the specific character in the mail address information as a company identification character string or a store identification character string. Is preferred.

これにより、識別文字列生成手段が、メールアドレス情報から、上記特定の文字、すなわち“@”よりも後方の文字列を会社識別文字列または店舗識別文字列として生成することになる。ビジネス名刺などで用いられる電子メールアドレスは“@”以降の部分に会社名を使用することが多いため、ビジネス名刺を文字認識の対象としていた場合に、より精度を高く、会社名を表す文字列を会社識別文字列として生成することが可能になる。また、店舗の情報が掲載された雑誌などで掲載されている電子メールアドレスには、“@”以降の部分に店舗名を使用することも多いため、上述のような雑誌を文字認識の対象としていた場合に、より精度を高く、店舗名を表す文字列を店舗識別文字列として生成することが可能となる。従って、特定の媒体についての画像情報より抽出される文字列から、より正確に、少なくとも会社名または店舗名識別文字列を表す文字列を同定することが可能になる。 As a result, the identification character string generation means generates the specific character, that is, the character string behind “@” as the company identification character string or the store identification character string from the mail address information. E-mail addresses used for business name cards often use the company name in the part after “@”, so when business name cards are targeted for character recognition, the character string that represents the company name is more accurate. Can be generated as a company identification character string. In addition, e-mail addresses published in magazines that contain store information often use store names in the part after “@”. If it is, the character string representing the store name can be generated with higher accuracy as the store identification character string. Accordingly, it is possible to more accurately identify a character string representing at least a company name or a store name identification character string from a character string extracted from image information on a specific medium.

また、本発明の情報処理装置では、前記識別文字列生成手段は、さらに、前記アドレス情報取得手段によって取得した前記メールアドレス情報に含まれる区切り文字の前後で前記個人識別文字列、前記会社識別文字列、または前記店舗識別文字列を分割し、新たな個人識別文字列、会社識別文字列、または店舗識別文字列とすることが好ましい。 In the information processing apparatus of the present invention, the identification character string generation unit further includes the personal identification character string and the company identification character before and after a delimiter included in the mail address information acquired by the address information acquisition unit. It is preferable to divide the column or the store identification character string into a new personal identification character string, company identification character string, or store identification character string.

これにより、電子メールアドレスのユーザ名とホスト名との間に入る特定の文字、すなわち“@”の前後でメールアドレス情報を分割して生成した個人識別文字列、会社識別文字列、または店舗識別文字列を、さらにメールアドレス情報に含まれる区切り文字の前後で分割した個人識別文字列、会社識別文字列、または店舗識別文字列を、識別文字列生成手段で生成する個人識別文字列、会社識別文字列、または店舗識別文字列とすることが可能になる。 As a result, a specific character that falls between the user name and host name of the email address, that is, a personal identification character string, company identification character string, or store identification generated by dividing the mail address information before and after “@” Personal identification character string generated by the identification character string generation means, company identification character string, company identification character string, or store identification character string further divided before and after the delimiter included in the email address information It can be a character string or a store identification character string.

また、本発明の情報処理装置では、前記識別文字列生成手段は、前記アドレス情報取得手段によって取得した前記ネットワークアドレス情報に含まれる区切り文字の前後で上記ネットワークアドレス情報を分割することによって、前記会社識別文字列または前記店舗識別文字列を生成することが好ましい。 In the information processing apparatus of the present invention, the identification character string generation unit divides the network address information before and after a delimiter included in the network address information acquired by the address information acquisition unit, thereby It is preferable to generate an identification character string or the store identification character string.

これにより、ネットワークアドレス情報に含まれる区切り文字の前後で分割したネットワークアドレス情報を、識別文字列生成手段で生成する会社識別文字列または店舗識別文字列とすることになる。ネットワークアドレス情報に会社名、店舗名を表す文字列が含まれている場合、会社名、店舗名を表す文字列は、一般的に区切り文字で区切られていることが多いので、会社名、店舗名を表す文字列のみを含む会社識別文字列または店舗識別文字列を生成することが可能になる。会社名、店舗名を表す文字列のみを含む会社識別文字列または店舗識別文字列が生成された場合、文字列比較照合手段での比較照合時に、会社名、店舗名を表す文字列のみを含む会社識別文字列または店舗識別文字列が比較照合に用いられることになるので、比較照合の精度をより高くすることが可能になる。 Thereby, the network address information divided before and after the delimiter included in the network address information is used as the company identification character string or the store identification character string generated by the identification character string generation means. If the network address information includes a character string representing a company name or store name, the character string representing the company name or store name is generally separated by a delimiter. It becomes possible to generate a company identification character string or a store identification character string including only a character string representing a name. When a company identification character string or store identification character string that includes only the character string representing the company name or store name is generated, only the character string representing the company name or store name is included in the comparison by the character string comparison and collation means. Since the company identification character string or the store identification character string is used for comparison and collation, the accuracy of comparison and collation can be further increased.

また、本発明の情報処理装置では、前記文字列比較照合手段は、前記識別文字列生成手段によって生成した前記識別文字列と、前記比較照合用文字列取得手段によって取得した前記複数の比較照合用文字列のそれぞれとについて、お互いの前方の文字からの比較照合、お互いの後方の文字からの比較照合、および、お互いの比較照合する相手を１文字ずつ順番にずらしながらの比較照合のそれぞれを行うことが好ましい。 In the information processing apparatus of the present invention, the character string comparison and collation unit includes the identification character string generated by the identification character string generation unit and the plurality of comparison and collation acquired by the comparison and collation character string acquisition unit. For each of the character strings, a comparison / collation from the characters in front of each other, a comparison / collation from the characters behind each other, and a comparison / collation by shifting the characters to be compared with each other one by one in order. It is preferable.

これにより、お互いの前方の文字からの比較照合、お互いの後方の文字からの比較照合、および、お互いの比較照合する相手を１文字ずつ順番にずらしながらの比較照合といった異なる種類の比較照合を行った結果を総合して、同定手段で判定を行うことが可能になる。従って、特定の媒体についての画像情報より抽出される文字列から、より精度を高くして、少なくとも氏名、会社名、および店舗名のうちのいずれかを表す文字列を同定することが可能になる。 This enables different types of comparison collation, such as comparison collation from the characters in front of each other, comparison collation from the characters behind each other, and comparison collation while sequentially shifting each other's comparison collation one character at a time. It is possible to make a determination by the identification means by combining the results. Therefore, it is possible to identify a character string representing at least one of a name, a company name, and a store name from the character string extracted from the image information about the specific medium with higher accuracy. .

また、本発明の情報処理装置では、前記文字列比較照合手段は、お互いの比較照合する相手を１文字ずつ順番にずらしながらの比較照合を行う場合、前記識別文字列と前記複数の比較照合用文字列とのうち、文字数の少ない方を、相手に対して１文字ずつ順番にずらしながら比較照合することが好ましい。 In the information processing apparatus according to the present invention, the character string comparison / collation unit may perform the comparison collation while sequentially shifting each other's counterparts for comparison / collation one character at a time. It is preferable to compare and collate the character string having the smaller number of characters with respect to the partner while sequentially shifting the characters one by one.

これにより、お互いの比較照合する相手を１文字ずつ順番にずらしながらの比較照合を行う場合に、識別文字列と複数の比較照合用文字列とのうち、文字数の少ない方を、相手に対して１文字ずつ順番にずらしながら比較照合することが可能になる。 As a result, when performing the comparison and collation while sequentially shifting the other party to be compared and collated one character at a time, the one having the smaller number of characters of the identification character string and the plurality of comparison and collation character strings is Comparison and collation can be performed while shifting character by character in order.

また、本発明の情報処理装置では、前記同定手段は、前記文字列比較照合手段での比較照合の結果、前記識別文字列と前記複数の比較照合用文字列のそれぞれとの間で一致する文字数に基づいて、前記識別文字列と前記複数の比較照合用文字列のそれぞれとが類似しているか否かを判定することが好ましい。 In the information processing apparatus according to the present invention, the identification unit may determine the number of characters that match between the identification character string and each of the plurality of comparison / matching character strings as a result of the comparison / collation in the character string comparison / collation unit. It is preferable to determine whether or not the identification character string and each of the plurality of comparison matching character strings are similar to each other.

これにより、同定手段が、文字列比較照合手段での比較照合の結果、識別文字列と複数の比較照合用文字列のそれぞれとの間で一致する文字数に基づいて、識別文字列と複数の比較照合用文字列のそれぞれとが類似しているか否かを判定することが可能になる。 As a result, the identification unit compares the identification character string with a plurality of comparisons based on the number of matching characters between the identification character string and each of the plurality of comparison matching character strings as a result of the comparison / collation in the character string comparison / collation unit. It is possible to determine whether or not each of the matching character strings is similar.

また、本発明の情報処理装置では、前記同定手段は、前記一致する文字数が前記個人識別文字列に対して最も多い比較照合用文字列を、上記個人識別文字列に類似すると判定し、上記一致する文字数が前記会社識別文字列に対して最も多い比較照合用文字列を、上記会社識別文字列に類似すると判定し、上記一致する文字数が前記店舗識別文字列に対して最も多い比較照合用文字列を、上記店舗識別文字列に類似すると判定することが好ましい。 In the information processing apparatus of the present invention, the identification unit determines that the comparison matching character string having the largest number of matching characters with respect to the personal identification character string is similar to the personal identification character string, and the matching The comparison / matching character string having the largest number of characters to be compared with the company identification character string is determined to be similar to the company identification character string, and the number of matching characters is the largest with respect to the store identification character string. It is preferable to determine that the column is similar to the store identification character string.

これにより、個人識別文字列、会社識別文字列、店舗識別文字列のそれぞれに対して、一致する文字数が最も多い比較照合用文字列を、それぞれ、氏名を表す文字列、会社名を表す文字列、店舗名を表す文字列として同定することが可能になる。 Thereby, for each of the personal identification character string, the company identification character string, and the store identification character string, the comparison matching character string having the largest number of matching characters, the character string representing the name and the character string representing the company name, respectively. , And can be identified as a character string representing the store name.

また、本発明の情報処理装置では、前記同定手段は、前記文字列比較照合手段での比較照合の結果、前記複数の比較照合用文字列のいずれも前記識別文字列に類似しないと判定した場合、前記複数の比較照合用文字列のうちの、それぞれ予め定められた条件を満たす比較照合用文字列を、氏名および会社名または店舗名として同定することが好ましい。 In the information processing apparatus of the present invention, when the identification unit determines that none of the plurality of comparison matching character strings is similar to the identification character string, as a result of the comparison matching in the character string comparison matching unit. It is preferable to identify a comparison / matching character string satisfying a predetermined condition among the plurality of comparison / matching character strings as a name and a company name or a store name.

これにより、文字列比較照合手段での比較照合の結果、複数の比較照合用文字列のいずれも前記個人識別文字列に類似しないと判定した場合にも、複数の比較照合用文字列のうちの、それぞれ予め定められた条件を満たす比較照合用文字列を、氏名および会社名または店舗名として同定することが可能になる。 As a result of the comparison and collation by the character string comparison and collation means, even when it is determined that none of the plurality of comparison and collation character strings is similar to the personal identification character string, Thus, it is possible to identify a comparison / matching character string satisfying a predetermined condition as a name and a company name or a store name.

なお、上記情報処理装置は、コンピュータによって実現してもよく、この場合には、コンピュータを上記各手段として動作させることにより上記情報処理装置をコンピュータにて実現させる制御プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The information processing apparatus may be realized by a computer. In this case, a control program that causes the information processing apparatus to be realized by the computer by causing the computer to operate as the respective means, and a computer that records the control program A readable recording medium falls within the scope of the present invention.

本発明によれば、新しく辞書およびデータベースを情報処理装置に備える必要がない。また、文字認識を行って取得した文字列から得られた、少なくともメールアドレス情報およびネットワークアドレス情報のうちのいずれかから生成される、氏名の識別に用いる個人識別文字列、会社名の識別に用いる会社識別文字列、店舗名の識別に用いる店舗識別文字列等の識別文字列と、メールアドレス情報およびネットワークアドレス情報以外の文字列である複数の比較照合用文字列とを比較照合し、識別文字列に類似する比較照合用文字列を抽出することによって、氏名、会社名、店舗名を同定するので、複雑な辞書データおよび複雑な処理のルールが、情報処理装置に必要とならないとともに、「（株）」、「株式会社」、「Corp.」、「Inc.」といったような、特定の分類の指標となる文字列が、文字認識を行って取得した文字列に含まれていなかった場合であっても、氏名、会社名、店舗名を表す文字列を同定することが可能になる。 According to the present invention, it is not necessary to provide a new dictionary and database in the information processing apparatus. Also, a personal identification character string used for identifying a name, which is generated from at least one of mail address information and network address information obtained from a character string obtained by performing character recognition, and used for identifying a company name The identification character string such as the company identification character string and the store identification character string used for identifying the store name is compared with a plurality of comparison and verification character strings that are other than the mail address information and the network address information. By extracting a character string for comparison and collation similar to the column, the name, company name, and store name are identified, so that complicated dictionary data and complicated processing rules are not required for the information processing apparatus, and “( Stock) ”,“ Inc. ”,“ Corp. ”,“ Inc. ”, etc., character strings that serve as indices for specific classifications are converted to character strings obtained through character recognition. Even in the case it was not rare, full name, company name, it is possible to identify a string that represents the store name.

よって、比較的簡単な処理によって、文字列が記載された媒体についての画像情報から、少なくとも氏名、会社名、および店舗名のうちのいずれかを表す文字列を同定することが可能になる。また、特に、氏名、会社名、店舗名に特定の分類の指標となる文字列が含まれることの少ない英文表記の名刺等の媒体についての画像情報より抽出される文字列から、より容易に、少なくとも氏名、会社名、および店舗名のうちのいずれかを表す文字列を同定することが可能になる。従って、特定の媒体についての画像情報より抽出される文字列から、より容易に、少なくとも氏名、会社名、および店舗名のうちのいずれかを表す文字列を同定することを可能にするという効果を奏する。 Therefore, it is possible to identify a character string representing at least one of the name, the company name, and the store name from the image information about the medium in which the character string is described by a relatively simple process. Also, in particular, more easily from character strings extracted from image information about media such as business cards in English notation that the name, company name, store name rarely contain character strings that serve as specific classification indices, It becomes possible to identify a character string representing at least one of a name, a company name, and a store name. Therefore, it is possible to more easily identify a character string representing at least one of a name, a company name, and a store name from a character string extracted from image information about a specific medium. Play.

〔実施の形態１〕
本発明の一実施形態について図１ないし図１６に基づいて説明すれば、以下の通りである。なお、以下の説明に用いる図面は、同一の部材または同一の機能のものについては同一の符号を付してある。従って、それらについての詳細な説明は繰り返さない。 [Embodiment 1]
An embodiment of the present invention will be described with reference to FIGS. 1 to 16 as follows. In the drawings used for the following description, the same members or the same functions are denoted by the same reference numerals. Therefore, detailed description thereof will not be repeated.

最初に、図１を用いて情報処理装置１の構成の概要について説明を行う。図１は、本実施の形態における情報処理装置１の概略的構成を示す機能ブロック図である。情報処理装置１は、図１に示すように、画像データ取得部１１、文字認識辞書データ格納部１２、文字認識部１３、アドレス情報取得部（アドレス情報取得手段）１４、比較照合用文字列取得部（比較照合用文字列取得手段）１５、識別文字列生成部（識別文字列生成手段）１６、文字列比較照合部（文字列比較照合手段）１７、および項目分類決定部（同定手段）１８を備えている。なお、情報処理装置１は、名刺などの媒体についての画像情報をもとに文字認識を行い、媒体に記載されている氏名、会社名などの文字列を同定し、氏名、会社名などの項目分類を行うものである。 First, the outline of the configuration of the information processing apparatus 1 will be described with reference to FIG. FIG. 1 is a functional block diagram showing a schematic configuration of an information processing apparatus 1 according to the present embodiment. As shown in FIG. 1, the information processing apparatus 1 includes an image data acquisition unit 11, a character recognition dictionary data storage unit 12, a character recognition unit 13, an address information acquisition unit (address information acquisition unit) 14, and a comparison verification character string acquisition. Part (comparison character string acquisition means) 15, identification character string generation part (identification character string generation means) 16, character string comparison collation part (character string comparison collation means) 17, and item classification determination part (identification means) 18 It has. The information processing apparatus 1 performs character recognition based on image information about a medium such as a business card, identifies a character string such as a name and a company name written on the medium, and items such as a name and a company name. Classification is performed.

まず、画像データ取得部１１は、カメラ機能付き携帯電話機のカメラで撮影した画像の画像データ（画像情報）、スキャナで読み取った画像の画像データ（画像情報）などを取得するものである。また、画像データ取得部１１は、取得した画像データを文字認識部１３に送るものである。なお、画像データ取得部１１で取得した画像データは、名刺（文字列が記載された媒体）をスキャナで読み取った画像に由来する画像データ、または名刺をカメラ機能付き携帯電話機のカメラで撮影して得た画像に由来する画像データであるものとして以降では説明を行っていく。 First, the image data acquisition unit 11 acquires image data (image information) of an image taken by a camera of a mobile phone with a camera function, image data (image information) of an image read by a scanner, and the like. The image data acquisition unit 11 sends the acquired image data to the character recognition unit 13. The image data acquired by the image data acquisition unit 11 is image data derived from an image obtained by reading a business card (medium on which a character string is written) with a scanner, or a business card is photographed with a camera of a mobile phone with a camera function. The following description will be made assuming that the image data is derived from the obtained image.

文字認識辞書データ格納部１２は、文字認識部１３で文字列を認識するために用いる文字列のデータ（文字認識辞書）を格納しているものである。例えば、文字認識辞書データ格納部１２には、漢字、平仮名、カタカナ、数字、英字、記号等の文字コードとその文字コードに対応する文字の標準特徴ベクトルとを格納している構成であってもよく、以降では、文字認識辞書データ格納部１２に文字コードとその文字コードに対応する文字の標準特徴ベクトル（文字認識辞書データ）とを格納しているものとして説明を行う。 The character recognition dictionary data storage unit 12 stores character string data (character recognition dictionary) used for the character recognition unit 13 to recognize a character string. For example, the character recognition dictionary data storage unit 12 may be configured to store character codes such as kanji, hiragana, katakana, numbers, alphabets, symbols, and standard feature vectors of characters corresponding to the character codes. In the following description, it is assumed that the character recognition dictionary data storage unit 12 stores a character code and a standard feature vector (character recognition dictionary data) of a character corresponding to the character code.

文字認識部１３は、文字認識辞書データ格納部１２に格納されている文字認識辞書データを参照して、画像データ取得部１１から送られてくる画像データについて文字認識するものである。例えば、文字認識部１３は、画像データ取得部１１から送られてくる画像データから特徴ベクトルを抽出し、文字認識辞書データ格納部１２の文字認識辞書データのすべての文字コードの標準特徴ベクトルと照合し、文字認識を行うものである。なお、文字認識部１３には、公知のＯＣＲ（Optical Character Recognition）を利用する構成であってもよい。 The character recognition unit 13 refers to the character recognition dictionary data stored in the character recognition dictionary data storage unit 12 and performs character recognition on the image data sent from the image data acquisition unit 11. For example, the character recognition unit 13 extracts a feature vector from the image data sent from the image data acquisition unit 11 and compares it with the standard feature vectors of all character codes in the character recognition dictionary data in the character recognition dictionary data storage unit 12. Character recognition. The character recognition unit 13 may be configured to use a known OCR (Optical Character Recognition).

アドレス情報取得部１４は、文字認識部１３での文字認識によって得られた文字列から、電子メールアドレス情報およびＵＲＬ（Uniform Resource Locator）情報といったアドレス情報を抽出し、取得するものである。例えば、電子メールアドレス情報を抽出する方法としては、文字列が半角文字からなり、接頭に“E-mail”、“mail”などの文字列（特定の文字列）が含まれるものを抽出する方法を用いてもよいし、電子メールアドレス情報を示す文字列には“@”が含まれるはずなので、“@”を含む文字列（特定の文字）を抽出する方法を用いてもよい。また、ＵＲＬ情報（ネットワークアドレス情報）を抽出する方法としては、例えば文字列が半角文字からなり、“http”または“www”から文字列が始まる文字列（特定の文字列）を抽出する方法を用いてもよいし、“http”または“www”を含む文字列（特定の文字列）を抽出する方法を用いてもよい。 The address information acquisition unit 14 extracts and acquires address information such as e-mail address information and URL (Uniform Resource Locator) information from a character string obtained by character recognition in the character recognition unit 13. For example, as a method of extracting e-mail address information, a method in which a character string is composed of single-byte characters and a character string (specific character string) such as “E-mail” or “mail” is included in the prefix is extracted. Since the character string indicating the e-mail address information should include “@”, a method of extracting a character string (specific character) including “@” may be used. In addition, as a method of extracting URL information (network address information), for example, a method of extracting a character string (a specific character string) starting from “http” or “www” where the character string is made up of single-byte characters. You may use, and the method of extracting the character string (specific character string) containing "http" or "www" may be used.

比較照合用文字列取得部１５は、文字認識部１３での文字認識によって得られた文字列から、上記アドレス情報に該当する文字列以外の文字列を取得し、“ （スペース）”、”,”の箇所で分割し、後述する文字列比較照合部１７での比較照合で用いる比較照合用文字列を得るものである。 The comparison collation character string acquisition unit 15 acquires a character string other than the character string corresponding to the address information from the character string obtained by the character recognition in the character recognition unit 13, and obtains “(space)”, “, The character string for comparison and collation used for comparison and collation in the character string comparison and collation unit 17 described later is obtained.

識別文字列生成部１６は、アドレス情報取得部１４で取得した電子メールアドレス情報を示す文字列を、特定の区切り文字で区切ることによって、個人識別文字列（識別文字列）を生成するものである。また、識別文字列生成部１６は、アドレス情報取得部１４で取得したネットワークアドレス情報を示す文字列を、特定の区切り文字で区切ることによって、会社識別文字列（識別文字列）を生成するものである。ここで言うところのそれぞれの特定の区切り文字については、後に詳述する。 The identification character string generation unit 16 generates a personal identification character string (identification character string) by dividing the character string indicating the e-mail address information acquired by the address information acquisition unit 14 with a specific delimiter. . The identification character string generation unit 16 generates a company identification character string (identification character string) by dividing the character string indicating the network address information acquired by the address information acquisition unit 14 with a specific delimiter. is there. Each specific delimiter here will be described in detail later.

文字列比較照合部１７は、識別文字列生成部１６で生成された個人識別文字列および会社識別文字列と比較照合用文字列取得部１５で得られた比較照合用文字列との比較照合を行うものである。 The character string comparison / collation unit 17 performs comparison / collation between the personal identification character string and company identification character string generated by the identification character string generation unit 16 and the comparison / collation character string obtained by the comparison / collation character string acquisition unit 15. Is what you do.

そして、項目分類決定部１８は、文字列比較照合部１７における比較照合の結果（比較照合結果）に基づいて、文字列の項目を分類する（氏名、会社名を表す文字列を同定する）ものである。なお、項目分類決定部１８で決定された分類の結果（分類結果）は、情報処理装置１の外部に備えられた表示装置などに送られ、分類結果が表示装置などで表示される。 Then, the item classification determination unit 18 classifies the items of the character string based on the result of the comparison and collation (comparison collation result) in the character string comparison and collation unit 17 (identifies the character string representing the name and company name). It is. The classification result (classification result) determined by the item classification determination unit 18 is sent to a display device provided outside the information processing apparatus 1, and the classification result is displayed on the display device or the like.

次に、図２を用いて、情報処理装置１での動作フローについて説明を行う。図２は、情報処理装置１での動作フローの一例を示すフローチャートである。なお、ここでは、図３に示すような英文名刺の画像データのすべてをもとにして情報処理装置１で処理を行う場合を例にとって説明を行う。 Next, the operation flow in the information processing apparatus 1 will be described with reference to FIG. FIG. 2 is a flowchart illustrating an example of an operation flow in the information processing apparatus 1. Here, a case where processing is performed by the information processing apparatus 1 based on all image data of an English business card as shown in FIG. 3 will be described as an example.

まず、ステップＳ１では、文字認識部１３が、画像データ取得部１１で取得した画像データに対して文字認識を行う。なお、文字認識は縦書きの名刺の場合には１行単位、横書きの名刺の場合には１列単位で行うものとする。図３に示すような英文名刺の画像データについて文字認識を行った結果として、図４（ａ）に示すように、“Ray Smith”、“NML Market”、“○○Avenue,△△,[][]”、“TEL（999）888-5678”、“FAX（999)888-1234”、”E-mail：rsmith@nmlMar.com”、“www.nml-Market.com”といった文字列が得られることになる。なお、“[]”の部分には、図４（ａ）にも示すように、白抜きの四角が本来は当てはまるものとし、以降の“[]”の部分についても白抜きの四角が当てはまるものとする。 First, in step S1, the character recognition unit 13 performs character recognition on the image data acquired by the image data acquisition unit 11. Character recognition is performed in units of one line for vertical business cards and in units of one column for horizontal business cards. As a result of performing character recognition on the image data of an English business card as shown in FIG. 3, as shown in FIG. 4A, “Ray Smith”, “NML Market”, “XX Avenue, Δ △, [] [] ”,“ TEL (999) 888-5678 ”,“ FAX (999) 888-1234 ”,“ E-mail: rsmith@nmlMar.com ”,“ www.nml-Market.com ” Will be. As shown in FIG. 4A, a white square is originally applicable to the “[]” portion, and a white square is also applicable to the subsequent “[]” portion. And

続いて、ステップＳ２では、文字認識部１３での文字認識の結果によって得られた文字列から、電子メールアドレス情報およびＵＲＬ（ネットワークアドレス）情報といったアドレス情報を示す文字列をアドレス情報取得部１４が抽出し、取得する。本例では、アドレス情報取得部１４によって、図４（ａ）の破線で囲んだ箇所に示す電子メールアドレス情報“E-mail：rsmith@nmlMar.com”とネットワークアドレス情報“www.nml-Market.com”とを取得することになる。また、電子メールアドレス情報を示す文字列中の“E-mail：”との文字列（メールアドレスおよびネットワークアドレスの構成要素以外を表す文字列であることが明らかな文字列）は、電子メールアドレス情報ごとに共通の文字列であり、氏名、会社名などが含まれていない、メールアドレスおよびネットワークアドレスの構成要素以外を表す文字列であるため、アドレス情報取得部１４は、“E-mail：”の部分を削除してアドレス情報を取得する構成であってもよく、本例では、図４（ｂ）の破線で囲んだ箇所に示すような“rsmith@nmlMar.com”との文字列をアドレス情報として取得することとする。なお、ここでは、“E-mail：”の部分を削除する構成を例として示したが、必ずしもこれに限らない。例えば、氏名、会社名などが含まれないなどのメールアドレスおよびネットワークアドレスの構成要素以外を表す文字列であることが明らかな部分であれば上記削除を行う構成であってもよく、“mail：”などの部分（メールアドレスおよびネットワークアドレスの構成要素以外を表す文字列であることが明らかな文字列）についても同様に削除する構成であってもよい。 Subsequently, in step S2, the address information acquisition unit 14 obtains a character string indicating address information such as e-mail address information and URL (network address) information from a character string obtained as a result of character recognition by the character recognition unit 13. Extract and get. In this example, the address information acquisition unit 14 uses the e-mail address information “E-mail: rsmith@nmlMar.com” and the network address information “www.nml-Market. com ”. In addition, the character string “E-mail:” in the character string indicating the e-mail address information (a character string that is clearly a character string representing components other than the mail address and the network address component) is the e-mail address. Since the information is a common character string for each information and does not include a name, a company name, or the like and is a character string that represents a component other than the constituent elements of the mail address and the network address, the address information acquisition unit 14 “E-mail: The address information may be acquired by deleting the part “”. In this example, the character string “rsmith@nmlMar.com” as shown in the part surrounded by the broken line in FIG. It will be acquired as address information. Here, the configuration in which the “E-mail:” part is deleted is shown as an example, but the configuration is not necessarily limited thereto. For example, if it is clear that the character string represents a component other than the components of the email address and network address, such as a name and company name not included, the above deletion may be used. A configuration such as “” (a character string that is clearly a character string representing a component other than the constituent elements of the mail address and the network address) may be similarly deleted.

次に、ステップＳ３では、文字認識部１３により得られた、アドレス情報に該当する文字列以外の文字列（比較照合用文字列候補）を、比較照合用文字列取得部１５によって、さらに“ （スペース）”、“,”などの区切り文字（所定の区切り文字）の前後で分割する（区切り文字で区切る）。本例では、図５（ａ）に示す比較照合用文字列候補（図５（ａ）中の文字列候補１）から、図５（ｂ）に示す比較照合用文字列（図５（ｂ）中の文字列候補２）が得られる。また、文字列比較照合部１７での比較照合時に、分割された文字列（比較照合用文字列）が、名刺（特定の媒体）上で同一行にもともと記載されていたか否かの情報が必要となるため、文字列（比較照合用文字列候補）を分割する際に、名刺上で同一行にもともと記載されていたことを何らかの形で保持しておく。本例では、同一行にもともと記載されていた比較照合用文字列同士については、図５（ｂ）に示すように同一のインデックスを付与している。なお、ここで言うところの区切り文字（所定の区切り文字）としては、“ （スペース）”、“,”などの他に、“（”、“）”、“‐”、“_”などを含んでもよく、それ以外の任意の文字を区切り文字としてもよい。また、区切り文字として以上のいずれかのみを用いる構成であってもよいし、以上の組み合わせを用いる構成であってもよい。 Next, in step S3, the character string (comparison character string candidate) other than the character string corresponding to the address information obtained by the character recognizing unit 13 is further converted into a character string “(( (Space)) Divide before and after delimiters (predetermined delimiters) such as “,” (separate by delimiters). In this example, from the comparison / matching character string candidate (character string candidate 1 in FIG. 5 (a)) shown in FIG. 5 (a) to the comparison / matching character string (FIG. 5 (b)). The middle character string candidate 2) is obtained. Further, information on whether or not the divided character strings (comparison character strings) were originally written on the same line on the business card (specific medium) at the time of comparison and collation by the character string comparison and collation unit 17 is necessary. Therefore, when the character string (character string candidate for comparison and collation) is divided, the fact that it was originally written on the same line on the business card is held in some form. In this example, the comparison index character strings originally described in the same line are given the same index as shown in FIG. Note that the delimiter (predetermined delimiter) mentioned here includes "(", ")", "-", "_", etc. in addition to "(space)", ",", etc. However, any other character may be used as a delimiter. Moreover, the structure which uses only any of the above as a delimiter may be sufficient, and the structure which uses the above combination may be sufficient.

なお、比較照合用文字列候補のうち、“TEL”または“FAX”との文字列に続く数字等の文字列（氏名、会社名、および店舗名以外を表す文字列であることが明らかな文字列）は、電話番号またはファックス番号のみである場合が多いことから、“TEL”および“FAX”との文字列、ならびにTEL”または“FAX”との文字列に続く文字列は、電話番号またはファックス番号として決定してしまい、比較照合用文字列から削除してしまう構成であってもよい。ただし、“TEL”または“FAX”という特定文字列に続いてはいるものの、電話番号またはファックス番号の特徴として当てはまらない文字列については、比較照合用文字列とする構成であってもよい。 Of the candidate strings for comparison and collation, a character string such as a number following the character string “TEL” or “FAX” (a character that clearly represents a character string other than the name, company name, and store name) Column) is often only a phone number or fax number, so the string “TEL” and “FAX” and the string following the string “TEL” or “FAX” It may be configured to be determined as a fax number and deleted from the comparison / matching character string, although it is followed by a specific character string of “TEL” or “FAX”, but the telephone number or fax number. A character string that does not apply as a feature may be configured as a comparison / matching character string.

続いて、ステップＳ４では、識別文字列生成部１６が、アドレス情報取得部１４によって得られたアドレス情報のうちの電子メールアドレス情報から個人識別文字列と会社識別文字列とを取得し、ステップＳ５に移る。また、ステップＳ５では、識別文字列生成部１６が、アドレス情報取得部１４によって得られたアドレス情報のうちのネットワークアドレス情報から会社識別文字列を取得し、ステップＳ６に移る。なお、ステップＳ４での詳しい処理については後述する。 Subsequently, in step S4, the identification character string generation unit 16 acquires a personal identification character string and a company identification character string from the e-mail address information in the address information obtained by the address information acquisition unit 14, and step S5. Move on. In step S5, the identification character string generation unit 16 acquires a company identification character string from the network address information in the address information obtained by the address information acquisition unit 14, and proceeds to step S6. Detailed processing in step S4 will be described later.

ここで、図６を用いて、ステップＳ４での、電子メールアドレス情報から、個人識別文字列および会社識別文字列を生成する処理についての詳細な説明を行う。図６は、電子メールアドレス情報から個人識別文字列および会社識別文字列を生成する処理の一例の詳細を示すフローチャートである。 Here, with reference to FIG. 6, a detailed description will be given of the process of generating the personal identification character string and the company identification character string from the e-mail address information in step S4. FIG. 6 is a flowchart showing details of an example of processing for generating a personal identification character string and a company identification character string from the electronic mail address information.

まず、ステップＳ４１では、アドレス情報取得部１４によって得られたアドレス情報のうちの電子メールアドレス情報から、“@”の箇所（電子メールアドレスのユーザ名とホスト名との間に入る文字）を検出し、ステップＳ４２に移る。ステップＳ４２では、当該電子メールアドレス情報から、“@”以前の文字列を取得し、ステップＳ４３に移る。ここで、“@”以前の文字列は、個人識別文字列の候補となる。 First, in step S41, the location of “@” (characters between the user name and host name of the email address) is detected from the email address information in the address information obtained by the address information acquisition unit 14. Then, the process proceeds to step S42. In step S42, a character string before “@” is acquired from the e-mail address information, and the process proceeds to step S43. Here, the character string before “@” is a candidate for a personal identification character string.

ステップＳ４３では、“@”以前の文字列に含まれることのある“_”、“‐”、“．”といった区切り文字（メールアドレス情報に含まれる区切り文字）が、個人識別文字列の候補の文字列内に存在するか否かを判定する。そして、上記区切り文字が存在した場合（ステップＳ４３でＹｅｓ）には、ステップＳ４４に移る。また、上記区切り文字が存在しなかった場合（ステップＳ４３でＮｏ）には、ステップＳ４５に移る。 In step S43, delimiters such as “_”, “−”, and “.” That may be included in the character string before “@” (delimiters included in the mail address information) are candidates for personal identification character strings. Determine if it exists in the string. If the delimiter is present (Yes in step S43), the process proceeds to step S44. If the delimiter does not exist (No in step S43), the process proceeds to step S45.

ステップＳ４４では、区切り文字が存在した位置の前後で個人識別文字列の候補の文字列を分割し、分割で得られたそれぞれの文字列を新たな個人識別文字列の候補の文字列として保持する。また、ステップＳ４５では、個人識別文字列の候補の文字列をそのまま個人識別文字列として保持する。 In step S44, the character strings of the personal identification character string candidates are divided before and after the position where the delimiter character exists, and each character string obtained by the division is held as a new character string of the personal identification character string candidate. . In step S45, the personal identification character string candidate character string is held as it is as a personal identification character string.

本例では、電子メールアドレス情報として“rsmith@nmlMar.com”が得られているので、ステップＳ４１での処理によって“@”が検出される。そして、ステップＳ４２の処理によって、個人識別文字列の候補として“rsmith”が取得される。なお、“rsmith”には“_”、“‐”、“．”といった区切り文字が含まれないので、ステップＳ４４での処理は行われず、“rsmith”との文字列が、電子メールアドレス情報から生成される個人識別文字列となる。もし、電子メールアドレス情報が“r-smith@nmlMar.com”であった場合には、個人識別文字列の候補となる“r-smith”に“‐”との区切り文字が存在するので、ステップＳ４３での処理によって、“r”と“smith”との２つの個人識別文字列が生成されることになる。 In this example, “rsmith@nmlMar.com” is obtained as the e-mail address information, so “@” is detected by the processing in step S41. Then, “rsmith” is acquired as a candidate for the personal identification character string by the process of step S42. Since “rsmith” does not include delimiters such as “_”, “−”, “.”, The process in step S44 is not performed, and the character string “rsmith” is obtained from the e-mail address information. This is a generated personal identification character string. If the e-mail address information is “r-smith@nmlMar.com”, there is a delimiter with “-” in “r-smith” that is a candidate for the personal identification character string. By the processing in S43, two personal identification character strings “r” and “smith” are generated.

続いて、ステップＳ４６以降の処理は、電子メールアドレス情報のうちの“@”以降の文字列を取得し、ステップＳ４７に移る。なお、“@”以降の文字列は、会社識別文字列の候補となる。ステップＳ４７では、“@”以降の文字列に含まれることのある“_”、“‐”、“．”といった区切り文字（メールアドレス情報に含まれる区切り文字）が、会社識別文字列の候補の文字列内に存在するか否かを判定する。そして、上記区切り文字が存在した場合（ステップＳ４７でＹｅｓ）には、ステップＳ４８に移る。また、上記区切り文字が存在しなかった場合（ステップＳ４７でＮｏ）には、ステップＳ４９に移る。 Subsequently, in the processing after step S46, the character string after “@” in the e-mail address information is acquired, and the process proceeds to step S47. A character string after “@” is a candidate for a company identification character string. In step S47, delimiters such as “_”, “−”, “.” That may be included in the character string after “@” (delimiters included in the mail address information) are candidates for the company identification character string. Determine if it exists in the string. If the delimiter is present (Yes in step S47), the process proceeds to step S48. If the delimiter does not exist (No in step S47), the process proceeds to step S49.

ステップＳ４８では、区切り文字が存在した位置の前後で会社識別文字列の候補の文字列を分割し、分割で得られたそれぞれの文字列を新たな会社識別文字列の候補の文字列として保持する。また、ステップＳ４９では、会社識別文字列の候補の文字列をそのまま会社識別文字列として保持し、フローを終了する。 In step S48, the company identification character string candidate character string is divided before and after the position where the delimiter character exists, and each character string obtained by the division is held as a new company identification character string candidate character string. . In step S49, the candidate character string of the company identification character string is held as it is as the company identification character string, and the flow ends.

本例では、会社識別文字列は“nmlMar.com”なので、区切り文字“．”の前後で“nmlMar.com”は“nmlMar”と“com”とに分割され、“nmlMar”と“com”との２つの会社識別文字列が生成されることになる。ただし、“com”は、電子メールアドレスのそれぞれで共通に使用される文字列なので、識別文字列生成部１６で削除する構成であってもよい。他にも、任意で“co”、“ne”、“go”、“jp”、および“ca”などの電子メールアドレスに共通で用いられる文字列を、図示しない辞書データ格納部などに格納しておき、上記辞書データ格納部の辞書データを照合することにより、識別文字列生成部１６で当該文字列を削除する構成としてもよい。本例では、“com”を削除して“nmlMar”を会社識別文字列として生成する場合を例として説明を続ける。 In this example, the company identification string is “nmlMar.com”, so “nmlMar.com” is divided into “nmlMar” and “com” before and after the delimiter “.”, And “nmlMar” and “com” These two company identification character strings are generated. However, since “com” is a character string commonly used for each e-mail address, the identification character string generation unit 16 may delete it. In addition, character strings commonly used for e-mail addresses such as “co”, “ne”, “go”, “jp”, and “ca” are optionally stored in a dictionary data storage unit (not shown). In addition, the identification character string generation unit 16 may delete the character string by collating the dictionary data in the dictionary data storage unit. In this example, “com” is deleted and “nmlMar” is generated as a company identification character string.

さらに、ここでは、図７を用いて、ステップＳ５での、ネットワークアドレス情報から、会社識別文字列を生成する処理についての詳細な説明を行う。図７は、ネットワークアドレス情報から会社識別文字列を生成する処理の一例の詳細を示すフローチャートである。 Further, here, with reference to FIG. 7, a detailed description will be given of a process of generating a company identification character string from the network address information in step S5. FIG. 7 is a flowchart showing details of an example of processing for generating a company identification character string from network address information.

まず、ステップＳ５１では、アドレス情報取得部１４によって得られたアドレス情報のうちのネットワークアドレス情報に、区切り文字が存在するかを判定する処理である。なお、ネットワークアドレス情報における区切り文字（ネットワークアドレス情報に含まれる区切り文字）としては、“‐”、“．”、“_”、“/”、“：”などが対象となる。そして、上記区切り文字が存在した場合（ステップＳ５１でＹｅｓ）には、ステップＳ５２に移る。また、上記区切り文字が存在しなかった場合（ステップＳ５１でＮｏ）には、ステップＳ５３に移る。 First, in step S51, it is a process for determining whether or not a delimiter exists in the network address information of the address information obtained by the address information acquisition unit 14. Note that “−”, “.”, “_”, “/”, “:”, And the like are targeted as delimiters in the network address information (delimiters included in the network address information). If the delimiter is present (Yes in step S51), the process proceeds to step S52. If the delimiter character does not exist (No in step S51), the process proceeds to step S53.

ステップＳ５２では、区切り文字が存在した位置の前後でネットワークアドレス情報の文字列を分割し、分割で得られたそれぞれの文字列を新たなネットワークアドレス情報の文字列として保持する。また、ステップＳ５３では、ネットワークアドレス情報の文字列をそのまま会社識別文字列として保持し、フローを終了する。 In step S52, the character string of the network address information is divided before and after the position where the delimiter character exists, and each character string obtained by the division is held as a new character string of the network address information. In step S53, the character string of the network address information is held as it is as the company identification character string, and the flow ends.

本例では、ネットワークアドレス情報として“www.nml-Market.com”が得られているので、ステップＳ５３での処理によって、“www”と“nml”と“Market”と“com”との４つの会社識別文字列が生成されることになる。また、ネットワークアドレス情報を示す文字列中の“www”、“com”などの文字列（氏名、会社名、および店舗名以外を表す文字列であることが明らかな文字列）は、ネットワークアドレス情報ごとに共通の文字列であり、会社名などが含まれていないことが明らかなため、アドレス情報取得部１４は、“www”、“com”などの部分を削除して会社識別文字列を取得する構成であってもよい。本例では、“www”および“com”を削除して“nml”および“Market”を会社識別文字列として生成する場合を例として説明を続ける。なお、ここでは、“www”、“com”の部分を削除する構成を例として示したが、必ずしもこれに限らない。例えば、会社名などが含まれないことが明らかな文字列（氏名、会社名、および店舗名以外を表す文字列であることが明らかな文字列）であれば上記削除を行う構成であってもよい。また、電子メールアドレス情報の場合と同様に、ネットワークアドレスに共通で用いられる文字列を、図示しない辞書データ格納部などに格納しておき、上記辞書データ格納部の辞書データを照合することにより、識別文字列生成部１６で当該文字列を削除する構成としてもよい。 In this example, since “www.nml-Market.com” is obtained as the network address information, four processes of “www”, “nml”, “Market”, and “com” are obtained by the processing in step S53. A company identification string will be generated. In addition, character strings such as “www” and “com” in the character string indicating the network address information (character strings that are clearly character strings other than names, company names, and store names) are network address information. The address information acquisition unit 14 deletes the parts such as “www” and “com” and acquires the company identification character string because it is clear that each character string is common and does not include the company name. It may be configured to. In this example, “www” and “com” are deleted, and “nml” and “Market” are generated as company identification character strings. Here, the configuration in which the portions “www” and “com” are deleted is shown as an example, but the present invention is not necessarily limited thereto. For example, if it is a character string that does not include a company name or the like (a character string that is clearly a character string that represents a name other than the name, company name, and store name), the deletion may be performed. Good. Further, as in the case of the e-mail address information, a character string commonly used for the network address is stored in a dictionary data storage unit (not shown) and the dictionary data in the dictionary data storage unit is collated, The identification character string generation unit 16 may delete the character string.

すなわち、本例では、ステップＳ４の処理の結果、図８（ａ）に示すように、個人識別文字列として“rsmith”、会社識別文字列として図８（ｂ）に示すように、“nmlMar”が得られ、ステップＳ５の処理の結果、図８（ｂ）に示すように、会社識別文字列として“nml”、“Market”が得られることになる。 That is, in this example, as a result of the process of step S4, as shown in FIG. 8A, “rsmith” as the personal identification character string and “nmlMar” as the company identification character string as shown in FIG. 8B. As a result of the processing in step S5, “nml” and “Market” are obtained as the company identification character string as shown in FIG. 8B.

続いて、ステップＳ６では、文字列比較照合部１７が、識別文字列生成部１６で得た個人識別文字列と比較照合用文字列取得部１５で取得した複数の比較照合用文字列とを比較照合する。詳細には、上記比較照合用文字列と個人識別文字列とを比較し、両者で一致する文字数を求める。なお、ここで求められる文字数が、当該比較照合で得られる評価値となる。 Subsequently, in step S6, the character string comparison / collation unit 17 compares the personal identification character string obtained by the identification character string generation unit 16 with a plurality of comparison / collation character strings obtained by the comparison / collation character string obtaining unit 15. Match. More specifically, the comparison matching character string and the personal identification character string are compared, and the number of characters that match is obtained. Note that the number of characters obtained here is an evaluation value obtained by the comparison and collation.

ここで、図９（ａ）ないし図９（ｃ）を用いて、比較照合用文字列“Ray”と個人識別文字列“rsmith”との比較照合の具体例を示す。なお、比較照合用文字列および個人識別文字列については、正規化を行うことによって大文字を小文字として処理を行ってもよいし、正規化を行わずに処理をおこなってもよいが、ここでは、正規化を行うことによって大文字を小文字として処理する場合を例に挙げて説明を行う。すなわち、“Ray”は“ray”として処理を行う。 Here, a specific example of the comparison and collation between the comparison / matching character string “Ray” and the personal identification character string “rsmith” will be described with reference to FIGS. 9A to 9C. Note that the comparison matching character string and the personal identification character string may be processed by converting the uppercase letter to lowercase by normalization, or may be processed without normalization. The case where uppercase letters are processed as lowercase letters by normalization will be described as an example. That is, “Ray” is processed as “ray”.

まず、図９（ａ）で示すように、“ray”と“rsmith”とを前方の先頭文字（前方先頭文字）から順番に比較照合していき（前方一致処理）、文字が一致するごとに評価値として１を加算していく。また、両者の前方先頭文字が一致していた場合には、さらに評価値として１を加算するものとする。“ray”と“rsmith”との比較照合では、前方先頭文字が一致しているので、評価値として２（前方先頭文字の一致と文字の一致との分）が加算される。引き続き、後続の文字も比較照合していくが、一致する文字はないので、“ray”と“rsmith”との前方一致処理では＋２という評価値が最終的に得られることになる。なお、前方先頭文字から順番に比較照合していくと、“ray”と“rsmith”との文字数の違いにより、“rsmith”のうちの“ith”は比較照合されないことになるが、文字数の違いによる評価値の減算などは本例では特に行わないものとする。しかしながら、文字数の違いによる評価値の減算などを行う構成であっても構わないものとする。 First, as shown in FIG. 9A, “ray” and “rsmith” are compared and collated in order from the first character in the front (the first character in the front) (forward matching process). 1 is added as an evaluation value. In addition, when both front leading characters match, 1 is further added as an evaluation value. In the comparison / collation between “ray” and “rsmith”, since the first character in front matches, 2 is added as an evaluation value (a match between the first character in front and the character match). Subsequently, the subsequent characters are also compared and collated, but since there is no matching character, an evaluation value of +2 is finally obtained in the forward matching process of “ray” and “rsmith”. Note that when comparing and collating in order from the first character in front, “ith” of “rsmith” is not compared and collated due to the difference in the number of characters between “ray” and “rsmith”. In this example, the subtraction of the evaluation value by is not particularly performed. However, the configuration may be such that the evaluation value is subtracted depending on the number of characters.

続いて、図９（ｂ）で示すように、“ray”と“rsmith”とを後方の先頭文字（後方先頭文字）から順番に比較照合していき（後方一致処理）、文字が一致するごとに評価値として１を加算していく。“ray”と“rsmith”との比較照合では、“ray”の“y”と“rsmith”の“h”とを比較し、一致すれば評価値として１を加算し、両者の後方先頭文字が一致した場合には、さらに評価値として１を加算するものとする。“ray”と“rsmith”との比較照合では、後方先頭文字が一致していないので、評価値として０が加算される（評価値が加算されない）。引き続き、後続の文字も比較照合していくが、一致する文字はないので、“ray”と“rsmith”との後方一致処理では０という評価値が最終的に得られることになる。 Subsequently, as shown in FIG. 9B, “ray” and “rsmith” are compared and collated in order from the rear first character (rear first character) (backward matching process), and each time the characters match. 1 is added as an evaluation value. In comparison comparison of “ray” and “rsmith”, “y” of “ray” is compared with “h” of “rsmith”, and if they match, 1 is added as an evaluation value, If they match, 1 is further added as the evaluation value. In the comparison collation between “ray” and “rsmith”, since the rear head characters do not match, 0 is added as the evaluation value (the evaluation value is not added). Subsequently, the subsequent characters are also compared and collated, but since there is no matching character, an evaluation value of 0 is finally obtained in the backward matching process of “ray” and “rsmith”.

そして、最後に、図９（ｃ）で示すように、“ray”と“rsmith”とを、比較照合する相手を１文字ずつ順番にずらしながら比較照合していき、文字が一致するごとに評価値として１を加算していく。ここでは、比較照合用文字列と個人識別文字列のうち、文字数が少ないものを、文字数の多いものに対して１文字ずつ順番にずらしながら比較照合していく。すなわち、ここでは、“ray”の“r”と“rsmith”の“s”との比較照合から順番に、図９（ｃ）に示すように行っていくことになる。そして、“ray”の“r”と“rsmith”の“s”とから始まる比較照合が終了した後に、比較照合する相手を１文字ずらし、“ray”の“r”と“rsmith”の“m”とから始まる比較照合を開始するといったように、比較照合する相手を１文字ずつ順番にずらしながらの比較照合を行うことを繰り返すことによって、“ray”という文字列の評価値を比較照合の種類ごとに得るが、そのうちの最大の評価値を、比較照合する相手を１文字ずつ順番にずらしながらの比較照合を行った場合の最終的な評価値とする。ただし、評価値が１以下であった場合には、評価値は０とする。また、評価値が１以下であった場合の評価値を０とするのは、前方一致処理、後方一致処理でも同様とする。そして、文字列比較照合部１７での比較照合用文字列と個人識別文字列との比較照合の結果として最終的に得られる評価値としては、前方一致処理、後方一致処理、および比較照合する相手を１文字ずつ順番にずらしながらの比較照合を行う処理のそれぞれで得られた評価値のうちの最大の評価値を選択するものとする。以上のようにして、ステップＳ６の処理を、比較照合用文字列と個人識別文字列とのすべての組み合わせに対して行い、各比較照合用文字列の評価値を得る。 Finally, as shown in FIG. 9 (c), “ray” and “rsmith” are compared and collated while sequentially shifting the counterparts for comparison and matching one character at a time, and evaluated each time the characters match. 1 is added as a value. Here, of the comparison / matching character string and the personal identification character string, those having a small number of characters are compared and collated while being sequentially shifted one by one with respect to those having a large number of characters. That is, here, as shown in FIG. 9C, the comparison is made sequentially from the comparison collation between “r” of “ray” and “s” of “rsmith”. After the comparison and collation starting from “r” of “ray” and “s” of “rsmith” is completed, the comparison partner is shifted by one character, “r” of “ray” and “m” of “rsmith” The character string evaluation value of “ray” is compared with the type of comparison collation by repeating the comparison collation while sequentially shifting the comparison collation partner one character at a time, such as starting the comparison collation starting with “”. The maximum evaluation value is obtained as a final evaluation value when the comparison and collation is performed while sequentially shifting the counterparts to be compared one character at a time. However, when the evaluation value is 1 or less, the evaluation value is 0. The evaluation value 0 when the evaluation value is 1 or less is the same in the forward matching process and the backward matching process. The evaluation value finally obtained as a result of the comparison and collation between the character string for comparison and collation in the character string comparison and collation unit 17 is a forward match process, a backward match process, and a partner to be compared and collated. It is assumed that the maximum evaluation value is selected from the evaluation values obtained in each of the processes for performing comparison and collation while sequentially shifting the characters one by one. As described above, the process of step S6 is performed on all combinations of the comparison / matching character string and the personal identification character string, and the evaluation value of each comparison / matching character string is obtained.

ここで、図１０に、各比較照合用文字列“Ray”、“Smith”、“NML”、“Market”、“○○Avenue”、“△△”、および“[][]”の評価値を計算した結果を示す。図１０に示すように、各比較照合用文字列ごとに、インデックスと評価値とが対応付けられており、“Smith”のインデックスは「１」であって、評価値が最大の「＋６」であることがわかる。 Here, FIG. 10 shows the evaluation values of the comparison matching strings “Ray”, “Smith”, “NML”, “Market”, “XX Avenue”, “△△”, and “[] []”. The result of calculating is shown. As shown in FIG. 10, an index and an evaluation value are associated with each comparison matching character string, the index of “Smith” is “1”, and the evaluation value is “+6” which is the maximum. I know that there is.

続いて、ステップＳ７では、比較照合用文字列と個人識別文字列とのすべての組み合わせに対して比較照合が終了していた場合（ステップＳ７でＹｅｓ）には、ステップＳ８に移る。また、比較照合用文字列と個人識別文字列とのすべての組み合わせに対して比較照合が終了していなかった場合（ステップＳ７でＮｏ）には、ステップＳ６に戻ってフローを繰り返す。 Subsequently, in step S7, when the comparison / collation has been completed for all combinations of the comparison / collation character string and the personal identification character string (Yes in step S7), the process proceeds to step S8. If the comparison / collation is not completed for all combinations of the comparison / collation character string and the personal identification character string (No in step S7), the process returns to step S6 and the flow is repeated.

ステップＳ８では、文字列比較照合部１７での比較照合結果に基づいて、項目分類決定部１８が、氏名項目に分類される文字列の決定を行う（氏名を表す文字列を同定する）。具体的には、文字列比較照合部１７で得られた評価値が最大であった文字列を、氏名を表す文字列として同定する。本例では、図１０に示すように、比較照合用文字列“Smith”の評価値が最大であるので、“Smith”との文字列を、氏名を表す文字列として同定する。 In step S8, based on the comparison / collation result in the character string comparison / collation unit 17, the item classification determination unit 18 determines a character string classified into the name item (identifies a character string representing the name). Specifically, the character string having the maximum evaluation value obtained by the character string comparison / collation unit 17 is identified as a character string representing a name. In this example, as shown in FIG. 10, since the evaluation value of the comparison matching character string “Smith” is the maximum, the character string “Smith” is identified as a character string representing a name.

なお、氏名は名刺上の同一行で姓と名とが記載されていることが多いので、名刺上の同一行に存在した比較照合用文字列（同一のインデックスが付与されている比較照合用文字列）も氏名を表す文字列として同定する構成であってもよい。すなわち、本例では、“Smith”と同じインデックス（１）が付与されている“Ray”も氏名を表す文字列として同定される。なお、姓と名との判断については、文字列の並び順に基づいて判断する構成にすればよく、英文名刺の場合は、名の次に姓の順に文字列が一般的に並ぶので並び順の先の“Ray”を名とし、並び順の後の“Smith”を姓として判断する構成にすればよい。以上で、氏名を表す文字列を同定する処理（氏名項目分類処理）は終了である。 Note that the full name is often written on the same line on the business card as the first name and last name, so the comparison / matching character string (the comparison / matching character with the same index assigned) Column) may be configured to be identified as a character string representing a name. That is, in this example, “Ray” to which the same index (1) as “Smith” is assigned is also identified as a character string representing the name. Note that the last name and first name can be determined based on the arrangement order of the character strings. In the case of English business cards, the character strings are generally arranged in the order of the first name followed by the last name. The first “Ray” may be used as a first name, and “Smith” after the arrangement order may be determined as a last name. This completes the process of identifying the character string representing the name (name item classification process).

次に、ステップＳ９では、文字列比較照合部１７が、識別文字列生成部１６で得た会社識別文字列と比較照合用文字列取得部１５で取得した複数の比較照合用文字列とを比較照合する。詳細には、上記比較照合用文字列と会社識別文字列とを比較し、両者で一致する文字数を求める。なお、ここで求められる文字数が、当該比較照合で得られる評価値となる。 Next, in step S9, the character string comparison / collation unit 17 compares the company identification character string obtained by the identification character string generation unit 16 with a plurality of comparison / collation character strings obtained by the comparison / collation character string obtaining unit 15. Match. More specifically, the above-described comparison / matching character string and the company identification character string are compared to determine the number of characters that match. Note that the number of characters obtained here is an evaluation value obtained by the comparison and collation.

本例では、比較照合用文字列と会社識別文字列とがそれぞれ複数存在しているが、ここでは、比較照合用文字列“NML”と会社識別文字列“nmlMar”との比較照合の具体例を示す。なお、比較照合用文字列および会社識別文字列については、正規化を行うことによって大文字を小文字として処理を行ってもよいし、正規化を行わずに処理をおこなってもよいが、ここでは、正規化を行うことによって大文字を小文字として処理する場合を例に挙げて説明を行う。すなわち、“NML”は“nml”となり、“nmlMar”は“nmlmar”となる。 In this example, there are multiple comparison matching strings and company identification strings, but here is a specific example of comparison matching between the comparison matching string “NML” and the company identification string “nmlMar” Indicates. In addition, for the comparison matching character string and the company identification character string, processing may be performed using uppercase letters as lowercase characters by performing normalization, or processing may be performed without performing normalization. The case where uppercase letters are processed as lowercase letters by normalization will be described as an example. That is, “NML” becomes “nml” and “nmlMar” becomes “nmlmar”.

まず、“nml”と“nmlmar”とを前方の先頭文字（前方先頭文字）から順番に比較照合していく（前方一致処理）。つまり、“nml”の“n”と“nmlmar”の“n”との比較から比較照合を開始する。なお、前方先頭文字から順番に比較照合していくと、“nml”と“nmlmar”との文字数の違いにより、“nmlmar”のうちの“mar”は比較照合されないことになるが、文字数の違いによる評価値の減算などは本例では特に行わないものとする。しかしながら、文字数の違いによる評価値の減算などを行う構成であっても構わないものとする。また、評価値については、比較照合用文字列と個人識別文字列との比較照合の場合と同様にして求めるものとする。すなわち、文字列比較照合部１７での比較照合用文字列と会社識別文字列との比較照合の結果として最終的に得られる評価値としては、前方一致処理、後方一致処理、および比較照合する相手を１文字ずつ順番にずらしながらの比較照合を行う処理のそれぞれで得られた評価値のうちの最大の評価値を選択するものとする。よって、本例では「＋４」が比較照合用文字列“NML”の評価値となる。 First, “nml” and “nmlmar” are compared and collated in order from the first character in the front (the first character in the front) (forward matching process). That is, the comparison collation is started by comparing “n” of “nml” with “n” of “nmlmar”. Note that when comparing and collating in order from the first character in front, “mar” in “nmlmar” is not compared and collated due to the difference in the number of characters between “nml” and “nmlmar”. In this example, the subtraction of the evaluation value by is not particularly performed. However, the configuration may be such that the evaluation value is subtracted depending on the number of characters. Further, the evaluation value is obtained in the same manner as in the case of the comparison and collation between the comparison collation character string and the personal identification character string. That is, as an evaluation value that is finally obtained as a result of the comparison and collation between the comparison / matching character string and the company identification character string in the character string comparison / collation unit 17, the forward matching process, the backward matching process, and the partner to be compared and collated It is assumed that the maximum evaluation value is selected from the evaluation values obtained in each of the processes for performing comparison and collation while sequentially shifting the characters one by one. Therefore, in this example, “+4” is the evaluation value of the comparison / matching character string “NML”.

続いて、ステップＳ１０では、比較照合用文字列と会社識別文字列とのすべての組み合わせに対して比較照合が終了していた場合（ステップＳ１０でＹｅｓ）には、ステップＳ１１に移る。また、比較照合用文字列と会社識別文字列とのすべての組み合わせに対して比較照合が終了していなかった場合（ステップＳ１０でＮｏ）には、ステップＳ９に戻ってフローを繰り返す。 Subsequently, in step S10, if the comparison / collation has been completed for all combinations of the comparison / collation character string and the company identification character string (Yes in step S10), the process proceeds to step S11. If the comparison / matching has not been completed for all combinations of the comparison / matching character string and the company identification character string (No in step S10), the process returns to step S9 and the flow is repeated.

ステップＳ１１では、文字列比較照合部１７での比較照合結果に基づいて、項目分類決定部１８が、会社項目に分類される文字列の決定を行う（会社名を表す文字列を同定する）。具体的には、文字列比較照合部１７で得られた評価値が最大であった文字列を、会社名を表す文字列として同定する。本例では、図１１（ａ）に示す全比較照合用文字列と図１１（ｂ）に示す全会社識別文字列とを文字列比較照合部１７によって各々比較照合することによって、図１２に示すような比較照合結果（評価値）を得る。図１２に示すように、各比較照合用文字列ごとに、インデックスと評価値とが対応付けられており、図１２からわかるように、評価値が最大の比較照合用文字列は、評価値が「＋７」の“Market”であるので、“Market”との文字列を、会社名を表す文字列として同定する。 In step S11, based on the comparison / collation result in the character string comparison / collation unit 17, the item classification determination unit 18 determines a character string classified into the company item (identifies a character string representing the company name). Specifically, the character string having the maximum evaluation value obtained by the character string comparison / collation unit 17 is identified as a character string representing the company name. In this example, the character string for comparison / collation shown in FIG. 11 (a) and the company identification character string shown in FIG. Such a comparison result (evaluation value) is obtained. As shown in FIG. 12, an index and an evaluation value are associated with each comparison / matching character string. As can be seen from FIG. 12, the comparison / matching character string having the largest evaluation value has an evaluation value of Since it is “Market” of “+7”, the character string “Market” is identified as a character string representing the company name.

なお、会社名は、名刺上の同一行で会社名を構成する文字列が羅列して記載されていることが多いので、名刺上の同一行に存在した比較照合用文字列（同一のインデックスが付与されている比較照合用文字列）も会社名を表す文字列として同定する構成であってもよい。すなわち、本例では、“Market”と同じインデックス（２）が付与されている“NML”も会社名を表す文字列として同定され、文字列の並び順に従って、“NML Market”が会社名を表す文字列として同定される。以上で、会社名を表す文字列を同定する処理（会社名項目分類処理）は終了であって、情報処理装置１での動作フローも終了である。 In many cases, the company name is written in the same line on the business card as the character string that forms the company name. The comparison and matching character string) may also be identified as a character string representing a company name. That is, in this example, “NML” assigned with the same index (2) as “Market” is also identified as a character string representing a company name, and “NML Market” represents a company name according to the arrangement order of the character strings. Identified as a string. This completes the process of identifying the character string representing the company name (company name item classification process), and the operation flow in the information processing apparatus 1 is also completed.

なお、以上の実施の形態は、最良のものについて説明したものであり、文字認識部１３での文字認識の結果に誤りがない場合について説明したものである。そこで、以下では、文字認識部１３が、図３に示す英文名刺の画像データから電子メールアドレス情報を、例えば図１３に示すように“l3mith@mnlMar.com”と誤認識した場合を例にとって説明を行う。 In the above embodiment, the best one has been described, and the case where there is no error in the result of character recognition in the character recognition unit 13 has been described. Therefore, in the following, a case where the character recognition unit 13 misrecognizes e-mail address information as “l3mith@mnlMar.com” as shown in FIG. 13, for example, from the image data of the English business card shown in FIG. 3 will be described as an example. I do.

図２に示したフローに従って処理をすすめると、図１４（ａ）に示すように、“Ray”、“Smith”、“NML”、“Market”、“○○Avenue”、“△△”、および“[][]”が比較照合用文字列として得られ、図１４（ｂ）および図１４（ｃ）に示すように、個人識別文字列が“l3mith”、会社識別文字列が“mnlMar”、“nml”、および“Market”として生成されることになる。そして、生成された個人識別文字列、会社識別文字列について、前方一致処理、後方一致処理、および比較照合する相手を１文字ずつ順番にずらしながらの比較照合の処理を行うことになる。 When the process proceeds according to the flow shown in FIG. 2, as shown in FIG. 14 (a), “Ray”, “Smith”, “NML”, “Market”, “XX Avenue”, “△△”, and “[] []” Is obtained as a comparison / matching character string, and as shown in FIGS. 14B and 14C, the personal identification character string is “l3mith”, the company identification character string is “mnlMar”, It will be generated as “nml” and “Market”. Then, with respect to the generated personal identification character string and company identification character string, a forward matching process, a backward matching process, and a comparison / matching process are performed while sequentially shifting the counterparts to be compared and matched one character at a time.

ここで、図１５を用いて、個人識別文字列“l3mith”と比較照合用文字列とを比較照合した結果を示す。図１５に示すように、比較照合用文字列“Smith”の評価値が「＋５」で最大であるので、“Smith”との文字列を、氏名を表す文字列として同定する。また、“Smith”と同じインデックス（１）が付与されている“Ray”も、氏名を表す文字列として同定される。 Here, FIG. 15 is used to show the result of comparing and collating the personal identification character string “l3mith” with the character string for comparison and collation. As shown in FIG. 15, since the evaluation value of the comparison matching character string “Smith” is “+5”, which is the maximum, the character string “Smith” is identified as a character string representing a name. Further, “Ray” to which the same index (1) as “Smith” is assigned is also identified as a character string representing the name.

次に、図１４（ａ）に示す比較照合用文字列と図１４（ｃ）に示す会社識別文字列とのすべての組み合わせについて、前方一致処理、後方一致処理、および比較照合する相手を１文字ずつ順番にずらしながらの比較照合の処理を行うことになる。 Next, with respect to all combinations of the comparison / matching character string shown in FIG. 14 (a) and the company identification character string shown in FIG. The comparison and collation process is performed while shifting the order one by one.

ここで、図１６を用いて、図１４（ａ）に示す比較照合用文字列と図１４（ｃ）に示す会社識別文字列とのすべての組み合わせについての比較照合結果を示す。図１６に示すように、比較照合用文字列“Market”の評価値が「＋７」で最大であるので、“Market”との文字列を、会社名を表す文字列として同定する。また、“Market”と同じインデックス（２）が付与されている“NML”も、会社名を表す文字列として同定され、“NML Market ”が会社名を表す文字列として同定される。 Here, FIG. 16 is used to show the results of comparison and collation for all combinations of the comparison and collation character string shown in FIG. 14A and the company identification character string shown in FIG. As shown in FIG. 16, since the evaluation value of the comparison / matching character string “Market” is “+7”, which is the maximum, the character string “Market” is identified as a character string representing the company name. “NML” to which the same index (2) as “Market” is assigned is also identified as a character string representing the company name, and “NML Market” is identified as a character string representing the company name.

以上のように、本発明の構成によれば、文字認識部１３での文字認識の結果に誤りがあった場合でも、より正確に氏名、会社名を表す文字列を同定（氏名、会社名項目を分類）することが可能になる。 As described above, according to the configuration of the present invention, even when there is an error in the result of character recognition by the character recognition unit 13, the character string representing the name and company name is more accurately identified (name and company name items). Can be classified).

〔実施の形態２〕
本発明の他の実施の形態について図１７ないし図２６に基づいて説明すれば、以下の通りである。なお、本実施の形態において説明すること以外の構成は、前記実施の形態１と同じである。また、説明の便宜上、前記の実施の形態１の図面に示した部材と同一の機能を有する部材については、同一の符号を付し、その説明を省略する。 [Embodiment 2]
The following will describe another embodiment of the present invention with reference to FIGS. Configurations other than those described in the present embodiment are the same as those in the first embodiment. For convenience of explanation, members having the same functions as those shown in the drawings of the first embodiment are given the same reference numerals, and explanation thereof is omitted.

前記実施の形態１においては、図３に示した英文名刺の画像を携帯電話のカメラなどで撮影、またはスキャナなどで読み取りし、その画像データを画像データ取得部１１ですべて取得した後に、比較照合用文字列を複数得て、取得した比較照合用文字列のすべてについて、個人識別文字列および会社識別文字列と比較照合し、氏名および会社名を表す文字列を同定する構成を示した。これに対して、本実施の形態では、画像データを画像データ取得部１１で取得しながら、取得した分の画像データから得られる比較照合用文字列について、個人識別文字列および会社識別文字列を得て、個人識別文字列および会社識別文字列と比較照合し、氏名および会社名を表す文字列を同定する構成について説明する。なお、本実施の形態を実現する情報処理装置の主要な構成は、図１に示したものと同様である。 In the first embodiment, the image of the English business card shown in FIG. 3 is taken with a camera of a mobile phone or read with a scanner or the like, and all of the image data is acquired with the image data acquisition unit 11, followed by comparison and verification. A configuration is shown in which a plurality of character strings are obtained, and all of the obtained comparison and collation character strings are compared and collated with a personal identification character string and a company identification character string, and a character string representing a name and a company name is identified. On the other hand, in the present embodiment, the personal identification character string and the company identification character string are obtained for the comparison matching character string obtained from the acquired image data while the image data obtaining unit 11 obtains the image data. A configuration in which the character string representing the name and the company name is identified by comparison with the personal identification character string and the company identification character string will be described. The main configuration of the information processing apparatus that implements the present embodiment is the same as that shown in FIG.

まず、図１７を用いて、本実施の形態における処理の流れについて説明を行う。図１７は、本実施の形態における処理のフローの一例を示すフローチャートである。 First, the flow of processing in the present embodiment will be described with reference to FIG. FIG. 17 is a flowchart illustrating an example of a processing flow in the present embodiment.

まず、ステップＳ６１では、図１８に示すように、名刺の下辺からスキャナ等の入力装置によって画像の走査を開始する。そして、走査を行いながら画像を読み取り、読み取った画像に文字列が存在した場合（ステップＳ６１でＹｅｓ）には、ステップＳ６２に移る。また、読み取った画像に文字列が存在しなかった場合（ステップＳ６１でＹｅｓ）には、フローを終了する。ステップＳ６２では、文字認識部１３によって文字認識を開始し、ステップＳ６３に移る。なお、この際、名刺に記載されている１行単位で、名刺の下辺側から順番に文字認識部１３が文字認識を行うものとする。 First, in step S61, as shown in FIG. 18, scanning of an image is started from the lower side of the business card by an input device such as a scanner. Then, the image is read while scanning, and if a character string exists in the read image (Yes in step S61), the process proceeds to step S62. If no character string exists in the read image (Yes in step S61), the flow ends. In step S62, character recognition is started by the character recognition unit 13, and the process proceeds to step S63. In this case, it is assumed that the character recognition unit 13 performs character recognition in order from the lower side of the business card in units of one line described on the business card.

ステップＳ６３では、文字認識部１３での文字認識の結果によって得られた文字列が、電子メールアドレス情報またはネットワークアドレス情報であるかを文字認識部１３が判定する。そして、文字認識部１３での文字認識の結果によって得られた文字列が、電子メールアドレス情報またはネットワークアドレス情報であった場合（ステップＳ６３でＹｅｓ）には、ステップＳ６４に移る。また、文字認識部１３での文字認識の結果によって得られた文字列が、電子メールアドレス情報またはネットワークアドレス情報でなかった場合（ステップＳ６３でＮｏ）には、ステップＳ６６に移る。 In step S63, the character recognition unit 13 determines whether the character string obtained as a result of character recognition by the character recognition unit 13 is e-mail address information or network address information. If the character string obtained as a result of character recognition by the character recognition unit 13 is e-mail address information or network address information (Yes in step S63), the process proceeds to step S64. If the character string obtained as a result of character recognition by the character recognition unit 13 is not e-mail address information or network address information (No in step S63), the process proceeds to step S66.

ステップＳ６４では、電子メールアドレス情報およびネットワークアドレス情報といったアドレス情報を示す文字列を、アドレス情報取得部１４が取得し、識別文字列生成部１６が、アドレス情報取得部１４によって取得したアドレス情報から、個人識別文字列および／または会社識別文字列を生成する。そして、ステップＳ６５に移る。 In step S64, the address information acquisition unit 14 acquires a character string indicating address information such as e-mail address information and network address information, and the identification character string generation unit 16 uses the address information acquired by the address information acquisition unit 14. A personal identification character string and / or a company identification character string is generated. Then, the process proceeds to step S65.

図１８に示す名刺画像では、まず、名刺の最も下辺側の文字列が文字認識されるため、図１９（ａ）に示す“www.nml-Market.com”が文字認識部１３によって得られることになる。そして、“www”という文字列を含むことから、ネットワークアドレス情報であると文字認識部１３で判断され、識別文字列生成部１６によって、図１９（ａ）に示す、“www”と“nml”との間の“．”、“nml”と“Market”との間の“-”、“Market”と“com”との間の“．”といった区切り文字の前後で分割され、図１９（ｂ）に示すように“nml”および“Market”が、会社識別文字列として生成される。なお、“com”も上記分割によって得られるが、前記実施の形態１と同様に、会社識別文字列から削除する。また、上記文字列がネットワークアドレス情報であった場合のステップＳ６４の処理は、図７で示した前述の処理と同様であるので、詳細な説明はここでは行わない。本例では、ステップＳ６４において会社識別文字列が得られ、ステップＳ６５に移る。 In the business card image shown in FIG. 18, first, the character string on the lowermost side of the business card is recognized, so that “www.nml-Market.com” shown in FIG. become. Since the character string “www” is included, the character recognition unit 13 determines that the address is network address information, and the identification character string generation unit 16 performs “www” and “nml” shown in FIG. 19, and “-” between “nml” and “Market”, and “.” Between “Market” and “com”. “Nml” and “Market” are generated as company identification character strings as shown in FIG. “Com” is also obtained by the above division, but is deleted from the company identification character string as in the first embodiment. Further, since the process of step S64 when the character string is network address information is the same as the process described above with reference to FIG. 7, detailed description thereof will not be given here. In this example, a company identification character string is obtained in step S64, and the process proceeds to step S65.

ステップＳ６５では、名刺の下辺側からの順番に従って、次の行に文字認識が可能な文字列が存在するか否かを文字認識部１３が判定する。そして、文字列が存在した場合（ステップＳ６５でＹｅｓ）には、ステップＳ６２に戻って、フローを繰り返す。また、文字列が存在しなかった場合（ステップＳ６５でＮｏ）には、フローを終了する。 In step S65, the character recognition unit 13 determines whether there is a character string that can be recognized in the next line in the order from the lower side of the business card. If the character string exists (Yes in step S65), the process returns to step S62 and the flow is repeated. If there is no character string (No in step S65), the flow ends.

図１８に示す名刺画像では、名刺の下辺側からの順番に従って、“www.nml-Market.com”の次の行の、図２０（ａ）に示す“E-mail:rsmith@nmlMar.com”が文字認識部１３によって得られることになる。そして、接頭に“E-mail”という文字列を含むこと、および“@”を含むことなどから、ステップＳ６３の処理により、電子メールアドレス情報であると文字認識部１３で判断される。また、ステップＳ６４の処理により、識別文字列生成部１６によって、図２０（ｂ）および図２０（ｃ）に示すように、“rsmith”が個人識別文字列、“nmlMar”が会社識別文字列として生成される。なお、“com”も取得されるが、前記実施の形態１と同様に、会社識別文字列から削除する。前述したネットワークアドレス情報が得られた時点で、“nml”、“Market”がすでに会社識別文字列として生成されているので、“nmlMar”はすでに存在している会社識別文字列に追加されることになる。また、上記文字列が電子メールアドレス情報であった場合のステップＳ６４の処理は、図６で示した前述の処理と同様であるので、詳細な説明はここでは行わない。 In the business card image shown in FIG. 18, “E-mail: rsmith@nmlMar.com” shown in FIG. 20A in the next line of “www.nml-Market.com” in the order from the lower side of the business card. Is obtained by the character recognition unit 13. Then, because the prefix includes the character string “E-mail” and includes “@”, the character recognition unit 13 determines that the address information is e-mail address information by the process of step S63. Further, as a result of the processing in step S64, the identification character string generation unit 16 sets “rsmith” as the personal identification character string and “nmlMar” as the company identification character string as shown in FIGS. 20 (b) and 20 (c). Generated. “Com” is also acquired, but is deleted from the company identification character string as in the first embodiment. When the network address information mentioned above is obtained, “nml” and “Market” are already generated as company identification strings, so “nmlMar” must be added to the existing company identification string. become. Further, since the process of step S64 when the character string is electronic mail address information is the same as the process described above with reference to FIG. 6, detailed description thereof will not be given here.

さらに、本例では、ステップＳ６５に移り、名刺の下辺側からの順番に従って、さらに次の行に文字認識が可能な文字列が存在するか否かを文字認識部１３が判定する。図１８に示す名刺画像では、さらに次の行にも文字列が存在するので、ステップＳ６２において、名刺の下辺側からの順番に従って、“E-mail:rsmith@nmlMar.com”の次の行の、“FAX（999）888‐1234”が文字認識部１３によって得られることになる。そして、“FAX”という文字列を含むことからファックス番号であることがわかるので、ステップＳ６３の処理により、電子メールアドレス情報およびネットワークアドレス情報ではないと文字認識部１３で判断され、ステップＳ６６に移る。 Further, in this example, the process proceeds to step S65, and the character recognition unit 13 determines whether or not there is a character string that can be recognized in the next line in the order from the lower side of the business card. In the business card image shown in FIG. 18, since there is a character string in the next line, the next line of “E-mail: rsmith@nmlMar.com” in the order from the lower side of the business card in step S62. , “FAX (999) 888-1234” is obtained by the character recognition unit 13. Since the character string “FAX” is included, the fax number is known, so that the character recognition unit 13 determines that it is not e-mail address information and network address information by the processing in step S63, and the process proceeds to step S66. .

また、ステップＳ６６では、比較照合用文字列との比較照合を行うためには個人識別文字列または会社識別文字列が必要であるので、個人識別文字列または会社識別文字列が既に得られているか否かを文字列比較照合部１７が判定する。そして、個人識別文字列または会社識別文字列が既に得られていた場合（ステップＳ６６でＹｅｓ）には、ステップＳ６７に移る。また、個人識別文字列または会社識別文字列が既に得られていなかった場合（ステップＳ６６でＮｏ）には、ステップＳ６１に戻ってフローを繰り返す。本例では、個人識別文字列および会社識別文字列が既に得られているので、ステップＳ６７に移る。 In step S66, since the personal identification character string or the company identification character string is necessary to perform the comparison / collation with the comparison / collation character string, is the personal identification character string or the company identification character string already obtained? The character string comparison / collation unit 17 determines whether or not. If the personal identification character string or the company identification character string has already been obtained (Yes in step S66), the process proceeds to step S67. If the personal identification character string or the company identification character string has not been obtained (No in step S66), the process returns to step S61 and the flow is repeated. In this example, since the personal identification character string and the company identification character string have already been obtained, the process proceeds to step S67.

ステップＳ６７では、アドレス情報に該当する文字列以外の文字列（比較照合用文字列候補）を、比較照合用文字列取得部１５によって、さらに“ （スペース）”、“,”などの区切り文字（所定の区切り文字）の前後で分割する。本例では、区切り文字の前後として、“FAX（999）888‐1234”のうちの、“（”、“）”、“‐”（所定の区切り文字）の前後で文字列を分割する。ただし、“FAX（999）888‐1234”は、“FAX”という文字列と“FAX”という文字列に続く数字の羅列とからなっていることから、氏名および会社名ではないことがわかるので、ステップＳ６６以降の比較照合の処理を省く構成であってもよい。なお、本例において、“FAX（999）888‐1234”の次の文字認識で得られる文字列“TEL（999）888‐5678”についても同様に、“TEL”という文字列と“TEL”という文字列に続く数字の羅列とからなっていることから、氏名および会社名ではないことがわかるため、ステップＳ６６以降の比較照合の処理を省く構成であってもよい。 In step S67, a character string other than the character string corresponding to the address information (comparison character string candidate for comparison and verification) is further converted by the comparison and verification character string acquisition unit 15 into delimiters such as “(space)” and “,”. Divide before and after a predetermined delimiter. In this example, the character string is divided before and after “(”, “)”, “−” (predetermined delimiter) in “FAX (999) 888-1234” before and after the delimiter. However, since “FAX (999) 888-1234” is composed of a string of letters “FAX” and a series of numbers following the string of “FAX”, it is understood that it is not a name and company name. A configuration in which the comparison and collation processing after step S66 is omitted may be employed. In this example, the character string “TEL (999) 888-5678” obtained by the character recognition next to “FAX (999) 888-1234” is similarly called “TEL” and “TEL”. Since it consists of an enumeration of numbers following the character string, it can be understood that it is not a name or company name, and therefore, the comparison / collation processing after step S66 may be omitted.

続いて、本例において上述の処理を繰り返して、ステップＳ６２で文字認識部１３により、図２１に示すような“○○Avenue,△△,[][]”との文字列（比較照合用文字列候補）が得られた場合について説明を行う。ステップＳ６３の処理において、“○○Avenue,△△,[][]”は電子メールアドレス情報およびネットワークアドレス情報でないと判定されることから、ステップＳ６６に移る。ステップＳ６６の処理においては、個人識別文字列または会社識別文字列が既に得られているため、ステップＳ６７に移る。ステップＳ６７の処理では、比較照合用文字列取得部１５によって、“○○Avenue,△△,[][]”に含まれる区切り文字“,”の前後で“○○Avenue,△△,[][]”の文字列が分割され、比較照合用文字列“○○Avenue”、“△△”、“[][]”が得られることになる。 Subsequently, in the present example, the above-described processing is repeated, and in step S62, the character recognition unit 13 performs a character string (character for comparison and collation with “XX Avenue, ΔΔ, [] []” as shown in FIG. A case where a column candidate) is obtained will be described. In the process of step S63, since it is determined that “XX Avenue, ΔΔ, [] []” is not e-mail address information and network address information, the process proceeds to step S66. In the process of step S66, since the personal identification character string or the company identification character string has already been obtained, the process proceeds to step S67. In the process of step S67, the comparison / matching character string acquisition unit 15 performs “XX Avenue, Δ △, [] before and after the delimiter“, ”included in“ XX Avenue, Δ △, [] [] ”. The character string “[]” is divided, and the comparison character strings “XX Avenue”, “ΔΔ”, and “[] []” are obtained.

ステップＳ６８では、文字列比較照合部１７が、識別文字列生成部１６で得た個人識別文字列および／または会社識別文字列と比較照合用文字列取得部１５で取得した比較照合用文字列とを比較照合する。なお、本処理は、前記実施の形態１で説明したように、前方一致処理、後方一致処理、および比較照合する相手を１文字ずつ順番にずらしながらの比較照合の処理を行うことによって、評価値を求めるものである。 In step S68, the character string comparison / collation unit 17 uses the personal identification character string and / or company identification character string obtained by the identification character string generation unit 16 and the comparison / collation character string obtained by the comparison / collation character string obtaining unit 15. Is compared. In addition, as described in the first embodiment, this process performs the forward matching process, the backward matching process, and the comparison matching process while sequentially shifting the counterpart to be compared and matched character by character. Is what you want.

続いて、ステップＳ６９では、比較照合の結果に基づいて、比較照合用文字列が個人識別文字列または会社識別文字列に類似しているか否かを項目分類決定部１８が判定する。そして、比較照合用文字列が個人識別文字列または会社識別文字列に類似していた場合（ステップＳ６９でＹｅｓ）には、ステップＳ７０に移る。また、比較照合用文字列が個人識別文字列および会社識別文字列に類似していなかった場合（ステップＳ６９でＮｏ）には、ステップＳ６５に移ってフローを繰り返す。 Subsequently, in step S69, the item classification determination unit 18 determines whether or not the comparison matching character string is similar to the personal identification character string or the company identification character string based on the result of the comparison matching. If the comparison / matching character string is similar to the personal identification character string or the company identification character string (Yes in step S69), the process proceeds to step S70. If the comparison / matching character string is not similar to the personal identification character string and the company identification character string (No in step S69), the process proceeds to step S65 to repeat the flow.

そして、ステップＳ７０では、項目分類決定部１８が、個人識別文字列に比較照合用文字列が類似していた場合には、個人識別文字列に類似していた比較照合用文字列を氏名を表す文字列として同定する。また、会社識別文字列に比較照合用文字列が類似していた場合には、会社識別文字列に類似していた比較照合用文字列を、会社名を表す文字列として同定する。そして、ステップＳ６５に移ってフローを繰り返す。 In step S70, the item classification determination unit 18 indicates the name of the comparison / matching character string similar to the personal identification character string when the comparison / identification character string is similar to the personal identification character string. Identifies as a string. When the comparison / matching character string is similar to the company identification character string, the comparison / matching character string similar to the company identification character string is identified as a character string representing the company name. And it moves to step S65 and repeats a flow.

本例では、図２２（ａ）および図２２（ｂ）に示す比較照合結果から、比較照合用文字列“○○Avenue”、“△△”、“[][]”は全て評価値が「０」なので、比較照合用文字列が個人識別文字列および会社識別文字列に類似していないものとして項目分類決定部１８が判定し、ステップＳ６５に移る。本例では、“○○Avenue,△△,[][]”の次の行にも文字列があるので、ステップＳ６２に移る。また、ステップＳ６２の処理では、名刺の下辺側からの順番に従って、“○○Avenue,△△,[][]”の次の行の、図２３に示す“NML Market”が文字認識部１３によって得られ、ステップＳ６３に移る。ステップＳ６３の処理では、“NML Market”は電子メールアドレス情報およびネットワークアドレス情報でないと判定されることから、ステップＳ６６に移る。ステップＳ６６の処理においては、個人識別文字列または会社識別文字列が既に得られているため、ステップＳ６７に移る。ステップＳ６７の処理では、比較照合用文字列取得部１５によって、“NML Market”に含まれる区切り文字“ （スペース）”の前後で“NML Market”の文字列が分割され、比較照合用文字列“NML”と“Market”とが得られることになる。 In this example, from the comparison / collation results shown in FIGS. 22 (a) and 22 (b), the comparison / matching character strings “XX Avenue”, “ΔΔ”, and “[] []” all have evaluation values “ Since it is “0”, the item classification determination unit 18 determines that the comparison / matching character string is not similar to the personal identification character string and the company identification character string, and the process proceeds to step S65. In this example, since there is a character string on the line next to “XX Avenue, ΔΔ, [] []”, the process proceeds to step S62. Further, in the process of step S62, “NML Market” shown in FIG. 23 in the next line of “XX Avenue, ΔΔ, [] []” is generated by the character recognition unit 13 in the order from the bottom side of the business card. Is obtained and the process proceeds to step S63. In the process of step S63, since it is determined that “NML Market” is not e-mail address information and network address information, the process proceeds to step S66. In the process of step S66, since the personal identification character string or the company identification character string has already been obtained, the process proceeds to step S67. In the process of step S67, the character string for “NML Market” is divided before and after the delimiter character “(space)” included in “NML Market” by the comparison character string acquisition unit 15, and the character string for comparison and collation “ NML "and" Market "will be obtained.

また、本例では、ステップＳ６８の処理において、まず、比較照合用文字列“NML”と個人識別文字列“rsmith”との比較照合を行い、次に比較照合用文字列“Market”と個人識別文字列“rsmith”との比較照合を行う。そして、比較照合の結果、図２４（ａ）に示すような評価値「０」が両者ともに得られるので、比較照合用文字列“NML”および“Market”は、個人識別文字列とは類似していないものとして項目分類決定部１８に判定される。また、比較照合用文字列“NML”と会社識別文字列“nml”との比較照合を行う。そして、比較照合の結果、図２４（ｂ）に示すような「＋４」という評価値が得られる。比較照合用文字列“NML”と会社識別文字列“nml”とは、文字数がお互い３文字であることから、比較照合用文字列“NML”と会社識別文字列“nml”とは、完全に一致していることがわかる。従って、比較照合用文字列“NML”と会社識別文字列“nml”とが類似しているものとして項目分類決定部１８に判定される。そして、ステップＳ７０の処理において、比較照合用文字列“NML”を、会社名を表す文字列として同定する。 In this example, in the process of step S68, the comparison / matching character string “NML” and the personal identification character string “rsmith” are first compared, and then the comparison / matching character string “Market” and the personal identification character string are identified. Compare with the string “rsmith”. As a result of the comparison and collation, the evaluation value “0” as shown in FIG. 24A is obtained for both. Therefore, the comparison and collation character strings “NML” and “Market” are similar to the personal identification character string. It is determined by the item classification determination unit 18 as not having been performed. Further, the comparison / matching character string “NML” is compared with the company identification character string “nml”. As a result of the comparison and collation, an evaluation value “+4” as shown in FIG. 24B is obtained. Since the comparison character string “NML” and the company identification character string “nml” have three characters, the comparison character string “NML” and the company identification character string “nml” are completely You can see that they match. Therefore, the item classification determination unit 18 determines that the comparison / matching character string “NML” and the company identification character string “nml” are similar. In the process of step S70, the comparison / matching character string “NML” is identified as a character string representing the company name.

なお、本実施の形態では、名刺に記載されている１行単位で処理を行っているので、これ以上の比較照合を行わずに、“NML”と同じ行に羅列されている“Market”をさらに加えた“NML Market”という文字列を、会社名を表す文字列として同定する構成であってもよい。上記構成によれば、情報処理装置１での処理量が減少し、処理時間が短縮されるという効果が得られる。 In this embodiment, since processing is performed in units of one line described on the business card, “Market” enumerated in the same line as “NML” is not performed without further comparison. Further, the added character string “NML Market” may be identified as a character string representing a company name. According to the above configuration, it is possible to obtain an effect that the processing amount in the information processing apparatus 1 is reduced and the processing time is shortened.

上述した一連の処理を、名刺の最も上辺側の文字列が認識されるまで繰り返す。本例では、この結果、上述した一連の処理を、名刺の最も上辺側の文字列が認識されるまで繰り返した結果、“NML Market”が、会社名を表す文字列として同定（会社名項目として分類）され、“Ray Smith”が、氏名を表す文字列として同定（氏名項目として分類）される。 The above-described series of processing is repeated until the uppermost character string on the business card is recognized. In this example, as a result of repeating the above-described series of processing until the uppermost character string of the business card is recognized, “NML Market” is identified as a character string representing the company name (as a company name item) "Ray Smith" is identified (classified as a name item) as a character string representing the name.

また、本実施の形態の構成では、評価値が最大になる比較照合用文字列を氏名または会社名を表す文字列として同定するのではなく、評価値が所定の値以上となるものを選別して、氏名または会社名を表す文字列として同定する必要がある。なお、ここで言うところの所定の値とは、例えば、比較照合の対象となる氏名識別文字列または会社名識別文字列の文字数の半分以上の値とするなど、任意で設定可能なものである。 In the configuration of the present embodiment, instead of identifying the comparison / matching character string that maximizes the evaluation value as a character string that represents the name or company name, the character string that evaluates to a predetermined value or more is selected. Therefore, it is necessary to identify the character string representing the name or company name. The predetermined value mentioned here can be arbitrarily set, for example, a value that is half or more of the number of characters of the name identification character string or the company name identification character string to be compared and collated. .

ここで、所定の値を、比較照合の対象となる氏名識別文字列または会社名識別文字列の文字数の半分以上の値とした場合について、本例を用いて示す。本例では、比較照合用文字列候補として図２５に示す文字列が得られ、比較照合用文字列として、“Ray”と“Smith”との文字列が得られる。なお、個人識別文字列“rsmith”の文字数が６文字であるので、所定の値は、個人識別文字列“rsmith”の文字数の半分の値の３（＋３）となる。比較照合用文字列“Ray”および“Smith”と個人識別文字列“rsmith”との比較照合の結果として得られる評価値は、それぞれ図２６に示すとおりであるので、評価値が「＋３」以上の「＋６」である比較照合用文字列“Smith”を、氏名を表す文字列として選別し、“Smith”と同一行に存在していた比較照合用文字列“Ray”も含めた“Ray Smith”を、氏名を表す文字列として同定することになる。 Here, a case where the predetermined value is set to a value more than half the number of characters of the name identification character string or the company name identification character string to be compared and collated will be described using this example. In this example, the character string shown in FIG. 25 is obtained as the character string candidate for comparison and collation, and the character strings “Ray” and “Smith” are obtained as the character string for comparison and collation. Since the number of characters of the personal identification character string “rsmith” is six, the predetermined value is 3 (+3), which is half the number of characters of the personal identification character string “rsmith”. Since the evaluation values obtained as a result of the comparison and collation between the comparison and verification character strings “Ray” and “Smith” and the personal identification character string “rsmith” are as shown in FIG. 26, the evaluation value is “+3” or more. “Smith”, which is “+6”, is selected as a character string representing the name, and “Ray Smith” including the comparative matching string “Ray” that was present on the same line as “Smith” "Is identified as a character string representing the name.

なお、所定の値は任意に設定可能な値ではあるが、氏名または会社名を表す文字列を同定する処理の精度をより高いものとするために、２（＋２）以上とすることが望ましい。これは、前方一致処理、または後方一致処理において、比較照合した文字列同士の先頭文字が一致していた場合に、先頭文字の一致および文字の一致によって、評価値に２が加算され、先頭文字以外の文字列がまったく一致しないものであっても、氏名または会社名を表す文字列として選別される可能性があるためである。 Although the predetermined value is a value that can be arbitrarily set, it is desirable that the predetermined value is 2 (+2) or more in order to improve the accuracy of the process of identifying the character string representing the name or company name. This is because, when the first character of the compared and matched character strings is matched in the front matching process or the backward matching process, 2 is added to the evaluation value due to the matching of the leading character and the matching of the character. This is because there is a possibility that even if the character strings other than are not matched at all, they are selected as a character string representing the name or company name.

ビジネス名刺では、氏名および会社名よりも上段に電子メールアドレス情報およびネットワークアドレス情報が記載されることが少ない。具体的には、図３に示すように、英文名刺であっても、氏名および会社名よりも下段に電子メールアドレス情報およびネットワークアドレス情報が記載されることが多い。従って、以上の構成によれば、名刺画像の下辺の文字列から上辺の文字列に向かって順番に文字認識を開始し、識別文字列を氏名及び会社名項目よりも先に生成するので、情報処理装置１での処理量の削減、処理の高速化を実現することが可能になる。すなわち、実施の形態１の情報処理装置１では、名刺の上辺から下辺までの全てを走査し、走査した全てについて文字認識を行い、文字認識した結果が出揃ってから、個人識別文字列および会社識別文字列を生成するのに対して、実施の形態２の情報処理装置１では、名刺の下辺から１行ずつ文字認識を行い、電子メールアドレス情報またはネットワークアドレス情報が見つかった時点で、個人識別文字列および会社識別文字列を生成する構成になっているので、実施の形態２の情報処理装置１では、名刺の上辺から下辺までの全てを走査し、走査した全てについて文字認識を行う前に、個人識別文字列および会社識別文字列を生成することができ、情報処理装置１での処理量の削減、処理を高速化を実現することができる。 In business name cards, e-mail address information and network address information are rarely described above the name and company name. Specifically, as shown in FIG. 3, even in the case of an English business card, e-mail address information and network address information are often described below the name and company name. Therefore, according to the above configuration, the character recognition starts in order from the lower character string to the upper character string of the business card image, and the identification character string is generated before the name and company name items. It is possible to reduce the processing amount and increase the processing speed in the processing apparatus 1. That is, in the information processing apparatus 1 according to the first embodiment, all of the business card from the upper side to the lower side is scanned, character recognition is performed on all the scanned cards, and after the character recognition results are obtained, the personal identification character string and company identification are performed. In contrast to generating a character string, the information processing apparatus 1 according to the second embodiment performs character recognition line by line from the lower side of the business card, and when the e-mail address information or the network address information is found, the personal identification character Since the information processing apparatus 1 according to the second embodiment scans everything from the upper side to the lower side of the business card and performs character recognition for all the scanned items, since the information processing device 1 according to the second embodiment is configured to generate the column and the company identification character string. Personal identification character strings and company identification character strings can be generated, and the amount of processing in the information processing apparatus 1 can be reduced and the processing speed can be increased.

なお、両実施の形態において、氏名および会社名以外の文字列の同定（項目分類）については特に規定しないが、個人識別文字列および会社識別文字列と比較照合した結果、氏名および会社名のどちらとも判断されなかった比較照合用文字列は、他の記憶領域に記憶しておき、最後に氏名・会社名以外の項目に項目分類を行う構成にしてもよい。また、電話番号、ファックス番号などを表す文字列は、“TEL”、“FAX”等の文字列によって、容易に電話番号、ファックス番号などであると判断できるので、“TEL”、“FAX”等の文字列を利用することにより、情報処理装置１で電話番号、ファックス番号などを表す文字列も同定する構成にしてもよい。 In both embodiments, the identification (item classification) of the character string other than the name and company name is not specified, but as a result of comparison with the personal identification character string and the company identification character string, either the name or the company name The comparison / matching character string that has not been determined may be stored in another storage area, and finally the item classification may be performed on items other than the name and company name. In addition, character strings such as telephone numbers and fax numbers can be easily determined as character numbers such as “TEL” and “FAX”, such as “TEL” and “FAX”. The information processing apparatus 1 may also be configured to identify a character string representing a telephone number, a fax number, or the like.

また、両実施の形態において、氏名識別文字列および会社識別文字列と比較照合用文字列との比較照合の結果、比較照合用文字列内に氏名および会社名を表す文字列が含まれていないと判断された場合には、氏名および会社名以外を表す文字列を先に同定（氏名および会社名以外の項目を先に分類）してしまい、氏名および会社名以外の項目の分類で残った比較照合用文字列の中から、氏名に用いられる頻度が高い候補（それぞれ予め定められた条件を満たす比較照合用文字列）を氏名の項目に分類する構成にすればよい。一方、会社名については、氏名の項目の分類後に、さらに残った比較照合用文字列（それぞれ予め定められた条件を満たす比較照合用文字列）を会社名の項目に分類する構成にすればよい。なお、ここで言うところの氏名および会社名以外の項目としては、例えば、前述した電話番号およびファックス番号、ならびに住所などが挙げられる。また、氏名に用いられる頻度が高い候補は、情報処理装置１に姓名辞書データを保持しておき、当該姓名辞書データと比較照合することによって判定する構成であってよいし、情報処理装置１に姓名辞書データを保持せず、周知技術である「比較的大きく書かれた文字列を氏名とする」などの条件を保持しておき、当該条件に一致するものを氏名とする構成であってもよい。 Further, in both embodiments, as a result of comparison and matching between the name identification character string and the company identification character string and the comparison matching character string, the comparison matching character string does not include a character string representing the name and the company name. If it is determined, the character string representing the name other than the name and company name is identified first (items other than the name and company name are classified first), and remains in the classification of items other than the name and company name. A configuration may be adopted in which candidates that are frequently used for names (comparison character strings for comparison that satisfy a predetermined condition) are classified into items of names from the comparison matching character strings. On the other hand, with respect to the company name, after the name item is classified, the remaining comparison matching character strings (comparison character strings satisfying predetermined conditions) may be classified into the company name items. . Note that items other than the name and company name mentioned here include, for example, the above-described telephone number, fax number, and address. Further, candidates that are frequently used for names may be determined by holding the first and last name dictionary data in the information processing apparatus 1 and comparing and comparing with the first and last name dictionary data. Even if it does not hold first-name dictionary data, it keeps the condition such as “a relatively large written character string as a name”, which is a well-known technique, and a name that matches the condition is a name. Good.

なお、両実施の形態では、情報処理装置１に文字認識辞書データ格納部１２を備える構成を示したが、必ずしもこれに限らない。例えば、情報処理装置１の外部の記憶装置、サーバ装置などに文字認識辞書データ格納部１２が格納されている構成であってもよい。この場合、文字認識部１３は、上記外部の記憶装置、サーバ装置などにアクセスすることによって文字認識辞書データ格納部１２に格納されている文字認識辞書データを参照すればよい。 In both embodiments, the configuration in which the information processing apparatus 1 includes the character recognition dictionary data storage unit 12 is shown, but the present invention is not limited to this. For example, the character recognition dictionary data storage unit 12 may be stored in a storage device, a server device, or the like outside the information processing apparatus 1. In this case, the character recognition unit 13 may refer to the character recognition dictionary data stored in the character recognition dictionary data storage unit 12 by accessing the external storage device, the server device, or the like.

また、両実施の形態では、英文名刺の画像データについて文字認識を行った場合について説明を行ったが、必ずしもこれに限らず、欧文名刺の画像データについて文字認識を行った場合であっても本発明は適用可能である。なぜならば、電子メールアドレスおよびネットワークアドレスの表記は、必ず英数文字であることが規定されているため、欧文名刺であっても電子メールアドレスおよびネットワークアドレスに、合字またはドイツ語のウムラウトなどのような特殊文字が含まれる事はない。従って、英文名刺の画像データについて文字認識を行う場合と同様にして処理をすすめることが可能である。 In both embodiments, the case where the character recognition is performed on the image data of the English business card is described. However, the present invention is not limited to this, and the case where the character recognition is performed on the image data of the European business card is not limited to this. The invention is applicable. Because e-mail addresses and network addresses are always written in alphanumeric characters, even business cards in Western languages have ligatures or German umlauts, etc. Such special characters are not included. Accordingly, it is possible to proceed with processing in the same manner as when character recognition is performed on image data of an English business card.

ただし、欧文名刺中の氏名等の表記には上述したような特殊文字が含まれることがあり得るので、欧文名刺中に特殊文字が含まれる場合には、特殊文字を、対応する英字に置き換える変換を行う部材を情報処理装置１にさらに備えることによって、欧文名刺の画像データに対して本発明を適用可能にする構成であってもよい。また、特殊文字を、対応する英字に置き換える変換を行った後のデータに対して、本発明を適用する構成であってもよい。 However, since the name and the like in the European business card may include special characters as described above, if the special character is included in the European business card, the conversion is performed by replacing the special character with the corresponding alphabetic character. The information processing apparatus 1 may be further provided with a member that performs the above, so that the present invention can be applied to image data of European business cards. Moreover, the structure which applies this invention with respect to the data after performing the conversion which replaces a special character with a corresponding alphabetic character may be sufficient.

また、ローマ字で記載された名刺においても、上述したのと同様の処理を行うことにより、本発明は適用可能である。また、和文名刺においても、日本語をローマ字表記に置き換える変換を行う部材を情報処理装置１にさらに備えることによって、和文名刺の画像データに対して本発明を適用可能であるし、日本語をローマ字表記に置き換える変換を行った後のデータに対して、本発明を適用することも可能である。 The present invention can also be applied to business cards written in Roman letters by performing the same processing as described above. In addition, even in Japanese business cards, the information processing apparatus 1 is further provided with a member that converts Japanese into Roman characters, so that the present invention can be applied to image data of Japanese business cards. The present invention can also be applied to data after conversion to be replaced with notation.

なお、両実施の形態では、特定の媒体として名刺を用い、情報処理装置１によって、名刺に記載された氏名と会社名とを表す文字列を同定する構成を示したが、必ずしもこれに限らない。例えば、特定の媒体は、名刺だけでなく雑誌であってもよい。この場合、情報処理装置１によって、会社名を表す文字列を同定する処理と同様にして、雑誌等に掲載されている店舗名を表す文字列を同定することが可能である。すなわち、雑誌等に記載されている店舗のホームページを表すネットワークアドレス、および電子メールアドレスなどに店舗名が含まれている場合には、当該ネットワークアドレスおよび／または電子メールアドレスを利用することによって、会社識別文字列と同様の店舗識別文字列（識別文字列）を生成し、比較照合用文字列と比較照合を行うことによって、店舗名を表す文字列を同定することが可能になる。 In both the embodiments, a business card is used as a specific medium, and the information processing apparatus 1 identifies the character string representing the name and company name written on the business card. However, the present invention is not limited to this. . For example, the specific medium may be a magazine as well as a business card. In this case, the information processing apparatus 1 can identify a character string representing a store name published in a magazine or the like in the same manner as the process of identifying a character string representing a company name. In other words, if the store name is included in the network address and the e-mail address representing the store's homepage described in a magazine, etc., by using the network address and / or e-mail address, By generating a store identification character string (identification character string) similar to the identification character string and performing comparison and comparison with the comparison and verification character string, it is possible to identify the character string representing the store name.

また、両実施の形態では、氏名または会社名を表す文字列を同定するときに、同一行で認識された文字列同士を、１つの文字列として同定（同じ項目として分類）するようにしているが、スコアがある一定値以上である文字列を選別する構成としてもよい。ここで言うところのスコアのつけかたとしては、特に規定されたものを使用する必要はない。なお、スコアがある一定値以上である文字列を選別する場合には、姓名辞書のデータを別途保持し、頻度の高い氏名に高いスコアを付与することによって頻度の高い氏名に該当する文字列を選別するなどの処理を行ってもよい。 In both embodiments, when identifying a character string representing a name or company name, character strings recognized on the same line are identified as one character string (classified as the same item). However, it is good also as a structure which selects the character string whose score is more than a certain fixed value. It is not necessary to use what is specifically defined as the way of assigning the score here. In addition, when selecting a character string whose score is equal to or greater than a certain value, the data of the surname dictionary is separately stored, and a high score is assigned to a frequent name so that the character string corresponding to the frequent name is Processing such as sorting may be performed.

本発明では、氏名の項目分類には、電子メールアドレスの“@”以前の文字列を用い、会社名の項目分類には、“@”以降の特定部分の文字列を用いる。また、ネットワークアドレス情報に含まれる“．”、“_”、“-”、“/”、および“:”などの区切り文字（ネットワークアドレス情報に含まれる区切り文字）で区切られる特定部分の文字列を用いることによって会社名の項目分類を行う構成になっている。 In the present invention, a character string before “@” of the e-mail address is used for the item classification of the name, and a character string of a specific part after “@” is used for the item classification of the company name. In addition, a character string of a specific part delimited by delimiters (delimiters included in network address information) such as “.”, “_”, “-”, “/”, And “:” included in network address information Is used to classify company name items.

ただし、電子メールアドレスに用いられる個人識別文字列は姓名そのものであることはまずなく、
イ）姓＋区切り文字＋名＋（その他）
ロ）名＋区切り文字＋姓＋（その他）
ハ）姓の一部＋（区切り文字）＋名＋（その他）
ニ）名の一部＋（区切り文字）＋姓＋（その他）
ホ）姓（の一部）＋（その他）
ヘ）名（の一部）＋（その他）
ト）（その他)＋姓（の一部）
チ）（その他)＋名（の一部）
リ）姓名に関係なし
などのパターンがあるため、単純に比較照合用文字列と比較照合しても比較照合用文字列との一致は難しい。 However, the personal identification string used in the email address is unlikely to be the first and last name itself,
A) Last name + Separator + First name + (Others)
B) First name + Separator + Last name + (Other)
C) Part of surname + (separator) + first name + (other)
D) Part of first name + (separator) + surname + (other)
E) Surname (part) + (Other)
F) Name (part) + (Other)
G) (others) + surname (part of)
H) (others) + name (part of)
B) Since there is a pattern such as irrelevant first name and last name, it is difficult to match with the comparison matching character string even if the comparison matching character string is simply compared.

そこで、本発明では、得られた電子メールアドレスの“@”以前の個人識別文字列を、“.”、“_”、および“-”などの区切り文字（メールアドレス情報に含まれる区切り文字）で区切ることによって、複数もしくは１つの個人識別文字列を得て、すでに得られている比較照合用文字列と比較照合し、最も類似する比較照合用文字列を氏名として同定する構成になっている。また、電子メールアドレスの“@”以降の特定部分の文字列も、同様に区切り文字で区切って切り出し、会社識別文字列とし、すでに得られている比較照合用文字列と比較照合し、最も類似する比較照合用文字列を会社名として同定する構成になっている。さらに、ネットワークアドレスも、特定部分の文字列を切り出すことで会社識別文字列とし、すでに得られている比較照合用文字列と比較照合し、最も類似する比較照合用文字列を会社名として同定する構成になっている。 Therefore, in the present invention, the personal identification character string before “@” of the obtained e-mail address is converted into delimiters (delimiters included in the e-mail address information) such as “.”, “_”, And “-”. By dividing with, a plurality of or one personal identification character string is obtained, compared with the already obtained comparison collation character string, and the most similar comparison collation character string is identified as a name. . In addition, the character string of the specific part after the “@” in the email address is similarly cut out by using a delimiter to make a company identification character string, which is compared and collated with the comparison character string already obtained. The comparison matching character string is identified as the company name. Furthermore, the network address is also extracted as a company identification character string by cutting out a character string of a specific part, compared with a comparison character string already obtained, and the most similar comparison matching character string is identified as a company name. It is configured.

なお、店舗名の項目分類の場合には、電子メールアドレスの“@”以前の文字列を用いてもよいし、“@”以降の特定部分の文字列を用いてもよい。これは、店舗の情報が掲載された雑誌、ビジネス名刺などで掲載されている電子メールアドレスでは、店舗が独自に電子メールアドレスを設定するため、“@”以前の部分に店舗名を使用する場合と“@”以降の部分に店舗名を使用する場合との両方が存在することによる。 In the case of the item classification of the store name, a character string before “@” of the e-mail address may be used, or a character string of a specific part after “@” may be used. This is because, in the case of an e-mail address published in a magazine or business name card that contains store information, the store sets its own e-mail address. And the case where the store name is used in the part after “@”.

本発明では、氏名および／または会社名の項目分類に利用する電子メールアドレスおよび／またはネットワークアドレスは、名刺などの文字認識の結果として得られるものであるため、新たに辞書またはデータベースを備える必要がない。また、本発明では、上述したように、複雑な辞書およびルールを保持しない構成であるため、比較的簡単な処理によって、名刺などの特定の媒体に表記されている氏名および会社名を同定することができる。さらに、本発明では、「（株）」、「株式会社」、「Corp.」、「Inc.」のような、会社名を特定する会社識別文字列がない場合でも、氏名及び会社名項目を分類することができ、特に、氏名、会社名、店舗名に特定の分類の指標となる文字列が含まれることの少ない英文表記の名刺の氏名、会社名、店舗名の同定を精度よく行うことができる。 In the present invention, the e-mail address and / or the network address used for the classification of the name and / or company name items are obtained as a result of character recognition such as a business card. Therefore, it is necessary to newly provide a dictionary or database. Absent. In addition, as described above, in the present invention, since the configuration does not hold a complicated dictionary and rules, the name and company name written on a specific medium such as a business card can be identified by a relatively simple process. Can do. Furthermore, in the present invention, even if there is no company identification character string for specifying a company name such as “(stock)”, “corporation”, “Corp.”, “Inc.”, the name and company name items are changed. In particular, the name, company name, and store name must be accurately identified in English, where the name, company name, and store name contain few character strings that can be used as indicators for specific classification. Can do.

なお、本発明では、より精度高く氏名を表す文字列を同定可能にするために、上述した姓名辞書データを情報処理装置１にさらに保持する構成とした場合であっても、保持する姓名辞書のル―ルは難しいものでなくてもよく、予め多数の姓名を保持しておけばよいだけである。従って、上述した姓名辞書データを情報処理装置１にさらに保持する構成とした場合であっても、新たに複雑な辞書またはデータベースを備える必要がなく、特定の媒体についての画像情報より抽出される文字列から、より容易に、少なくとも氏名、会社名、および店舗名のうちのいずれかを表す文字列を精度良く同定することができる。 In the present invention, in order to make it possible to identify a character string representing a name with higher accuracy, even when the above-described first-name dictionary data is further stored in the information processing apparatus 1, the stored first-name dictionary is stored. The rules do not have to be difficult, you only need to keep a number of first and last names. Therefore, even if the first and last name dictionary data is configured to be further stored in the information processing apparatus 1, it is not necessary to provide a new complicated dictionary or database, and characters extracted from image information about a specific medium. From the column, it is possible to more easily identify a character string representing at least one of a name, a company name, and a store name with high accuracy.

最後に、情報処理装置１の各ブロックは、ハードウェアロジックによって構成してもよいし、次のようにＣＰＵを用いてソフトウェアによって実現してもよい。 Finally, each block of the information processing apparatus 1 may be configured by hardware logic, or may be realized by software using a CPU as follows.

すなわち、情報処理装置１は、各機能を実現する制御プログラムの命令を実行するＣＰＵ（central processing unit）、上記プログラムを格納したＲＯＭ（read only memory）、上記プログラムを展開するＲＡＭ（random access memory）、上記プログラムおよび各種データを格納するメモリ等の記憶装置（記録媒体）などを備えている。そして、本発明の目的は、上述した機能を実現するソフトウェアである情報処理装置１の制御プログラムのプログラムコード（実行形式プログラム、中間コードプログラム、ソースプログラム）をコンピュータで読取り可能に記録した記録媒体を、情報処理装置１に供給し、そのコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に記録されているプログラムコードを読み出し実行することによっても、達成可能である。 That is, the information processing apparatus 1 includes a CPU (central processing unit) that executes instructions of a control program that realizes each function, a ROM (read only memory) that stores the program, and a RAM (random access memory) that expands the program. And a storage device (recording medium) such as a memory for storing the program and various data. An object of the present invention is to provide a recording medium in which a program code (execution format program, intermediate code program, source program) of a control program of the information processing apparatus 1 which is software that realizes the above-described functions is recorded so as to be readable by a computer. This can also be achieved by supplying the information processing apparatus 1 and reading and executing the program code recorded on the recording medium by the computer (or CPU or MPU).

上記記録媒体としては、例えば、磁気テープやカセットテープ等のテープ系、フロッピー（登録商標）ディスク／ハードディスク等の磁気ディスクやＣＤ−ＲＯＭ／ＭＯ／ＭＤ／ＤＶＤ／ＣＤ−Ｒ等の光ディスクを含むディスク系、ＩＣカード（メモリカードを含む）／光カード等のカード系、あるいはマスクＲＯＭ／ＥＰＲＯＭ／ＥＥＰＲＯＭ／フラッシュＲＯＭ等の半導体メモリ系などを用いることができる。 Examples of the recording medium include a tape system such as a magnetic tape and a cassette tape, a magnetic disk such as a floppy (registered trademark) disk / hard disk, and an optical disk such as a CD-ROM / MO / MD / DVD / CD-R. Card system such as IC card, IC card (including memory card) / optical card, or semiconductor memory system such as mask ROM / EPROM / EEPROM / flash ROM.

また、情報処理装置１を通信ネットワークと接続可能に構成し、上記プログラムコードを、通信ネットワークを介して供給してもよい。この通信ネットワークとしては、特に限定されず、例えば、インターネット、イントラネット、エキストラネット、ＬＡＮ、ＩＳＤＮ、ＶＡＮ、ＣＡＴＶ通信網、仮想専用網（virtual private network）、電話回線網、移動体通信網、衛星通信網等が利用可能である。また、通信ネットワークを構成する伝送媒体としては、特に限定されず、例えば、ＩＥＥＥ１３９４、ＵＳＢ、電力線搬送、ケーブルＴＶ回線、電話線、ＡＤＳＬ回線等の有線でも、ＩｒＤＡやリモコンのような赤外線、Ｂｌｕｅｔｏｏｔｈ（登録商標）、８０２．１１無線、ＨＤＲ、携帯電話網、衛星回線、地上波デジタル網等の無線でも利用可能である。なお、本発明は、上記プログラムコードが電子的な伝送で具現化された、搬送波に埋め込まれたコンピュータデータ信号の形態でも実現され得る。 Further, the information processing apparatus 1 may be configured to be connectable to a communication network, and the program code may be supplied via the communication network. The communication network is not particularly limited. For example, the Internet, intranet, extranet, LAN, ISDN, VAN, CATV communication network, virtual private network, telephone line network, mobile communication network, satellite communication. A net or the like is available. Also, the transmission medium constituting the communication network is not particularly limited. For example, even in the case of wired such as IEEE 1394, USB, power line carrier, cable TV line, telephone line, ADSL line, etc., infrared rays such as IrDA and remote control, Bluetooth ( (Registered trademark), 802.11 wireless, HDR, mobile phone network, satellite line, terrestrial digital network, and the like can also be used. The present invention can also be realized in the form of a computer data signal embedded in a carrier wave in which the program code is embodied by electronic transmission.

なお、本発明は、上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims, and the technical means disclosed in different embodiments can be appropriately combined. Such embodiments are also included in the technical scope of the present invention.

以上のように、本発明の情報処理装置、その制御プログラムおよび該制御プログラムを記録したコンピュータ読み取り可能な記録媒体、ならびに制御方法は、特定の媒体についての画像情報より抽出される文字列から、より容易に、少なくとも氏名、会社名、および店舗名のうちのいずれかを表す文字列を同定することを可能にする。従って、本発明は、名刺、雑誌などの媒体を撮影して得られる画像情報より抽出される文字列から氏名などを同定し、氏名などの項目ごとに分類する装置に関連する産業分野に好適に用いることができ、撮影した画像に含まれる文字列の内容を読み取って電話帳に記録するカメラ機能付き携帯電話機にも好適に用いることができる。 As described above, the information processing apparatus of the present invention, the control program thereof, the computer-readable recording medium recording the control program, and the control method are obtained from a character string extracted from image information about a specific medium. It is possible to easily identify a character string representing at least one of a name, a company name, and a store name. Therefore, the present invention is suitable for an industrial field related to an apparatus that identifies a name from a character string extracted from image information obtained by photographing a medium such as a business card or a magazine, and classifies the name by item such as a name. It can also be used suitably for a mobile phone with a camera function that reads the contents of a character string included in a photographed image and records it in a telephone directory.

本発明における情報処理装置の概略的構成を示す機能ブロック図である。It is a functional block diagram which shows the schematic structure of the information processing apparatus in this invention. 上記情報処理装置での動作フローの一例を示すフローチャートである。It is a flowchart which shows an example of the operation | movement flow in the said information processing apparatus. 英文名刺の一例を示した図である。It is the figure which showed an example of an English business card. （ａ）は、上記情報処理装置の文字認識部で得られる文字列の一例を示す図であって、（ｂ）は、上記情報処理装置のアドレス情報取得部で取得されるアドレス情報の一例を示す図である。(A) is a figure which shows an example of the character string obtained by the character recognition part of the said information processing apparatus, (b) is an example of the address information acquired by the address information acquisition part of the said information processing apparatus. FIG. （ａ）は、本発明におけるアドレス情報に該当する文字列以外の文字列の一例を示す図であって、（ｂ）は、上記情報処理装置の比較照合用文字列取得部で得られる比較照合用文字列の一例を示す図である。(A) is a figure which shows an example of character strings other than the character string applicable to the address information in this invention, Comprising: (b) is the comparison collation obtained in the character string acquisition part for comparison collation of the said information processing apparatus It is a figure which shows an example of the character string for use. 本発明における電子メールアドレス情報から個人識別文字列および会社識別文字列を生成する処理の一例の詳細を示すフローチャートである。It is a flowchart which shows the detail of an example of the process which produces | generates a personal identification character string and a company identification character string from the email address information in this invention. 本発明におけるネットワークアドレス情報から会社識別文字列を生成する処理の一例の詳細を示すフローチャートである。It is a flowchart which shows the detail of an example of the process which produces | generates a company identification character string from the network address information in this invention. （ａ）は、本発明における個人識別文字列の一例を示す図であって、（ｂ）は、本発明における会社識別文字列の一例を示す図である。(A) is a figure which shows an example of the personal identification character string in this invention, (b) is a figure which shows an example of the company identification character string in this invention. （ａ）ないし（ｃ）は、本発明における比較照合用文字列と個人識別文字列との比較照合の具体例を示す模式図である。(A) thru | or (c) is a schematic diagram which shows the specific example of the comparison collation with the character string for comparison and collation in this invention, and a personal identification character string. 本発明における比較照合用文字列と個人識別文字列との比較照合結果の一例を示す図である。It is a figure which shows an example of the comparison collation result of the character string for comparison and collation in this invention, and a personal identification character string. （ａ）は、本発明における比較照合用文字列の一例を示す図であって、（ｂ）は、本発明における会社識別文字列の一例を示す図である。(A) is a figure which shows an example of the character string for a comparison collation in this invention, (b) is a figure which shows an example of the company identification character string in this invention. 本発明における比較照合用文字列と会社識別文字列との比較照合結果の一例を示す図である。It is a figure which shows an example of the comparison collation result with the character string for comparison and collation in this invention, and a company identification character string. 上記情報処理装置の文字認識部で得られる文字列の一例を示す図である。It is a figure which shows an example of the character string obtained in the character recognition part of the said information processing apparatus. （ａ）は、本発明における比較照合用文字列の一例を示す図であって、（ｂ）は、本発明における個人識別文字列の一例を示す図であって、（ｃ）は、本発明における会社識別文字列の一例を示す図である。(A) is a figure which shows an example of the character string for comparison and collation in this invention, (b) is a figure which shows an example of the personal identification character string in this invention, (c) is this invention It is a figure which shows an example of the company identification character string in. 本発明における比較照合用文字列と個人識別文字列とを比較照合した結果の一例を示す図である。It is a figure which shows an example of the result of having compared and collated the character string for comparison and collation with the personal identification character string in this invention. 本発明における比較照合用文字列と会社識別文字列とを比較照合した結果の一例を示す図である。It is a figure which shows an example of the result of having compared and collated the character string for comparison and the company identification character string in this invention. 本発明における処理のフローの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the process in this invention. 本発明における文字認識を行う場合の順番の一例を示した図である。It is the figure which showed an example of the order in the case of performing the character recognition in this invention. （ａ）は、本発明におけるネットワークアドレス情報の一例を示す図であって、（ｂ）は、本発明における会社識別文字列の一例を示す図である。(A) is a figure which shows an example of the network address information in this invention, (b) is a figure which shows an example of the company identification character string in this invention. （ａ）は、本発明における電子メールアドレス情報の一例を示す図であって、（ｂ）は、本発明における会社識別文字列の一例を示す図であって、（ｃ）は、本発明における個人識別文字列の一例を示す図である。(A) is a figure which shows an example of the email address information in this invention, (b) is a figure which shows an example of the company identification character string in this invention, (c) is a figure in this invention It is a figure which shows an example of a personal identification character string. 本発明における比較照合用文字列候補の一例を示す図である。It is a figure which shows an example of the character string candidate for a comparison collation in this invention. （ａ）は、本発明における比較照合用文字列と個人識別文字列との比較照合の結果を示す図であって、（ｂ）は、本発明における比較照合用文字列と会社識別文字列との比較照合の結果を示す図である。(A) is a figure which shows the result of the comparison collation with the character string for comparison collation and personal identification character string in this invention, Comprising: (b) is the character string for comparison collation in this invention, a company identification character string, It is a figure which shows the result of these comparison collation. 本発明における比較照合用文字列候補の一例を示す図である。It is a figure which shows an example of the character string candidate for a comparison collation in this invention. （ａ）は、本発明における比較照合用文字列と個人識別文字列との比較照合の結果を示す図であって、（ｂ）は、本発明における比較照合用文字列と会社識別文字列との比較照合の結果を示す図である。(A) is a figure which shows the result of the comparison collation with the character string for comparison collation and personal identification character string in this invention, Comprising: (b) is the character string for comparison collation in this invention, a company identification character string, It is a figure which shows the result of these comparison collation. 本発明における比較照合用文字列候補の一例を示す図である。It is a figure which shows an example of the character string candidate for a comparison collation in this invention. 本発明における比較照合用文字列と個人識別文字列との比較照合の結果を示す図でThe figure which shows the result of the comparison collation with the character string for comparison collation and personal identification character string in this invention

Explanation of symbols

１情報処理装置
１１画像データ取得部
１２文字認識辞書データ格納部
１３文字認識部
１４アドレス情報取得部（アドレス情報取得手段）
１５比較照合用文字列取得部（比較照合用文字列手段）
１６識別文字列生成部（識別文字列生成手段）
１７文字列比較照合部（文字列比較照合手段）
１８項目分類決定部（同定手段） DESCRIPTION OF SYMBOLS 1 Information processing apparatus 11 Image data acquisition part 12 Character recognition dictionary data storage part 13 Character recognition part 14 Address information acquisition part (address information acquisition means)
15 Character string acquisition unit for comparison and collation (character string means for comparison and collation)
16 Identification character string generation unit (identification character string generation means)
17 Character string comparison / collation section (character string comparison / collation means)
18 Item classification determination unit (identification means)

Claims

An information processing apparatus that performs character recognition using a character recognition dictionary based on image information of a medium in which a character string is described, and acquires a character string described in the medium,
An address for acquiring address information that is at least one of mail address information that is a character string that represents an e-mail address and network address information that is a character string that represents a network address, from the character string obtained by the character recognition. Information acquisition means;
A comparison verification character string acquisition means for acquiring a plurality of comparison verification character strings that are character strings other than the mail address information and the network address information from the character string obtained by the character recognition;
Based on the address information acquired by the address information acquisition means, for generating an identification string is one of a company identification string for use in at least name personal identification string used to identify an, and identification of the company name An identification character string generating means;
A character string comparison and collation means for comparing and collating the identification character string generated by the identification character string generation means and each of the plurality of comparison and collation character strings acquired by the comparison and collation character string acquisition means;
Based on the result of comparison and collation by the character string comparison and collation means, the comparison and collation character string determined to be similar to the personal identification character string is identified as a character string representing a name, and similar to the company identification character string. the comparison collation string is determined, the information processing apparatus characterized by comprising a identification means you identified as a character string representing the company name.

The address information acquisition unit uses at least one of the specific character and the specific character string included in the mail address information or the network address information, so that at least the mail address information and the network address information The information processing apparatus according to claim 1, wherein any one of them is acquired.

The address information acquisition unit further deletes a character string that is clearly a character string representing a component other than a component of the mail address and the network address from the acquired mail address information or the network address information. The information processing apparatus according to claim 2, wherein address information or network address information is used.

The comparison verification character string acquisition means divides a character string other than the mail address information and the network address information, before and after a predetermined delimiter, among the character strings obtained by the character recognition. The information processing apparatus according to any one of claims 1 to 3, wherein a plurality of character strings for comparison and collation are acquired.

The comparison collation string obtaining means further from the plurality of comparative collation string, name and company name except for comparison and collation by deleting a clear string that is a character string representing the The information processing apparatus according to claim 4, wherein the information processing apparatus is a character string.

The identification character string generation unit generates the company identification character string included in the mail address information as a company identification character string when network address information is not acquired by the address information acquisition unit. The information processing apparatus according to any one of claims 1 to 5.

The identification character string generation means includes the mail address information before and after specific characters included between the user name and host name of the email address included in the mail address information acquired by the address information acquisition means. by splitting apparatus according to any one of claims 1 to 6, characterized in that for generating said company identification string and the personal identification string.

The identification string generating means, the information processing apparatus according to claim 7, wherein said one of the mail address information, be generated by the personal identification string in front of the string than the specific character .

The identification string generating means, information according to claim 7 or 8, characterized in that said one of the mail address information is generated as a company identification string behind the string than the specific character Processing equipment.

The identification string generation means, further the personal identification character string before and after the delimiter included in the e-mail address information acquired by the address information acquisition means or dividing the company identification string, a new personal identification The information processing apparatus according to claim 8, wherein the information processing apparatus is a character string or a company identification character string.

The identification string generating means, by dividing the network address information before and after the delimiter included in the network address information acquired by the address information acquisition means, and generates said company identification string The information processing apparatus according to any one of claims 1 to 10.

The character string comparison and collation unit is configured to forward each of the identification character string generated by the identification character string generation unit and each of the plurality of comparison and collation character strings acquired by the comparison and collation character string acquisition unit. 12. The comparison and collation from each other character, the comparison and collation from characters behind each other, and the comparison and collation while sequentially shifting each other's counterparts for comparison and collation one character at a time. The information processing apparatus according to any one of the above.

The character string comparison / collation means, when performing comparison / collation while sequentially shifting each other's comparison / collation partners one character at a time, one of the identification character string and the plurality of comparison / collation character strings having the smaller number of characters The information processing apparatus according to claim 12, wherein the information is compared and collated while sequentially shifting the characters one by one with respect to the other party.

The identification means, as a result of comparison and collation by the character string comparison and collation means, based on the number of characters that match between the identification character string and each of the plurality of comparison and collation character strings, The information processing apparatus according to any one of claims 1 to 13, wherein it is determined whether or not each of the plurality of character strings for comparison and collation is similar.

The identification means determines that the matching character string having the largest number of matching characters with respect to the personal identification character string is similar to the personal identification character string, and the number of matching characters corresponds to the company identification character string. the most common comparison collation string Te an information processing apparatus according to claim 14, characterized that you determined to be similar to the company identification string.

When the identification means determines that none of the plurality of comparison matching character strings is similar to the identification character string as a result of the comparison matching in the character string comparison matching means, the plurality of comparison matching character strings out of the satisfying comparison collation string respective predetermined information processing apparatus according to any one of claims 1, characterized in that the identified with name and company name to 15.

A method for controlling an information processing apparatus that performs character recognition using a character recognition dictionary based on image information of a medium in which a character string is described, and acquires the character string described in the medium,
From the character string obtained by the character recognition by the address information acquisition means, at least one of mail address information that is a character string representing an e-mail address and network address information that is a character string representing a network address An address information acquisition step for acquiring address information;
A comparison / matching character string for obtaining a plurality of comparison / matching character strings that are strings other than the mail address information and the network address information from the character string obtained by the character recognition by the comparison / matching character string acquisition means. Acquisition process;
The identification string generating means, based on the address information acquired by the address information acquisition means, any of the company identification string used to identify at least the name personal identification string used to identify an, and company name An identification character string generation step for generating an identification character string,
A character string for comparing and collating the identification character string generated by the identification character string generating unit and each of the plurality of comparison and collating character strings acquired by the comparison and matching character string acquiring unit by a character string comparison and collating unit A comparison and verification process;
Based on the result of comparison and collation in the character string comparison and collation step, the identification means identifies the comparison and collation character string determined to be similar to the personal identification character string as a character string representing a name, and the company identification character control method for a comparative collation string is determined to be similar to the column, characterized in that it comprises a identification step you identified as a character string representing the company name.

The control program for operating a computer as said each means with which the information processing apparatus of any one of Claim 1-16 is provided.

The computer-readable recording medium which recorded the control program of Claim 18.