JP5544693B2

JP5544693B2 - Data processing apparatus, data processing program, and data processing method

Info

Publication number: JP5544693B2
Application number: JP2008214539A
Authority: JP
Inventors: 晴人本間; 信一郎西澤; 博司諸星; 真彦永田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2008-08-22
Filing date: 2008-08-22
Publication date: 2014-07-09
Anticipated expiration: 2028-08-22
Also published as: JP2010049574A

Description

本発明は、データ処理装置、データ処理プログラムおよびデータ処理方法に関する。 The present invention relates to a data processing device, a data processing program, and a data processing method.

従来、コンピュータシステムでは、住所や、氏名、組織名などの所定のデータをデータベース等に記憶して管理している。この種のデータは、システム管理者や、利用者などの人間によって入力されることが多いため、入力される文字列の表記が統一されない場合がある。表記が不統一になる例としては、記載の一部省略や、異体字の使用などがある。例えば、同一の住所を入力する場合であっても、入力者Ａは、「東京都千代田区麹町１丁目」と入力するが、入力者Ｂは、「東京都千代田区こうじ町１丁目」と入力することがある。 Conventionally, computer systems store and manage predetermined data such as addresses, names, and organization names in a database or the like. Since this type of data is often input by a system administrator or a user such as a user, the notation of the input character string may not be unified. Examples of inconsistent notation include partial omission of descriptions and the use of variant characters. For example, even if the same address is input, the input person A inputs “Kyomachi 1-chome, Chiyoda-ku, Tokyo”, but the input person B inputs “1 Koji-cho, Chiyoda-ku, Tokyo”. There are things to do.

かかる場合、コンピュータシステムは、本来同一の内容を示すデータを、異なるデータとして管理する状態になる。このことは、コンピュータシステムがデータの照合などを行う場合に、本来同一のデータを異なるデータであると認識してしまうという問題を招く。 In such a case, the computer system is in a state of managing data originally showing the same contents as different data. This leads to a problem that the same data is originally recognized as different data when the computer system collates data.

そこで、近年では、データを統一するための技術がいくつか提案されている。例えば、所定の住所マスタに登録されている住所と、修正対象の住所とをキーとなる文字列に分解し、分解した各文字列を比較して、所定の条件の下、統一化されたデータを表示する技術が提案されている。また、例えば、修正対象の住所を住所階層に分割し、所定の住所マスタから建物名が一致する住所を抽出し、修正対象の住所を抽出した住所に補正する技術が提案されている。 In recent years, several techniques for unifying data have been proposed. For example, an address registered in a predetermined address master and an address to be corrected are decomposed into key character strings, and the decomposed character strings are compared, and unified data under predetermined conditions A technique for displaying is proposed. Further, for example, a technique has been proposed in which an address to be corrected is divided into address hierarchies, an address having a matching building name is extracted from a predetermined address master, and the address to be corrected is corrected to the extracted address.

特開２００１−１０９７５１号公報JP 2001-109751 A 特開２００８−１５８７３８号公報JP 2008-158738 A

しかしながら、上記従来技術には、統一されたデータを抽出する処理における負荷が増大し、かかる処理を高速に行えないという問題があった。具体的には、従来技術では、修正対象の住所を、都道府県や市区郡などの階層ごとに分割し、分割した各住所について、所定の住所マスタに登録されている住所と照合する処理を繰り返して行っていた。一般に、住所は、５個以上の階層に分割されることが多いため、従来技術では、照合処理を５回以上行う必要があった。このため、従来技術では、データ修正処理を行う装置にかかる処理負荷が増大し、データを統一するための処理に時間がかかっていた。 However, the conventional technique has a problem that the load in the process of extracting the unified data increases, and the process cannot be performed at high speed. Specifically, in the prior art, the address to be corrected is divided into hierarchies such as prefectures and municipalities, and each divided address is checked against an address registered in a predetermined address master. I went repeatedly. In general, since an address is often divided into five or more hierarchies, it has been necessary to perform collation processing five times or more in the related art. For this reason, in the prior art, the processing load on the apparatus that performs the data correction processing increases, and it takes time to process the data.

開示の技術は、上述した従来技術による問題点を解消するためになされたものであり、所定のデータを修正するためのデータを低負荷かつ高速に抽出することができるデータ処理装置、データ処理プログラムおよびデータ処理方法を提供することを目的とする。 The disclosed technology has been made to solve the above-described problems caused by the conventional technology, and is a data processing apparatus and data processing program capable of extracting data for correcting predetermined data at a low load and at high speed And it aims at providing a data processing method.

上述した課題を解決し、目的を達成するために、本願に開示するデータ処理装置は、所定の文字列と、前記所定の文字列を識別するための文字列識別子とを記憶する文字列記憶手段と、前記所定の文字列に含まれる文字列である文字列要素と、該文字列要素が含まれる前記所定の文字列を示す文字列識別子の集合とを記憶する文字列要素記憶手段と、前記文字列要素記憶手段から、処理対象の文字列に含まれる文字列要素に対応付けて記憶されている文字列識別子の集合を抽出する抽出手段と、前記抽出手段によって抽出された文字列識別子の集合に最も多く含まれている文字列識別子を、前記処理対象の文字列と一致または類似する文字列である対応文字列を示す文字列識別子に特定する第一の特定手段とを備えたことを要件とする。 In order to solve the above-described problems and achieve the object, a data processing device disclosed in the present application stores a predetermined character string and a character string identifier for identifying the predetermined character string. A character string element storage means for storing a character string element which is a character string included in the predetermined character string, and a set of character string identifiers indicating the predetermined character string including the character string element, Extraction means for extracting a set of character string identifiers stored in association with character string elements included in the character string to be processed from the character string element storage means, and a set of character string identifiers extracted by the extraction means And a first specifying means for specifying a character string identifier most often included in a character string identifier indicating a corresponding character string that is a character string that matches or is similar to the character string to be processed. And

なお、本願に開示するデータ処理装置の構成要素、表現または構成要素の任意の組合せを、方法、装置、システム、コンピュータプログラム、記録媒体、データ構造などに適用したものも、他の態様として有効である。 It should be noted that a method, an apparatus, a system, a computer program, a recording medium, a data structure, etc., in which any component, expression, or any combination of components disclosed in the present application is also effective as another aspect. is there.

本願に開示したデータ装置によれば、所定のデータを修正するためのデータを低負荷かつ高速に抽出することができるという効果を奏する。 According to the data device disclosed in the present application, it is possible to extract data for correcting predetermined data at a low load and at a high speed.

以下に、本願に開示するデータ処理装置、データ処理プログラムおよびデータ処理方法の実施例を図面に基づいて詳細に説明する。なお、以下の実施例では、本願に開示するデータ処理装置、データ処理プログラムおよびデータ処理方法が、データが統一されていない住所を表す文字列データを、表記が統一されている文字列データに修正（クレンジング）する例について説明する。ただし、本願に開示するデータ処理装置、データ処理プログラムおよびデータ処理方法は、氏名や、組織名、法人名、支店名などの住所以外のデータを、表記が統一されているデータに修正することもできる。特に、本願に開示するデータ処理装置、データ処理プログラムおよびデータ処理方法は、階層構造を持つデータを、表記が統一されているデータに修正することができる。 Hereinafter, embodiments of a data processing apparatus, a data processing program, and a data processing method disclosed in the present application will be described in detail with reference to the drawings. In the following embodiments, the data processing device, the data processing program, and the data processing method disclosed in the present application modify character string data representing an address whose data is not unified into character string data whose notation is unified. An example of (cleansing) will be described. However, the data processing apparatus, data processing program, and data processing method disclosed in the present application may correct data other than addresses such as name, organization name, corporation name, and branch name to data with a unified notation. it can. In particular, the data processing device, the data processing program, and the data processing method disclosed in the present application can correct data having a hierarchical structure to data having a unified notation.

まず、実施例１に係るデータ処理装置１００によるデータ修正処理の概要について説明する。データ処理装置１００は、表記が統一されている住所を表す文字列データ（以下、「正規住所データ」という）を所定の記憶部に記憶しておく。そして、データ処理装置１００は、表記が統一されているか否かが不明である所定の住所を表す文字列データ（以下、「処理対象住所データ」という）を受け付けた場合に、所定の記憶部から、処理対象住所データに対応する正規住所データを取得する。そして、データ処理装置１００は、処理対象住所データを、取得した正規住所データに書き換える。 First, an outline of data correction processing by the data processing apparatus 100 according to the first embodiment will be described. The data processing apparatus 100 stores character string data (hereinafter referred to as “regular address data”) representing addresses whose notations are unified in a predetermined storage unit. When the data processing apparatus 100 receives character string data representing a predetermined address (hereinafter, referred to as “processing target address data”) for which it is unknown whether or not the notation is unified, the data processing apparatus 100 receives the data from the predetermined storage unit. The regular address data corresponding to the processing target address data is acquired. Then, the data processing device 100 rewrites the processing target address data with the acquired regular address data.

なお、ここでいう「表記が統一されている住所を表す文字列データ」とは、システム管理者等によって予め決められている表記規則に則って表記された住所データを示す。また、「処理対象住所データに対応する正規住所データ」とは、処理対象住所データと一致する正規住所データ、または、処理対象住所データと最も類似する正規住所データを示す。以下では、「処理対象住所データに対応する正規住所データ」を「修正候補住所データ」と呼ぶこととする。 Here, “character string data representing an address whose notation is unified” indicates address data described according to a notation rule predetermined by a system administrator or the like. The “regular address data corresponding to the processing target address data” indicates the regular address data that matches the processing target address data, or the regular address data that is most similar to the processing target address data. Hereinafter, “regular address data corresponding to the processing target address data” is referred to as “correction candidate address data”.

図１を用いて、実施例１に係るデータ処理装置１００によるデータ修正処理の概要について具体的に説明する。図１は、実施例１に係るデータ処理装置１００によるデータ修正処理の概要を説明するための図である。データ処理装置１００は、住所記憶部１２１と、住所要素記憶部１２２とを有する。住所記憶部１２１は、表記が統一されている正規住所データと、かかる正規住所データを識別するための住所ＩＤとを対応付けて記憶する。 An outline of data correction processing by the data processing apparatus 100 according to the first embodiment will be specifically described with reference to FIG. FIG. 1 is a diagram for explaining an overview of data correction processing performed by the data processing apparatus 100 according to the first embodiment. The data processing apparatus 100 includes an address storage unit 121 and an address element storage unit 122. The address storage unit 121 stores therein regular address data whose notation is unified and an address ID for identifying the regular address data in association with each other.

住所要素記憶部１２２は、住所データを形成する都道府県、市区郡、政令区町村などの住所データの要素（以下、「住所要素」という）と、かかる住所要素に対応する正規住所データの住所ＩＤとを対応付けて記憶する。例えば、図１に示した例において、住所要素「神奈川県」は、住所記憶部１２１に記憶されている正規住所データのうち、住所ＩＤ「１」〜「４９９」に対応付けて記憶されている正規住所データに含まれる。かかる場合、住所要素記憶部１２２は、住所要素「神奈川県」と、住所ＩＤ「１」〜「４９９」とを対応付けて記憶する。 The address element storage unit 122 includes address data elements (hereinafter referred to as “address elements”) such as prefectures, municipalities, ordinance-designated towns and villages that form address data, and addresses of regular address data corresponding to the address elements. The ID is stored in association with each other. For example, in the example illustrated in FIG. 1, the address element “Kanagawa” is stored in association with the address IDs “1” to “499” in the regular address data stored in the address storage unit 121. Included in legitimate address data. In such a case, the address element storage unit 122 stores the address element “Kanagawa” and the address IDs “1” to “499” in association with each other.

なお、図１に示した住所記憶部１２１および住所要素記憶部１２２の構造はあくまで一例であって、住所記憶部１２１および住所要素記憶部１２２の詳細な構造については後述する。 The structures of the address storage unit 121 and the address element storage unit 122 illustrated in FIG. 1 are merely examples, and detailed structures of the address storage unit 121 and the address element storage unit 122 will be described later.

そして、データ処理装置１００は、処理対象住所データを受け付けた場合に、住所要素記憶部１２２に記憶されている住所要素によって形成されるオートマトンに対して、処理対象住所データを順次代入する。これにより、データ処理装置１００は、住所要素記憶部１２２に記憶されている住所要素と、処理対象住所データに含まれる住所要素との文字列照合を行う。データ処理装置１００は、かかる文字列照合の結果、住所要素記憶部１２２から、処理対象住所データに含まれる住所要素に対応付けて記憶されている住所ＩＤを取得する。 When the data processing device 100 receives the processing target address data, the data processing device 100 sequentially substitutes the processing target address data for the automaton formed by the address elements stored in the address element storage unit 122. Thereby, the data processing device 100 performs character string matching between the address element stored in the address element storage unit 122 and the address element included in the processing target address data. As a result of the character string matching, the data processing device 100 acquires the address ID stored in association with the address element included in the processing target address data from the address element storage unit 122.

図１に示した例では、データ処理装置１００は、処理対象住所データとして「神奈川県川崎市中原下小田中１丁目」を受け付けている。かかる場合、データ処理装置１００は、住所要素記憶部１２２に記憶されている住所要素によって形成されるオートマトンに対して、文字列「神」、「奈」、「川」、「県」、「川」、「崎」、「市」、「中」、「原」、「下」、「小」、「田」、「中」、「１」、「丁」、「目」を順次代入して照合する。 In the example shown in FIG. 1, the data processing apparatus 100 accepts “Nakahara Shimoda 1-chome, Kawasaki City, Kanagawa Prefecture” as the processing target address data. In such a case, the data processing apparatus 100 applies the character strings “god”, “na”, “river”, “prefecture”, “river” to the automaton formed by the address elements stored in the address element storage unit 122. ”,“ Saki ”,“ city ”,“ middle ”,“ hara ”,“ bottom ”,“ small ”,“ field ”,“ middle ”,“ 1 ”,“ cho ”,“ eyes ” Collate.

そして、データ処理装置１００は、オートマトンに対して、文字列「神」、「奈」、「川」、「県」を順次代入した場合に、住所要素記憶部１２２に記憶されている住所要素「神奈川県」と一致したことを検出する。続いて、データ処理装置１００は、住所要素「神奈川県」に対応付けて住所要素記憶部１２２に記憶されている住所ＩＤ「１−４９９」を取得する。同様にして、データ処理装置１００は、住所要素記憶部１２２から、住所要素「川崎市」に対応する住所ＩＤ「１−１００」と、住所要素「下小田中」に対応する住所ＩＤ「５−９」と、住所要素「１丁目」に対応する住所ＩＤ「１：５：１０１：・・・：５００：・・・」とを取得する。 When the data processing device 100 sequentially substitutes the character strings “god”, “na”, “river”, and “prefecture” for the automaton, the address element “ Detects a match with Kanagawa Prefecture. Subsequently, the data processing apparatus 100 acquires the address ID “1-499” stored in the address element storage unit 122 in association with the address element “Kanagawa”. Similarly, the data processing apparatus 100 reads from the address element storage unit 122 the address ID “1-100” corresponding to the address element “Kawasaki City” and the address ID “5-9” corresponding to the address element “Shimoodanaka”. And the address ID “1: 5: 101:...: 500:...” Corresponding to the address element “1 chome”.

続いて、データ処理装置１００は、住所要素記憶部１２２から取得した住所ＩＤの論理積を演算することにより、修正候補住所データを示す住所ＩＤを特定する。図１に示した例では、データ処理装置１００は、住所ＩＤの論理積演算の結果、修正候補住所データを示す住所ＩＤとして、住所ＩＤ「５」を特定する。続いて、データ処理装置１００は、住所記憶部１２１から、特定した住所ＩＤ「５」に対応する正規住所データ「神奈川県川崎市中原区下小田中１丁目」を取得する。 Subsequently, the data processing apparatus 100 specifies the address ID indicating the correction candidate address data by calculating the logical product of the address IDs acquired from the address element storage unit 122. In the example illustrated in FIG. 1, the data processing apparatus 100 specifies the address ID “5” as the address ID indicating the correction candidate address data as a result of the logical product operation of the address ID. Subsequently, the data processing apparatus 100 acquires from the address storage unit 121 regular address data “1-chome Shimoodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa” corresponding to the identified address ID “5”.

続いて、データ処理装置１００は、処理対象住所データ「神奈川県川崎市中原下小田中１丁目」を、取得した正規住所データ「神奈川県川崎市中原区下小田中１丁目」に書き換える。すなわち、処理対象住所データ「神奈川県川崎市中原下小田中１丁目」は、「中原」の後に文字列「区」が抜けていたが、データ処理装置１００によるデータ修正処理により、「神奈川県川崎市中原区下小田中１丁目」に修正される。 Subsequently, the data processing apparatus 100 rewrites the processing target address data “Nakahara Shimo Odanaka 1-chome, Kawasaki City, Kanagawa Prefecture” with the acquired regular address data “1 Shimo Oda Naka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture”. In other words, in the address data to be processed “Nakahara Shimoda 1-chome, Kawasaki City, Kanagawa Pref.”, The character string “ku” was missing after “Nakahara”. However, the data correction processing by the data processing apparatus 100 resulted in “Kawasaki City, Kanagawa Pref. "Nakahara-ku Shimo-Odanaka 1-chome"

このように、実施例１に係るデータ処理装置１００は、住所要素記憶部１２２に記憶されている住所要素によって形成されるオートマトンを用いて、処理対象住所データに含まれる住所要素と、住所要素記憶部１２２に記憶されている住所要素との文字列照合を行う。これにより、データ処理装置１００は、１回の文字列照合処理によって、住所要素記憶部１２２から、処理対象住所データに含まれる住所要素に対応付けて記憶されている住所ＩＤを取得することができる。 As described above, the data processing apparatus 100 according to the first embodiment uses the automaton formed by the address elements stored in the address element storage unit 122 to use the address elements included in the processing target address data and the address element storage. Character string matching with the address element stored in section 122 is performed. Thereby, the data processing apparatus 100 can acquire the address ID stored in association with the address element included in the processing target address data from the address element storage unit 122 by one character string matching process. .

また、データ処理装置１００は、住所要素記憶部１２２から取得した住所ＩＤの論理積を演算することにより、住所ＩＤを絞り込むので、１回の演算処理を行うだけで修正候補住所データを示す住所ＩＤを特定することができる。 In addition, the data processing apparatus 100 narrows down the address ID by calculating the logical product of the address IDs acquired from the address element storage unit 122. Therefore, the address ID indicating the correction candidate address data only by performing one calculation process. Can be specified.

すなわち、データ処理装置１００は、１回の文字列照合処理と、１回の論理演算処理を行うだけで修正候補住所データを特定することができる。その結果、データ処理装置１００は、低負荷かつ高速に処理対象住所データを修正候補住所データに修正することができる。 That is, the data processing apparatus 100 can specify the correction candidate address data by performing only one character string matching process and one logic operation process. As a result, the data processing apparatus 100 can correct the processing target address data to the correction candidate address data with low load and high speed.

なお、図１に示した例では、データ処理装置１００が、住所ＩＤの論理積を演算することにより、修正候補住所データを一意に特定する例を示したが、処理対象住所データによっては、住所ＩＤを一意に特定できずに複数の住所ＩＤを特定する場合がある。かかる場合、データ処理装置１００は、特定した複数の住所ＩＤが示す複数の正規住所データから、修正候補住所データを探索する。かかる探索処理については後述する。 In the example illustrated in FIG. 1, the data processing apparatus 100 uniquely identifies the correction candidate address data by calculating the logical product of the address IDs. However, depending on the processing target address data, the address may be A plurality of address IDs may be specified without being able to uniquely specify the ID. In such a case, the data processing device 100 searches for correction candidate address data from a plurality of regular address data indicated by a plurality of identified address IDs. Such search processing will be described later.

次に、実施例１に係るデータ処理装置１００の構成について説明する。図２は、実施例１に係るデータ処理装置１００の構成を示す図である。図２に示すように、データ処理装置１００は、対象データ入出力部１１０と、記憶部１２０と、制御部１３０とを有する。 Next, the configuration of the data processing apparatus 100 according to the first embodiment will be described. FIG. 2 is a diagram illustrating the configuration of the data processing apparatus 100 according to the first embodiment. As shown in FIG. 2, the data processing apparatus 100 includes a target data input / output unit 110, a storage unit 120, and a control unit 130.

対象データ入出力部１１０は、処理対象住所データを所定の処理部へ出力したり、修正候補住所データを所定の処理部から受け付けたりする。例えば、対象データ入出力部１１０は、表記が統一されていない住所データが記憶されている所定の記憶部から住所データを１レコードずつ取得して、受付部１３１へ順次出力する。また、例えば、対象データ入出力部１１０は、ネットワークを介して、他の情報処理装置から表記が統一されていない処理対象住所データを受信して、受信した処理対象住所データを受付部１３１へ出力する。そして、対象データ入出力部１１０は、書換部１３４から、受付部１３１へ出力した処理対象住所データを修正するための修正候補住所データを受け付ける。 The target data input / output unit 110 outputs the processing target address data to a predetermined processing unit, and receives correction candidate address data from the predetermined processing unit. For example, the target data input / output unit 110 acquires address data one record at a time from a predetermined storage unit in which address data whose notation is not unified is stored, and sequentially outputs the address data to the reception unit 131. Further, for example, the target data input / output unit 110 receives processing target address data whose notation is not unified from another information processing apparatus via the network, and outputs the received processing target address data to the reception unit 131. To do. Then, the target data input / output unit 110 receives from the rewriting unit 134 correction candidate address data for correcting the processing target address data output to the reception unit 131.

記憶部１２０は、各種情報を記憶する記憶デバイスであり、住所記憶部１２１と、住所要素記憶部１２２と、構成情報記憶部１２３と、ログ記憶部１２４とを有する。住所記憶部１２１は、上述したように、主に、正規住所データと、かかる正規住所データを識別するための住所ＩＤとを対応付けて記憶する。 The storage unit 120 is a storage device that stores various types of information, and includes an address storage unit 121, an address element storage unit 122, a configuration information storage unit 123, and a log storage unit 124. As described above, the address storage unit 121 mainly stores the regular address data and the address ID for identifying the regular address data in association with each other.

住所記憶部１２１の一例を図３に示す。図３に示すように、住所記憶部１２１は、住所ＩＤ、住所、郵便番号、フリガナといった項目を有する。項目「住所ＩＤ」は、住所を識別するための識別子を記憶する。項目「住所」は、表記が統一されている正規住所データを記憶する。項目「郵便番号」は、対応する項目「住所」に記憶されている正規住所データの郵便番号を記憶する。項目「フリガナ」は、対応する項目「住所」に記憶されている正規住所データのフリガナ（読み仮名）を記憶する。 An example of the address storage unit 121 is shown in FIG. As illustrated in FIG. 3, the address storage unit 121 includes items such as an address ID, an address, a postal code, and a reading. The item “address ID” stores an identifier for identifying an address. The item “address” stores regular address data whose notation is unified. The item “zip code” stores the zip code of the regular address data stored in the corresponding item “address”. The item “Reading” stores the reading (reading pseudonym) of the regular address data stored in the corresponding item “Address”.

すなわち、図３に示した住所記憶部１２１の１行目は、住所ＩＤ「１」の正規住所データが「神奈川県川崎市中原区上小田中１丁目」であり、郵便番号が「２１１−００５３」であり、フリガナが「カナガワケンカワサキシナカハラク・・・」であることを示している。また、図３に示した住所記憶部１２１の２行目は、住所ＩＤ「２」の正規住所データが「神奈川県川崎市中原区上小田中２丁目」であり、郵便番号が「２１１−００５３」であり、フリガナが「カナガワケンカワサキシナカハラク・・・」であることを示している。 That is, in the first line of the address storage unit 121 shown in FIG. 3, the regular address data of the address ID “1” is “1st Kamiodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture”, and the postal code is “211-0053”. It is shown that the reading is “Kanagawa Kenkawa Saskina Kahalak…”. In the second row of the address storage unit 121 shown in FIG. 3, the regular address data of the address ID “2” is “2-chome Kamiodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture”, and the postal code is “211-0053”. It is shown that the reading is “Kanagawa Kenkawa Saskina Kahalak…”.

住所要素記憶部１２２は、上述したように、主に、階層的（都道府県、市区郡等ごと）に分割された住所データの要素である住所要素と、かかる住所要素に対応する正規住所データを示す住所ＩＤとを対応付けて記憶する。 As described above, the address element storage unit 122 mainly includes an address element that is an element of address data divided hierarchically (for each prefecture, city, county, etc.) and regular address data corresponding to the address element. Are stored in association with each other.

住所要素記憶部１２２の一例を図４に示す。図４に示すように、住所要素記憶部１２２は、住所要素、住所ＩＤ、タイプといった項目を有する。項目「住所要素」は、都道府県や、市区郡、政令区町村、町名字、丁目字などの住所データを形成する要素を記憶する。項目「住所ＩＤ」は、図３に示した住所記憶部１２１が有する項目「住所ＩＤ」に対応し、住所要素に対応する正規住所データを示す住所ＩＤを記憶する。項目「タイプ」は、対応する項目「住所要素」に記憶されている住所要素が、都道府県や、市区郡、政令区町村、町名字、丁目字などのいずれに該当するかを示す情報を記憶する。なお、図４中の住所ＩＤに表記した「Ｎ−Ｍ」は、住所ＩＤ「Ｎ」〜「Ｍ」であることを示す。また、図４中の住所ＩＤに表記した「Ｎ：Ｍ」は、住所ＩＤ「Ｎ」および「Ｍ」であることを示す。 An example of the address element storage unit 122 is shown in FIG. As shown in FIG. 4, the address element storage unit 122 includes items such as an address element, an address ID, and a type. The item “address element” stores elements forming address data such as prefectures, municipalities, government-designated towns, town names, and chome characters. The item “address ID” corresponds to the item “address ID” included in the address storage unit 121 illustrated in FIG. 3, and stores an address ID indicating regular address data corresponding to the address element. The item “type” is information indicating whether the address element stored in the corresponding item “address element” corresponds to a prefecture, a municipality, a government-designated town, a town name, a chome character, or the like. Remember. Note that “NM” written in the address ID in FIG. 4 indicates the address IDs “N” to “M”. Further, “N: M” written in the address ID in FIG. 4 indicates the address IDs “N” and “M”.

すなわち、図４に示した住所要素記憶部１２２の１行目は、住所要素「神奈川県」に対応する正規住所データは、住所ＩＤが「１」〜「４９９」であり、住所要素「神奈川県」は、タイプが「都道府県」であることを示している。また、図４に示した住所要素記憶部１２２の１２行目は、住所要素「１丁目」に対応する正規住所データは、住所ＩＤが「１」、「５」、「１０１」、・・・、「５００」、・・・であり、住所要素「１丁目」は、タイプが「丁目字」であることを示している。 That is, the first line of the address element storage unit 122 shown in FIG. 4 is that the regular address data corresponding to the address element “Kanagawa” has the address IDs “1” to “499” and the address element “Kanagawa” "Indicates that the type is" prefecture ". Further, the 12th line of the address element storage unit 122 shown in FIG. 4 indicates that the regular address data corresponding to the address element “1 chome” has the address IDs “1”, “5”, “101”,. , “500”,..., And the address element “1 chome” indicates that the type is “chome character”.

構成情報記憶部１２３は、主に、住所ＩＤと、かかる住所ＩＤが示す正規住所データを形成する住所要素のタイプとを対応付けて記憶する。構成情報記憶部１２３の一例を図５に示す。図５に示すように、構成情報記憶部１２３は、住所ＩＤ、構成情報といった項目を有する。項目「住所ＩＤ」は、図３に示した住所記憶部１２１が有する項目「住所ＩＤ」に対応する。項目「構成情報」は、対応する住所ＩＤが示す正規住所データが、都道府県や、市区郡、政令区町村、町名字、丁目字などのいずれによって形成されているかを示す情報を記憶する。なお、図５に例示した構成情報記憶部１２３では、都道府県や、市区郡、政令区町村、町名字、丁目字などを「｜」によって区切って項目「構成情報」に記憶する。 The configuration information storage unit 123 mainly stores an address ID and an address element type forming regular address data indicated by the address ID in association with each other. An example of the configuration information storage unit 123 is shown in FIG. As illustrated in FIG. 5, the configuration information storage unit 123 includes items such as an address ID and configuration information. The item “address ID” corresponds to the item “address ID” included in the address storage unit 121 illustrated in FIG. 3. The item “configuration information” stores information indicating whether the regular address data indicated by the corresponding address ID is formed by prefectures, municipalities, government-designated towns, town names, chome characters, or the like. In the configuration information storage unit 123 illustrated in FIG. 5, prefectures, municipalities, government-designated towns, town names, chome characters, and the like are separated by “|” and stored in the item “configuration information”.

すなわち、図５に示した構成情報記憶部１２３の１行目は、住所ＩＤ「１」が示す住所データは、都道府県、市区郡、政令区町村、町名字、丁目字によって形成されていることを示す。また、図５に示した構成情報記憶部１２３の６行目は、住所ＩＤ「５００」が示す住所データは、都道府県、市区郡、政令区町村、丁目字によって形成されていることを示す。 That is, in the first line of the configuration information storage unit 123 shown in FIG. 5, the address data indicated by the address ID “1” is formed by prefectures, municipalities, government-designated municipalities, town names, and chome characters. It shows that. Further, the sixth line of the configuration information storage unit 123 shown in FIG. 5 indicates that the address data indicated by the address ID “500” is formed by prefectures, municipalities, government-designated municipalities, and chome characters. .

ログ記憶部１２４は、ログ情報を記憶し、例えば、テキストファイルや、データベースである。かかるログ記憶部１２４は、後述する書換部１３４によって、データ修正処理に関する各種ログ情報が記憶される。 The log storage unit 124 stores log information and is, for example, a text file or a database. In the log storage unit 124, various log information related to the data correction process is stored by the rewriting unit 134 described later.

ここで、図６を用いて、上述した住所要素記憶部１２２を生成する手法について説明する。図６は、住所要素記憶部１２２を生成する手法を説明するための図である。図６に示すように、データ処理装置１００は、まず、住所記憶部１２１に記憶されている各種情報のように、各住所に住所ＩＤを付与する（ステップＳ１１）。 Here, a method of generating the address element storage unit 122 described above will be described with reference to FIG. FIG. 6 is a diagram for explaining a method for generating the address element storage unit 122. As shown in FIG. 6, the data processing apparatus 100 first assigns an address ID to each address like various information stored in the address storage unit 121 (step S <b> 11).

続いて、データ処理装置１００は、各住所データを住所要素に分解する。そして、データ処理装置１００は、図６に示すように、住所要素が含まれる住所データを示す住所ＩＤの範囲を、住所要素ごとに抽出する（ステップＳ１２）。そして、データ処理装置１００は、各住所要素と、抽出した住所ＩＤとを対応づけることによって、図４に示したような住所要素記憶部１２２を生成する。ここでは、データ処理装置１００が住所要素記憶部１２２を生成するものとして説明したが、外部で作成して、データ処理装置１００に与えてもよい。 Subsequently, the data processing apparatus 100 decomposes each address data into address elements. Then, as illustrated in FIG. 6, the data processing device 100 extracts the range of the address ID indicating the address data including the address element for each address element (Step S <b> 12). Then, the data processing apparatus 100 generates the address element storage unit 122 as shown in FIG. 4 by associating each address element with the extracted address ID. Here, the data processing apparatus 100 has been described as generating the address element storage unit 122, but it may be created externally and given to the data processing apparatus 100.

制御部１３０は、データ処理装置１００を全体制御し、受付部１３１と、一次探索部１３２と、二次探索部１３３と、書換部１３４とを有する。受付部１３１は、対象データ入出力部１１０から出力された処理対象住所データを受け付ける。 The control unit 130 controls the data processing apparatus 100 as a whole, and includes a reception unit 131, a primary search unit 132, a secondary search unit 133, and a rewrite unit 134. The accepting unit 131 accepts the processing target address data output from the target data input / output unit 110.

一次探索部１３２は、受付部１３１によって受け付けられた処理対象住所データを修正するための修正候補住所データを探索する。具体的には、一次探索部１３２は、抽出部１３２ａと、一次特定部１３２ｂとを有する。 The primary search unit 132 searches for correction candidate address data for correcting the processing target address data received by the reception unit 131. Specifically, the primary search unit 132 includes an extraction unit 132a and a primary specification unit 132b.

抽出部１３２ａは、住所要素記憶部１２２に記憶されている住所ＩＤのうち、処理対象住所データに含まれる住所要素に対応付けて記憶されている住所ＩＤを抽出する。具体的には、抽出部１３２ａは、住所要素記憶部１２２に記憶されている住所要素により形成されるオートマトンに対して、処理対象住所データを順次代入することによって文字列照合を行う。そして、抽出部１３２ａは、住所要素記憶部１２２に記憶されている住所要素と、処理対象住所データに含まれる住所要素とが一致する住所ＩＤを住所要素記憶部１２２から抽出する。 The extraction unit 132a extracts the address ID stored in association with the address element included in the processing target address data from among the address IDs stored in the address element storage unit 122. Specifically, the extraction unit 132a performs character string matching by sequentially substituting the processing target address data for the automaton formed by the address elements stored in the address element storage unit 122. Then, the extraction unit 132a extracts, from the address element storage unit 122, an address ID in which the address element stored in the address element storage unit 122 matches the address element included in the processing target address data.

例えば、受付部１３１によって受け付けられた処理対象住所データが「神奈川県川崎市中原区下小田中１丁目」であり、住所要素記憶部１２２に記憶されている各種情報が、図４に示した状態であるものとする。かかる場合、抽出部１３２ａは、住所要素記憶部１２２から、処理対象住所データに含まれる「神奈川県」に対応付けて記憶されている住所ＩＤ「１−４９９」を抽出する。同様に、抽出部１３２ａは、住所要素記憶部１２２から、「川崎市」に対応付けて記憶されている住所ＩＤ「１−１００」と、「中原区」に対応付けて記憶されている住所ＩＤ「１−２０」とを抽出する。また、抽出部１３２ａは、住所要素記憶部１２２から、「下小田中」に対応付けて記憶されている住所ＩＤ「５−９」と、「１丁目」に対応付けて記憶されている住所ＩＤ「１：５：１０１：・・・：５００：・・・」とを抽出する。 For example, the processing target address data received by the receiving unit 131 is “1st Shimoodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture”, and various information stored in the address element storage unit 122 is in the state shown in FIG. It shall be. In this case, the extraction unit 132a extracts the address ID “1-499” stored in association with “Kanagawa Prefecture” included in the processing target address data from the address element storage unit 122. Similarly, the extraction unit 132a stores the address ID “1-100” stored in association with “Kawasaki City” and the address ID stored in association with “Nakahara Ward” from the address element storage unit 122. “1-20” is extracted. In addition, the extraction unit 132a stores the address ID “5-9” stored in association with “Shimoodanaka” and the address ID “1chome” stored in association with “1 chome” from the address element storage unit 122. 1: 5: 101: ...: 500: ... ".

このように、抽出部１３２ａは、オートマトンを用いて文字列照合処理を行うので、１回の文字列照合処理によって、住所要素記憶部１２２から住所ＩＤを抽出することができる。なお、抽出部１３２ａは、例えば、文字列照合エンジン「ＳＩＧＭＡ」を用いて、上記文字列照合処理を行う。 Thus, since the extraction unit 132a performs the character string matching process using the automaton, the address ID can be extracted from the address element storage unit 122 by one character string matching process. The extraction unit 132a performs the character string matching process using, for example, a character string matching engine “SIGMA”.

一次特定部１３２ｂは、処理対象住所データを修正するための修正候補住所データを示す住所ＩＤを特定する。具体的には、一次特定部１３２ｂは、抽出部１３２ａによって抽出された住所ＩＤの集合の論理積を演算することにより住所ＩＤを特定する。言い換えれば、一次特定部１３２ｂは、抽出部１３２ａによって抽出された住所ＩＤの集合に最も多く含まれている住所ＩＤを特定する。そして、一次特定部１３２ｂは、特定した住所ＩＤが１個である場合、かかる住所ＩＤを書換部１３４へ出力する。一方、一次特定部１３２ｂは、特定した住所ＩＤが複数である場合、かかる複数の住所ＩＤを二次探索部１３３へ出力する。 The primary specification part 132b specifies address ID which shows the correction candidate address data for correcting process target address data. Specifically, the primary specifying unit 132b specifies an address ID by calculating a logical product of a set of address IDs extracted by the extracting unit 132a. In other words, the primary specifying unit 132b specifies the address ID that is most frequently included in the set of address IDs extracted by the extracting unit 132a. Then, when the identified address ID is one, the primary identification unit 132b outputs the address ID to the rewriting unit 134. On the other hand, when there are a plurality of specified address IDs, the primary specifying unit 132b outputs the plurality of address IDs to the secondary search unit 133.

ここで、図７を用いて、一次特定部１３２ｂによる住所ＩＤ特定処理について具体的に説明する。図７は、一次特定部１３２ｂによる住所ＩＤ特定処理を説明するための図である。なお、図７に示した例では、上記例と同様に、受付部１３１によって受け付けられた処理対象住所データが「神奈川県川崎市中原区下小田中１丁目」であるものとする。図７の上段には、抽出部１３２ａによって抽出された住所ＩＤを示す。なお、ここでは、説明の便宜上、住所ＩＤに加えて、住所要素とタイプとを図示する。 Here, the address ID specifying process by the primary specifying unit 132b will be specifically described with reference to FIG. FIG. 7 is a diagram for explaining the address ID specifying process by the primary specifying unit 132b. In the example shown in FIG. 7, it is assumed that the processing target address data received by the receiving unit 131 is “1-chome Shimo-Odanaka, Nakahara-ku, Kawasaki-shi, Kanagawa” as in the above example. The upper part of FIG. 7 shows the address ID extracted by the extraction unit 132a. Here, for convenience of explanation, an address element and a type are shown in addition to the address ID.

図７に示した例において、一次特定部１３２ｂは、抽出部１３２ａによって抽出された住所ＩＤ「１−４９９」、「１−１００」、「１−２０」、「５−９」および「１：５：１０１：・・・：５００：・・・」の論理積を演算する。図７の下段には、各住所要素に対応する住所ＩＤの範囲を、斜線を付した矩形の幅によって示す。図７の下段に示すように、一次特定部１３２ｂは、論理積演算の結果、抽出部１３２ａによって抽出された住所ＩＤの集合に最も多く含まれる住所ＩＤとして、住所ＩＤ「５」を特定する。 In the example illustrated in FIG. 7, the primary identification unit 132b includes the address IDs “1-499”, “1-100”, “1-20”, “5-9”, and “1:” extracted by the extraction unit 132a. 5: 101: ...: 500: ... "is calculated. In the lower part of FIG. 7, the range of address IDs corresponding to each address element is indicated by the width of a rectangle with diagonal lines. As illustrated in the lower part of FIG. 7, the primary specifying unit 132b specifies the address ID “5” as the address ID that is most frequently included in the set of address IDs extracted by the extracting unit 132a as a result of the AND operation.

二次探索部１３３は、一次特定部１３２ｂから入力された複数の住所ＩＤが示す複数の住所データから、修正候補住所データを探索する。具体的には、二次探索部１３３は、Ｎグラム法（「Ｎ文字インデックス法」とも呼ばれる）により、修正候補住所データを探索し、分割部１３３ａと、二次特定部１３３ｂとを有する。 The secondary search unit 133 searches for correction candidate address data from a plurality of address data indicated by a plurality of address IDs input from the primary specifying unit 132b. Specifically, the secondary search unit 133 searches for correction candidate address data by the N-gram method (also referred to as “N-character index method”), and includes a dividing unit 133a and a secondary specifying unit 133b.

分割部１３３ａは、処理対象住所データを所定の文字数ごとに分割する。なお、実施例１において、二次探索部１３３は、Ｎグラム法の一種であるバイグラムを用いるものとする。また、分割部１３３ａは、分割後の各文字列が前後の文字列と１文字ずつ重複するように、処理対象住所データを分割するものとする。すなわち、分割部１３３ａは、処理対象住所データを、１文字ずつ重複させながら２文字単位に分割する。 The dividing unit 133a divides the processing target address data every predetermined number of characters. In the first embodiment, the secondary search unit 133 uses a bigram which is a kind of N-gram method. Further, the dividing unit 133a divides the processing target address data so that each divided character string overlaps the preceding and following character strings one character at a time. That is, the dividing unit 133a divides the processing target address data into two character units while overlapping each character.

図８を用いて、分割部１３３ａによる処理対象住所データの分割処理について具体的に説明する。図８は、分割部１３３ａによる処理対象住所データの分割処理を説明するための図である。図８に示した例では、処理対象住所データが「神奈川県川崎市中原区下小田１丁目」であるものとする。かかる場合、分割部１３３ａは、図８に示すように、処理対象住所データを、「神奈」、「奈川」、「川県」、・・・、「田１」、「１丁」、「丁目」に分割する。 The dividing process of the processing target address data by the dividing unit 133a will be specifically described with reference to FIG. FIG. 8 is a diagram for explaining the division processing of the processing target address data by the division unit 133a. In the example illustrated in FIG. 8, it is assumed that the processing target address data is “1 Chome, Shimooda, Nakahara-ku, Kawasaki City, Kanagawa Prefecture”. In such a case, as shown in FIG. 8, the dividing unit 133a converts the processing target address data into “Kana”, “Nagawa”, “Kawa Prefecture”,..., “Ta 1”, “1 Chome”, “Chome”. ”.

二次特定部１３３ｂは、一次特定部１３２ｂから入力された複数の住所ＩＤが示す複数の正規住所データから、修正候補住所データを特定する。具体的には、二次特定部１３３ｂは、住所記憶部１２１から、一次特定部１３２ｂから入力された複数の住所ＩＤに対応付けて記憶されている正規住所データを住所ＩＤごとに取得する。続いて、二次特定部１３３ｂは、分割部１３３ａによって分割された文字列により形成されるオートマトンに対して、住所記憶部１２１から取得した正規住所データを順次代入する。これにより、二次特定部１３３ｂは、分割部１３３ａによって分割された文字列と、住所記憶部１２１から取得した正規住所データに含まれる文字列とを順次照合する。 The secondary specifying unit 133b specifies correction candidate address data from a plurality of regular address data indicated by a plurality of address IDs input from the primary specifying unit 132b. Specifically, the secondary specifying unit 133b acquires, for each address ID, the regular address data stored in association with the plurality of address IDs input from the primary specifying unit 132b from the address storage unit 121. Subsequently, the secondary specifying unit 133b sequentially substitutes the regular address data acquired from the address storage unit 121 for the automaton formed by the character strings divided by the dividing unit 133a. Thereby, the secondary identification unit 133b sequentially collates the character string divided by the dividing unit 133a with the character string included in the regular address data acquired from the address storage unit 121.

続いて、二次特定部１３３ｂは、住所記憶部１２１から取得した正規住所データのうち、分割部１３３ａによって分割された文字列と一致する文字列の数が最も多い正規住所データを特定する。そして、二次特定部１３３ｂは、かかる照合処理（バイグラム）によって特定した正規住所データに対応する住所ＩＤを書換部１３４へ出力する。 Subsequently, the secondary specifying unit 133b specifies the normal address data having the largest number of character strings that match the character strings divided by the dividing unit 133a among the normal address data acquired from the address storage unit 121. Then, the secondary specifying unit 133b outputs the address ID corresponding to the regular address data specified by the matching process (bigram) to the rewriting unit 134.

ここで、図９を用いて、二次特定部１３３ｂによる住所データ特定処理について具体的に説明する。図９は、二次特定部１３３ｂによる住所データ特定処理を説明するための図である。図９では、図８に示した例と同様に、分割部１３３ａによって処理対象住所データ「神奈川県川崎市中原区下小田１丁目」が分割されたものとする。また、二次特定部１３３ｂは、一次特定部１３２ｂから住所ＩＤ「１」および「５」を入力されたものとする。すなわち、二次特定部１３３ｂは、住所記憶部１２１から正規住所データ「神奈川県川崎市中原区上小田中１丁目」と、「神奈川県川崎市中原区下小田中１丁目」とを取得する。 Here, the address data specifying process by the secondary specifying unit 133b will be specifically described with reference to FIG. FIG. 9 is a diagram for explaining address data specifying processing by the secondary specifying unit 133b. In FIG. 9, as in the example shown in FIG. 8, it is assumed that the processing target address data “1-chome Shimooda, Nakahara-ku, Kawasaki-shi, Kanagawa” is divided by the dividing unit 133a. Also, it is assumed that the secondary identification unit 133b has received the address IDs “1” and “5” from the primary identification unit 132b. That is, the secondary specifying unit 133b acquires the regular address data “1 Kamiodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa” and “1-chome Shimo-Odanaka, Nakahara-ku, Kawasaki-shi, Kanagawa” from the address storage unit 121.

かかる場合、二次特定部１３３ｂは、分割部１３３ａによって分割された文字列により形成されるオートマトンに対して、正規住所データ「神奈川県川崎市中原区上小田中１丁目」を順次代入する。これにより、二次特定部１３３ｂは、分割部１３３ａによって分割された文字列と、正規住所データ「神奈川県川崎市中原区上小田中１丁目」に含まれる文字列とを順次照合する。図９に示した例では、二次特定部１３３ｂは、分割部１３３ａによって分割された文字列のうち、１２個の文字列（「神奈」、「奈川」、「川県」、「県川」、「川崎」、「崎市」、「市中」、「中原」、「原区」、「小田」、「１丁」、「丁目」）が、正規住所データ「神奈川県川崎市中原区上小田中１丁目」に含まれる文字列と一致することを検出する。 In such a case, the secondary specifying unit 133b sequentially substitutes the regular address data “Kamiodachi 1-chome, Nakahara-ku, Kawasaki-shi, Kanagawa” for the automaton formed by the character string divided by the dividing unit 133a. As a result, the secondary specifying unit 133b sequentially collates the character string divided by the dividing unit 133a with the character string included in the legitimate address data “Kamiodaku 1-chome, Nakahara-ku, Kanagawa Prefecture”. In the example illustrated in FIG. 9, the secondary identification unit 133 b includes 12 character strings (“Kana”, “Nagawa”, “Kawa Prefecture”, “Kenagawa”) among the character strings divided by the dividing unit 133 a. , “Kawasaki”, “Saki City”, “Ichichu”, “Nakahara”, “Hara-ku”, “Oda”, “1”, “Chome”) is the official address data “Nakahara-ku, Kawasaki-shi, Kanagawa It is detected that it matches the character string included in “Odanaka 1-chome”.

続いて、二次特定部１３３ｂは、分割部１３３ａによって分割された文字列により形成されるオートマトンに対して、正規住所データ「神奈川県川崎市中原区下小田中１丁目」を順次代入する。これにより、二次特定部１３３ｂは、分割部１３３ａによって分割された文字列と、住所データ「神奈川県川崎市中原区下小田中１丁目」に含まれる文字列とを順次照合する。図９に示した例では、二次特定部１３３ｂは、分割部１３３ａによって分割された文字列のうち、１４個の文字列（「神奈」、「奈川」、「川県」、「県川」、「川崎」、「崎市」、「市中」、「中原」、「原区」、「区下」、「下小」、「小田」、「１丁」、「丁目」）が、正規住所データ「神奈川県川崎市中原区上小田中１丁目」に含まれる文字列と一致することを検出する。 Subsequently, the secondary specifying unit 133b sequentially substitutes the regular address data “1-chome Shimo-Odanaka, Nakahara-ku, Kawasaki-shi, Kanagawa” for the automaton formed by the character string divided by the dividing unit 133a. Thereby, the secondary specific | specification part 133b collates sequentially the character string divided | segmented by the division | segmentation part 133a, and the character string contained in address data "Kamozaki-shi, Nakahara-ku, Shimo-Odanaka 1-chome". In the example shown in FIG. 9, the secondary specifying unit 133 b includes 14 character strings (“Kana”, “Nagawa”, “Kawa Prefecture”, “Kenagawa”) among the character strings divided by the dividing unit 133 a. , “Kawasaki”, “Saki City”, “Munich”, “Nakahara”, “Hara Ward”, “Ward Shimo”, “Shimoko”, “Oda”, “1”, “Chome”) are regular It is detected that it matches the character string contained in the address data “Kamiodachi 1-chome, Nakahara-ku, Kawasaki-shi, Kanagawa”

続いて、二次特定部１３３ｂは、正規住所データ「神奈川県川崎市中原区上小田中１丁目」よりも、正規住所データ「神奈川県川崎市中原区下小田中１丁目」の方が、分割部１３３ａによって分割された文字列と一致する文字列の数が多いと判定する。そして、二次特定部１３３ｂは、正規住所データ「神奈川県川崎市中原区下小田中１丁目」を、修正候補住所データに特定して、対応する住所ＩＤ「５」を書換部１３４へ出力する。 Subsequently, in the secondary identification unit 133b, the regular address data "1 Shimo-Odanaka 1-chome, Nakahara-ku, Kawasaki-shi, Kanagawa" is more divided than the regular address data "1-chome Nakagami-ku, Kawasaki-shi, Kanagawa" 133a It is determined that the number of character strings that match the character string divided by is large. Then, the secondary specifying unit 133b specifies the regular address data “1st Shimoodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa” as the correction candidate address data, and outputs the corresponding address ID “5” to the rewriting unit 134.

このように、二次特定部１３３ｂは、オートマトンを用いて文字列照合処理を行うので、１回の文字列照合処理によって、修正候補住所データを特定することができる。なお、二次特定部１３３ｂは、抽出部１３２ａと同様に、例えば、文字列照合エンジン「ＳＩＧＭＡ」を用いて、上記文字列照合処理を行う。 Thus, since the secondary specific | specification part 133b performs a character string collation process using an automaton, it can identify correction candidate address data by one character string collation process. In addition, the secondary specific | specification part 133b performs the said character string collation process using the character string collation engine "SIGMA" similarly to the extraction part 132a, for example.

書換部１３４は、処理対象住所データを、一次特定部１３２ｂまたは二次特定部１３３ｂによって特定された正規住所データに書き換えるとともに、データ修正処理に関する各種ログ情報をログ記憶部１２４に記憶させる。 The rewriting unit 134 rewrites the processing target address data with the regular address data specified by the primary specifying unit 132b or the secondary specifying unit 133b, and stores various log information related to the data correction processing in the log storage unit 124.

具体的には、書換部１３４は、一次特定部１３２ｂから住所ＩＤを受け付けた場合に、住所記憶部１２１から、かかる住所ＩＤに対応付けて記憶されている正規住所データを取得する。続いて、書換部１３４は、所定の記憶部に記憶されている処理対象住所データを、住所記憶部１２１から取得した正規住所データ（修正候補住所データ）に書き換える。このとき、書換部１３４は、住所データ書換処理を実行した日時（年月日時分秒）や、処理対象住所データ、修正候補住所データ、処理対象住所データと修正候補住所データとの差分などをログ記憶部１２４に記憶させる。 Specifically, when the rewriting unit 134 receives an address ID from the primary specifying unit 132b, the rewriting unit 134 acquires the regular address data stored in association with the address ID from the address storage unit 121. Subsequently, the rewrite unit 134 rewrites the processing target address data stored in the predetermined storage unit to the regular address data (correction candidate address data) acquired from the address storage unit 121. At this time, the rewriting unit 134 logs the date and time (year / month / day / hour / minute / second) when the address data rewriting process is executed, the processing target address data, the correction candidate address data, the difference between the processing target address data and the correction candidate address data, and the like. The data is stored in the storage unit 124.

また、書換部１３４は、二次特定部１３３ｂから住所ＩＤを受け付けた場合に、受け付けた住所ＩＤの数を判定する。二次特定部１３３ｂから１個の住所ＩＤの数を受け付けた場合、書換部１３４は、所定の記憶部に記憶されている処理対象住所データを、二次特定部１３３ｂから受け付けた住所データ（修正候補住所データ）に書き換えるとともに、住所データ書換処理に関する各種ログ情報をログ記憶部１２４に記憶させる。 Moreover, the rewriting part 134 determines the number of received address ID, when address ID is received from the secondary specific | specification part 133b. When the number of one address ID is received from the secondary specifying unit 133b, the rewriting unit 134 converts the processing target address data stored in the predetermined storage unit into the address data (corrected) received from the secondary specifying unit 133b. (Candidate address data) and various log information related to the address data rewriting process are stored in the log storage unit 124.

一方、二次特定部１３３ｂから複数の住所ＩＤを受け付けた場合、書換部１３４は、データ修正処理を行うことなく、住所データ書換処理を実行した日時や、処理対象住所データ、二次特定部１３３ｂによって特定された複数の住所ＩＤなどをログ記憶部１２４に記憶させる。これにより、システム管理者等は、かかるログ情報に基づいて、処理対象住所データを、どの修正候補住所データに修正するかを即座に判断することができる。 On the other hand, when a plurality of address IDs are received from the secondary specifying unit 133b, the rewriting unit 134 performs the date and time of the address data rewriting process, the processing target address data, and the secondary specifying unit 133b without performing the data correction process. Are stored in the log storage unit 124. Thereby, the system administrator or the like can immediately determine to which correction candidate address data the processing target address data is to be corrected based on the log information.

なお、書換部１３４は、修正候補住所データを対象データ入出力部１１０へ出力することにより住所データを書き換えてもよい。例えば、処理対象住所データが他の情報処理装置が有する所定の記憶部に記憶されている場合、書換部１３４は、住所データを対象データ入出力部１１０に出力することにより、処理対象住所データを修正候補住所データに書き換えることが可能になる。また、対象データ入出力部１１０はネットワークを介して出力してもよい。 Note that the rewriting unit 134 may rewrite the address data by outputting the correction candidate address data to the target data input / output unit 110. For example, when the processing target address data is stored in a predetermined storage unit included in another information processing apparatus, the rewrite unit 134 outputs the processing target address data to the target data input / output unit 110 by outputting the address data. It becomes possible to rewrite the correction candidate address data. The target data input / output unit 110 may output via a network.

また、書換部１３４は、処理対象住所データと修正候補住所データとの差分が存在する構成情報をログ記憶部１２４に記憶させてもよい。かかる場合、書換部１３４は、構成情報記憶部１２３から、修正候補住所データの構成情報を取得する。そして、書換部１３４は、構成情報記憶部１２３から取得した構成情報と、処理対象住所データを形成する構成情報とを比較することにより、処理対象住所データと修正候補住所データとの差分が存在する構成情報を特定する。なお、書換部１３４は、修正候補住所データの構成情報と、処理対象住所データを形成する構成情報との論理積を演算することにより、双方の構成情報の差異を算出してもよい。 Further, the rewrite unit 134 may cause the log storage unit 124 to store configuration information in which a difference between the processing target address data and the correction candidate address data exists. In such a case, the rewrite unit 134 acquires the configuration information of the correction candidate address data from the configuration information storage unit 123. Then, the rewrite unit 134 compares the configuration information acquired from the configuration information storage unit 123 with the configuration information forming the processing target address data, so that there is a difference between the processing target address data and the correction candidate address data. Identify configuration information. Note that the rewrite unit 134 may calculate a difference between the configuration information of the correction candidate address data and the configuration information forming the processing target address data by calculating a logical product of the configuration information of the correction candidate address data.

また、書換部１３４は、処理対象住所データと、修正候補住所データとを比較して、双方の住所データが一致しない場合に住所データ書換処理を行ってもよい。これにより、書換部１３４は、双方の住所データが一致する場合に住所データ書換処理を行わないので、より低負荷かつ高速に住所データ書換処理を行うことができる。 Further, the rewriting unit 134 may compare the processing target address data with the correction candidate address data, and may perform the address data rewriting process when both the address data do not match. As a result, the rewriting unit 134 does not perform the address data rewriting process when both the address data match, so the address data rewriting process can be performed at a lower load and at a higher speed.

また、書換部１３４は、住所データ書換処理を行わずに、処理対象住所データと、修正候補住所データとをログ記憶部１２４に記憶させる処理を行ってもよい。かかる場合、システム管理者等は、ログ記憶部１２４に記憶されているログ情報に基づいて、住所データ書換処理を行うか否かを判断することができる。そして、システム管理者等は、住所データ書換処理を行うと判断した場合に、住所データ書換処理を行うことができる。 Further, the rewriting unit 134 may perform a process of storing the processing target address data and the correction candidate address data in the log storage unit 124 without performing the address data rewriting process. In such a case, the system administrator or the like can determine whether or not to perform the address data rewriting process based on the log information stored in the log storage unit 124. When the system administrator or the like determines that the address data rewriting process is to be performed, the system administrator can perform the address data rewriting process.

また、書換部１３４は、データ修正処理に関する各種ログ情報を所定の表示部に表示制御してもよい。例えば、利用者によって所定の入力フォーム（ブラウザなど）に住所を入力された場合に、対象データ入出力部１１０が入力された住所データを受け付けることにより、制御部１３０は、上述したデータ修正処理を行う。そして、書換部１３４は、利用者によって入力された住所データと修正候補住所データとが異なる場合に、修正候補住所データを表示部に表示制御する。これにより、利用者は、住所を入力する場合に、表記が統一されている住所をリアルタイムに確認することができる。また、データ処理装置１００は、所定の記憶部に記憶されている住所データに対して修正処理を行うことなく、表記が統一されている住所データを記憶部に記憶させることができる。 Further, the rewrite unit 134 may control display of various log information related to the data correction process on a predetermined display unit. For example, when the user inputs an address into a predetermined input form (such as a browser), the control unit 130 receives the input address data and the control unit 130 performs the data correction process described above. Do. Then, the rewriting unit 134 controls the display of the correction candidate address data on the display unit when the address data input by the user is different from the correction candidate address data. Thereby, the user can confirm the address where the notation is unified in real time when inputting the address. Further, the data processing apparatus 100 can store the address data with the same notation in the storage unit without performing the correction process on the address data stored in the predetermined storage unit.

次に、実施例１に係るデータ処理装置１００によるデータ修正処理の手順について説明する。図１０は、実施例１に係るデータ処理装置１００によるデータ修正処理手順を示すフローチャートである。なお、図１０に示したデータ修正処理手順は、所定の日時かつ所定の時間にバッチ処理として動作したり、データ処理装置１００が他の処理（例えば、Ｗｅｂアプリケーション起動処理）を実行している間にバックグラウンドで動作したりする。 Next, a procedure of data correction processing by the data processing apparatus 100 according to the first embodiment will be described. FIG. 10 is a flowchart illustrating the data correction processing procedure performed by the data processing apparatus 100 according to the first embodiment. Note that the data correction processing procedure shown in FIG. 10 operates as a batch process at a predetermined date and time and while the data processing apparatus 100 is executing another process (for example, a Web application activation process). Or run in the background.

また、図１０では、データ処理装置１００が、１個の処理対象住所データについてデータ修正処理を行う例について説明する。複数の処理対象住所データについてデータ修正処理を行う場合、データ処理装置１００は、図１０に示した処理手順を処理対象住所データの数だけ繰り返して行う。 FIG. 10 illustrates an example in which the data processing apparatus 100 performs data correction processing for one piece of processing target address data. When performing data correction processing for a plurality of processing target address data, the data processing apparatus 100 repeats the processing procedure shown in FIG. 10 by the number of processing target address data.

図１０に示すように、データ処理装置１００の対象データ入出力部１１０は、所定の記憶部から処理対象住所データを取得する（ステップＳ１０１）。なお、対象データ入出力部１１０は、ネットワークを介して他の情報処理装置から処理対象住所データを受信してもよい。続いて、受付部１３１は、対象データ入出力部１１０から処理対象住所データを受け付ける。 As shown in FIG. 10, the target data input / output unit 110 of the data processing apparatus 100 acquires processing target address data from a predetermined storage unit (step S101). The target data input / output unit 110 may receive processing target address data from another information processing apparatus via a network. Subsequently, the receiving unit 131 receives processing target address data from the target data input / output unit 110.

続いて、抽出部１３２ａは、住所要素記憶部１２２に記憶されている住所要素により形成されるオートマトンに対して、受付部１３１によって受け付けられた処理対象住所データを順次代入することによって文字列照合を行う（ステップＳ１０２）。そして、抽出部１３２ａは、住所要素記憶部１２２から、処理対象住所データに含まれる住所要素に対応付けて記憶されている住所ＩＤを抽出する（ステップＳ１０３）。 Subsequently, the extraction unit 132a performs character string matching by sequentially substituting the processing target address data received by the reception unit 131 for the automaton formed by the address elements stored in the address element storage unit 122. Perform (step S102). Then, the extraction unit 132a extracts the address ID stored in association with the address element included in the processing target address data from the address element storage unit 122 (step S103).

続いて、一次特定部１３２ｂは、抽出部１３２ａによって抽出された住所ＩＤの集合の論理積を演算することにより住所ＩＤを特定する（ステップＳ１０４）。一次特定部１３２ｂによって特定された住所ＩＤが１個である場合（ステップＳ１０５肯定）、書換部１３４は、住所記憶部１２１から、かかる住所ＩＤに対応付けて記憶されている正規住所データを取得する（ステップＳ１０６）。続いて、書換部１３４は、所定の記憶部に記憶されている処理対象住所データを、住所記憶部１２１から取得した正規住所データ（修正候補住所データ）に書き換える（ステップＳ１０７）。 Subsequently, the primary specifying unit 132b specifies the address ID by calculating the logical product of the set of address IDs extracted by the extracting unit 132a (step S104). When the address ID specified by the primary specifying unit 132b is one (Yes at Step S105), the rewrite unit 134 acquires the regular address data stored in association with the address ID from the address storage unit 121. (Step S106). Subsequently, the rewriting unit 134 rewrites the processing target address data stored in the predetermined storage unit to the regular address data (correction candidate address data) acquired from the address storage unit 121 (step S107).

また、書換部１３４は、住所データ書換処理に関する各種ログ情報をログ記憶部１２４に記憶させる（ステップＳ１０８）。なお、データ処理装置１００が、他の情報処理装置からネットワークを介して処理対象住所データを受信した場合、書換部１３４は、ネットワークを介して、処理対象住所データを上記ステップＳ１０６の処理手順において取得された正規住所データに書き換える。 In addition, the rewriting unit 134 stores various log information related to the address data rewriting process in the log storage unit 124 (step S108). When the data processing apparatus 100 receives the processing target address data from another information processing apparatus via the network, the rewrite unit 134 acquires the processing target address data via the network in the processing procedure of step S106. Rewritten to the registered regular address data.

一方、一次特定部１３２ｂによって特定された住所ＩＤが１個でない場合（ステップＳ１０５否定）、分割部１３３ａは、処理対象住所データを所定の文字数ごとに分割する（ステップＳ１０９）。続いて、二次特定部１３３ｂは、住所記憶部１２１から、一次特定部１３２ｂによって特定された複数の住所ＩＤに対応付けて記憶されている正規住所データを住所ＩＤごとに取得する（ステップＳ１１０）。 On the other hand, when the address ID specified by the primary specifying unit 132b is not one (No at Step S105), the dividing unit 133a divides the processing target address data every predetermined number of characters (Step S109). Subsequently, the secondary specifying unit 133b acquires, for each address ID, the regular address data stored in association with the plurality of address IDs specified by the primary specifying unit 132b from the address storage unit 121 (Step S110). .

続いて、二次特定部１３３ｂは、分割部１３３ａによって分割された文字列により形成されるオートマトンに対して、上記ステップＳ１１０において取得された正規住所データを順次代入することによって文字列照合を行う（ステップＳ１１１）。続いて、二次特定部１３３ｂは、上記ステップＳ１１０において取得された正規住所データのうち、分割部１３３ａによって分割された文字列と一致する文字列の数が最も多い正規住所データに対応する住所ＩＤを特定する（ステップＳ１１２）。 Subsequently, the secondary specifying unit 133b performs character string collation by sequentially substituting the regular address data acquired in step S110 with respect to the automaton formed by the character string divided by the dividing unit 133a ( Step S111). Subsequently, the secondary identification unit 133b includes the address ID corresponding to the regular address data having the largest number of character strings that match the character strings divided by the dividing unit 133a among the regular address data acquired in step S110. Is specified (step S112).

続いて、書換部１３４は、二次特定部１３３ｂによって特定された住所ＩＤが１個である場合（ステップＳ１１３肯定）、所定の記憶部に記憶されている処理対象住所データを、二次特定部１３３ｂによって特定された正規住所データに書き換える（ステップＳ１１４）。また、書換部１３４は、住所データ書換処理に関する各種ログ情報をログ記憶部１２４に記憶させる（ステップＳ１０８）。 Subsequently, when the address ID specified by the secondary specifying unit 133b is one (Yes at Step S113), the rewriting unit 134 converts the processing target address data stored in the predetermined storage unit to the secondary specifying unit. The regular address data specified by 133b is rewritten (step S114). In addition, the rewriting unit 134 stores various log information related to the address data rewriting process in the log storage unit 124 (step S108).

一方、二次特定部１３３ｂによって特定された住所ＩＤが１個でない場合（ステップＳ１１３否定）、書換部１３４は、住所データ書換処理を実行した日時や、処理対象住所データ、二次特定部１３３ｂによって特定された複数の正規住所データなどをログ記憶部１２４に記憶させる（ステップＳ１１５）。 On the other hand, when the address ID specified by the secondary specifying unit 133b is not one (No at Step S113), the rewriting unit 134 uses the date and time when the address data rewriting process is executed, the processing target address data, the secondary specifying unit 133b. A plurality of specified regular address data and the like are stored in the log storage unit 124 (step S115).

次に、図１１〜図１３を用いて、データ処理装置１００によるデータ修正処理の一例について説明する。図１１〜図１３は、実施例１に係るデータ処理装置１００によるデータ修正処理の一例を説明するための図である。なお、以下に説明する３つの例において、住所記憶部１２１、住所要素記憶部１２２、構成情報記憶部１２３に記憶されている各種情報は、それぞれ図３、図４、図５に示した状態であるものとする。 Next, an example of data correction processing by the data processing apparatus 100 will be described with reference to FIGS. FIGS. 11 to 13 are diagrams for explaining an example of data correction processing by the data processing apparatus 100 according to the first embodiment. In the three examples described below, various types of information stored in the address storage unit 121, the address element storage unit 122, and the configuration information storage unit 123 are as shown in FIGS. 3, 4, and 5, respectively. It shall be.

まず、図１１を用いて、処理対象住所データが「神奈川県川崎市下小田中１丁目」である場合におけるデータ処理装置１００によるデータ修正処理について説明する。なお、この処理対象住所データ「神奈川県川崎市下小田中１丁目」を修正するための修正候補住所データは、「神奈川県川崎市中原区下小田中１丁目」である。すなわち、処理対象住所データ「神奈川県川崎市下小田中１丁目」は、表記が統一されている修正候補住所データと比較して「中原区」が抜けている。 First, the data correction processing by the data processing apparatus 100 when the processing target address data is “1st Shimo-Odanaka 1-chome, Kawasaki City, Kanagawa Prefecture” will be described with reference to FIG. The correction candidate address data for correcting this processing target address data “1st Shimoodanaka, Kawasaki City, Kanagawa” is “1st Shimoodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture”. That is, “Nakahara-ku” is missing in the address data to be processed “1 Shimo-Odanaka 1-chome, Kawasaki-shi, Kanagawa” as compared with the correction candidate address data whose notation is unified.

かかる場合、データ処理装置１００の抽出部１３２ａは、図１１に示すように、オートマトンを用いた文字列照合処理の結果、住所要素記憶部１２２から、住所ＩＤ「１−４９９」、「１−１００」、「５−９」および「１：５：１０１：・・・：５００：・・・」を抽出する。 In this case, the extraction unit 132a of the data processing device 100 receives the address IDs “1-499” and “1-100” from the address element storage unit 122 as a result of the character string matching process using the automaton, as shown in FIG. ”,“ 5-9 ”and“ 1: 5: 101:...: 500:.

続いて、一次特定部１３２ｂは、抽出された住所ＩＤの論理積を演算することにより、抽出された住所ＩＤの中から、住所ＩＤ「５」を特定する。続いて、書換部１３４は、一次特定部１３２ｂによって住所ＩＤが１個に特定されたため、住所記憶部１２１から、住所ＩＤ「５」に対応付けて記憶されている住所データ「神奈川県川崎市中原区下小田中１丁目」を取得する。そして、書換部１３４は、処理対象住所データ「神奈川県川崎市下小田中１丁目」を「神奈川県川崎市中原区下小田中１丁目」へ書き換える。 Subsequently, the primary specifying unit 132b specifies an address ID “5” from the extracted address IDs by calculating a logical product of the extracted address IDs. Subsequently, the rewriting unit 134 specifies the address data “Nakahara, Kawasaki-shi, Kanagawa, from the address storage unit 121 because the primary identification unit 132b has identified one address ID. Acquired “Kita Shimo-Odanaka 1-chome”. Then, the rewriting unit 134 rewrites the processing target address data “1st Shimoodanaka, Kawasaki City, Kanagawa” to “1st Shimoodanaka, Nakahara-ku, Kawasaki City, Kanagawa”.

また、書換部１３４は、構成情報記憶部１２３から、住所ＩＤ「５」に対応付けて記憶されている構成情報「都道府県｜市区郡｜政令区町村｜町名字｜丁目字」を取得する。そして、書換部１３４は、取得した構成情報と、処理対象住所データを形成する構成情報とが一致するか否かを判定する。ここでは、書換部１３４は、構成情報「政令区町村」が不一致であると判定する。 Further, the rewriting unit 134 acquires the configuration information “prefecture | city / city / county / government / town / town name | chome character” stored in association with the address ID “5” from the configuration information storage unit 123. . Then, the rewriting unit 134 determines whether or not the acquired configuration information matches the configuration information that forms the processing target address data. Here, the rewriting unit 134 determines that the configuration information “political ordinance-ward town and village” does not match.

そして、書換部１３４は、ログ記憶部１２４にログ情報を記憶させる。このとき、書換部１３４は、例えば、構成情報「政令区町村」を書き換えた旨、または、構成情報「政令区町村」を補完した旨のログ情報をログ記憶部１２４に記憶させる。また、例えば、書換部１３４は、処理対象住所データ「神奈川県川崎市下小田中１丁目」と、修正候補住所データ「神奈川県川崎市中原区下小田中１丁目」との差分をログ記憶部１２４に記憶させる。 Then, the rewrite unit 134 stores log information in the log storage unit 124. At this time, the rewriting unit 134 causes the log storage unit 124 to store, for example, log information indicating that the configuration information “Government-designated ward / town / village” has been rewritten or that the configuration information “political-designated ward / village” has been supplemented. Further, for example, the rewriting unit 134 stores the difference between the processing target address data “1st Shimoodanaka, Kawasaki City, Kanagawa” and the correction candidate address data “1st Shimododa, Nakahara-ku, Kawasaki City, Kanagawa Prefecture” in the log storage unit 124. Remember me.

このように、データ処理装置１００は、処理対象住所データに構成情報（図１１の例では、政令区町村）が抜けている場合であっても、１回の文字列照合処理と、１回の演算処理によって、修正候補住所データを特定することができる。すなわち、データ処理装置１００は、処理対象住所データに構成情報が抜けている場合であっても、低負荷かつ高速にデータ修正処理を行うことができる。 As described above, the data processing apparatus 100 can perform one character string matching process and one time even when the configuration information (in the example of FIG. Correction candidate address data can be specified by the arithmetic processing. That is, the data processing apparatus 100 can perform the data correction process at a low load and at a high speed even when the configuration information is missing from the processing target address data.

次に、図１２を用いて、処理対象住所データが「神奈川県横川崎市中原区下小田中１丁目」である場合におけるデータ処理装置１００によるデータ修正処理について説明する。なお、この処理対象住所データ「神奈川県横川崎市中原区下小田中１丁目」を修正するための修正候補住所データは、「神奈川県川崎市中原区下小田中１丁目」である。すなわち、処理対象住所データ「神奈川県横川崎市中原区下小田中１丁目」は、表記が統一されている住所データと比較して、「川崎市」の１文字前の「横」が余分に付与されている。 Next, the data correction processing by the data processing device 100 when the processing target address data is “1 Shimoodanaka, Nakahara-ku, Yokokawasaki-shi, Kanagawa” will be described with reference to FIG. The correction candidate address data for correcting the processing target address data “1st Shimoodanaka, Nakahara-ku, Yokokawasaki-shi, Kanagawa” is “1chome Shimododa, Nakahara-ku, Kawasaki-shi, Kanagawa-ken”. In other words, the address data to be processed “1-chome, Shimo-Odanaka, Nakahara-ku, Yokokawasaki-shi, Kanagawa” is given an extra “horizontal” one character before “Kawasaki-shi” compared to the address data with the same notation. Has been.

かかる場合、データ処理装置１００の抽出部１３２ａは、オートマトンを用いた文字列照合処理の結果、住所要素記憶部１２２から、住所ＩＤ「１−４９９」、「８５１−８５５」、「１−１００」、「１−２０」、「５−９」および「１：５：１０１：・・・：５００：・・・」を抽出する。なお、住所ＩＤ「８５１−８５５」は、町名字「横川」を含む住所データの住所ＩＤである。 In such a case, the extraction unit 132a of the data processing apparatus 100 receives the address IDs “1-499”, “851-855”, “1-100” from the address element storage unit 122 as a result of the character string matching process using the automaton. , “1-20”, “5-9” and “1: 5: 101:...: 500:. The address ID “851-855” is the address ID of the address data including the town name character “Yokogawa”.

続いて、一次特定部１３２ｂは、抽出された住所ＩＤの論理積を演算することにより、抽出された住所ＩＤの中から、住所ＩＤ「５」を特定する。続いて、書換部１３４は、住所記憶部１２１から、住所ＩＤ「５」に対応付けて記憶されている住所データ「神奈川県川崎市中原区下小田中１丁目」を取得する。そして、書換部１３４は、処理対象住所データ「神奈川県横川崎市中原区下小田中１丁目」を「神奈川県川崎市中原区下小田中１丁目」へ書き換える。 Subsequently, the primary specifying unit 132b specifies an address ID “5” from the extracted address IDs by calculating a logical product of the extracted address IDs. Subsequently, the rewriting unit 134 acquires from the address storage unit 121 address data “1-chome Shimoodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa” stored in association with the address ID “5”. Then, the rewriting unit 134 rewrites the processing target address data “1-chome Shimo-Odanaka, Nakahara-ku, Yokokawasaki-shi, Kanagawa” to “1-chome Shimo-Odanaka, Nakahara-ku, Kawasaki-shi, Kanagawa”.

また、書換部１３４は、構成情報記憶部１２３から、住所ＩＤ「５」に対応付けて記憶されている構成情報「都道府県｜市区郡｜政令区町村｜町名字｜丁目字」を取得する。そして、書換部１３４は、取得した構成情報と、処理対象住所データを形成する構成情報とが一致するか否かを判定する。ここでは、書換部１３４は、構成情報記憶部１２３から取得した構成情報と、処理対象住所データを形成する構成情報とが一致すると判定する。 Further, the rewriting unit 134 acquires the configuration information “prefecture | city / city / county / government / town / town name | chome character” stored in association with the address ID “5” from the configuration information storage unit 123. . Then, the rewriting unit 134 determines whether or not the acquired configuration information matches the configuration information that forms the processing target address data. Here, the rewrite unit 134 determines that the configuration information acquired from the configuration information storage unit 123 matches the configuration information forming the processing target address data.

そして、書換部１３４は、ログ記憶部１２４にログ情報を記憶させる。このとき、書換部１３４は、例えば、処理対象住所データ「神奈川県横川崎市中原区下小田中１丁目」と、修正候補住所データ「神奈川県川崎市中原区下小田中１丁目」との差分をログ記憶部１２４に記憶させる。また、書換部１３４は、かかる差分が存在する構成情報を特定して、特定した構成情報をログ記憶部１２４に記憶させる。 Then, the rewrite unit 134 stores log information in the log storage unit 124. At this time, the rewriting unit 134 logs, for example, the difference between the processing target address data “1-chome Shimo-Odanaka, Nakahara-ku, Yokokawasaki-shi, Kanagawa” and correction candidate address data “1-chome Shimo-Odanaka, Nakahara-ku, Kawasaki-shi, Kanagawa” The data is stored in the storage unit 124. In addition, the rewrite unit 134 identifies configuration information in which such a difference exists, and stores the identified configuration information in the log storage unit 124.

このように、データ処理装置１００は、処理対象住所データに余分な文字列が付与されている場合であっても、１回の文字列照合処理と、１回の演算処理によって、修正候補住所データを特定することができる。すなわち、データ処理装置１００は、処理対象住所データに余分な文字列が付与されている場合であっても、低負荷かつ高速にデータ修正処理を行うことができる。 As described above, the data processing apparatus 100 can correct the correction candidate address data by performing one character string matching process and one operation process even when an extra character string is added to the processing target address data. Can be specified. That is, the data processing apparatus 100 can perform the data correction process with low load and high speed even when an extra character string is added to the processing target address data.

次に、図１３を用いて、処理対象住所データが「神奈川県川崎市中原区小田中１丁目」である場合におけるデータ処理装置１００によるデータ修正処理について説明する。なお、この処理対象住所データ「神奈川県川崎市中原区小田中１丁目」を修正するための修正候補住所データは、「神奈川県川崎市中原区下小田中１丁目」であるものとする。すなわち、処理対象住所データ「神奈川県川崎市中原区小田中１丁目」は、表記が統一されている住所データと比較して、「小田中」の１文字前の「下」が抜けている。 Next, the data correction processing by the data processing apparatus 100 when the processing target address data is “1 Odanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture” will be described with reference to FIG. It is assumed that the candidate address data for correcting the processing target address data “1 Odanaka, Nakahara-ku, Kawasaki-shi, Kanagawa” is “1 Ochinaka Shimo-oda, Nakahara-ku, Kawasaki-shi, Kanagawa”. That is, the address data to be processed “1 Odanaka 1-chome, Nakahara-ku, Kawasaki-shi, Kanagawa” is missing “lower” one character before “Odanaka” compared to the address data whose notation is unified.

かかる場合、データ処理装置１００の抽出部１３２ａは、オートマトンを用いた文字列照合処理の結果、住所要素記憶部１２２から、住所ＩＤ「１−４９９」、「１−１００」、「１−２０」および「１：５：１０１：・・・：５００：・・・」を抽出する。 In such a case, the extraction unit 132a of the data processing apparatus 100 obtains the address IDs “1-499”, “1-100”, “1-20” from the address element storage unit 122 as a result of the character string matching process using the automaton. And “1: 5: 101:...: 500:...” Are extracted.

続いて、一次特定部１３２ｂは、抽出された住所ＩＤの論理積を演算することにより、抽出された住所ＩＤの中から、住所ＩＤ「１」および「５」を特定する。これは、抽出部１３２ａによって抽出された住所ＩＤの中で、最も多く含まれている住所ＩＤが「１」と「５」であるからである。 Subsequently, the primary specifying unit 132b specifies address IDs “1” and “5” from the extracted address IDs by calculating a logical product of the extracted address IDs. This is because the most frequently included address IDs among the address IDs extracted by the extraction unit 132a are “1” and “5”.

続いて、分割部１３３ａは、処理対象住所データが「神奈川県川崎市中原区小田中１丁目」を１文字ずつ重複させながら２文字単位に分割する。具体的には、分割部１３３ａは、処理対象住所データを「神奈」「奈川」「川県」「県川」「川崎」「崎市」「市中」「中原」「原区」「区小」「小田」「田中」「中１」「１丁」「丁目」に分割する。 Subsequently, the dividing unit 133a divides the processing target address data into units of two characters while overlapping each character “Odanaka 1-chome, Nakahara-ku, Kawasaki City, Kanagawa Prefecture” character by character. Specifically, the dividing unit 133a converts the address data to be processed into “Kan”, “Nagawa”, “Kawa Prefecture”, “Kenagawa”, “Kawasaki”, “Saki City”, “Middle City”, “Nakahara”, “Hara Ward”, “ “Oda” “Tanaka” “Middle 1” “1 Chome” “Chome”.

続いて、二次特定部１３３ｂは、住所記憶部１２１から、住所ＩＤ「１」に対応付けて記憶されている住所データ「神奈川県川崎市中原区上小田中１丁目」と、住所ＩＤ「５」に対応付けて記憶されている住所データ「神奈川県川崎市中原区下小田中１丁目」とを取得する。続いて、二次特定部１３３ｂは、分割部１３３ａによって分割された文字列により形成されるオートマトンを用いて、住所記憶部１２１から取得した住所データと、分割された文字列との文字列照合を行う。 Subsequently, the secondary specifying unit 133b stores the address data “Kamiodachi 1-chome, Nakahara-ku, Kawasaki-shi, Kanagawa” and the address ID “5” stored in the address storage unit 121 in association with the address ID “1”. The address data “1-chome Shimo-Odanaka, Nakahara-ku, Kawasaki-shi, Kanagawa”, which is stored in association with, is acquired. Subsequently, the secondary specifying unit 133b uses the automaton formed by the character strings divided by the dividing unit 133a to perform character string matching between the address data acquired from the address storage unit 121 and the divided character strings. Do.

ここでは、図１３に示すように、二次特定部１３３ｂは、分割部１３３ａによって分割された文字列のうち、１４個の文字列が、住所ＩＤ「１」の住所データ「神奈川県川崎市中原区上小田中１丁目」に含まれる文字列と一致することを検出する。同様に、二次特定部１３３ｂは、分割部１３３ａによって分割された文字列のうち、１４個の文字列が、住所ＩＤ「５」の住所データ「神奈川県川崎市中原区下小田中１丁目」に含まれる文字列と一致することを検出する。 Here, as shown in FIG. 13, the secondary specifying unit 133b includes 14 character strings out of the character strings divided by the dividing unit 133a, and the address data “Nakahara, Kawasaki City, Kanagawa Prefecture”. It is detected that it matches the character string included in “Kamikami Odanaka 1-chome”. Similarly, the secondary identifying unit 133b includes 14 character strings out of the character strings divided by the dividing unit 133a in the address data “Shomoodanaka 1-chome, Nakahara-ku, Kawasaki-shi, Kanagawa”. Detects that it matches the contained string.

すなわち、図１３に示した例では、住所ＩＤ「１」である住所データと、住所ＩＤ「５」である住所データとは、分割部１３３ａによって分割された文字列と一致する文字列の数が等しい。したがって、書換部１３４は、処理対象住所データを書き換えることなく、住所データ書換処理を実行した日時や、処理対象住所データ、住所ＩＤ「１」である住所データ、住所ＩＤ「５」である住所データなどをログ記憶部１２４に記憶させる。 That is, in the example illustrated in FIG. 13, the address data with the address ID “1” and the address data with the address ID “5” have the number of character strings that match the character strings divided by the dividing unit 133 a. equal. Therefore, the rewriting unit 134 does not rewrite the processing target address data, the date and time when the address data rewriting process is executed, the processing target address data, the address data with the address ID “1”, and the address data with the address ID “5”. And the like are stored in the log storage unit 124.

このように、データ処理装置１００は、修正候補住所データを一意に特定できなかった場合であっても、特定した複数の修正候補住所データをログ記憶部１２４に記憶させる。これにより、システム管理者等は、かかるログ情報に基づいて、処理対象住所データを、どの修正候補住所データに書き換えるかを判断する候補を得ることができる。 As described above, the data processing apparatus 100 causes the log storage unit 124 to store the plurality of specified correction candidate address data even when the correction candidate address data cannot be uniquely specified. As a result, the system administrator or the like can obtain candidates for determining which correction candidate address data to rewrite the processing target address data based on the log information.

なお、データ処理装置１００は、処理対象住所データの市区郡や政令区町村などが入れ代わっている場合、構成情報記憶部１２３に記憶されている各種情報に基づいて、市区郡や政令区町村などの入れ代わりを検出することができる。処理対象住所データが「神奈川県中原区川崎市下小田中１丁目」である場合を例に挙げて説明する。かかる場合、データ処理装置１００は、処理対象住所データを修正するための修正候補住所データの住所ＩＤを「５」に特定する。 In addition, the data processing apparatus 100 is based on various information memorize | stored in the structure information memory | storage part 123, when the municipality of a processing target address data, a government-designated municipality, etc. are replaced. It is possible to detect replacement of towns and villages. The case where the processing target address data is “1-chome Shimoodanaka, Kawasaki City, Nakahara-ku, Kanagawa Prefecture” will be described as an example. In such a case, the data processing device 100 specifies “5” as the address ID of the correction candidate address data for correcting the processing target address data.

このとき、データ処理装置１００は、構成情報記憶部１２３から、住所ＩＤ「５」に対応付けて記憶されている構成情報「都道府県｜市区郡｜政令区町村｜町名字｜丁目字」を取得する。そして、データ処理装置１００は、取得した構成情報と、処理対象住所データの構成情報「都道府県｜政令区町村｜市区郡｜町名字｜丁目字」とを比較する。これにより、データ処理装置１００は、処理対象住所データの市区郡と政令区町村とが入れ代わっていることを検出することができる。データ処理装置１００は、このような市区郡や政令区町村などの入れ代わり情報をログ記憶部１２４に記憶させてもよい。 At this time, the data processing apparatus 100 obtains the configuration information “prefecture | city / city / county / government / town / city name / chome character” stored in association with the address ID “5” from the configuration information storage unit 123. get. Then, the data processing apparatus 100 compares the acquired configuration information with the configuration information “prefecture | decrement | decorative town / city | city / city | town name | chome character” of the address data to be processed. Thereby, the data processing apparatus 100 can detect that the municipality and the government-designated municipality of the processing target address data are interchanged. The data processing apparatus 100 may store the replacement information of such municipalities and government-designated municipalities in the log storage unit 124.

上述してきたように、実施例１に係るデータ処理装置１００は、住所要素記憶部１２２に記憶されている住所要素によって形成されるオートマトンを用いて、かかる住所要素と、処理対象住所データに含まれる住所要素との文字列照合を行う。そして、データ処理装置１００は、住所要素記憶部１２２から取得した住所ＩＤの論理積を演算することにより住所ＩＤを特定する。これにより、データ処理装置１００は、１回の文字列照合処理と、１回の演算処理を行うことによって、修正候補住所データを特定することができる。その結果、データ処理装置１００は、低負荷かつ高速に修正候補住所データを特定することができ、低負荷かつ高速にデータ修正処理を行うことができる。 As described above, the data processing apparatus 100 according to the first embodiment is included in the address element and the processing target address data using the automaton formed by the address elements stored in the address element storage unit 122. Perform string matching with address elements. Then, the data processing device 100 specifies the address ID by calculating the logical product of the address IDs acquired from the address element storage unit 122. Thereby, the data processing apparatus 100 can specify the correction candidate address data by performing one character string matching process and one calculation process. As a result, the data processing apparatus 100 can specify the correction candidate address data with low load and high speed, and can perform data correction processing with low load and high speed.

また、データ処理装置１００は、論理積演算の結果、複数の住所ＩＤを特定した場合に、Ｎグラム法を用いて、修正対象住所データを特定する。これにより、データ処理装置１００は、２回の文字列照合処理と、１回の演算処理を行うことによって、修正対象住所データを特定することができる。その結果、データ処理装置１００は、低負荷かつ高速に修正対象住所データを特定することができ、低負荷かつ高速にデータ修正処理を行うことができる。 In addition, when a plurality of address IDs are specified as a result of the logical product operation, the data processing apparatus 100 specifies the correction target address data using the N-gram method. Thereby, the data processing device 100 can specify the correction target address data by performing the character string matching process twice and the calculation process once. As a result, the data processing apparatus 100 can specify the correction target address data with low load and high speed, and can perform data correction processing with low load and high speed.

また、データ処理装置１００は、Ｎグラム法を用いた文字列照合処理の結果、複数の住所データを特定した場合に、特定された複数の住所データをログ記憶部１２４に記憶させる。これにより、システム管理者等は、かかるログ情報に基づいて、処理対象住所データを、どの住所データに書き換えるか判断する候補を得ることができる。 In addition, when a plurality of address data is specified as a result of the character string matching process using the N-gram method, the data processing apparatus 100 stores the specified plurality of address data in the log storage unit 124. Thereby, the system administrator or the like can obtain candidates for determining which address data to rewrite the processing target address data based on the log information.

なお、上記実施例１では、住所要素記憶部１２２が、表記が統一されている住所要素（住所記憶部１２１に記憶されている住所データに含まれる住所要素）を記憶する例を示したが、住所要素記憶部１２２は、表記が統一されていない住所要素を記憶してもよい。例えば、住所要素記憶部１２２に表記が不統一になり易い住所要素を記憶させることにより、データ処理装置１００は、表記が不統一になり易い住所要素を含む住所データを、表記が統一されている住所データへ容易に修正することができる。 In the first embodiment, the example in which the address element storage unit 122 stores the address elements whose notation is unified (address elements included in the address data stored in the address storage unit 121) is shown. The address element storage unit 122 may store address elements whose notations are not unified. For example, the address element storage unit 122 stores address elements whose notation is likely to be inconsistent, so that the data processing apparatus 100 has the notation of address data including address elements whose notation is likely to be inconsistent. It can be easily corrected to address data.

以下に、図１４および図１５を用いて具体的に説明する。図１４は、表記が統一されていない住所要素を記憶する住所要素記憶部１２２の一例を示す図である。図１４に示した例において、タイプに表記した「＋」は、表記が統一されていない住所要素であることを示す。このようなタイプに「＋」が表記されているレコードが追加されるケースは、例えば、表記が不統一になり易い住所要素が存在する場合である。 This will be specifically described below with reference to FIGS. 14 and 15. FIG. 14 is a diagram illustrating an example of an address element storage unit 122 that stores address elements whose notation is not unified. In the example shown in FIG. 14, “+” written in the type indicates an address element whose notation is not unified. A case where a record in which “+” is written in such a type is added is, for example, a case where there is an address element in which the notation is likely to be inconsistent.

具体的には、図１４に示した住所要素記憶部１２２では、表記が統一されている住所要素「麹町」（図１４中の１１行目）の表記が不統一になり易い文字列として、「麹町」の異体字（図１４中の１２行目）が追加されている。同様に、図１４に示した住所要素記憶部１２２では、「麹町」を平仮名で表記した「こうじ町」（図１４中の１３行目）が追加されている。 Specifically, in the address element storage unit 122 shown in FIG. 14, as a character string in which the notation of the address element “Kashiwamachi” (11th line in FIG. 14) whose notation is unified is likely to be inconsistent. A variant of “Kashiwamachi” (12th line in FIG. 14) is added. Similarly, in the address element storage unit 122 shown in FIG. 14, “Kojimachi” (13th line in FIG. 14) in which “Kashiwacho” is written in hiragana is added.

続いて、表記が統一されていない住所要素を記憶する住所要素記憶部１２２を用いたデータ処理装置１００によるデータ修正処理の一例について説明する。図１５は、表記が統一されていない住所要素を記憶する住所要素記憶部１２２を用いた場合におけるデータ修正処理の一例を説明するための図である。 Next, an example of data correction processing by the data processing apparatus 100 using the address element storage unit 122 that stores address elements whose notation is not unified will be described. FIG. 15 is a diagram for explaining an example of the data correction process when the address element storage unit 122 that stores address elements whose notation is not unified is used.

図１５に示した例において、処理対象住所データは、「東京都千代田区こうじ町１丁目」であるものとする。なお、この処理対象住所データ「東京都千代田区こうじ町１丁目」を修正するための修正候補住所データは、「東京都千代田区麹町１丁目」である。すなわち、処理対象住所データ「東京都千代田区こうじ町１丁目」は、表記が統一されている住所データと比較して「麹」の文字列が平仮名で表記されている。 In the example shown in FIG. 15, it is assumed that the processing target address data is “Kojicho 1-chome, Chiyoda-ku, Tokyo”. The correction candidate address data for correcting this processing target address data “Kojicho 1-chome, Chiyoda-ku, Tokyo” is “1-chome Kojimachi, Chiyoda-ku, Tokyo”. That is, in the address data to be processed “Kojicho 1-chome, Chiyoda-ku, Tokyo”, the character string “麹” is written in Hiragana compared to the address data whose notation is unified.

かかる場合、データ処理装置１００の抽出部１３２ａは、図１５に示すように、オートマトンを用いた文字列照合処理の結果、住所要素記憶部１２２から、住所ＩＤ「５００−１５００」、「５００−５９９」、「５００−５１０」および「１：５：１０１：・・・：５００：・・・」を抽出する。 In this case, the extraction unit 132a of the data processing apparatus 100 receives the address IDs “500-1500” and “500-599” from the address element storage unit 122 as a result of the character string matching process using the automaton, as shown in FIG. ”,“ 500-510 ”and“ 1: 5: 101:...: 500:.

続いて、一次特定部１３２ｂは、抽出された住所ＩＤの論理積を演算することにより、抽出された住所ＩＤの中から、住所ＩＤ「５００」を特定する。続いて、書換部１３４は、住所記憶部１２１から、住所ＩＤ「５００」に対応付けて記憶されている住所データ「東京都千代田区麹町１丁目」を取得する。そして、書換部１３４は、処理対象住所データ「東京都千代田区こうじ町１丁目」を「東京都千代田区麹町１丁目」へ書き換える。 Subsequently, the primary specifying unit 132b specifies the address ID “500” from the extracted address IDs by calculating a logical product of the extracted address IDs. Subsequently, the rewriting unit 134 acquires the address data “1-chome, Sakaimachi, Chiyoda-ku, Tokyo” stored in association with the address ID “500” from the address storage unit 121. Then, the rewriting unit 134 rewrites the processing target address data “Kojimachi 1-chome, Chiyoda-ku, Tokyo” to “1 Chome-cho, Chiyoda-ku, Tokyo”.

そして、書換部１３４は、ログ記憶部１２４にログ情報を記憶させる。このとき、書換部１３４は、住所要素記憶部１２２から取得した住所要素「こうじ町」に対応するタイプに「＋」が含まれているので、構成情報「政令区町村」を書き換えた旨のログ情報をログ記憶部１２４に記憶させる。 Then, the rewrite unit 134 stores log information in the log storage unit 124. At this time, since the rewrite unit 134 includes “+” in the type corresponding to the address element “Kojimachi” acquired from the address element storage unit 122, the log indicating that the configuration information “Government-designated city” has been rewritten. Information is stored in the log storage unit 124.

このように、住所要素記憶部１２２に表記が統一されていない住所要素を記憶しておくことにより、データ処理装置１００は、表記が不統一になり易い住所要素を含む住所データを、表記が統一されている住所データへ容易に修正することができる。 As described above, by storing address elements whose notation is not unified in the address element storage unit 122, the data processing apparatus 100 unifies the address data including address elements whose notation is likely to be ununiform. It can be easily corrected to the address data.

ところで、上記実施例１では、データ処理装置１００の一次特定部１３２ｂが、抽出部１３２ａによって抽出された住所ＩＤの集合の論理積を演算することにより住所ＩＤを特定する例を示した。言い換えれば、上記実施例１では、一次特定部１３２ｂが、論理積を演算することにより、抽出部１３２ａによって抽出された住所ＩＤの集合に最も多く含まれる住所ＩＤを特定する例を示した。 Incidentally, in the first embodiment, the example in which the primary identification unit 132b of the data processing apparatus 100 identifies the address ID by calculating the logical product of the set of address IDs extracted by the extraction unit 132a has been described. In other words, in the first embodiment, the example in which the primary specifying unit 132b specifies the address ID most frequently included in the set of address IDs extracted by the extracting unit 132a by calculating a logical product.

しかし、データ処理装置は、住所要素記憶部に記憶されている住所要素に重要度を付与しておいてもよい。そして、データ処理装置は、処理対象住所データに重要度の高い住所要素が含まれている場合に、かかる住所要素が含まれている住所データを修正候補住所データに特定するようにしてもよい。また、データ処理装置は、重要度の低い住所要素については、論理積演算から除外するようにしてもよい。そこで、実施例２では、住所要素記憶部に記憶されている住所要素に重要度を付与する例について説明する。 However, the data processing device may give importance to the address elements stored in the address element storage unit. Then, when the processing target address data includes a highly important address element, the data processing apparatus may specify the address data including the address element as the correction candidate address data. Further, the data processing apparatus may exclude address elements with low importance from the logical product operation. Therefore, in the second embodiment, an example in which importance is given to an address element stored in the address element storage unit will be described.

まず、実施例２に係るデータ処理装置２００の構成について説明する。図１６は、実施例２に係るデータ処理装置２００の構成を示す図である。図１６に示すように、データ処理装置２００は、図２に示したデータ処理装置１００と比較して、住所要素記憶部１２２の代わりに住所要素記憶部２２２を有し、一次特定部１３２ｂの代わりに一次特定部２３２ｂを有する。 First, the configuration of the data processing apparatus 200 according to the second embodiment will be described. FIG. 16 is a diagram illustrating the configuration of the data processing device 200 according to the second embodiment. As illustrated in FIG. 16, the data processing device 200 includes an address element storage unit 222 instead of the address element storage unit 122 and replaces the primary identification unit 132 b as compared with the data processing device 100 illustrated in FIG. 2. Has a primary identification unit 232b.

住所要素記憶部２２２の一例を図１７に示す。図１７に示すように、住所要素記憶部２２２は、図４に示した住所要素記憶部１２２と比較して、項目「重要度」を新たに有する。項目「重要度」は、対応する「住所要素」に記憶されている情報の重要度を記憶する。 An example of the address element storage unit 222 is shown in FIG. As illustrated in FIG. 17, the address element storage unit 222 has a new item “importance” as compared to the address element storage unit 122 illustrated in FIG. 4. The item “importance” stores the importance of information stored in the corresponding “address element”.

図１７に示した例において、住所要素記憶部２２２は、タイプが「都道府県」である住所要素「神奈川県」および「東京都」の重要度に「０」を記憶する。ここで、重要度「０」は、重要度が低いことを示すものとする。そして、重要度「０」は、後述する一次特定部２３２ｂにより実行される論理積演算から除外してよいことを示すものとする。 In the example illustrated in FIG. 17, the address element storage unit 222 stores “0” as the importance of the address elements “Kanagawa” and “Tokyo” whose type is “prefecture”. Here, the importance “0” indicates that the importance is low. The importance “0” indicates that it may be excluded from the logical product operation executed by the primary specifying unit 232b described later.

このような重要度「０」は、住所を特定する上で不要な住所要素に付与される。例えば、図１７に示した例のように、タイプ「都道府県」に重要度「０」を付与する。これは、一般に、都道府県が不明であっても、市区郡、政令区町村、町名字、丁目字が明確であれば、住所を特定できるからである。 Such importance “0” is given to an address element that is unnecessary for specifying an address. For example, as in the example illustrated in FIG. 17, the importance “0” is assigned to the type “prefecture”. This is because, in general, even if the prefecture is unknown, the address can be specified if the municipality, government-designated city, town name, and chome character are clear.

また、図１７に示した例において、住所要素記憶部２２２は、タイプが「特定名称」である住所要素「○○病院」の重要度に「１」を記憶する。ここで、重要度「１」は、重要度が高いことを示すものとする。そして、重要度「１」は、対応する住所要素（図１７の例では、「○○病院」）が処理対象住所データに含まれている場合、後述する一次特定部２３２ｂによって、対応する住所ＩＤ（図１７の例では、「１０１」）に特定されてよいことを示すものとする。 In the example illustrated in FIG. 17, the address element storage unit 222 stores “1” as the importance of the address element “XX hospital” whose type is “specific name”. Here, the importance level “1” indicates that the importance level is high. Then, when the corresponding address element (in the example of FIG. 17, “XX hospital”) is included in the processing target address data, the importance “1” is assigned to the corresponding address ID by the primary specifying unit 232b described later. (In the example of FIG. 17, “101”) may be specified.

このような重要度「１」は、住所を一意に特定することができる住所要素（例えば、建物名や、ビル名、施設名）に付与される。例えば、図１７に示した例のように、「○○病院」に重要度「１」を付与する。これは、「○○病院」が明確になれば、住所を一意に特定できるからである。 Such importance “1” is given to an address element (for example, a building name, a building name, or a facility name) that can uniquely identify an address. For example, as in the example illustrated in FIG. 17, importance “1” is assigned to “XX hospital”. This is because the address can be uniquely identified if “XX hospital” becomes clear.

一次特定部２３２ｂは、抽出部１３２ａによって抽出された住所ＩＤの集合に対して論理積および論理和を演算することにより住所ＩＤを特定する。このとき、一次特定部２３２ｂは、抽出部１３２ａによって抽出された住所ＩＤのうち、重要度「０」に対応付けて住所要素記憶部２２２に記憶されている住所ＩＤを除外して、論理積を演算する。また、一次特定部２３２ｂは、重要度が設定されていない住所ＩＤの論理積を演算し、かかる論理積演算の結果と、重要度が「１」である住所ＩＤとの論理和を演算する。 The primary specifying unit 232b specifies an address ID by calculating a logical product and a logical sum for the set of address IDs extracted by the extracting unit 132a. At this time, the primary identification unit 232b excludes the address ID stored in the address element storage unit 222 in association with the importance “0” from the address ID extracted by the extraction unit 132a, and performs a logical product. Calculate. Further, the primary specifying unit 232b calculates the logical product of the address IDs for which the importance level is not set, and calculates the logical sum of the logical product calculation result and the address ID having the importance level “1”.

ここで、図１８および図１９を用いて、データ処理装置２００によるデータ修正処理の一例について説明する。図１８および図１９は、実施例２に係るデータ処理装置２００によるデータ修正処理の一例を説明するための図である。なお、以下に説明する２つの例において、住所記憶部１２１、住所要素記憶部２２２に記憶されている各種情報は、それぞれ図３、図１７に示した状態であるものとする。 Here, an example of data correction processing by the data processing device 200 will be described with reference to FIGS. 18 and 19. 18 and 19 are diagrams for explaining an example of the data correction process by the data processing apparatus 200 according to the second embodiment. In the two examples described below, it is assumed that various information stored in the address storage unit 121 and the address element storage unit 222 are in the states illustrated in FIGS. 3 and 17, respectively.

まず、図１８を用いて、処理対象住所データが「東京都川崎市中原区下小田中１丁目」である場合におけるデータ処理装置２００によるデータ修正処理について説明する。なお、この処理対象住所データ「東京都川崎市中原区下小田中１丁目」を修正するための修正候補住所データは、「神奈川県川崎市中原区下小田中１丁目」である。すなわち、処理対象住所データ「東京都川崎市中原区下小田中１丁目」は、表記が統一されている住所データと比較して「神奈川県」と「東京都」とが異なる。 First, the data correction processing by the data processing apparatus 200 when the processing target address data is “1st Shimoodanaka, Nakahara-ku, Kawasaki-shi, Tokyo” will be described with reference to FIG. The correction candidate address data for correcting this processing target address data “1 Chome Okanaka, Nakahara-ku, Kawasaki-shi, Tokyo” is “Chome Odanaka 1-chome, Nakahara-ku, Kawasaki City, Kanagawa Prefecture”. That is, the address data to be processed “Kanagawa Prefecture” and “Tokyo” are different from the address data in which the notation is unified in the address data “Shokuodanaka 1-chome, Nakahara-ku, Kawasaki-shi, Tokyo”.

かかる場合、データ処理装置２００の抽出部１３２ａは、図１８に示すように、オートマトンを用いた文字列照合処理の結果、住所要素記憶部２２２から、住所ＩＤ「５００−１５００」、「１−１００」、「１−２０」、「５−９」および「１：５：１０１：・・・：５００：・・・」を抽出する。 In this case, the extraction unit 132a of the data processing device 200 receives the address IDs “500-1500” and “1-100” from the address element storage unit 222 as a result of the character string matching process using the automaton, as shown in FIG. ”,“ 1-20 ”,“ 5-9 ”and“ 1: 5: 101:...: 500:.

続いて、一次特定部２３２ｂは、抽出された住所ＩＤの論理積を演算する。このとき、一次特定部２３２ｂは、重要度が「０」である住所ＩＤ「５００−１５００」を除外して、論理積を演算する。すなわち、一次特定部２３２ｂは、住所ＩＤ「１−１００」、「１−２０」、「５−９」および「１：５：１０１：・・・：５００：・・・」の論理積を演算する。これにより、一次特定部２３２ｂは、住所ＩＤ「５」を特定する。続いて、書換部１３４は、住所記憶部１２１から、住所ＩＤ「５」に対応付けて記憶されている住所データ「神奈川県川崎市中原区下小田中１丁目」を取得する。そして、書換部１３４は、処理対象住所データ「東京都川崎市中原区下小田中１丁目」を「神奈川県川崎市中原区下小田中１丁目」へ書き換える。 Subsequently, the primary specifying unit 232b calculates a logical product of the extracted address IDs. At this time, the primary identification unit 232b calculates a logical product by excluding the address ID “500-1500” having the importance of “0”. That is, the primary specifying unit 232b calculates the logical product of the address IDs “1-100”, “1-20”, “5-9”, and “1: 5: 101:...: 500:. To do. Thereby, the primary specific | specification part 232b specifies address ID "5". Subsequently, the rewriting unit 134 acquires from the address storage unit 121 address data “1-chome Shimoodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa” stored in association with the address ID “5”. Then, the rewriting unit 134 rewrites the processing target address data “1-chome Shimo-Odanaka, Nakahara-ku, Kawasaki-shi, Tokyo” to “1-chome Shimo-Odanaka, Nakahara-ku, Kawasaki-shi, Kanagawa”.

このように、データ処理装置２００は、重要度が低い住所要素に対応する住所ＩＤを除外して論理積を演算するので、かかる演算処理をより高速に実行することができる。すなわち、データ処理装置２００は、修正対象住所データをより低負荷かつ高速に特定することができる。 As described above, the data processing apparatus 200 calculates the logical product by excluding the address ID corresponding to the address element having low importance, and thus can execute the calculation process at a higher speed. That is, the data processing device 200 can specify the correction target address data with a lower load and higher speed.

次に、図１９を用いて、処理対象住所データが「神奈川県横浜市○○病院」である場合におけるデータ処理装置２００によるデータ修正処理について説明する。なお、この処理対象住所データ「神奈川県横浜市○○病院」を修正するための修正候補住所データは、「神奈川県横浜市港北区新横浜１丁目○○病院」である。すなわち、処理対象住所データ「神奈川県横浜市○○病院」は、表記が統一されている住所データと比較して「港北区新横浜１丁目」が抜けている。 Next, data correction processing by the data processing device 200 when the processing target address data is “Yokohama City, Kanagawa Prefecture XX Hospital” will be described with reference to FIG. The correction candidate address data for correcting this processing target address data “Yokohama City ○○ Hospital, Kanagawa Prefecture” is “1 Hospital, Shin-Yokohama 1-chome, Kohoku-ku, Yokohama City, Kanagawa Prefecture”. In other words, the address data to be processed “Yokohama City, Kanagawa Prefecture ○○ Hospital” is missing “Shin-Yokohama 1-chome, Kohoku Ward” compared to address data whose notation is unified.

かかる場合、データ処理装置２００の抽出部１３２ａは、図１９に示すように、オートマトンを用いた文字列照合処理の結果、住所要素記憶部２２２から、住所ＩＤ「１−４９９」、「１０１−２００」および「１０１」を抽出する。 In this case, the extraction unit 132a of the data processing device 200 receives the address IDs “1-499” and “101-200” from the address element storage unit 222 as a result of the character string matching process using the automaton, as shown in FIG. ”And“ 101 ”are extracted.

続いて、一次特定部２３２ｂは、抽出された住所ＩＤのうち、重要度が設定されていない住所ＩＤについては論理積を演算し、重要度が設定されている住所ＩＤについては論理和を演算する。図１９に示した例では、一次特定部２３２ｂは、（「１−４９９」ＡＮＤ「１０１−２００」）ＯＲ「１０１」を演算する。そして、一次特定部２３２ｂは、かかる演算処理の結果、修正候補住所データの住所ＩＤを、住所ＩＤ「１０１」に特定する。 Subsequently, the primary specifying unit 232b calculates a logical product for the extracted address IDs for which the importance level is not set, and calculates a logical sum for the address ID for which the importance level is set. . In the example illustrated in FIG. 19, the primary specifying unit 232 b calculates (“1-499” AND “101-200”) OR “101”. And the primary specific | specification part 232b specifies address ID of correction candidate address data to address ID "101" as a result of this arithmetic processing.

続いて、書換部１３４は、住所記憶部１２１から、住所ＩＤ「１０１」に対応付けて記憶されている住所データ「神奈川県横浜市港北区新横浜１丁目○○病院」を取得する。そして、書換部１３４は、処理対象住所データ「神奈川県横浜市○○病院」を「神奈川県横浜市港北区新横浜１丁目○○病院」へ書き換える。 Subsequently, the rewriting unit 134 acquires from the address storage unit 121 address data “Hospital in Shin-Yokohama 1-chome, Kohoku-ku, Yokohama-shi, Kanagawa” stored in association with the address ID “101”. Then, the rewrite unit 134 rewrites the processing target address data “Yokohama City, Kanagawa Prefecture ○○ Hospital” to “1 Hospital, Shin-Yokohama, Kohoku-ku, Yokohama City, Kanagawa Prefecture”.

このように、データ処理装置２００は、重要度が高い住所要素に対応する住所ＩＤを優先的に修正候補住所データに特定する。これにより、データ処理装置２００は、処理対象住所データに重要度が高い住所要素が含まれている場合に、修正候補住所データを高精度に特定することができる。 In this way, the data processing device 200 preferentially specifies the address ID corresponding to the address element having a high importance as the correction candidate address data. Thereby, the data processing device 200 can specify the correction candidate address data with high accuracy when the address element having high importance is included in the processing target address data.

なお、上記実施例２では、２個の重要度（「０」または「１」）について説明したが、データ処理装置２００は、住所要素記憶部２２２に３個以上の重要度を付与してもよい。かかる場合、一次特定部２３２ｂは、処理対象住所データに含まれる住所要素の重要度が高いほど、かかる住所要素に対応する住所ＩＤを優先的に修正候補住所データに特定する。例えば、一次特定部２３２ｂは、重要度が所定の閾値よりも低い住所ＩＤを除外するとともに、重要度が所定の閾値よりも高い住所ＩＤについては論理和演算を行う。 In the second embodiment, two importance levels (“0” or “1”) have been described. However, the data processing apparatus 200 may assign three or more importance levels to the address element storage unit 222. Good. In such a case, the primary identification unit 232b preferentially identifies the address ID corresponding to the address element as the correction candidate address data as the importance of the address element included in the processing target address data is higher. For example, the primary specifying unit 232b excludes address IDs whose importance is lower than a predetermined threshold and performs a logical sum operation on address IDs whose importance is higher than a predetermined threshold.

また、図２および図１６に示したデータ処理装置１００および２００の構成は、要旨を逸脱しない範囲で種々に変更することができる。例えば、データ処理装置１００の制御部１３０の機能をソフトウェアとして実装し、これをコンピュータで実行することにより、データ処理装置１００と同等の機能を実現することもできる。以下に、データ処理装置１００の各種処理部の機能をソフトウェアとして実装したデータ処理プログラム１０７１を実行するコンピュータの一例を示す。 Further, the configurations of the data processing devices 100 and 200 shown in FIGS. 2 and 16 can be variously changed without departing from the gist. For example, a function equivalent to that of the data processing apparatus 100 can be realized by mounting the function of the control unit 130 of the data processing apparatus 100 as software and executing the function by a computer. An example of a computer that executes a data processing program 1071 in which functions of various processing units of the data processing apparatus 100 are installed as software will be described below.

図２０は、データ処理プログラム１０７１を実行するコンピュータ１０００を示す図である。このコンピュータ１０００は、各種演算処理を実行するＣＰＵ（Central Processing Unit）１０１０と、ユーザからのデータの入力を受け付ける入力装置１０２０と、各種情報を表示するモニタ１０３０と、記録媒体からプログラム等を読み取る媒体読取り装置１０４０と、ネットワークを介して他のコンピュータとの間でデータの授受を行うネットワークインターフェース装置１０５０と、各種情報を一時記憶するＲＡＭ（Random Access Memory）１０６０と、ハードディスク装置１０７０とをバス１０８０で接続して構成される。 FIG. 20 is a diagram illustrating a computer 1000 that executes the data processing program 1071. The computer 1000 includes a CPU (Central Processing Unit) 1010 that executes various arithmetic processes, an input device 1020 that receives input of data from a user, a monitor 1030 that displays various information, and a medium that reads a program from a recording medium. A bus 1080 includes a reading device 1040, a network interface device 1050 that exchanges data with other computers via a network, a RAM (Random Access Memory) 1060 that temporarily stores various information, and a hard disk device 1070. Connected and configured.

そして、ハードディスク装置１０７０には、図２に示した制御部１３０と同様の機能を有するデータ処理プログラム１０７１が記憶される。また、ハードディスク装置１０７０には、図２に示した記憶部１２０に記憶される各種データ（住所記憶部１２１、住所要素記憶部１２２および構成情報記憶部１２３）に対応するデータ処理関連データ１０７２が記憶される。また、ハードディスク装置１０７０には、図２に示したログ記憶部１２４に対応するログファイル１０７３が記憶される。なお、データ処理関連データ１０７２またはログファイル１０７３を、適宜分散させ、ネットワークを介して接続された他のコンピュータに記憶させておくこともできる。 The hard disk device 1070 stores a data processing program 1071 having the same function as that of the control unit 130 shown in FIG. The hard disk device 1070 stores data processing related data 1072 corresponding to various data (address storage unit 121, address element storage unit 122, and configuration information storage unit 123) stored in the storage unit 120 shown in FIG. Is done. The hard disk device 1070 stores a log file 1073 corresponding to the log storage unit 124 illustrated in FIG. The data processing related data 1072 or the log file 1073 can be appropriately distributed and stored in another computer connected via the network.

そして、ＣＰＵ１０１０がデータ処理プログラム１０７１をハードディスク装置１０７０から読み出してＲＡＭ１０６０に展開することにより、データ処理プログラム１０７１は、データ処理プロセス１０６１として機能するようになる。そして、データ処理プロセス１０６１は、データ処理関連データ１０７２から読み出した情報等を適宜ＲＡＭ１０６０上の自身に割り当てられた領域に展開し、この展開したデータ等に基づいて各種データ処理を実行する。そして、データ処理プロセス１０６１は、所定の情報をログファイル１０７３に出力する。 Then, the CPU 1010 reads out the data processing program 1071 from the hard disk device 1070 and develops it in the RAM 1060, whereby the data processing program 1071 functions as the data processing process 1061. The data processing process 1061 expands information read from the data processing related data 1072 and the like in an area allocated to itself on the RAM 1060 as appropriate, and executes various data processing based on the expanded data and the like. Then, the data processing process 1061 outputs predetermined information to the log file 1073.

なお、上記のデータ処理プログラム１０７１は、必ずしもハードディスク装置１０７０に格納されている必要はなく、ＣＤ−ＲＯＭ等の記憶媒体に記憶されたこのプログラムを、コンピュータ１０００が読み出して実行するようにしてもよい。また、公衆回線、インターネット、ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等を介してコンピュータ１０００に接続される他のコンピュータ（またはサーバ）等にこのプログラムを記憶させておき、コンピュータ１０００がこれらからプログラムを読み出して実行するようにしてもよい。 Note that the data processing program 1071 does not necessarily have to be stored in the hard disk device 1070, and the computer 1000 may read and execute this program stored in a storage medium such as a CD-ROM. . The computer 1000 stores the program in another computer (or server) connected to the computer 1000 via a public line, the Internet, a LAN (Local Area Network), a WAN (Wide Area Network), or the like. You may make it read and run a program from these.

以上の各実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）所定の文字列と、前記所定の文字列を識別するための文字列識別子とを記憶する文字列記憶手段と、
前記所定の文字列に含まれる文字列である文字列要素と、該文字列要素が含まれる前記所定の文字列を示す文字列識別子の集合とを記憶する文字列要素記憶手段と、
前記文字列要素記憶手段から、処理対象の文字列に含まれる文字列要素に対応付けて記憶されている文字列識別子の集合を抽出する抽出手段と、
前記抽出手段によって抽出された文字列識別子の集合に最も多く含まれている文字列識別子を、前記処理対象の文字列と一致または類似する文字列である対応文字列を示す文字列識別子に特定する第一の特定手段と
を備えたことを特徴とするデータ処理装置。 (Supplementary Note 1) Character string storage means for storing a predetermined character string and a character string identifier for identifying the predetermined character string;
A character string element storage means for storing a character string element that is a character string included in the predetermined character string; and a set of character string identifiers indicating the predetermined character string including the character string element;
Extraction means for extracting a set of character string identifiers stored in association with the character string elements included in the character string to be processed from the character string element storage means;
A character string identifier that is most frequently included in the set of character string identifiers extracted by the extracting unit is specified as a character string identifier that indicates a corresponding character string that is a character string that matches or is similar to the character string to be processed. A data processing apparatus comprising: a first specifying unit.

（付記２）前記第一の特定手段によって特定された文字列識別子が複数存在する場合に、前記処理対象の文字列を所定のサイズごとに分割する分割手段と、
前記第一の特定手段によって特定された文字列識別子が示す複数の文字列のうち、前記分割手段によって分割された文字列である分割文字列が最も多く含まれる文字列を前記対応文字列に特定する第二の特定手段とをさらに備えたことを特徴とする付記１に記載のデータ処理装置。 (Supplementary Note 2) When there are a plurality of character string identifiers specified by the first specifying means, a dividing unit that divides the character string to be processed by a predetermined size;
Among the plurality of character strings indicated by the character string identifier specified by the first specifying means, the character string containing the largest number of divided character strings that are character strings divided by the dividing means is specified as the corresponding character string. The data processing apparatus according to appendix 1, further comprising: a second specifying unit that performs the following.

（付記３）前記抽出手段は、前記文字列要素記憶手段に記憶されている文字列要素により形成されるオートマトンを用いて、前記処理対象の文字列に含まれる文字列要素と、前記文字列要素記憶手段に記憶されている文字列要素とを照合することにより、前記文字列要素記憶手段から、前記処理対象の文字列に含まれる文字列要素に対応付けて記憶されている文字列識別子の集合を抽出することを特徴とする付記１または２に記載のデータ処理装置。 (Additional remark 3) The said extraction means uses the automaton formed by the character string element memorize | stored in the said character string element memory | storage means, The character string element contained in the said character string to be processed, The said character string element A set of character string identifiers stored in association with the character string elements included in the character string to be processed from the character string element storage means by collating with the character string elements stored in the storage means The data processing apparatus according to appendix 1 or 2, characterized in that:

（付記４）前記第一の特定手段は、前記抽出手段によって抽出された文字列識別子の集合の論理積を演算することにより、前記対応文字列を示す文字列識別子を特定することを特徴とする付記１〜３のいずれか一つに記載のデータ処理装置。 (Supplementary note 4) The first specifying means specifies a character string identifier indicating the corresponding character string by calculating a logical product of a set of character string identifiers extracted by the extracting means. The data processing device according to any one of appendices 1 to 3.

（付記５）前記文字列要素記憶手段は、前記文字列要素と前記文字列識別子の集合とに対応付けて、該文字列要素の重要度をさらに記憶し、
前記第一の特定手段は、前記抽出手段によって抽出された文字列識別子の集合のうち、前記文字列要素記憶手段に記憶されている重要度が高い文字列識別子ほど、前記対応文字列を示す文字列識別子に特定することを特徴とする付記１〜４のいずれか一つに記載のデータ処理装置。 (Supplementary Note 5) The character string element storage unit further stores the importance of the character string element in association with the character string element and the set of character string identifiers,
The first specifying unit includes a character string identifier having a higher importance level stored in the character string element storage unit in the set of character string identifiers extracted by the extraction unit, and indicating the corresponding character string. The data processing device according to any one of appendices 1 to 4, wherein the data processing device is specified as a column identifier.

（付記６）前記第一の特定手段は、前記抽出手段によって抽出された文字列識別子の集合から前記重要度が所定の閾値よりも低い文字列識別子の集合を除外した文字列識別子の集合に最も多く含まれている文字列識別子を、前記対応文字列を示す文字列識別子に特定することを特徴とする付記５に記載のデータ処理装置。 (Supplementary note 6) The first specifying means is a set of character string identifiers obtained by excluding a set of character string identifiers whose importance is lower than a predetermined threshold from a set of character string identifiers extracted by the extraction means. The data processing apparatus according to appendix 5, wherein a character string identifier that is included in a large amount is specified as a character string identifier that indicates the corresponding character string.

（付記７）前記第一の特定手段は、前記抽出手段によって抽出された文字列識別子のうち、前記重要度が所定の閾値よりも低い文字列識別子の集合の論理積を演算し、論理積演算の結果と、前記重要度が所定の閾値よりも高い文字列識別子の集合との論理和を演算することにより、前記対応文字列を示す文字列識別子を特定することを特徴とする付記５または６に記載のデータ処理装置。 (Additional remark 7) Said 1st specific means calculates the logical product of the set of the character string identifier whose said importance is lower than a predetermined threshold among the character string identifiers extracted by said extraction means, and performs logical product operation A character string identifier indicating the corresponding character string is specified by calculating a logical sum of the result of the above and a set of character string identifiers having a degree of importance higher than a predetermined threshold. The data processing apparatus described in 1.

（付記８）前記第一の特定手段によって文字列識別子が一意に特定された場合に、前記処理対象の文字列を、該文字列識別子に対応付けて前記文字列記憶手段に記憶されている文字列に書き換え、前記第一の特定手段によって特定された文字列識別子が複数存在する場合に、前記処理対象の文字列を、前記第二の特定手段によって特定された対応文字列に書き換える書換手段をさらに備えたことを特徴とする付記１〜７のいずれか一つに記載のデータ処理装置。 (Supplementary Note 8) When a character string identifier is uniquely specified by the first specifying means, the character string stored in the character string storage means is associated with the character string identifier in association with the character string identifier to be processed. Rewriting means for rewriting to a string, and when there are a plurality of character string identifiers specified by the first specifying means, the rewriting means for rewriting the character string to be processed to the corresponding character string specified by the second specifying means The data processing device according to any one of appendices 1 to 7, further comprising:

（付記９）前記書換手段は、前記処理対象の文字列と前記対応文字列とが異なる場合に、前記処理対象の文字列を前記対応文字列に書き換えることを特徴とする付記８に記載のデータ処理装置。 (Supplementary note 9) The data according to supplementary note 8, wherein the rewriting means rewrites the processing target character string to the corresponding character string when the processing target character string is different from the corresponding character string. Processing equipment.

（付記１０）前記対応文字列と、前記処理対象の文字列と、前記対応文字列と前記処理対象の文字列との差異とを、所定の表示部または所定の記憶部に出力するログ出力手段をさらに備えたことを特徴とする付記１〜９のいずれか一つに記載のデータ処理装置。 (Supplementary Note 10) Log output means for outputting the corresponding character string, the character string to be processed, and the difference between the corresponding character string and the character string to be processed to a predetermined display unit or a predetermined storage unit The data processing device according to any one of appendices 1 to 9, further comprising:

（付記１１）所定の文字列に含まれる文字列である文字列要素と、該文字列要素が含まれる前記所定の文字列を識別するための文字列識別子の集合とを記憶する文字列要素記憶手段から、処理対象の文字列に含まれる文字列要素に対応付けて記憶されている文字列識別子の集合を抽出する抽出手順と、
前記抽出手順によって抽出された文字列識別子の集合に最も多く含まれている文字列識別子を、前記処理対象の文字列と一致または類似する文字列である対応文字列を示す文字列識別子に特定する第一の特定手順と
をコンピュータに実行させることを特徴とするデータ処理プログラム。 (Supplementary Note 11) Character string element storage for storing a character string element which is a character string included in a predetermined character string and a set of character string identifiers for identifying the predetermined character string including the character string element An extraction procedure for extracting a set of character string identifiers stored in association with character string elements included in the character string to be processed from the means;
A character string identifier that is most frequently included in the set of character string identifiers extracted by the extraction procedure is specified as a character string identifier indicating a corresponding character string that is a character string that matches or is similar to the character string to be processed. A data processing program that causes a computer to execute the first specific procedure.

（付記１２）前記第一の特定手順によって特定された文字列識別子が複数存在する場合に、前記処理対象の文字列を所定のサイズごとに分割する分割手順と、
前記第一の特定手順によって特定された文字列識別子が示す複数の文字列のうち、前記分割手順によって分割された文字列である分割文字列が最も多く含まれる文字列を前記対応文字列に特定する第二の特定手順とをさらにコンピュータに実行させることを特徴とする付記１１に記載のデータ処理プログラム。 (Supplementary Note 12) When there are a plurality of character string identifiers specified by the first specifying procedure, a dividing procedure for dividing the character string to be processed by a predetermined size;
Among the plurality of character strings indicated by the character string identifier specified by the first specifying procedure, the character string containing the largest number of divided character strings that are character strings divided by the dividing procedure is specified as the corresponding character string. The data processing program according to appendix 11, further causing the computer to execute the second specific procedure.

（付記１３）前記文字列要素記憶手段は、前記文字列要素と前記文字列識別子の集合とに対応付けて、該文字列要素の重要度をさらに記憶し、
前記第一の特定手順は、前記抽出手順によって抽出された文字列識別子の集合のうち、前記文字列要素記憶手段に記憶されている重要度が高い文字列識別子ほど、前記対応文字列を示す文字列識別子に特定することを特徴とする付記１１または１２に記載のデータ処理プログラム。 (Supplementary note 13) The character string element storage means further stores the importance of the character string element in association with the character string element and the set of character string identifiers,
In the first specific procedure, a character string identifier having a higher importance stored in the character string element storage unit of the set of character string identifiers extracted by the extraction procedure is a character indicating the corresponding character string. 13. The data processing program according to appendix 11 or 12, characterized by being specified as a column identifier.

（付記１４）前記第一の特定手順によって文字列識別子が一意に特定された場合に、前記所定の文字列と該所定の文字列の文字列識別子とを記憶する文字列記憶手段から、前記第一の特定手順によって特定された文字列識別子に対応付けて記憶されている文字列を取得し、前記処理対象の文字列を、取得した文字列に書き換え、前記第一の特定手順によって特定された文字列識別子が複数存在する場合に、前記処理対象の文字列を、前記第二の特定手順によって特定された対応文字列に書き換える書換手順をさらにコンピュータに実行させることを特徴とする付記１１〜１３のいずれか一つに記載のデータ処理プログラム。 (Supplementary Note 14) When a character string identifier is uniquely specified by the first specifying procedure, from the character string storage means for storing the predetermined character string and the character string identifier of the predetermined character string, The character string stored in association with the character string identifier specified by one specific procedure is acquired, the character string to be processed is rewritten to the acquired character string, and is specified by the first specific procedure Additional remarks 11-13, wherein when there are a plurality of character string identifiers, the computer further executes a rewriting procedure for rewriting the character string to be processed into the corresponding character string specified by the second specifying procedure. A data processing program according to any one of the above.

（付記１５）前記対応文字列と、前記処理対象の文字列と、前記対応文字列と前記処理対象の文字列との差異とを、所定の表示部または所定の記憶部に出力するログ出力手順をさらにコンピュータに実行させることを特徴とする付記１１〜１４のいずれか一つに記載のデータ処理プログラム。 (Supplementary Note 15) A log output procedure for outputting the corresponding character string, the character string to be processed, and the difference between the corresponding character string and the character string to be processed to a predetermined display unit or a predetermined storage unit The data processing program according to any one of appendices 11 to 14, further causing the computer to execute.

（付記１６）データ処理装置が、
所定の文字列に含まれる文字列である文字列要素と、該文字列要素が含まれる前記所定の文字列を識別するための文字列識別子の集合とを記憶する文字列要素記憶手段から、処理対象の文字列に含まれる文字列要素に対応付けて記憶されている文字列識別子の集合を抽出する抽出工程と、
前記抽出工程によって抽出された文字列識別子の集合に最も多く含まれている文字列識別子を、前記処理対象の文字列と一致または類似する文字列である対応文字列を示す文字列識別子に特定する第一の特定工程と
を含んだことを特徴とするデータ処理方法。 (Supplementary Note 16) The data processing device is
Processing from a character string element storage means for storing a character string element that is a character string included in the predetermined character string and a set of character string identifiers for identifying the predetermined character string including the character string element. An extraction step of extracting a set of character string identifiers stored in association with character string elements included in the target character string;
A character string identifier that is most frequently included in the set of character string identifiers extracted in the extraction step is specified as a character string identifier that indicates a corresponding character string that is a character string that matches or is similar to the character string to be processed. A data processing method comprising: a first specific step.

（付記１７）前記第一の特定工程によって特定された文字列識別子が複数存在する場合に、前記処理対象の文字列を所定のサイズごとに分割する分割工程と、
前記第一の特定工程によって特定された文字列識別子が示す複数の文字列のうち、前記分割工程によって分割された文字列である分割文字列が最も多く含まれる文字列を前記対応文字列に特定する第二の特定工程とをさらに含んだことを特徴とする付記１６に記載のデータ処理方法。 (Supplementary Note 17) When there are a plurality of character string identifiers specified by the first specifying step, a dividing step of dividing the processing target character string into predetermined sizes;
Among the plurality of character strings indicated by the character string identifier specified by the first specifying step, the character string containing the largest number of divided character strings that are character strings divided by the dividing step is specified as the corresponding character string. The data processing method according to appendix 16, further including a second specific step.

（付記１８）前記文字列要素記憶手段は、前記文字列要素と前記文字列識別子の集合とに対応付けて、該文字列要素の重要度をさらに記憶し、
前記第一の特定工程は、前記抽出工程によって抽出された文字列識別子の集合のうち、前記文字列要素記憶手段に記憶されている重要度が高い文字列識別子ほど、前記対応文字列を示す文字列識別子に特定することを特徴とする付記１６または１７に記載のデータ処理方法。 (Supplementary Note 18) The character string element storage means further stores the importance of the character string element in association with the character string element and the set of character string identifiers,
In the first specifying step, a character string identifier having a higher importance level stored in the character string element storage unit of the set of character string identifiers extracted in the extraction step is a character indicating the corresponding character string. 18. The data processing method according to appendix 16 or 17, wherein the data is specified as a column identifier.

（付記１９）前記第一の特定工程によって文字列識別子が一意に特定された場合に、前記所定の文字列と該所定の文字列の文字列識別子とを記憶する文字列記憶手段から、前記第一の特定工程によって特定された文字列識別子に対応付けて記憶されている文字列を取得し、前記処理対象の文字列を、取得した文字列に書き換え、前記第一の特定工程によって特定された文字列識別子が複数存在する場合に、前記処理対象の文字列を、前記第二の特定工程によって特定された対応文字列に書き換える書換工程をさらに含んだことを特徴とする付記１６〜１８のいずれか一つに記載のデータ処理方法。 (Supplementary note 19) When a character string identifier is uniquely specified by the first specifying step, from the character string storage means for storing the predetermined character string and a character string identifier of the predetermined character string, A character string stored in association with the character string identifier specified by one specifying step is acquired, the processing target character string is rewritten to the acquired character string, and the first specifying step specifies Any one of supplementary notes 16 to 18, further comprising a rewriting step of rewriting the character string to be processed with the corresponding character string specified in the second specifying step when there are a plurality of character string identifiers. The data processing method as described in any one.

（付記２０）前記対応文字列と、前記処理対象の文字列と、前記対応文字列と前記処理対象の文字列との差異とを、所定の表示部または所定の記憶部に出力するログ出力工程をさらに含んだことを特徴とする付記１６〜１９のいずれか一つに記載のデータ処理方法。 (Supplementary Note 20) A log output step of outputting the corresponding character string, the character string to be processed, and the difference between the corresponding character string and the character string to be processed to a predetermined display unit or a predetermined storage unit The data processing method according to any one of supplementary notes 16 to 19, further comprising:

実施例１に係るデータ処理装置によるデータ修正処理の概要を説明するための図である。It is a figure for demonstrating the outline | summary of the data correction process by the data processor which concerns on Example 1. FIG. 実施例１に係るデータ処理装置の構成を示す図である。1 is a diagram illustrating a configuration of a data processing apparatus according to a first embodiment. 住所記憶部の一例を示す図である。It is a figure which shows an example of an address memory | storage part. 住所要素記憶部の一例を示す図である。It is a figure which shows an example of an address element memory | storage part. 構成情報記憶部の一例を示す図である。It is a figure which shows an example of a structure information storage part. 住所要素記憶部を生成する手法を説明するための図である。It is a figure for demonstrating the method of producing | generating an address element memory | storage part. 一次特定部による住所ＩＤ特定処理を説明するための図である。It is a figure for demonstrating the address ID specific process by a primary specific part. 分割部による処理対象住所データの分割処理を説明するための図である。It is a figure for demonstrating the division | segmentation process of the process target address data by a division part. 二次特定部による住所データ特定処理を説明するための図である。It is a figure for demonstrating the address data specific process by a secondary specific part. 実施例１に係るデータ処理装置によるデータ修正処理手順を示すフローチャートである。3 is a flowchart illustrating a data correction processing procedure by the data processing apparatus according to the first embodiment. 実施例１に係るデータ処理装置によるデータ修正処理の一例を説明するための図である。FIG. 6 is a diagram for explaining an example of a data correction process performed by the data processing apparatus according to the first embodiment. 実施例１に係るデータ処理装置によるデータ修正処理の一例を説明するための図である。FIG. 6 is a diagram for explaining an example of a data correction process performed by the data processing apparatus according to the first embodiment. 実施例１に係るデータ処理装置によるデータ修正処理の一例を説明するための図である。FIG. 6 is a diagram for explaining an example of a data correction process performed by the data processing apparatus according to the first embodiment. 表記が統一されていない住所要素を記憶する住所要素記憶部の一例を示す図である。It is a figure which shows an example of the address element memory | storage part which memorize | stores the address element which is not unified. 表記が統一されていない住所要素を記憶する住所要素記憶部を用いた場合におけるデータ修正処理の一例を説明するための図である。It is a figure for demonstrating an example of the data correction process at the time of using the address element memory | storage part which memorize | stores the address element which is not unified. 実施例２に係るデータ処理装置の構成を示す図である。FIG. 6 is a diagram illustrating a configuration of a data processing device according to a second embodiment. 住所要素記憶部の一例を示す図である。It is a figure which shows an example of an address element memory | storage part. 実施例２に係るデータ処理装置によるデータ修正処理の一例を説明するための図である。FIG. 10 is a diagram for explaining an example of data correction processing by a data processing apparatus according to the second embodiment. 実施例２に係るデータ処理装置によるデータ修正処理の一例を説明するための図である。FIG. 10 is a diagram for explaining an example of data correction processing by a data processing apparatus according to the second embodiment. データ処理プログラムを実行するコンピュータを示す図である。It is a figure which shows the computer which executes a data processing program.

Explanation of symbols

１００データ処理装置
１１０対象データ入出力部
１２０記憶部
１２１住所記憶部
１２２住所要素記憶部
１２３構成情報記憶部
１２４ログ記憶部
１３０制御部
１３１受付部
１３２一次探索部
１３２ａ抽出部
１３２ｂ一次特定部
１３３二次探索部
１３３ａ分割部
１３３ｂ二次特定部
１３４書換部
２００データ処理装置
２２２住所要素記憶部
２３２ｂ一次特定部
１０００コンピュータ
１０１０ＣＰＵ
１０２０入力装置
１０３０モニタ
１０４０媒体読取り装置
１０５０ネットワークインターフェース装置
１０６０ＲＡＭ
１０６１データ処理プロセス
１０７０ハードディスク装置
１０７１データ処理プログラム
１０７２データ処理関連データ
１０７３ログファイル
１０８０バス DESCRIPTION OF SYMBOLS 100 Data processor 110 Target data input / output part 120 Storage part 121 Address storage part 122 Address element storage part 123 Configuration information storage part 124 Log storage part 130 Control part 131 Reception part 132 Primary search part 132a Extraction part 132b Primary specification part 133 2 Next search unit 133a Division unit 133b Secondary identification unit 134 Rewriting unit 200 Data processing device 222 Address element storage unit 232b Primary identification unit 1000 Computer 1010 CPU
1020 Input device 1030 Monitor 1040 Media reader 1050 Network interface device 1060 RAM
1061 Data processing process 1070 Hard disk device 1071 Data processing program 1072 Data processing related data 1073 Log file 1080 Bus

Claims

Character string storage means for storing a plurality of character strings composed of a plurality of character string elements in association with character string elements that are character strings included in each of the plurality of character strings;
Extraction means for identifying a plurality of character string elements included in the input character string from the input character string, and extracting a plurality of character strings including at least one of the specified character string elements from the character string storage means When,
Corresponding character string corresponding to the input character string from the extracted character strings based on the correspondence relationship between the specified character string elements and the character string elements included in the extracted character strings and a first specifying means for specifying,
The extraction means performs extraction of the plurality of character strings by a single character string matching process between a character string element included in the input character string and a character string element stored in the character string storage means,
The data processing apparatus according to claim 1, wherein the first specifying unit specifies the corresponding character string by a single specifying process between the extracted character strings .

A dividing unit that divides the input character string by a predetermined size when there are a plurality of corresponding character strings specified by the first specifying unit;
Second identifying means for identifying the corresponding character string that includes the largest number of divided character strings that are character strings divided by the dividing means among a plurality of corresponding character strings;
The data processing apparatus according to claim 1, further comprising:

The character string storage means further stores the importance of the character string element in association with the character string element,
The first specifying means specifies a character string including a character string element having a higher importance among the plurality of character strings extracted by the extracting means as the corresponding character string. Or the data processing apparatus of 2.

The specific process, the data processing apparatus according to any one of claims 1 to 3, characterized in that a logical product operation process.

It further comprises log output means for outputting the corresponding character string, the input character string, and the difference between the corresponding character string and the input character string to a predetermined display unit or a predetermined storage unit. The data processing device according to any one of claims 1 to 4.

A plurality of character string elements included in the input character string are identified from the input character string, and a plurality of character strings including at least one of the identified plurality of character string elements are configured by a plurality of character string elements. An extraction procedure for extracting a plurality of character strings from a character string storage unit that stores the character strings in association with character string elements that are character strings included in each of the plurality of character strings;
Corresponding character string corresponding to the input character string from the extracted character strings based on the correspondence relationship between the specified character string elements and the character string elements included in the extracted character strings a first specifying step of specifying, a cause the computer to execute a
In the extraction procedure, extraction of the plurality of character strings is performed by a single character string matching process between a character string element included in the input character string and a character string element stored in the character string storage unit,
In the first identification procedure, the corresponding character string is identified by one identification process between the extracted character strings .

A dividing procedure for dividing the input character string into predetermined sizes when there are a plurality of corresponding character strings specified by the first specifying procedure;
A plurality of corresponding character strings, a second specifying procedure for specifying, in the corresponding character string, a character string that is the most divided character string divided by the dividing procedure;
The data processing program according to claim 6, further causing a computer to execute.

The extraction means included in the computer specifies a plurality of character string elements included in the input character string from the input character string, and a plurality of character strings including at least one of the specified plurality of character string elements An extraction step of extracting a plurality of character strings composed of character string elements from character string storage means for storing in association with character string elements that are character strings included in each of the plurality of character strings;
The specifying means provided in the computer is configured to input the input from the extracted character strings based on a correspondence relationship between the specified character string elements and the character string elements included in the extracted character strings. and the specific process that identifies the corresponding character string corresponding to the character string, only including,
In the extraction step, extraction of the plurality of character strings is performed by a single character string matching process between a character string element included in the input character string and a character string element stored in the character string storage unit,
In the specifying step, the corresponding character string is specified by a single specifying process between the extracted character strings .