JP4185399B2

JP4185399B2 - Customer data management apparatus, customer data management method, customer data management program, and recording medium storing customer data management program

Info

Publication number: JP4185399B2
Application number: JP2003145473A
Authority: JP
Inventors: 成人岩瀬
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-05-22
Filing date: 2003-05-22
Publication date: 2008-11-26
Anticipated expiration: 2023-05-22
Also published as: JP2004348489A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a customer management system for performing name identification on large-scale customer data speedily and precisely, and to provide a customer data managing device used for the customer management system. <P>SOLUTION: A computer 10 as the customer data managing device is connected to an input DB 11 and a name-identifying DB 18, analyzes address information in customer information by an address cleansing section 14, and analyzes the type of names and name data for verifying customers by a name cleansing section 15. Then, customer information stored in the name-identifying DB 18 is retrieved by a filtering section 16 for narrowing verification data, based on the customer data outputted from the address cleansing section 14 and the name cleansing section 15, the customer data are compared with the narrowed verification data by a matching section 20 to determine the degree of agreement, and the customer data are newly registered at the name-identifying DB 18 when the data are determined to be the customer data of a new customer according to the degree of agreement. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、顧客データ管理装置、顧客データ管理方法および顧客データ管理用プログラムならびに顧客データ管理用プログラムを格納した記録媒体に関する。より詳細には、本発明は、大規模顧客データを高速・高精度に名寄せすることを可能とする顧客データ管理装置、顧客データ管理方法および顧客データ管理用プログラムならびに顧客データ管理用プログラムを格納した記録媒体に関する。
【０００２】
【従来の技術】
従来の名寄せシステムにおいては、大規模な顧客データの照合を行なう場合、例えば特許文献１に示すように顧客情報の各項目ごとに一致不一致を判断し、ＲＤＢ（Relational Data Base）の完全一致検索機能を利用した照合を行なっている。また、取扱う顧客データがそれほど大規模でなく、かつ、高い精度での照合が必要な場合には、対象である顧客データを一度データ・ファイルにダウンロードしたうえで、文字列一致率や単語一致率などにより文字列の一部一致を点数付けして照合を行っている。
【０００３】
【特許文献１】
特開昭６３−２８２８３８号公報
【０００４】
【発明が解決しようとする課題】
しかしながら、このような従来の名寄せ方法には、以下のような問題があった。すなわち、ＲＤＢを用いて文字列一致率や単語一致率などによる照合には一般に長時間を必要とし、大規模な顧客データについて迅速な照合を行なうことは困難であった。また、顧客情報項目としてある「名義」および「住所」の双方の照合度を考慮して最終的な一致度を求めるようなことはＲＤＢを用いた照合方法では不可能である。
【０００５】
一方、顧客データをファイル化することにより高度な照合が可能となるが、データのダウンロードそのものに時間を要することに加え、このような顧客データも最終的にはＲＤＢ化されるものであり顧客データのＲＤＢ化にも長時間が必要とされる。また、取扱うべき顧客データが大規模な場合には、照合やソートにも時間がかかることとなり迅速な照合を求められている現実的なニーズに応えることは難しい。
【０００６】
さらに、顧客データをファイル化した場合には、照合ルールの変更に柔軟に対応することが容易ではない。例えば、「名義」の種別が法人ならば「住所」を番地まで絞り込むこととするルールがある場合に、「住所」が町名まで解析できずその結果番地までの絞込みが不可能なケースも発生し得る。このような場合、「住所」と「名義」とで絞り込むといったようなルールに変更すると、ファイル化した顧客データを利用することが困難となってしまう。
【０００７】
本発明はこのような問題に鑑みてなされたもので、その目的とするところは、大規模顧客データを高速・高精度に名寄せすることを可能とする顧客データ管理装置、顧客データ管理方法および顧客データ管理用プログラムならびに顧客データ管理用プログラムを格納した記録媒体を提供することにある。
【０００８】
【課題を解決するための手段】
本発明は、このような目的を達成するために、請求項１に記載の発明は、顧客の名義および住所を含む顧客情報を格納する入力データベースおよび登録済みの顧客情報を格納する名寄せデータベースの双方に接続され、前記入力データベースに格納された顧客情報中の住所情報を解析して、都道府県から番地・号までを住所コードに変換し、住所情報中にビル名および部屋番号がある場合には、前記住所コードとともに該ビル名および該部屋番号を付加した顧客照合用住所データを出力する住所クレンジング手段と、顧客情報中の名義情報を解析し、名義の種別が個人の場合には、名義を姓と名とに分割し、異体字を１つの文字に統一し、濁音を清音化して顧客照合用名義データを生成し、名義の種別が法人の場合には、名義の小文字を大文字に変換して解析し、変換後の名義と、該変換後の名義中の主要名義および固有名義とを抽出して顧客照合用名義データを生成し、前記名義の種別とともに前記顧客照合用名義データを出力する名義クレンジング手段と、前記住所クレンジング手段および前記名義クレンジング手段から出力される顧客データに基づいて前記名寄せデータベースに格納されている顧客情報を検索して照合データを絞り込むためのフィルタリング手段と、前記顧客データと前記絞り込まれた照合データとの間で、住所情報および名義それぞれの一致度を算出し、予め定められている照合判定ルールにより前記それぞれの一致度から前記顧客データの一致度を判断するマッチング手段と、前記一致度に応じて新規顧客の顧客データと判断された場合に当該顧客データを前記名寄せデータベースに新規登録する名寄せデータ更新手段とを有する顧客データ管理装置において、前記フィルタリング手段は、検索条件対応テーブルを格納する検索条件対応テーブル格納部と、前記名義クレンジング手段により出力された前記名義の種別と、前記住所クレンジング手段により出力された前記顧客照合用住所データの精度とをキーとして、前記検索条件対応テーブルを検索し、検索条件を決定して、該検索条件で前記名寄せデータベースから抽出した照合対象を出力するフィルタリング部とを有し、前記検索条件テーブルは、名義の種別と、顧客データの住所の精度と、顧客データの検索条件とを対応付けて有しており、前記名義の種別が個人の場合、前記顧客データの住所の精度として、（Ａ）「字丁目」までの記載または「字丁目」より詳しい記載があるときには、該顧客データの字丁目までの住所コードを検索条件とし、該顧客データの住所の精度として、（Ｂ）「町大字」までの記載があるときには、該顧客データの町大字までの住所コードと姓とを検索条件とし、該顧客データの住所の精度が、前記（Ａ）でなく、かつ前記（Ｂ）でないときには、該顧客データの姓と名とを検索条件とし、前記名義の種別が法人の場合、前記顧客データの住所の精度として、（Ｃ）「町大字」までの記載または「町大字」より詳しい記載があるときには、該顧客データの町大字までの住所コードを検索条件とし、該顧客データの住所の精度として、（Ｄ）「市区」までの記載があるときには、該顧客データの市区までの住所コードと固有名義の先頭１文字とを検索条件とし、該顧客データの住所の精度が、前記（Ｃ）でなく、かつ前記（Ｄ）でないときには、該顧客データの固有名義を検索条件とすることを定義したテーブルであることを特徴とする。
【０００９】
請求項２に記載の発明は、入力データベースに格納される顧客の名義および住所を含む顧客情報と、名寄せデータベースに格納される登録済みの顧客情報とに基づいて顧客の名寄せを行う顧客データ管理方法であって、住所クレンジング手段が、前記入力データベースに格納された顧客情報中の住所情報を解析して、都道府県から番地・号までを住所コードに変換し、住所情報中にビル名および部屋番号がある場合には、前記住所コードとともに該ビル名および該部屋番号を付加した顧客照合用住所データを出力する住所クレンジングステップと、名義クレンジング手段が、顧客情報中の名義情報を解析し、名義の種別が個人の場合には、名義を姓と名とに分割し、異体字を１つの文字に統一し、濁音を清音化して顧客照合用名義データを生成し、名義の種別が法人の場合には、名義の小文字を大文字に変換して解析し、変換後の名義と、該変換後の名義中の主要名義および固有名義とを抽出して顧客照合用名義データを生成し、前記名義の種別とともに前記顧客照合用名義データを出力する名義クレンジングステップと、フィルタリング手段が、前記住所クレンジングステップおよび前記名義クレンジングステップの結果として出力される顧客データに基づいて前記名寄せデータベースに格納されている顧客情報を検索して照合データを絞り込むためのフィルタリングステップと、マッチング手段が、前記顧客データと前記絞り込まれた照合データとの間で、住所情報および名義それぞれの一致度を算出し、予め定められている照合判定ルールにより前記それぞれの一致度から前記顧客データの一致度を判断するマッチングステップと、名寄せデータ更新手段が、前記一致度に応じて新規顧客の顧客データと判断された場合に当該顧客データを前記名寄せデータベースに新規登録する名寄せデータ更新ステップとを有し、前記フィルタリングステップでは、前記フィルタリング手段が、前記名義クレンジングステップで出力された前記名義の種別と、前記住所クレンジングステップで出力された前記顧客照合用住所データの精度とをキーとして、検索条件対応テーブル格納部に格納された検索条件対応テーブルを検索し、検索条件を決定して、該検索条件で前記名寄せデータベースから抽出した照合対象を出力し、前記検索条件テーブルは、名義の種別と、顧客データの住所の精度と、顧客データの検索条件とを対応付けて有しており、前記名義の種別が個人の場合、前記顧客データの住所の精度として、（Ａ）「字丁目」までの記載または「字丁目」より詳しい記載があるときには、該顧客データの字丁目までの住所コードを検索条件とし、該顧客データの住所の精度として、（Ｂ）「町大字」までの記載があるときには、該顧客データの町大字までの住所コードと姓とを検索条件とし、該顧客データの住所の精度が、前記（Ａ）でなく、かつ前記（Ｂ）でないときには、該顧客データの姓と名とを検索条件とし、前記名義の種別が法人の場合、前記顧客データの住所の精度として、（Ｃ）「町大字」までの記載または「町大字」より詳しい記載があるときには、該顧客データの町大字までの住所コードを検索条件とし、該顧客データの住所の精度として、（Ｄ）「市区」までの記載があるときには、該顧客データの市区までの住所コードと固有名義の先頭１文字とを検索条件とし、該顧客データの住所の精度が、前記（Ｃ）でなく、かつ前記（Ｄ）でないときには、該顧客データの固有名義を検索条件とすることを定義したテーブルであることを特徴とする。
【００１０】
請求項３に記載の発明は、顧客データ管理用プログラムであって、当該顧客データ管理用プログラムにより、請求項１に記載の顧客データ管理装置としてコンピュータを機能させることを特徴とする。
【００１１】
請求項４に記載の発明は、コンピュータ読み取り可能な記録媒体であって、請求項３に記載の顧客データ管理用プログラムを格納していることを特徴とする。
【００２１】
【発明の実施の形態】
以下に、図面を参照して本発明の実施の形態について説明する。
【００２２】
図１は本発明の顧客管理システムを説明するための図であり、図２は図１のシステムで用いる本発明の顧客データ管理方法を説明するための図である。図１において、１０は本発明の顧客データ管理方法を実行するコンピュータであり、本発明を実行するためのプログラムが格納されている。１１は入力された顧客データを格納する入力ＤＢ、１２は入力ＤＢに格納されている顧客データの中から処理されるべきデータを抽出するためのデータ抽出部、１３は「住所」および「名義」以外の顧客データ（例えば、「日付」や「電話番号」など）を変換して正規化するためのデータ変換部である。また、１４は住所照合に必要な各データを作成する住所クレンジング部、１５は名義照合に必要な各データを作成する名義クレンジング部、１６は住所クレンジング部１４および名義クレンジング部１５によるクレンジング結果を基にして照合すべき顧客データの検索条件を決定して既に登録されている顧客のデータを格納している名寄せＤＢ１８を検索するフィルタリング部である。なお、入力ＤＢ１１および名寄せＤＢ１８は、コンピュータ１０とインターネットなどの通信網で接続されたものであってもよいことはいうまでもない。
【００２３】
１７は予め定められた種々の検索条件がテーブル化されて格納されている検索条件対応テーブル格納部であり、１８は登録済み顧客の顧客データを格納する名寄せＤＢである。また、１９は照合ルール対応テーブル格納部であり、顧客判定ルールや住所照合ルールあるいは名義照合ルールなどの様々な照合ルールがテーブルとして格納されている。２０はマッチング部で、照合ルール対応テーブル格納部１９に格納されている照合ルールにしたがって、入力ＤＢ１１に顧客データが登録された顧客と名寄せＤＢ１８に顧客データが登録されている顧客とを照合し、新たに入力された顧客データが既に登録済みの顧客か否かを判断する。さらに、２１は名寄せＤＢ更新部であり、この名寄せＤＢ更新部２１は、入力ＤＢ１１に入力されて照合対象とされた顧客データが既に登録済みの顧客のものではないとマッチング部１９により判断された場合にその顧客データを名寄せＤＢ１８に格納して新規に顧客登録する。
【００２４】
この顧客管埋システムを用いて顧客データ管理を実行する際の処理フローは以下のとおりである。まず、外部から入力（登録）された顧客データを格納した入力ＤＢ１１から顧客データが抽出される（Ｓ２１）。この顧客データとは、例えば、「種別」は個人、「名義」は木村太郎、「住所」は（東京都）足立区千住１−２−３ハイツ北千住５０８、「生年月日」は昭和１２年１月２３日、「電話番号」は０３−ＡＢＣ−ＤＥＦＧ、「日付」は平成１５年５月１２日、などの所定の項目を備えている。必要により、この抽出データのなかの「住所」および「名義」以外の顧客データをデータ変換して正規化を行なう（Ｓ２２）。
【００２５】
次に、顧客データのうちの「住所」および「名義」についてのデータのクレンジングを実行する（Ｓ２３およびＳ２４）。住所クレンジング（Ｓ２３）は住所の照合のために行なうもので、都道府県から番地・号までの住所コード、ならびに、ビル名および部屋番号が出力される。例えば、住所が「（東京都）足立区千住１−２−３ハイツ北千住５０８」である場合には、都道府県から番地・号までの「（東京都）足立区千住１−２−３」に対応する住所コード「１３／１２１／０４０／００１／００２／００３」と、ビル名に対応する「ハイツ北千住」と、部屋番号に対応する「５０８」とが出力される。
【００２６】
名義クレンジング（Ｓ２４）は、顧客が個人か法人かの判断を行うと同時に名義照合のための各種データを作成するための処理である。個人の場合には、例えば、姓と名を分割し、異体字や清音化などの揺らぎを正規化した姓および名を作成する。例えば、「木村太郎」は「木村」という姓と「太郎」という名に分割される。また、「澤田一郎」の場合は「澤田」という姓と「一郎」という名に分割されるが、姓の「澤田」は「沢田」にクレンジングする如くである。また、法人の場合には、名義を会社名・支店名・部門名等に分割したうえで、名義の照合用データである揺らぎを削除するクレンジングを行なって、揺らぎ削除名義、主要名義、固有名義を定める。例えば、入力名義が「焼肉屋ジャンジャン」である場合には、名義の文字列中の小文字を大文字に変換することで揺らぎを削除して揺らぎ削除名義を「焼肉屋ジヤンジヤン」とし、主要名義を「ジヤンジヤン」、固有名義も「ジヤンジヤン」とするといった具合である。
【００２７】
図３は、このような名義クレンジングと住所クレンジングを実行した結果の例を説明するための図で、既に説明した名義のほか、例えば入力名義が「レストラン若菜」のケースでは、名義クレンジングの結果として、「種別」が法人、「会社名」がレストラン若菜、「主要名義」が若菜、「固有名義」が若菜とされ、住所クレンジングの結果として、都道府県から番地・号までの「（東京都）荒川区南千住１−１０−１」に対応する住所コード「１３／１１８／００７／００１／０１０／０００１」と、ビル名に対応する「小林ビル１Ｆ」とが出力される。
【００２８】
次に、住所クレンジング（Ｓ２３）及び名義クレンジング（Ｓ２４）の結果を基に、検索条件対応テーブル１７を参照して名寄せＤＢ１８に格納されている顧客データを検索するための検索条件を作成する（Ｓ２５）。図４は、検索条件対応テーブルの項目内容を説明するための一例であり、この例では「名義の種類」、「住所の必要精度」および「検索条件」の項目に分類されている。「名義の種類」としては「法人」か「個人」かが分類され、「住所の必要精度」としては「町大字」、「市区」、「字丁目」、「町大字」などと分類される。例えば法人名義で「住所の必要精度」が「市区」の場合には、顧客データの検索条件は「住所（市区）」と「固有名義の先頭１文字」との和とされる。なお、「住所の必要精度」を定義せず、法人名義の固有名義のみ若しくは個人名義の姓および名のみを検索条件とすることも可能である。これらの検索条件の設定に際しては、あらかじめ、検索対象である顧客データの登録状況を調べておき、データベース中の顧客データをフィルタリングして抽出される登録顧客（の顧客データ）が、漏れなく且つ絞り込み充分となる条件にしておく。
【００２９】
Ｓ２６はフィルタリングの工程であり、住所クレンジング部１４および名義クレンジング部１５によるクレンジング結果を基にして照合すべき顧客データの検索条件を決定して既に登録されている顧客のデータを格納している名寄せＤＢ１８を検索する。
【００３０】
たとえば、全国の法人名義の顧客データの検索を行なう場合には、その法人の住所が「町大字」まで知られていれば「固有名義」を検索条件に含めなくても充分な絞込みが可能であるが、「市区」までの住所しか判らないときには固有名義も検索条件に繰り込まない限り充分な絞込みができない。そのような場合には「住所（市区）＋固有名義の先頭１文字」を検索条件とすることとなる。また、通常の名寄せではなく、企業単位で名寄せするような場合には、大企業は全国に支店を有しているために「住所」を検索条件として用いることはできず、その法人の固有名義そのものを用いて全国検索する。同様に、個人名義の顧客データの検索を行う場合には、住所の情報として「字丁目」まであれば充分な絞り込みが可能であるが、「町大字」までの住所しか判らない場合には姓などの条件が必要となり、「住所（町大字）＋姓」を検索条件とすることになる。
【００３１】
この検索条件の決定について図３に示したクレンジング後の顧客データを例にとって説明すると、名義「木村太郎」の場合は個人名義の顧客データなので、「足立区千住１−２−３」の個人データを検索する。また、名義「レストラン若菜」の場合は法人名義のデータなので、「荒川区南千住」の法人データを検素する。名義「須田総合家具センター」の場合は法人名義のデータであるが、住所が東京都の中央区なのか大阪府の中央区なのかが確定されないので、検索条件として住所は使わずに固有名義「須田」のみを検索することとなる。
【００３２】
次に、名義及び住所のクレンジング結果から照合ルールを定め（Ｓ２７）、マッチングを実行する（Ｓ２８）。
【００３３】
図５は一般的な照合ルールの例を説明するための図で、この図中で「確定一致」とあるのは入力ＤＢ１１に格納されている照合対象とされた顧客データを、既に登録済みの名寄せＤＢ１８に格納された顧客データと照合して新規な顧客か既に登録済みの顧客かをチェック（ユーザチェック）する必要がないほどの精度で一致していることを意味しており、この場合には対象顧客データは既に登録済みの顧客のものであると判断されることとなる。「曖味一致」とあるのは一応は「一致データ」の候補ではあるが安全のためにユーザチェックが必要であると判断されたことを意味している。
【００３４】
例えば、「住所照合ルール」として、「号」までが一致している場合には住所一致度を９５、「番地」までが一致している場合には住所一致度を９０、「丁目」までが一致している場合には一致度を８０としたり、「名義照合ルール」として、「文字列一致」の場合には一致した文字の割合に１００を乗じて名義一致度とし、「単語一致」の場合には一致した単語の割合に１００を乗じて名義一致度とする。そして、このようなルールに基づいて「一致度」を求め、「名義」の一致度が９０以上でかつ「住所」は「番地」まで一致しておりその一致度が９０以上ならば「確定一致」とし、「住所」が「番地」まで一致しており一致度は９０以上であっても「名義」の一致度が８０以上９０未満の場合には「曖昧一致」とする。逆に、「名義」一致度が９０以上であっても「住所」が「字丁目」レベルの一致しかなく一致度が８０以上９０未満の場合にも「曖昧一致」とみなす。
【００３５】
最後に、照合対象である入力データが名寄せＤＢ１８に格納されている照合データと一致しない場合、すなわち新規の顧客データである場合には、その入力データを新規顧客のデータとして名寄せＤＢに登録する（Ｓ２９）。一方、「確定一致」の場合にはその入力データは既に登録済みの顧客のデータであるため削除される。なお、「曖昧一致」と判断された入力データは、一且は名寄せＤＢ１８に登録されるが、その後に人手による詳細なチェックを受け、新規顧客と判断されればそのまま名寄せＤＢ１８に格納され、既登録顧客と判断されれば削除される。
【００３６】
上述した照合ルールは、必要に応じて変更することも可能である。図６は、個人名義の照合ルールと法人名義の照合ルールとを名寄せ種類別にテーブル化した例である。名義のマッチングを判断するに際しては、名義が法人名ならば主要語や固有名での照合も考えられるが、名義が個人名である場合は姓と名の照合のみである。また、照合するべき住所の範囲も、法人ならば個人よりも件数が少ないので住所の照合度を緩めてもかまわない。入力データ全体が個人または法人の顧客データであれば、本発明の顧客管理システムの起動時に照合ルールを選択すればよいが、個人データと法人データとが混在して入力されてこれらの混在データが照合対象とされる場合には、システムを稼動させた状態で動的に照合ルールを変更する必要が生じる。
【００３７】
そこで、図７に示すように、検索条件に対応付けられた照合ルールを検索条件対応テーブルの項目として追加し、名義の種別や住所の精度に応じてどの照合ルールを用いるかを判断してデータ照合を行うこととすることができる。なお、この図の照合ルール項目中の「法人寄せ２」および「個人寄せ２」の照合ルールにおいては、フィルタリングの結果として「住所」が「市区」または「丁目」まで一致していることが明らかなので、図６中に示した具体的な照合ルールでは住所に関する照合条件を設ける必要がない。また、「住所」データは「市区」または「丁目」のレベルまでしかないので、照合結果は「曖味一致」として、後で人手などによる住所データの修正を行うようにしている。
【００３８】
なお、本システムのＯＳをマルチプロセス可能なＯＳとした場合には、このような照合ルールの変更だけではなく照合作業のさらなる高速化が実現できる。具体例としては図８に示すように、名寄せＤＢ１８を個人ＤＢ１８ａと法人ＤＢ１８ｂとで構成するようにし、個人ＤＢ１８ａには個人名義の顧客データを格納し、法人ＤＢ１８ｂには法人名義の顧客データを格納するようにする。この場合、マッチング部２０を個人マッチング部２０ａと法人マッチング部２０ｂとに分割し、これらの個人マッチング部２０ａと法人マッチング部２０ｂの各々に、個人寄せルールテーブル２０ａ´と法人寄せルールテーブル２０ｂ´を備えてそれぞれの名寄せルールに則ってマッチングさせるように構成し、これらを個人または法人専用とする。そして、住所・名義クレンジング等がなされた入力データが個人名義のものであるか法人名義のものであるかに応じて何れかの処理ルートを選択して照合処理の実行を行なうようにすればよい。
【００３９】
また、マッチングの手順を以下のように構成することとしてもよい。すなわち、マッチング部２０を、例えば図９に示すように、確定一致マッチング部２０ｃと第１の曖昧一致マッチング部２０ｄおよび第２の曖昧一致マッチング部２０ｅを設け、確定一致マッチング部２０ｃには確定一致ルールテーブル２０ｆを、第１の曖昧一致マッチング部２０ｄおよび第２の曖昧一致マッチング部２０ｅには曖昧一致ルールテーブル２０ｇを備えるように構成する。なお、図９では曖昧マッチング部を２つ備える構成としたが、曖昧一致マッチング部は１つとしてもよく或いは３つ以上としてもよい。
【００４０】
マッチング部２０をこのように構成するメリットは以下のようなものである。例えば、顧客数がある程度飽和して、入力される顧客データの８０％程度は既存の顧客になったような場合には、確定一致とみなせる照合対象である入力データは５０％以上あることが期待できる。また、確定一致レベルにある入力データの照合は照合すべきデータ数も少ないため、曖味一致レベルにある入力データの照合よりも格段に早く終了する。そこで、確定一致のプロセスと曖味一致のプロセスを分離し、確定一致とされた入力データの処理が早く終了するようにすることにより、全体の処理が高速化される。
【００４１】
具体的には図１０に示すフローチャートのように、確定一致マッチング部２０ｃにより確定一致ルールに基づいた確定一致マッチングを行ない（Ｓ１０１）、マッチングしたか否かを判断する（Ｓ１０２）。その結果、確定一致したと判定（Ｓ１０２：Ｙｅｓ）された入力データは既に登録されている顧客のデータであるから既存データとして処理される（Ｓ１０３）。一方、確定一致しない（Ｓ１０２：Ｎｏ）と判定された入力データは取り敢えず新規データとして一旦名寄せＤＢ１８に格納・登録され（Ｓ１０４）る。確定一致しない状態で名寄せＤＢ１８に格納された入力データは曖昧一致マッチング部により、曖昧一致マッチングルールに則ってその一致度が判断される（Ｓ１０５）。曖昧一致マッチングの結果、入力データは曖昧一致であるものとの判断がなされると（Ｓ１０６：Ｙｅｓ）、その入力データは「曖昧一致データ」として取扱われる。一方、曖昧マッチングの結果、入力データは曖昧一致ではないとの判断がなされると（Ｓ１０６：Ｎｏ）、その入力データは「新規データ」とされて名寄せＤＢ１８へ登録すべきデータとして取扱われる（Ｓ１０８）。
【００４２】
すなわち、この処理においては取扱う入力データは確定一致でないことが判明した時点で名寄せＤＢ１８に格納・登録され、１または複数の曖昧一致マッチングプロセスを実行することが可能となる。曖昧一致マッチングに要する処理時間が確定一致マッチングの処理時間の１０倍を要したり、全体の入力データの半分が確定一致する顧客データであるような場合には、極めて迅速に照合作業を実行することが可能となる。
【００４３】
【発明の効果】
以上説明したように、本発明によれば、顧客データ管理装置として動作するコンピュータを入力ＤＢおよび名寄せＤＢの双方に接続し、顧客の名義と住所とを含む顧客情報のうちの住所情報を住所クレンジング部により解析するとともに、名義の種類および顧客照合用の名義データを名義クレンジング部により解析する。そして、住所クレンジング部および名義クレンジング部から出力される顧客データに基づいて名寄せＤＢに格納されている顧客情報をフィルタリング部により検索して照合データを絞り込む。顧客データと絞り込まれた照合データとはマッチング部により比較されて一致度が判断され、その一致度に応じて新規顧客の顧客データと判断された場合にはその顧客データを名寄せＤＢに新規登録することとした。このような構成とすると、顧客データのフィルタリングにより照合すべきデータ数を減らすことができ、金融機関等で取扱われる大規模顧客データを高速・高精度に名寄せすることを可能とする顧客管理システムおよびそれに用いられる顧客データ管理装置を提供することが可能となる。
【図面の簡単な説明】
【図１】本発明の顧客管理システムを説明するための図である。
【図２】本発明の顧客データ管理方法を説明するための図である。
【図３】入力名義の名義クレンジングおよび住所クレンジング後の名義例を説明するための図である。
【図４】検索条件対応テーブルの項目内容を説明するための一例を説明するための図である。
【図５】一般的な照合ルールの例を説明するための図である。
【図６】個人名義の照合ルールと法人名義の照合ルールとを名寄せ種類別にテーブル化した例を説明するための図である。
【図７】照合ルールを追加した検索条件対応テーブルを説明するための図である。
【図８】個人顧客と法人顧客の照合を別のプロセスとする場合のシステム構成図である。
【図９】確定一致マッチングと曖昧一致マッチングとを分離したマッチング部の構成を説明するための図である。
【図１０】確定一致マッチングと曖昧一致マッチングとを分離したマッチング工程を説明するための図である。
【符号の説明】
１０コンピュータ
１１入力ＤＢ
１２データ抽出部
１３データ変換部
１４住所クレンジング部
１５名義クレンジング部
１６フィルタリング部
１７検索条件対応テーブル格納部
１８名寄せＤＢ
１９照合ルール対応テーブル格納部
２０マッチング部
２１名寄せＤＢ更新部[0001]
BACKGROUND OF THE INVENTION
The present invention , Customer data management Device, customer data management method, customer data management program, and recording medium storing customer data management program Related To do. More specifically, The present invention Customer data management that enables high-speed and high-accuracy identification of large-scale customer data Device, customer data management method, customer data management program, and recording medium storing customer data management program About.
[0002]
[Prior art]
In a conventional name identification system, when collating large-scale customer data, for example, as shown in Patent Document 1, it is determined whether or not each item of customer information matches, and an RDB (Relational Data Base) complete match search function Checking using is performed. In addition, if the customer data to be handled is not very large and verification with high accuracy is required, the target customer data is downloaded once to a data file, and then the string match rate and word match rate The matching is performed by scoring partial matches of character strings.
[0003]
[Patent Document 1]
JP-A 63-282838
[0004]
[Problems to be solved by the invention]
However, such a conventional name identification method has the following problems. In other words, it generally takes a long time to collate with the character string coincidence rate or the word coincidence rate using RDB, and it is difficult to quickly collate large-scale customer data. Further, it is impossible with the matching method using RDB to obtain the final matching degree in consideration of the matching degree of both “name” and “address” as customer information items.
[0005]
On the other hand, high-level verification is possible by making customer data into a file, but in addition to the time required to download the data itself, such customer data is also eventually converted to RDB. It takes a long time to make RDB. In addition, when customer data to be handled is large, it takes time for collation and sorting, and it is difficult to meet the practical needs for quick collation.
[0006]
Furthermore, when customer data is filed, it is not easy to flexibly cope with a change in collation rules. For example, if there is a rule that narrows down the “address” to the address if the type of “name” is a legal entity, the case where the “address” cannot be analyzed down to the street name and as a result it is not possible to narrow down to the address. obtain. In such a case, if it is changed to a rule that narrows down by “address” and “name”, it becomes difficult to use the customer data filed.
[0007]
The present invention has been made in view of such problems, and its purpose is to manage customer data that enables large-scale customer data to be named at high speed and with high accuracy. Device, customer data management method, customer data management program, and recording medium storing customer data management program Is to provide.
[0008]
[Means for Solving the Problems]
In order to achieve the above object, the present invention described in claim 1 It is connected to both the input database that stores customer information including the name and address of the customer and the name identification database that stores registered customer information. The address information in the customer information stored in the input database is analyzed, and the capital information is analyzed. Converts prefectures to address / number into address codes, and if there is a building name and room number in the address information, outputs address data for customer verification with the building name and room number added together with the address code Analyzing address cleansing means and name information in customer information, if name type is individual, name is divided into surname and first name, unification character is unified into one character, When customer verification nominal data is generated and the type of name is corporate, the lowercase name is converted to uppercase and analyzed, and the converted name and the main name and name in the name after the conversion are analyzed. A nominal cleansing means for extracting the unique name and generating nominal data for customer verification, and outputting the nominal data for customer verification together with the type of nominal, and customer data output from the address cleansing means and the nominal cleansing means The filtering means for searching customer information stored in the name identification database based on the name and narrowing down the matching data, and the matching degree of each of the address information and the name between the customer data and the narrowed matching data And a matching means for judging the degree of coincidence of the customer data from the respective degrees of coincidence according to a predetermined collation judgment rule, and when the customer data of a new customer is judged according to the degree of coincidence Name identification data updating means for newly registering customer data in the name identification database In the customer data management apparatus, the filtering means includes a search condition correspondence table storage unit for storing a search condition correspondence table, the name type output by the name cleansing means, and the address cleansing means output by the address cleansing means. Using the accuracy of the customer verification address data as a key, the search condition correspondence table is searched, the search condition is determined, and a filtering unit that outputs a verification target extracted from the name identification database under the search condition, The search condition table associates the name type, the accuracy of the address of the customer data, and the search condition of the customer data. When the name type is an individual, the accuracy of the address of the customer data (A) When there is a description up to “character chome” or a more detailed description than “character chome”, the character data of the customer data When the address code up to the eye is used as a search condition, and the address accuracy of the customer data includes (B) “Machi Boji”, the address code and last name up to the town Boji in the customer data are used as the search condition. When the accuracy of the address of the customer data is not (A) and not (B), the last name and first name of the customer data are used as search conditions, and the customer data (C) When there is a description up to “Machi Daiji” or more detailed description than “Machi Daiji”, the address code up to the town Oji of the customer data is used as a search condition, and the accuracy of the address of the customer data (D) When there is a description up to “city”, the address code to the city of the customer data and the first character of the unique name are used as search conditions, and the accuracy of the address of the customer data is C) and not When not D) is a table defining that a search condition unique name of the customer data It is characterized by that.
[0009]
The invention described in claim 2 A customer data management method for performing customer name identification based on customer information including a customer name and address stored in an input database and registered customer information stored in a name identification database. Analyzing the address information in the customer information stored in the input database, converting the prefecture to the address / number into an address code, and if there is a building name and room number in the address information, the address code And an address cleansing step for outputting customer verification address data to which the building name and the room number are added, and the name cleansing means analyzes the name information in the customer information, and if the name type is an individual, Is divided into first name and last name, unification characters are unified into one character, muffled sound is clarified and customer collation name data is generated. If the name is converted to lower case, the name is converted to upper case, and the converted name, the main name and the unique name in the name after the conversion are extracted to generate customer matching name data, and the name The customer cleansing step that outputs the customer matching nominal data together with the type of the customer, and the filtering unit is stored in the name identification database based on the customer data output as a result of the address cleansing step and the nominal cleansing step A filtering step for searching for information to narrow down matching data, and a matching means calculate the degree of coincidence between the customer data and the narrowed matching data for each of address information and name, The matching degree of the customer data is determined from the respective matching degrees according to a matching judgment rule. The matching step, and the name identification data update means for newly registering the customer data in the name identification database when it is determined as customer data of a new customer according to the degree of matching, and the filtering In the step, the filtering means stores the type of name output in the name cleansing step and the accuracy of the customer verification address data output in the address cleansing step in the search condition correspondence table storage unit. Search the stored search condition correspondence table, determine the search condition, and output the collation target extracted from the name identification database with the search condition. The search condition table includes the type of name and the address of the customer data. It has accuracy and customer data search conditions in association with each other. When the type is an individual, as the accuracy of the address of the customer data, (A) when there is a description up to “Character” or a more detailed description than “Character”, the address code up to the character data of the customer data is used as a search condition. As the accuracy of the address of the customer data, when there is a description up to (B) “Machi Daiji”, the address code and the last name up to the town Oji of the customer data are used as search conditions, and the accuracy of the address of the customer data However, when it is not (A) and not (B), the last name and first name of the customer data are used as search conditions, and when the name type is a corporation, the accuracy of the address of the customer data is (C ) When there is a description up to “Machi Daiji” or a more detailed description than “Machi Daiji”, the address code up to the town Oji of the customer data is used as a search condition, and the accuracy of the address of the customer data is (D) “City " When the address code up to the city of the customer data and the first character of the unique name are used as search conditions, and the accuracy of the address of the customer data is not (C) and not (D), It is a table that defines the unique name of the customer data as a search condition It is characterized by that.
[0010]
The invention according to claim 3 A customer data management program that causes a computer to function as the customer data management apparatus according to claim 1. It is characterized by that.
[0011]
The invention according to claim 4 A computer-readable recording medium that stores the customer data management program according to claim 3. It is characterized by that.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
[0022]
FIG. 1 is a diagram for explaining a customer management system of the present invention, and FIG. 2 is a diagram for explaining a customer data management method of the present invention used in the system of FIG. In FIG. 1, reference numeral 10 denotes a computer for executing the customer data management method of the present invention, which stores a program for executing the present invention. 11 is an input DB for storing input customer data, 12 is a data extraction unit for extracting data to be processed from customer data stored in the input DB, and 13 is “address” and “name”. Is a data conversion unit for converting and normalizing customer data (for example, “date”, “telephone number”, etc.) other than. Further, 14 is an address cleansing unit that creates each data necessary for address matching, 15 is a nominal cleansing unit that creates each data necessary for nominal matching, and 16 is a cleansing result by the address cleansing unit 14 and the nominal cleansing unit 15. The filtering unit searches the name collation DB 18 storing the customer data already registered by determining the search condition of the customer data to be collated. Needless to say, the input DB 11 and the name identification DB 18 may be connected to the computer 10 via a communication network such as the Internet.
[0023]
Reference numeral 17 denotes a search condition correspondence table storage unit in which various predetermined search conditions are stored in a table. Reference numeral 18 denotes a name identification DB that stores customer data of registered customers. Reference numeral 19 denotes a collation rule correspondence table storage unit in which various collation rules such as a customer determination rule, an address collation rule, or a nominal collation rule are stored as a table. Reference numeral 20 denotes a matching unit, which collates a customer whose customer data is registered in the input DB 11 and a customer whose customer data is registered in the name identification DB 18 in accordance with a matching rule stored in the matching rule correspondence table storage unit 19. It is determined whether or not the newly input customer data is a registered customer. Furthermore, 21 is a name identification DB update unit, and this name identification DB update unit 21 is determined by the matching unit 19 that the customer data input to the input DB 11 and subjected to collation is not already registered. In that case, the customer data is stored in the name identification DB 18 and newly registered as a customer.
[0024]
The processing flow when executing customer data management using this customer management system is as follows. First, customer data is extracted from the input DB 11 storing customer data input (registered) from the outside (S21). For example, “type” is an individual, “name” is Taro Kimura, “address” is (Tokyo) 1-2-3 Senju, Adachi-ku Heights Kitasenju 508, “date of birth” is Showa 12 On January 23, 2012, “telephone number” includes predetermined items such as 03-ABC-DEFG, and “date” includes May 12, 2003. If necessary, customer data other than “address” and “name” in the extracted data is converted and normalized (S22).
[0025]
Next, cleansing of data on “address” and “name” of the customer data is executed (S23 and S24). Address cleansing (S23) is performed for address collation, and an address code from a prefecture to an address / number, a building name, and a room number are output. For example, if the address is "(Tokyo) Adachi-ku Senju 1-2-3 Heights Kitasenju 508", from the prefecture to the address / number "(Tokyo) Adachi-ku Senju 1-2-3" Address code “13/121/040/001/002/003”, “Heights Kitasenju” corresponding to the building name, and “508” corresponding to the room number are output.
[0026]
Nominal cleansing (S24) is a process for determining whether a customer is an individual or a corporation and simultaneously creating various data for name verification. In the case of an individual, for example, a surname and a first name are divided, and a surname and a first name are created by normalizing fluctuations such as variant characters and squealing. For example, “Taro Kimura” is divided into a surname “Kimura” and a name “Taro”. In the case of “Ichiro Sawada”, it is divided into a surname “Sawada” and a name “Ichiro”, but the surname “Sawada” seems to cleanse to “Sawada”. In the case of a corporation, the name is divided into company name, branch name, department name, etc., and then the cleansing is performed to remove the fluctuation, which is the name verification data. Determine. For example, if the input name is `` Yakiniku Jinjang '', the fluctuation is deleted by converting the lowercase letters in the name string to uppercase and the fluctuation name is changed to `` Yakinikuya Jianjyan ''. For example, “Jiyang Jiyang” and the unique name “Jiyang Jiyang” are also used.
[0027]
FIG. 3 is a diagram for explaining an example of the result of executing such name cleansing and address cleansing. In addition to the name already described, for example, in the case where the input name is “Restaurant Wakana”, as a result of name cleansing , “Type” is corporation, “Company name” is restaurant Wakana, “Main name” is Wakana, “Inherent name” is Wakana, and as a result of address cleansing, from prefecture to street number and number “(Tokyo) The address code “13/118/007/001/010/0001” corresponding to “Arakawa-ku Minamisenju 1-10-1” and “Kobayashi Building 1F” corresponding to the building name are output.
[0028]
Next, based on the results of address cleansing (S23) and nominal cleansing (S24), a search condition for searching customer data stored in the name identification DB 18 is created with reference to the search condition correspondence table 17 (S25). ). FIG. 4 is an example for explaining the contents of the items in the search condition correspondence table. In this example, the items are classified into items of “name type”, “necessary accuracy of address”, and “search condition”. “Name” is classified as “corporate” or “individual”, and “address required accuracy” is classified as “town large character”, “city”, “character chome”, “town large character”, etc. The For example, in the case of a corporate name, when the “address required accuracy” is “city”, the search condition for customer data is the sum of “address (city)” and “first character in unique name”. In addition, it is possible to use only the unique name of the corporate name or only the surname and first name of the personal name without defining the “necessary accuracy of the address”. When setting these search conditions, the registration status of the customer data to be searched is examined in advance, and the registered customers (customer data) extracted by filtering the customer data in the database are narrowed down without omission. Use sufficient conditions.
[0029]
S26 is a filtering step, in which name identification that stores customer data that has already been registered by determining a search condition for customer data to be collated based on the cleansing results by the address cleansing unit 14 and the nominal cleansing unit 15 Search DB18.
[0030]
For example, when searching for customer data in the name of a corporation in Japan, if the address of the corporation is known up to “Machi Oji”, it is possible to narrow down sufficiently without including “unique name” in the search conditions. However, when only the address up to "city" is known, it is not possible to narrow down sufficiently unless the unique name is also included in the search conditions. In such a case, “address (city) + first character of unique name” is used as a search condition. In addition, in the case of name identification by company unit instead of normal name identification, since a large company has branches all over the country, “address” cannot be used as a search condition. Search nationwide using itself. Similarly, when searching for customer data in the name of an individual, it is possible to narrow down sufficiently if the address information is up to "Jingchome", but if you only know the address up to "Machi Taiji", the last name Such a condition is necessary, and "address (town large letter) + surname" is used as a search condition.
[0031]
The determination of the search condition will be described by taking the customer data after cleansing shown in FIG. 3 as an example. In the case of the name “Taro Kimura”, the personal data is “Custom Senju 1-2-3” in Adachi-ku. Inspect Search To do. In the case of the name “Restaurant Wakana”, since it is data in the name of the corporation, the corporation data of “Arakawa-ku Minamisenju” is verified. In the case of the name `` Suda General Furniture Center '', it is data of the corporate name, but since the address is Chuo Ward in Tokyo or Chuo Ward in Osaka Prefecture, it is not determined, so the unique name `` Only "Suda" will be searched.
[0032]
Next, a matching rule is determined from the name and address cleansing results (S27), and matching is executed (S28).
[0033]
FIG. 5 is a diagram for explaining an example of a general collation rule. In this figure, “confirmed match” means that customer data that has been registered in the input DB 11 is already registered. In this case, it means that there is no need to check whether it is a new customer or an already registered customer (user check) by comparing with the customer data stored in the name identification DB 18. The target customer data is determined to be that of a registered customer. “Ambiguous match” means that it is determined that a user check is necessary for safety although it is a candidate for “match data”.
[0034]
For example, as the “address matching rule”, the address matching degree is 95 when up to “No.” is matched, the address matching degree is 90 when up to “address” is matched, and up to “chome”. If they match, the matching degree is set to 80. As the “name matching rule”, in the case of “character string matching”, the ratio of matched characters is multiplied by 100 to obtain the nominal matching degree. In this case, the ratio of matched words is multiplied by 100 to obtain the nominal matching degree. Then, based on such a rule, the “matching degree” is obtained. If the matching degree of “name” is 90 or more and the “address” matches up to “address” and the matching degree is 90 or more, “definite matching” If the “address” matches up to “address” and the matching degree is 90 or more, but the matching degree of “name” is 80 or more and less than 90, it is determined as “ambiguous matching”. On the other hand, even if the “name” matching degree is 90 or more, even if the “address” has only a match at the “letter level” level and the matching degree is 80 or more and less than 90, it is regarded as “fuzzy matching”.
[0035]
Finally, if the input data to be collated does not match the collation data stored in the name identification DB 18, that is, if it is new customer data, the input data is registered in the name identification DB as new customer data ( S29). On the other hand, in the case of “confirmed match”, the input data is deleted because it is already registered customer data. The input data determined as “fuzzy match” is temporarily registered in the name identification DB 18, but after that, it is subjected to a detailed manual check, and if it is determined as a new customer, it is stored in the name identification DB 18 as it is. If it is judged as a registered customer, it is deleted.
[0036]
The collation rules described above can be changed as necessary. FIG. 6 is an example in which a personal name collation rule and a corporate name collation rule are tabulated for each type of name identification. When determining the name matching, if the name is a corporate name, collation with a main word or a proper name can be considered, but if the name is an individual name, only matching of the surname and first name is possible. Further, since the number of addresses to be verified is smaller than that of an individual if it is a corporation, the degree of address verification may be relaxed. If the entire input data is individual or corporate customer data, the collation rule may be selected when the customer management system of the present invention is started, but personal data and corporate data are mixedly input and these mixed data are In the case of being a verification target, it is necessary to dynamically change the verification rule while the system is in operation.
[0037]
Therefore, as shown in FIG. 7, a matching rule associated with the search condition is added as an item of the search condition correspondence table, and it is determined which collation rule is used according to the type of name and the accuracy of the address. Verification can be performed. In the collation rule “corporate group 2” and “individual group 2” in the collation rule item of this figure, as a result of filtering, “address” may match “city” or “chome”. Obviously, the specific matching rule shown in FIG. 6 does not require a matching condition for the address. Further, since the “address” data has only the level of “city” or “chome”, the collation result is “fuzzy match”, and the address data is manually corrected later.
[0038]
When the OS of this system is an OS capable of multi-process, not only such a change of the matching rule but also a higher speed of the matching work can be realized. As a specific example, as shown in FIG. 8, the name identification DB 18 is composed of a personal DB 18a and a corporate DB 18b. The personal DB 18a stores personal customer data, and the corporate DB 18b stores corporate customer data. To do. In this case, the matching unit 20 is divided into a personal matching unit 20a and a corporate matching unit 20b, and the personal matching rule table 20a 'and the corporate matching rule table 20b' are provided in each of the personal matching unit 20a and the corporate matching unit 20b. Prepare to match according to each name identification rule, and these are dedicated to individuals or corporations. Then, depending on whether the input data subjected to address / name cleansing or the like is in the name of an individual or in the name of a corporation, it is only necessary to select one of the processing routes and execute the matching process. .
[0039]
The matching procedure may be configured as follows. That is, for example, as shown in FIG. 9, the matching unit 20 includes a confirmed match matching unit 20c, a first fuzzy match matching unit 20d, and a second fuzzy match matching unit 20e, and the confirmed match matching unit 20c has a confirmed match. The rule table 20f is configured so that the first fuzzy match matching unit 20d and the second fuzzy match matching unit 20e include a fuzzy match rule table 20g. In addition, although it was set as the structure provided with two fuzzy matching parts in FIG. 9, it is good also as one or three or more fuzzy matching matching parts.
[0040]
Advantages of configuring the matching unit 20 in this way are as follows. For example, if the number of customers is saturated to some extent and about 80% of the input customer data is an existing customer, it is expected that there is 50% or more of input data that is a matching target that can be regarded as a definite match. it can. Further, the collation of the input data at the definite matching level is completed much earlier than the collation of the input data at the ambiguous matching level because the number of data to be collated is small. Therefore, the entire process is speeded up by separating the process of the confirmed match and the process of the ambiguous match so that the processing of the input data determined to be the confirmed match is completed earlier.
[0041]
Specifically, as shown in the flowchart of FIG. 10, the confirmed match matching unit 20c performs confirmed match matching based on the confirmed match rule (S101), and determines whether or not the match is established (S102). As a result, since the input data determined to be confirmed and matched (S102: Yes) is already registered customer data, it is processed as existing data (S103). On the other hand, the input data determined not to be consistently matched (S102: No) is temporarily stored and registered in the name identification DB 18 as new data (S104). The degree of coincidence of input data stored in the name identification DB 18 in a state where there is no definite match is determined by the fuzzy match matching unit according to the fuzzy match matching rule (S105). As a result of the fuzzy match matching, when it is determined that the input data is fuzzy match (S106: Yes), the input data is handled as “fuzzy match data”. On the other hand, if it is determined that the input data is not an ambiguous match as a result of the ambiguous matching (S106: No), the input data is regarded as “new data” and handled as data to be registered in the name identification DB 18 (S108). ).
[0042]
That is, in this process, when input data to be handled is found not to be a definite match, it is stored and registered in the name identification DB 18 and one or a plurality of fuzzy match matching processes can be executed. When the processing time required for the fuzzy matching is 10 times as long as the processing time for the fixed matching, or when half of the entire input data is the customer data with the fixed matching, the matching operation is executed very quickly. It becomes possible.
[0043]
【The invention's effect】
As described above, according to the present invention, a computer operating as a customer data management apparatus is connected to both the input DB and the name identification DB, and address information in the customer information including the name and address of the customer is address cleansed. The name cleansing unit analyzes the type of name and the name data for customer verification. Then, based on the customer data output from the address cleansing unit and the name cleansing unit, the filtering unit searches the customer information stored in the name identification DB and narrows down the matching data. The matching data is compared with the customer data and the collation data that has been narrowed down, and the degree of coincidence is determined. When it is determined that the customer data is that of a new customer according to the degree of coincidence, the customer data is newly registered in the name identification DB. It was decided. With such a configuration, the number of data to be collated can be reduced by filtering customer data, and a customer management system that enables high-speed and high-precision name identification of large-scale customer data handled by financial institutions and the like, and It is possible to provide a customer data management device used for the above.
[Brief description of the drawings]
FIG. 1 is a diagram for explaining a customer management system of the present invention.
FIG. 2 is a diagram for explaining a customer data management method of the present invention.
FIG. 3 is a diagram for explaining a name cleansing after input cleansing and a name cleansing example after address cleansing;
FIG. 4 is a diagram for explaining an example for explaining item contents of a search condition correspondence table;
FIG. 5 is a diagram for explaining an example of a general matching rule;
FIG. 6 is a diagram for explaining an example in which a collation rule for personal names and a collation rule for corporate names are tabulated for each type of name identification;
FIG. 7 is a diagram for explaining a search condition correspondence table to which a matching rule is added.
FIG. 8 is a system configuration diagram in a case where verification of individual customers and corporate customers is a separate process.
FIG. 9 is a diagram for describing a configuration of a matching unit that separates definite match matching and fuzzy match matching;
FIG. 10 is a diagram for explaining a matching process in which confirmed match matching and fuzzy match matching are separated.
[Explanation of symbols]
10 Computer
11 Input DB
12 Data extraction unit
13 Data converter
14 Address Cleansing Department
15 Name Cleansing Club
16 Filtering section
17 Search condition correspondence table storage
18 people gathering DB
19 Matching rule correspondence table storage
20 Matching part
21 name identification DB update department

Claims

Connected to both an input database that stores customer information, including customer name and address, and a name identification database that stores registered customer information,
Analyzing the address information in the customer information stored in the input database, converting the prefecture to the address / number into an address code, and if there is a building name and room number in the address information, the address code Address cleansing means for outputting customer verification address data with the building name and the room number added thereto;
Analyzing the name information in the customer information, if the name type is an individual, the name is divided into first name and last name, the variant characters are unified into one character, the muddy sound is clarified, and customer verification name data If the type of ownership is a legal entity, the lowercase name of the name is converted to upper case and analyzed, and the converted name, the main name and the unique name in the name after the conversion are extracted, and the customer is extracted. Nominal cleansing means for generating nominal data for verification and outputting the nominal data for customer verification together with the type of nominal;
Filtering means for searching customer information stored in the name identification database based on customer data output from the address cleansing means and the customer name cleansing means, and narrowing down matching data;
The degree of coincidence of the address information and the name is calculated between the customer data and the narrowed collation data, and the degree of coincidence of the customer data is determined from the degree of coincidence according to a predetermined collation judgment rule. Matching means to
In a customer data management device having name identification data update means for newly registering the customer data in the name identification database when it is determined as customer data of a new customer according to the degree of coincidence,
The filtering means includes
A search condition correspondence table storage unit for storing a search condition correspondence table;
The search condition correspondence table is searched using the type of the name output by the name cleansing means and the accuracy of the customer verification address data output by the address cleansing means as a key, and the search condition is determined. A filtering unit that outputs a collation target extracted from the name identification database under the search condition,
The search condition table is
The name classification, customer data address accuracy, and customer data search conditions are associated with each other.
Said when the type of the name is personal, said as the address of the accuracy of customer data, (A) when there is a more detailed description described or "di-chome" to "character-chome", the address code of up to letter-chome of the customer data was used as a search condition, as the address of the accuracy of the customer data, (B) when there is a description of up to "town Oaza" is, as a search condition and address code and the last name of up to Oaza town of the customer data, of the customer data When the accuracy of the address is not (A) and not (B), the first and last names of the customer data are used as search conditions,
If the type of the name is of the corporation, said as the address of the accuracy of customer data, (C) when there is a description or described detailed than the "town Oaza" to "town Oaza" is, address code of up to Oaza town of the customer data was used as a search condition, as the address of the accuracy of the customer data, and to, the search conditions and the first character of the address code and a unique name to the municipal district of the customer data when there is a description of up to (D) "properly" When the accuracy of the address of the customer data is not (C) and not (D), the customer data is a table that defines that the unique name of the customer data is used as a search condition Management device.

A customer data management method for performing customer name identification based on customer information including a customer name and address stored in an input database and registered customer information stored in a name identification database,
When the address cleansing means analyzes the address information in the customer information stored in the input database, converts the prefecture to the address / number into an address code, and there is a building name and room number in the address information Is an address cleansing step for outputting customer verification address data to which the building name and the room number are added together with the address code;
The name cleansing means analyzes the name information in the customer information, and if the name type is an individual, the name is divided into last name and first name, the variant characters are unified into one character, If customer name data is generated and the name type is a legal entity, the lowercase name is converted to uppercase and analyzed, and the converted name, the main name and the unique name in the converted name, Nominal cleansing step for generating customer verification nominal data and outputting the customer verification nominal data together with the nominal type;
A filtering step for narrowing down matching data by searching customer information stored in the name identification database based on customer data output as a result of the address cleansing step and the nominal cleansing step;
The matching means calculates the degree of coincidence between the address information and the name between the customer data and the narrowed collation data, and the customer data is calculated from the degree of coincidence according to a predetermined collation judgment rule. A matching step for determining the degree of match;
A name identification data updating means for newly registering the customer data in the name identification database when it is determined as customer data of a new customer according to the degree of coincidence, and
In the filtering step,
The filtering means is stored in the search condition correspondence table storage unit using the type of the name output in the name cleansing step and the accuracy of the address data for customer verification output in the address cleansing step as keys. Search the search condition correspondence table, determine the search condition, output the collation target extracted from the name identification database with the search condition,
The search condition table is
The name classification, customer data address accuracy, and customer data search conditions are associated with each other.
Said when the type of the name is personal, said as the address of the accuracy of customer data, (A) when there is a more detailed description described or "di-chome" to "character-chome", the address code of up to letter-chome of the customer data was used as a search condition, as the address of the accuracy of the customer data, (B) when there is a description of up to "town Oaza" is, as a search condition and address code and the last name of up to Oaza town of the customer data, of the customer data When the accuracy of the address is not (A) and not (B), the first and last names of the customer data are used as search conditions,
If the type of the name is of the corporation, said as the address of the accuracy of customer data, (C) when there is a description or described detailed than the "town Oaza" to "town Oaza" is, address code of up to Oaza town of the customer data was used as a search condition, as the address of the accuracy of the customer data, and to, the search conditions and the first character of the address code and a unique name to the municipal district of the customer data when there is a description of up to (D) "properly" When the accuracy of the address of the customer data is not (C) and not (D), the customer data is a table that defines that the unique name of the customer data is used as a search condition Management method.

A customer data management program for causing a computer to function as the customer data management apparatus according to claim 1.

A computer-readable recording medium storing the customer data management program according to claim 3.