JP5220582B2

JP5220582B2 - Apparatus, method and program for supporting evaluation of new customer candidates

Info

Publication number: JP5220582B2
Application number: JP2008328913A
Authority: JP
Inventors: 弘揮 ▲柳▼澤; 久嗣鹿島; 玲田島
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2008-12-25
Filing date: 2008-12-25
Publication date: 2013-06-26
Anticipated expiration: 2028-12-25
Also published as: JP2010152568A

Description

本発明は、新規顧客候補の評価作業を支援するための技術に関し、特に、新規顧客候補の各対象を、既知の顧客情報に基づいて評価する技術に関する。 The present invention relates to a technology for supporting evaluation work for new customer candidates, and more particularly to a technology for evaluating each target of a new customer candidate based on known customer information.

企業にとって、新規顧客の発掘は経営上の最重要課題である。新規顧客開拓のために、ダイレクトメール、メディアによる広告・宣伝、セミナーや展示会等のイベントの開催といった様々な営業活動が行われている。そして、より効率的に顧客発掘を行うため、そのような営業活動に対する相手の反応を収集及び解析し、見込み客を絞り込むことが行われている。また、近年では多くの企業が自社のホームページをもつようになったことから、そのアクセスを解析することで見込み客のリストを作成することも行われている。 For companies, finding new customers is the most important management issue. In order to develop new customers, various sales activities such as direct mail, media advertising and promotion, and events such as seminars and exhibitions are held. And in order to discover a customer more efficiently, collecting and analyzing the reaction of the other party with respect to such a sales activity and narrowing down a prospective customer are performed. In recent years, since many companies have their own homepages, a list of prospective customers is also created by analyzing their access.

更に上記アクセス解析を改善して、ＳＦＡ（ＳａｌｅｓＦｏｒｃｅＡｕｔｏｍａｔｉｏｎ）データとＷｅｂアクセスログデータとを横断的、かつ、総合的に分析し、営業活動に関係するターゲット情報を抽出する技術も存在する。 Further, there is a technique for improving the access analysis and extracting target information related to sales activities by analyzing SFA (Sales Force Automation) data and Web access log data in a cross-sectional and comprehensive manner.

例えば、特許文献１は、共通する顧客名、商品カテゴリ及びそれぞれ異なる顧客の接点程度情報を含むＳＦＡデータの集計結果及びＷｅｂアクセスログデータの集計結果を統合するデータ統合処理手段と、このデータ統合処理手段によって統合された統合データからある特定商品カテゴリ又はある特定顧客名に関するＳＦＡデータに含む顧客の接点程度情報とＷｅｂアクセスログデータに含む接点程度情報とのギャップに基づいて、特定商品カテゴリの対象となるターゲット顧客名又は特定顧客名の対象となるターゲット商品を抽出する組織ギャップ検出手段とを設けた顧客情報分析システムを開示する。
特開２００４−３４８６８２号 For example, Patent Document 1 discloses a data integration processing unit that integrates a total result of SFA data and a total result of Web access log data including common customer names, product categories, and contact degree information of different customers, and this data integration processing. Based on the gap between the customer contact level information included in the SFA data relating to a specific product category or a specific customer name from the integrated data integrated by the means and the contact level information included in the Web access log data, the target of the specific product category Disclosed is a customer information analysis system provided with organization gap detection means for extracting a target product that is a target customer name or a target customer name.
JP 2004-348682

しかしながら、従来の見込み客の絞込みは、営業活動に対する問合せや相談、実際の商談や販売履歴、自社ホームページへのアクセスといった、相手からの何らかの反応を基に行われていた。そのためそのような情報が得られない相手は、例え顧客となる可能性が十分にあったとしても、見込み客として判断されることはなかった。 However, conventional narrowing down of prospective customers was based on some kind of reaction from the other party, such as inquiries and consultations on sales activities, actual business negotiations and sales history, and access to the company's homepage. Therefore, a partner who cannot obtain such information has not been judged as a prospective customer even if there is a sufficient possibility of becoming a customer.

また、従来の新規顧客開拓では、現在若しくは過去又はその両方の顧客情報を利用して、新たに顧客となりえる個人、企業、団体等の顧客候補を抽出し評価することは行われていなかった。 Further, in the conventional new customer development, it has not been performed to extract and evaluate customer candidates such as individuals, companies, and organizations that can become new customers by using customer information of current, past, or both.

この発明は、上記の問題点を解決するためになされたものであって、過去に接触がなく何らの反応も得られていない個人、企業、団体等であっても、新規顧客となり得るか否かを評価できる技術を提供することを目的とする。また、現在若しくは過去又はその両方の顧客情報に基づいて、新規顧客としての適正を評価することのできる技術を提供することを目的とする。 This invention has been made to solve the above-mentioned problems, and whether it can be a new customer even if it is an individual, a company, an organization, etc. that has not been contacted in the past and has not received any response. The purpose is to provide a technology that can evaluate this. Moreover, it aims at providing the technique which can evaluate the appropriateness as a new customer based on the customer information of the present, the past, or both.

上記目的を達成する本発明は、次のような新規顧客候補の評価作業を支援する装置によって実現される。そのような支援装置は、顧客又は非顧客のいずれかのラベルを有する複数の第１対象と顧客候補リストにリストされた複数の第２対象の夫々に対して求められた、予め選択された複数の属性に関する属性情報を要素とする特徴ベクトルを格納する特徴ベクトル格納部と、特徴ベクトル格納部から読み出した第１対象の複数の特徴ベクトルを訓練データとして、サポートベクターマシーンにおける判別面を算出する判別面算出手段と、第２対象と第１対象との距離を、それぞれの特徴ベクトル間の距離を判別面の法線ベクトルに射影した長さとして算出する距離算出手段と、第２対象の夫々について、該第２対象の近傍に位置し、かつ顧客のラベルを有する第１対象を算出した各距離に従って抽出し、第２対象を評価するための評価用情報として決定する決定手段とを備える。 The present invention that achieves the above-described object is realized by an apparatus that supports the following new customer candidate evaluation work. Such a support device is a plurality of preselected multiples determined for each of a plurality of first objects having either a customer or non-customer label and a plurality of second objects listed in the customer candidate list. A feature vector storage unit that stores a feature vector whose element is attribute information about the attribute of the attribute, and a discrimination that calculates a discriminant plane in the support vector machine using a plurality of feature vectors of the first target read from the feature vector storage unit as training data For each of the surface calculation means, the distance calculation means for calculating the distance between the second object and the first object as the length obtained by projecting the distance between the feature vectors onto the normal vector of the discrimination surface, and the second object The first object that is located in the vicinity of the second object and has a customer label is extracted according to the calculated distances, and is used as evaluation information for evaluating the second object. And a determination means for determining.

ここで非顧客情報とは、現在又は現在及び過去において顧客でない個人、企業、団体等の対象に関する情報を意味する。即ち、非顧客情報とは、現在又は現在及び過去において顧客でない個人、企業、団体等の対象であればよく、必ずしも、営業活動を行って将来もはっきりと顧客となりえないことが判明している対象である必要はない。一例として、非顧客情報は、新規顧客候補の評価作業の支援サービスをサービス利用者に提供するサービス提供会社にとっての顧客情報からサービス利用者の顧客情報を除いたものであってもよい。これは、顧客となり得そうな個人、企業、団体等の対象が存在するのであれば、その対象は既にサービス利用者に紹介している可能性が高いという仮定によるものである。 Here, the non-customer information means information related to objects such as individuals, companies, organizations, etc. that are not customers at the present or present and in the past. In other words, non-customer information may be the target of individuals, companies, organizations, etc. who are not customers at present or at present and in the past, and it has been found that it is not always possible to become a clear customer in the future by conducting sales activities. It doesn't have to be a subject. As an example, the non-customer information may be information obtained by removing customer information of a service user from customer information for a service provider that provides a service user with a support service for evaluating a new customer candidate. This is based on the assumption that there is a high possibility that the target has already been introduced to the service user if there is a target such as an individual, a company, or an organization that can be a customer.

好ましくは、上記判別面は、バイアスド・サポートベクターマシーンを適用して求められた判別面である。 Preferably, the discrimination surface is a discrimination surface obtained by applying a biased support vector machine.

好ましくは、上記判別面は、バイアスド・サポートベクターマシーンにおけるソフトマージン法を適用して求められた判別面である。 Preferably, the discriminant plane is a discriminant plane obtained by applying a soft margin method in a biased support vector machine.

好ましくは、各対象の特徴ベクトルの要素となる複数の属性に関する属性情報は、対象に対する第三者の評価に関する属性情報、対象の規模及び売り上げに関する属性情報、並びに対象の業種及び地域に関する属性情報のうちの任意の複数の属性情報を含む。 Preferably, the attribute information regarding a plurality of attributes that are elements of the feature vector of each target includes attribute information regarding third-party evaluation of the target, attribute information regarding the scale and sales of the target, and attribute information regarding the target industry and region. Includes any attribute information.

好ましくは、各対象の特徴ベクトルは、正規化された特徴ベクトルである。特徴ベクトルの正規化は、特徴ベクトルの各要素の平均が０になるように補正するものであってよい。また、特徴ベクトルの正規化は、特徴ベクトルの各要素の分散が同一値になるように更に補正をするものであってよい。 Preferably, the feature vector for each object is a normalized feature vector. The normalization of the feature vector may be performed so that the average of each element of the feature vector becomes zero. Further, the normalization of the feature vector may be further corrected so that the variance of each element of the feature vector becomes the same value.

好ましくは、第２対象の近傍は、該第２対象との距離が近い順に第１対象を並べた場合に上位Ｋ番目までに入る距離である。ここでＫは正の整数である。 Preferably, the vicinity of the second object is a distance that reaches the top Kth when the first object is arranged in order of increasing distance from the second object. Here, K is a positive integer.

好ましくは、上記支援装置は、第２対象の各々を、該第２対象の評価用情報として決定された顧客のラベルを有する第１対象の数によって評価する評価部を更に含む。 Preferably, the support device further includes an evaluation unit that evaluates each of the second objects based on the number of the first objects having a customer label determined as the evaluation information of the second object.

好ましくは、上記支援装置は、第２対象の各々を、該第２対象の評価用情報として決定された顧客のラベルを有する第１対象の特定の属性の属性値の合計によって評価する評価部を更に含む。 Preferably, the support device includes an evaluation unit that evaluates each of the second objects based on a sum of attribute values of specific attributes of the first object having a customer label determined as information for evaluation of the second object. In addition.

好ましくは、上記特定の属性は、第１対象の売上金情報である。 Preferably, the specific attribute is sales information of a first target.

好ましくは、上記支援装置は、第２対象の各々を、該第２対象の評価用情報として決定された顧客のラベルを有する第１対象の中で上記第２対象との距離が近い順に上からＭ番目までに含まれる１以上の第１対象に基づいて評価する評価部を更に含む。ここでＭは、正の整数である。 Preferably, the support device sets each of the second objects from the top in order of increasing distance from the second object among the first objects having the customer label determined as the evaluation information of the second object. An evaluation unit that evaluates based on one or more first objects included up to the Mth is further included. Here, M is a positive integer.

以上、新規顧客候補の評価作業を支援する装置として本発明を説明した。しかし、本発明は、そのような支援装置において実行される、新規顧客候補の評価作業を支援するための方法又はプログラムとして把握することもできる。 The present invention has been described above as an apparatus that supports the evaluation work of new customer candidates. However, the present invention can also be understood as a method or program for supporting evaluation work for new customer candidates, which is executed in such a support device.

本発明によれば、過去に接触がなく何らの反応も得られていない個人、企業、団体等であっても、新規顧客となり得るか否か評価することが可能となる。また、現在若しくは過去又はその両方の顧客情報に基づいて、新たな候補の新規顧客としての適正を評価することが可能となる。本発明のその他の効果については、各実施の形態の記載から理解される。 According to the present invention, it is possible to evaluate whether an individual, a company, an organization, or the like who has not been contacted in the past and has not received any reaction can become a new customer. In addition, it is possible to evaluate the appropriateness of a new candidate as a new customer based on current or past customer information or both. Other effects of the present invention will be understood from the description of each embodiment.

以下、本発明を実施するための最良の形態を図面に基づいて詳細に説明するが、以下の実施形態は特許請求の範囲にかかる発明を限定するものではなく、また実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。なお、実施の形態の説明の全体を通じて同じ要素には同じ番号を付している。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, the best mode for carrying out the present invention will be described in detail with reference to the drawings. However, the following embodiments do not limit the invention according to the claims, and are described in the embodiments. Not all combinations of features that are present are essential to the solution of the invention. Note that the same numbers are assigned to the same elements throughout the description of the embodiment.

図１は、本発明の一実施の形態に係る新規顧客候補の評価作業を支援する支援装置２００を実現するのに好適な情報処理装置のハードウェア構成の一例を示した図である。情報処理装置は、バス２に接続されたＣＰＵ（中央処理装置）１とメインメモリ４を含んでいる。ハードディスク装置１３、３０、およびＣＤ−ＲＯＭ装置２６、２９、フレキシブル・ディスク装置２０、ＭＯ装置２８、ＤＶＤ装置３１のようなリムーバブル・ストレージ（記録メディアを交換可能な外部記憶システム）がフロッピーディスクコントローラ１９、ＩＤＥコントローラ２５、ＳＣＳＩコントローラ２７などを経由してバス２へ接続されている。 FIG. 1 is a diagram showing an example of a hardware configuration of an information processing apparatus suitable for realizing a support apparatus 200 that supports an evaluation work for a new customer candidate according to an embodiment of the present invention. The information processing apparatus includes a CPU (central processing unit) 1 and a main memory 4 connected to the bus 2. The hard disk devices 13 and 30 and the CD-ROM devices 26 and 29, the flexible disk device 20, the MO device 28 and the DVD device 31 are removable storages (external storage systems in which recording media can be exchanged). Are connected to the bus 2 via the IDE controller 25, the SCSI controller 27, and the like.

フレキシブル・ディスク、ＭＯ、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭのような記憶メディアが、リムーバブル・ストレージに挿入される。これらの記憶メディアやハードディスク装置１３、３０、ＲＯＭ１４には、オペレーティング・システムと協働してＣＰＵ等に命令を与え、本発明を実施するためのコンピュータ・プログラムのコードを記録することができる。 A storage medium such as a flexible disk, MO, CD-ROM, or DVD-ROM is inserted into the removable storage. In these storage media, the hard disk devices 13 and 30, and the ROM 14, instructions of a computer program for carrying out the present invention can be recorded by giving instructions to the CPU or the like in cooperation with the operating system.

即ち、支援装置２００としての情報処理装置の上記説明した数々の記憶装置には、新規顧客候補の評価作業を支援するためのプログラムと、該プログラムが使用する、顧客情報、非顧客情報、及び顧客候補のリストを含む顧客候補情報や、特徴ベクトルを格納できる。コンピュータ・プログラムはメインメモリ４にロードされることによって実行される。コンピュータ・プログラムは圧縮し、また複数に分割して複数の媒体に記録することもできる。 That is, the above-described numerous storage devices of the information processing device as the support device 200 include a program for supporting the evaluation work of a new customer candidate, customer information, non-customer information, and customer used by the program. Customer candidate information including a list of candidates and feature vectors can be stored. The computer program is executed by being loaded into the main memory 4. The computer program can be compressed or divided into a plurality of pieces and recorded on a plurality of media.

情報処理装置は、キーボード／マウス・コントローラ５を経由して、キーボード６やマウス７のような入力デバイスからの入力を受ける。情報処理装置は、オーディオコントローラ２１を経由して、マイク２４からの入力を受け、またスピーカー２３から音声を出力する。情報処理装置は、視覚データをユーザに提示するための表示装置１１に、グラフィックスコントローラ１０を経由して接続される。情報処理装置は、ネットワーク・アダプタ１８（イーサネット（Ｒ）・カードやトークンリング・カード）等を介してネットワークに接続し、他のコンピュータ等と通信を行うことが可能である。 The information processing apparatus receives input from an input device such as a keyboard 6 or a mouse 7 via the keyboard / mouse controller 5. The information processing apparatus receives input from the microphone 24 via the audio controller 21 and outputs sound from the speaker 23. The information processing apparatus is connected via a graphics controller 10 to a display device 11 for presenting visual data to the user. The information processing apparatus can connect to a network via a network adapter 18 (Ethernet (R) card or token ring card) or the like, and can communicate with other computers.

以上の説明により、本発明の実施の形態による支援装置２００を実現するのに好適な情報処理装置は、通常のパーソナルコンピュータ、ワークステーション、メインフレームなどの情報処理装置、または、これらの組み合わせによって実現されることが容易に理解されるであろう。なお、上記説明した構成要素は例示であり、そのすべての構成要素が本発明の必須構成要素となるわけではない。 As described above, the information processing apparatus suitable for realizing the support apparatus 200 according to the embodiment of the present invention is realized by an information processing apparatus such as a normal personal computer, workstation, mainframe, or a combination thereof. It will be readily understood that In addition, the component demonstrated above is an illustration, All the components are not necessarily an essential component of this invention.

図２は、本発明の一実施形態に係る新規顧客候補の評価作業を支援する支援装置２００の機能構成の一例を示す図である。本発明の実施形態に係る支援装置２００は、非顧客情報格納部２０５と、顧客情報格納部２１０と、顧客候補情報格納部２１５と、特徴ベクトル作成部２２０と、特徴ベクトル格納部２２５と、判別面算出部２３０と、判別面情報格納部２３５と、距離算出部２４０と、決定部２４５と、評価部２５０とスコア格納部２５５とを備える。なお、以下では、サービス提供会社Ａが、サービス利用者Ｂに対し、新規顧客候補の評価作業の支援サービスを提供する場合を想定して説明を行う。 FIG. 2 is a diagram showing an example of a functional configuration of the support apparatus 200 that supports the evaluation work of new customer candidates according to an embodiment of the present invention. The support apparatus 200 according to the embodiment of the present invention includes a non-customer information storage unit 205, a customer information storage unit 210, a customer candidate information storage unit 215, a feature vector creation unit 220, a feature vector storage unit 225, and a determination. A surface calculation unit 230, a determination surface information storage unit 235, a distance calculation unit 240, a determination unit 245, an evaluation unit 250, and a score storage unit 255 are provided. In the following description, it is assumed that the service providing company A provides the service user B with a support service for evaluating new customer candidates.

非顧客情報格納部２０５は、サービス利用者Ｂにとっての非顧客のリストと該リストにある各対象の情報を格納する。上述したように、本発明において非顧客とは、現在又は現在及び過去においてサービス利用者Ｂにとって顧客でない個人、企業、団体等の対象に関する情報を意味する。本実施例では、非顧客情報として、サービス提供会社Ａにとっての顧客からサービス利用者Ｂにとっての顧客を除いた顧客の顧客情報を使用する。これは、顧客となり得そうな個人、企業、団体等の対象が存在するのであれば、その対象は既にサービス利用者に紹介している可能性が高いという仮定によるものである。 The non-customer information storage unit 205 stores a non-customer list for the service user B and information on each target in the list. As described above, the non-customer in the present invention means information related to an object such as an individual, a company, or an organization that is not a customer for the service user B at present or at present and in the past. In the present embodiment, customer information of a customer excluding the customer for the service user B from the customer for the service provider A is used as the non-customer information. This is based on the assumption that there is a high possibility that the target has already been introduced to the service user if there is a target such as an individual, a company, or an organization that can be a customer.

顧客情報格納部２１０は、サービス利用者Ｂにとっての顧客のリストと該リストにある各対象の情報を格納する。また、顧客候補情報格納部２１５は、サービス利用者Ｂにとっての顧客候補のリストと該リストにある各対象の情報を格納する。図３に、本実施例において使用する非顧客、顧客、及び顧客候補の各情報の関係を示す。図３に示すように、本実施例における顧客候補情報Ｃ３１５は、サービス提供会社Ａにとっても、サービス利用者Ｂにとっても顧客でない個人、企業、団体等の対象の情報である。 The customer information storage unit 210 stores a list of customers for the service user B and information on each target in the list. In addition, the customer candidate information storage unit 215 stores a list of customer candidates for the service user B and information on each target in the list. FIG. 3 shows the relationship between the non-customer, customer, and customer candidate information used in this embodiment. As shown in FIG. 3, the customer candidate information C315 in this embodiment is information about individuals, companies, organizations, and the like who are not customers to the service provider A and the service user B.

格納部２０５〜２１５にそれぞれ含まれる各対象の情報は、以下に掲げる複数の属性に関するものであってよい。
（ａ）第三者による非顧客の評点
（ｂ）非顧客の資本金、従業員数、工場数、事業所数、売上高、利益金、配当額、申告所得額
（ｃ）業種、地域
但し、各対象の情報はこれらに限定されず、顧客、非顧客、及び顧客候補の各対象の特徴を示し得る他の情報を更に含んでもよい。また非顧客情報格納部２０５に含まれる各対象は非顧客のラベル、顧客情報格納部２１０に含まれる各対象は顧客のラベルを有するものとする。 Information of each target included in each of the storage units 205 to 215 may relate to a plurality of attributes listed below.
(A) Non-customer ratings by third parties (b) Non-customer capital, number of employees, number of factories, number of establishments, sales, profits, dividends, reported income (c) industry, region The information of each target is not limited thereto, and may further include other information that can indicate the characteristics of each target of customers, non-customers, and customer candidates. Further, each target included in the non-customer information storage unit 205 has a label of a non-customer, and each target included in the customer information storage unit 210 has a label of a customer.

特徴ベクトル作成部２２０は、非顧客、顧客、及び顧客候補の各対象について、予め選択された複数の属性に関する属性情報を要素とする特徴ベクトルを作成する。ここで、各対象の特徴ベクトルの要素となる複数の属性に関する属性情報は、対象に対する第三者の評価に関する属性情報、対象の規模及び売り上げに関する属性情報、並びに対象の業種及び地域に関する属性情報のうちの任意の複数の属性情報を含んでよい。特徴ベクトル作成部２２０は、これら情報を格納部２０５〜２１５から読み出して、非顧客、顧客、及び顧客候補の夫々の対象について特徴ベクトルを作成する。特徴ベクトル作成部２２０は、作成した特徴ベクトルは、特徴ベクトル格納部２２５に格納する。 The feature vector creation unit 220 creates a feature vector whose elements are attribute information related to a plurality of attributes selected in advance for each target of a non-customer, a customer, and a customer candidate. Here, attribute information related to a plurality of attributes that are elements of the feature vector of each target includes attribute information related to third party evaluation of the target, attribute information related to the scale and sales of the target, and attribute information related to the target industry and region. Any plurality of attribute information may be included. The feature vector creation unit 220 reads out these pieces of information from the storage units 205 to 215, and creates feature vectors for the targets of the non-customer, the customer, and the customer candidate. The feature vector creation unit 220 stores the created feature vector in the feature vector storage unit 225.

ここで特徴ベクトル作成部２２０による特徴ベクトルの具体的な作成方法について説明する。本実施例では、基本的には特徴ベクトルxを、x=（評点、資本金、従業員数、…）として定義する。但し、上記（ｂ）に分類される資本金、従業員数等の属性情報については対数をとるものとする。即ち、特徴ベクトルx=（評点、log（資本金）、log（従業員数）、…）と定義する。また、（ｃ）に分類される業種及び地域の属性情報については、０又は１ベクトルで表現するものとする。即ち、例えば特徴ベクトルx=（…、農業、林業、漁業、…）と定義した場合、漁業を専門にしている企業の特徴ベクトルxを（…、０、０、１…）と表現する。地域の属性情報についても、都道府県単位で業種と同様にベクトル表現可能である。 Here, a specific method of creating a feature vector by the feature vector creating unit 220 will be described. In this embodiment, the feature vector x is basically defined as x = (score, capital, number of employees,...). However, the attribute information such as capital and the number of employees classified in (b) above is logarithmic. That is, the feature vector is defined as x = (score, log (capital), log (number of employees),...). In addition, the attribute information of the business category and region classified as (c) is expressed by 0 or 1 vector. That is, for example, when defining the feature vector x = (..., Agriculture, forestry, fishery,...), The feature vector x of a company specializing in fishery is expressed as (..., 0, 0, 1...). The regional attribute information can also be expressed in vectors in the same way as the type of business in each prefecture.

好ましくは、特徴ベクトル作成部２２０は、各対象の特徴ベクトルxに対して正規化を行う。一例として、特徴ベクトル作成部２２０は、特徴ベクトルxの各要素の平均が０になるように補正する。即ち、特徴ベクトルx=（評点、資本金、従業員数、…）とすると、正規化された特徴ベクトルx’= （評点-評点の平均値、資本金-資本金の平均値、従業員数-従業員数の平均値、…）となる。さらに、特徴ベクトル作成部２２０は、特徴ベクトルxの各要素の分散が同一値になるように補正してもよい。即ち、特徴ベクトルx=（評点、資本金、従業員数、…）とすると、正規化された特徴ベクトルx“= （（評点-評点の平均値）/評価の標準偏差、（資本金-資本金の平均値）/資本金の標準偏差、（従業員数-従業員数の平均値）/従業員数の標準偏差、…）となる。 Preferably, the feature vector creation unit 220 normalizes the feature vector x of each target. As an example, the feature vector creation unit 220 corrects so that the average of each element of the feature vector x becomes zero. That is, if feature vector x = (score, capital, number of employees, ...), normalized feature vector x '= (score-average score, capital-average capital, number of employees-employee Average number of members, ...). Furthermore, the feature vector creation unit 220 may correct the variance of each element of the feature vector x so as to have the same value. That is, if feature vector x = (score, capital, number of employees, ...), normalized feature vector x “= ((score-average score) / standard deviation of evaluation, (capital-capital) Average value) / standard deviation of capital, (number of employees-average value of number of employees) / standard deviation of number of employees, ...).

判別面算出部２３０は、特徴ベクトル格納部２２５から読み出した顧客又は非顧客のいずれかのラベルを有する複数の対象（以下、第１対象という）の特徴ベクトルを訓練データとして、サポートベクターマシーン（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅｓ、以下ＳＶＭという）における判別面を算出する。詳細は後述するが、本発明では、ＳＶＭにおける判別面を利用して、特徴ベクトル間の距離を定義する。判別面算出部２３０は、算出した判別面の情報（後述するパラメータw）を判別面情報格納部２３５に格納する。 The discriminant plane calculation unit 230 uses, as training data, feature vectors of a plurality of targets (hereinafter referred to as first targets) having either customer or non-customer labels read from the feature vector storage unit 225 as support data machines (Support). The discriminant plane in Vector Machines (hereinafter referred to as SVM) is calculated. Although details will be described later, in the present invention, the distance between feature vectors is defined using a discrimination surface in SVM. The discriminant plane calculation unit 230 stores the calculated discriminant plane information (a parameter w described later) in the discriminant plane information storage unit 235.

ここで、ＳＶＭについて説明する。ＳＶＭは、２つのクラスのいずれかに属する訓練データから、未知のデータがいずれのクラスに属するかを判定する分類方法であり、音声認識や文字認識、図形認識などのパターン認識分野や医療診断分野等の種々の分野に応用されている。ＳＶＭは、「マージン最大化」と言う明確な基準によりクラス分類を行う判別面を一意に決定する。 Here, SVM will be described. SVM is a classification method for determining from which training data belonging to one of two classes to which class unknown data belongs. Pattern recognition fields such as speech recognition, character recognition, figure recognition, and medical diagnosis fields It is applied to various fields such as. The SVM uniquely determines a discriminant plane for class classification based on a clear criterion of “margin maximization”.

例えば訓練データの集合が次のように与えられていたとする。(x₁,y₁), (x₂, y₂), ..., (x_n,y_n)。ここでx_iは入力ベクトルであり、y_iは、y_i∈{1, -1}を満たすクラスラベルである。この訓練データの入力に対し、ＳＶＭは次の線形識別関数を有する。f(x)= sign(w^Tx + b) 。ここで wとbは判別面g(x)を決定するパラメータである。また、関数sign(t)は、t>0のとき１を、t≦0のとき-1をとる符号関数である。さらにw^Tは、ベクトルwの転置を示す。 For example, assume that a set of training data is given as follows. (x ₁ , y ₁ ), (x ₂ , y ₂ ), ..., (x _n , y _n ). Here, x _i is an input vector, and y _i is a class label that satisfies y _i ε {1, -1}. For this training data input, the SVM has the following linear discriminant function: f (x) = sign (w ^T x + b). Here, w and b are parameters for determining the discrimination surface g (x). The function sign (t) is a sign function that takes 1 when t> 0 and -1 when t ≦ 0. Furthermore, w ^T indicates the transpose of the vector w.

訓練データが線形分離可能であるとすると、マージンを最大とするパラメータwとbを求める問題は、結局、制約条件y_i(w^Tx_i + b)≧1, (i=1, 2, ..., n)の下で、1/‖w‖を最大とするパラメータを求める問題と等価になる（最適化問題Ｐ１）。なお実問題では訓練データが線形分離可能であることは稀であり、この場合多少の識別誤りを許すように制約を緩めるソフトマージン法を利用できる。 Assuming that the training data is linearly separable, the problem of finding the parameters w and b that maximize the margin is, after all, the constraints y _i (w ^T x _i + b) ≧ 1, (i = 1, 2,. .., n) is equivalent to the problem of finding a parameter that maximizes 1 / ‖w‖ (optimization problem P1). In practice, training data is rarely linearly separable, and in this case, a soft margin method can be used that relaxes the constraints to allow some discrimination errors.

ソフトマージン法では、マージン1/‖w‖を最大としながら、いくつかの訓練データが反対側に入ってしまうことを許す。ソフトマージン法における最適な判別面を求める問題は、制約条件y_i(w^Tx_i+ b)≧1-ξ_i, (i=1, 2, ...,n, ξ_i≧0, i=1, 2, ..., n)の下で、1/‖w‖-C₁Σξ_i, (C₁≧0)を最大とするパラメータを求める問題と等価になる（最適化問題Ｐ２）。ここで、Σはi=1からi=nまでの総和であり、ξ_iははみ出し量を示すパラメータである。またＣ₁は第１項のマージンの大きさと、第２項のはみ出し量のバランスを調整する重みのパラメータである。 The soft margin method allows some training data to enter the other side while maximizing the margin 1 / ‖w‖. The problem of finding the optimal discriminant surface in the soft margin method is the constraint condition y _i (w ^T x _i + b) ≧ 1-ξ _i , (i = 1, 2, ..., n, ξ _i ≧ 0, i = 1, 2, ..., n), equivalent to the problem of finding a parameter that maximizes 1 / ‖w‖-C ₁ Σξ _i , (C ₁ ≧ 0) (optimization problem P2) . Here, Σ is the sum from i = 1 to i = n, and ξ _i is a parameter indicating the amount of protrusion. C ₁ is a weight parameter for adjusting the balance between the margin of the first term and the amount of protrusion of the second term.

上記最適化問題Ｐ１及びＰ２は、数理計画法の分野で２次計画問題として知られており、様々な数値計算法が提案され、その解法は公知であるためここでは説明を省略する。なお、上記最適化問題Ｐ１及びＰ２は、オープンソースのSVM-QP 等のソルバーを使用することで解くことができる。 The optimization problems P1 and P2 are known as quadratic programming problems in the field of mathematical programming, and various numerical calculation methods have been proposed, and their solutions are well-known, and thus description thereof is omitted here. The optimization problems P1 and P2 can be solved by using a solver such as an open source SVM-QP.

非顧客のラベルを有する非顧客の各対象が、営業活動等の結果として又は何らかの調査結果により将来もはっきりと顧客となりえないことが判明している対象である場合は、上記説明したＳＶＭやソフトマージン法における判別面を利用できる。しかしながら、本実施例のように、非顧客情報として、サービス提供会社にとっての顧客からサービス利用者にとっての顧客を除いた顧客の顧客情報を使用する場合は、即ち非顧客とする対象を推測又は仮定する場合、判別面算出部２３０は、バイアスドＳＶＭを使用して、判別面を算出する。あるいは、判別面算出部２３０は、バイアスドＳＶＭにおけるソフトマージン法を使用して、判別面を算出してもよい。 If each non-customer with a non-customer label is a target that is clearly not a customer in the future as a result of sales activities, etc. or as a result of some survey, SVM or software described above The discriminant surface in the margin method can be used. However, as in this embodiment, when using customer information of a customer excluding a customer for a service user from a customer for a service provider as non-customer information, that is, guessing or assuming an object to be a non-customer When doing so, the discriminant plane calculation unit 230 calculates the discriminant plane using the biased SVM. Alternatively, the discriminant plane calculation unit 230 may calculate the discriminant plane using a soft margin method in biased SVM.

バイアスドＳＶＭは、B. Liu, Y.Dai, X. Li, W. S. Lee, P. S. Yu, "Building Text Classifiers Using Positive andUnlabeled Examples", ICDM-03, pp19-22, 2003 において開示されている。そこで、バイアスドＳＶＭについて説明する。 Biased SVMs are disclosed in B. Liu, Y. Dai, X. Li, W. S. Lee, P. S. Yu, "Building Text Classifiers Using Positive and Unlabeled Examples", ICDM-03, pp19-22, 2003. Therefore, biased SVM will be described.

まず、訓練データの集合が次のように与えられていたとする。(x₁,y₁), (x₂, y₂), ..., (x_n,y_n)。ここでx_iは入力ベクトルであり、y_iは、y_i∈{1, -1}を満たすクラスラベルである。但し、i=1からi=k-1までの(k-1)個の訓練データは+1のクラスラベルを有し、一方、残りのi=kからi=nまで(n-k+1)個の訓練データは、-1のラベルを付与される、ラベルのない訓練データであると仮定する。ここで訓練データのサイズが十分に大きいとする。すると、+1のクラスラベルを有する訓練データを正しく分類しつつ、+1に分類されるラベルのない訓練データの数を最小にすることで、適切な分類器を得ることができる。 First, it is assumed that a set of training data is given as follows. (x ₁ , y ₁ ), (x ₂ , y ₂ ), ..., (x _n , y _n ). Here, x _i is an input vector, and y _i is a class label that satisfies y _i ε {1, -1}. However, (k-1) training data from i = 1 to i = k-1 has a class label of +1, while the remaining i = k to i = n (n-k + 1 ) Pieces of training data are assumed to be unlabeled training data given a label of -1. Here, it is assumed that the size of the training data is sufficiently large. Then, an appropriate classifier can be obtained by correctly classifying training data having a class label of +1 and minimizing the number of training data without a label classified as +1.

ラベルのない訓練データ以外の+1のラベルを有する訓練データを正しく分類できる場合、バイアスドＳＶＭにおける最適な判別面を求める問題は、制約条件(w^Tx_i+ b)≧1, (i=1, 2,..., k-1)と (w^Tx_i+ b)≦-1+d_i, (i=k, k+1, ..., n), (d_i≧0, i=k, k+1, ..., n)の下で、1/‖w‖-C₂Σd_i, (C₂≧0, Σはi= kからi= nまでの総和)を最大とするパラメータを求める問題と等価になる（最適化問題Ｐ３）。本実施例における第１対象の特徴ベクトルを訓練データとして、図４を参照してバイアスドＳＶＭにおける最適な判別面の決定方法を説明する。 When training data having a label of +1 other than unlabeled training data can be correctly classified, the problem of obtaining the optimum discriminant plane in the biased SVM is that the constraint condition (w ^T x _i + b) ≧ 1, (i = 1 , 2, ..., k-1) and (w ^T x _i + b) ≤-1 + d _i , (i = k, k + 1, ..., n), (d _i ≥0, i = k, k + 1, ..., n), 1 / ‖w‖-C ₂ Σd _i , (C ₂ ≧ 0, Σ is the sum from i = k to i = n) Is equivalent to the problem of obtaining parameters to be optimized (optimization problem P3). A method for determining an optimum discriminant plane in the biased SVM will be described with reference to FIG. 4 using the feature vector of the first target in the present embodiment as training data.

図４において、三角記号は非顧客のラベル（-1のラベルに対応）を有する訓練データ（特徴ベクトル）、丸の記号は顧客のラベル（+1のラベルに対応）を有する訓練データ（特徴ベクトル）を示す。ここで訓練データを２つに分類する判別面は、マージンを最大としつつ、判別面を越えて反対側に入ってしまった非顧客のラベルを有する訓練データのはみ出し量（図中の矢印の部分）の合計が最小となるよう決定される。 In FIG. 4, the triangle symbol is training data (feature vector) having a non-customer label (corresponding to -1 label), and the circle symbol is training data (feature vector) having a customer label (corresponding to +1 label). ). Here, the discriminant plane for classifying the training data into two is the amount of protrusion of the training data having the label of the non-customer that has entered the opposite side beyond the discriminant plane while maximizing the margin (the part indicated by the arrow in the figure). ) Is minimized.

なお、+1のラベルを有する訓練データに対しても識別誤りを許容する場合は、バイアスドＳＶＭにおけるソフトマージン法を使用する。この場合、最適な判別面を求める問題は、制約条件 y_i(w^Tx_i + b)≧1-d_i, (i=1, 2, ...,n, d_i≧0, i=1, 2, ...,n)の下で、1/‖w‖-C₊Σd_i-C_-Σd_i, (C₊,C_-≧0, 2項目のΣはi=1からi=k-1までの総和を、３項目のΣはi=kからi=nまでの総和を示す)を最大とするパラメータを求める問題と等価になる（最適化問題Ｐ４）。なお、C₊は+1のラベルを有する訓練データに対するはみ出し量の重みパラメータであり、C_-は-1のラベルを有する訓練データに対するはみ出し量の重みパラメータである。 In addition, when discriminating errors are allowed even for training data having a label of +1, the soft margin method in the biased SVM is used. In this case, the problem of finding the optimal discriminant plane is the constraint condition y _i (w ^T x _i + b) ≧ 1-d _i , (i = 1, 2, ..., n, d _i ≧ 0, i = 1, 2, ..., under n), 1 / ‖w‖-C + Σd i -C - Σd i, (C +, C - ≧ 0, 2 is Σ items from i = 1 i = This is equivalent to the problem of obtaining a parameter that maximizes the sum up to k−1, and Σ of the three items indicates the sum from i = k to i = n (optimization problem P4). C ₊ is a weight parameter for the amount of protrusion for the training data having a label of +1, and C _- is a weight parameter for the amount of protrusion for the training data having a label of -1.

最適化問題Ｐ３の上記式を、例えばマンハッタン距離Ｌ_１を用いて書き換えると次のようになる。
minimize:‖w‖₁+ C₂Σd_i, (C₂≧0, Σはi= kからi= nまでの総和)
subject to: w^Tx_i+b≧1, (i=1, 2, ..., k-1)
w^Tx_i+b≦-1+d_i,(i=k, ..., n), d_i≧0,(i=k, ..., n) The above formula optimization problem P3, for example, rewritten using the Manhattan distance L ₁ is as follows.
minimize: ‖w‖ ₁ + C ₂ Σd _i , (C ₂ ≧ 0, Σ is the sum from i = k to i = n)
subject to: w ^T x _i + b ≧ 1, (i = 1, 2, ..., k-1)
w ^T x _i + b ≦ -1 + d _i , (i = k, ..., n), d _i ≧ 0, (i = k, ..., n)

更に、上記式は次のように書き換えることで線形計画問題に変換できるので最適解を求めることが可能となる。
minimize:Σ(w⁺ _J + w^- _j)+C₂Σd_i, (C₂≧0, １項目のΣはj= 1からj= nまでの総和、２項目のΣはi= kからi= nまでの総和を示す)
subject to: (w⁺ _J + w^- _j)x_ij+b≧1, (i=1, 2, ..., k-1)
(w⁺ _J+ w^- _j)x_ij+b≦-1+d_i, (i=k, ..., n), d_i≧0,(i=k, 2, ..., n)
線形計画問題は、例えばＩＬＯＧ社のＣＰＬＥＸ等の商用ソルバーやオープンソースのソルバーを使用することで解くことができる。最適化問題Ｐ４も同様にして解くことができる。 Furthermore, since the above equation can be converted into a linear programming problem by rewriting as follows, an optimum solution can be obtained.
minimize: Σ (w ⁺ _J + w ^- _j ) + C ₂ Σd _i , (C ₂ ≧ 0, Σ for one item is the sum from j = 1 to j = n, Σ for the two items is i = k to i = indicates the total up to n)
subject to: (w ⁺ _J + w ^- _j ) x _ij + b ≧ 1, (i = 1, 2, ..., k-1)
(w ⁺ _J + w ^- _j ) x _ij + b ≦ -1 + d _i , (i = k, ..., n), d _i ≧ 0, (i = k, 2, ..., n)
The linear programming problem can be solved by using a commercial solver such as CPLEX of ILOG and an open source solver. The optimization problem P4 can be solved similarly.

距離算出部２４０は、第１対象と顧客候補の対象である第２対象との間の距離を、それぞれの特徴ベクトル間の距離を判別面の法線ベクトルに射影した長さとして算出する。即ち、距離算出部２４０は、特徴ベクトル格納部２２５から第１対象及び第２対象のそれぞれの特徴ベクトルx_i、x_jを読み出し、判別面格納部２３５から読み出したパラメータwを用いて、第１対象と第２対象間の距離を|w(x_i -x_j) |として求める（図５参照）。このように本発明では、判別面を利用して新たな距離空間を定義する。 The distance calculation unit 240 calculates the distance between the first target and the second target, which is the target of the customer candidate, as a length obtained by projecting the distance between the feature vectors onto the normal vector of the discrimination surface. That is, the distance calculation unit 240 reads the feature vectors x _i and x _j of the first object and the second object from the feature vector storage unit 225, and uses the parameter w read from the discrimination plane storage unit 235 to The distance between the object and the second object is determined as | w (x _i -x _j ) | (see FIG. 5). Thus, in the present invention, a new metric space is defined using the discrimination surface.

決定部２４５は、第２対象の夫々について、該第２対象の近傍に位置し、かつ顧客のラベルを有する第１対象を距離算出部２４０により求められた各距離に従って抽出し、第２対象を評価するための評価用情報として決定する。そして決定部２４５は、抽出した各対象の識別情報を、顧客候補リストの対応する第２対象に関連付けて顧客候補情報格納部２１５に格納する。 The determination unit 245 extracts, for each of the second objects, the first object that is located in the vicinity of the second object and has the customer label according to each distance obtained by the distance calculation unit 240, and the second object is extracted. It is determined as evaluation information for evaluation. Then, the determination unit 245 stores the extracted identification information of each target in the customer candidate information storage unit 215 in association with the corresponding second target in the customer candidate list.

好ましくは、第２対象の近傍を、該第２対象との距離が近い順に第１対象を並べた場合に上位Ｋ番目までに入る距離とする。ここでＫは正の整数であり、例えば１００である。簡単のためＫ＝５とした場合の例を、図６に示す。図６において、四角の記号は顧客候補の対象を、三角の記号は、非顧客のラベルを有する対象を、丸の記号は顧客のラベルを有する対象を示す。Ｋ＝５であるため、決定部２４５により抽出される顧客のラベルを有する第１対象は、破線で示された円の内部にある３つである。なお、図６において各対象は、上記説明した新たな距離空間内に示されていることに留意されたい。 Preferably, the vicinity of the second object is a distance that reaches the top Kth when the first objects are arranged in order of increasing distance from the second object. Here, K is a positive integer, for example, 100. FIG. 6 shows an example when K = 5 for simplicity. In FIG. 6, a square symbol indicates an object of a customer candidate, a triangle symbol indicates an object having a non-customer label, and a circle symbol indicates an object having a customer label. Since K = 5, the first objects having customer labels extracted by the determining unit 245 are three inside the circle indicated by the broken line. Note that each object in FIG. 6 is shown in the new metric space described above.

本発明は、顧客である各対象と非顧客である各対象に基づいて顧客候補の各対象を評価するにあたりk近傍法を利用する。k近傍法は、特徴空間における最も近い訓練データに基づいた統計分類の手法である。従来のk近傍法では、標本間の距離として、マンハッタン距離Ｌ_１やユークリッド距離Ｌ_２、また、マハラノビス距離が利用されていた。そのため、従来のk近傍法では、近傍を考えるにあたって、全ての標本が等しく扱われ、該標本が有するラベルが考慮されることはなかった。 The present invention uses the k-nearest neighbor method in evaluating each target customer candidate based on each target being a customer and each target being a non-customer. The k-nearest neighbor method is a statistical classification method based on the nearest training data in the feature space. In the conventional k-neighbor method, the Manhattan distance L ₁ , the Euclidean distance L ₂ , and the Mahalanobis distance are used as the distance between samples. For this reason, in the conventional k-neighbor method, when considering the neighborhood, all the samples are treated equally and the labels of the samples are not considered.

しかしながら、本発明のように、顧客である各対象と非顧客である各対象に基づいて顧客候補の各対象を評価することを目的とする場合、２つの標本間の距離は該標本のラベルを考慮して決定されることが望ましい。即ち、顧客Ｅ、Ｆと非顧客Ｇが与えられた場合、顧客Ｅ−顧客Ｆ間の距離は短く、顧客Ｅ−非顧客Ｇ間の距離は長くなるような距離空間においてk近傍を利用することが望ましい。そこで、本発明では、上記のように判別面を利用して新たな距離空間を定義し、該距離空間においてk近傍法を利用することとした。 However, as in the present invention, when the objective is to evaluate each candidate customer based on each customer target and each non-customer target, the distance between the two samples is the label of the sample. It is desirable to decide in consideration. That is, when customers E and F and non-customer G are given, use the k neighborhood in a distance space where the distance between customer E and customer F is short and the distance between customer E and non-customer G is long. Is desirable. Therefore, in the present invention, a new metric space is defined using the discriminant plane as described above, and the k-nearest neighbor method is used in the metric space.

評価部２５０は、第２対象の各々を、該第２対象の評価用情報として決定された顧客のラベルを有する第１対象の数によって評価する。これに代えて又はこれに加えて評価部２５０は、第２対象の各々を、該第２対象の評価用情報として決定された顧客のラベルを有する第１対象の特定の属性の属性値の合計によって評価してもよい。特定の属性としては、例えば第１対象の売上金情報を利用してよい。 The evaluation unit 250 evaluates each of the second objects based on the number of the first objects having the customer label determined as the evaluation information for the second object. Instead of this or in addition to this, the evaluation unit 250 adds the attribute values of the specific attributes of the first object having the customer label determined as the second object evaluation information for each of the second objects. You may evaluate by. As the specific attribute, for example, sales information on the first target may be used.

更に評価部２５０は、第２対象の各々を、該第２対象の評価用情報として決定された顧客のラベルを有する第１対象の中で第２対象との距離が近い順に上からＭ番目までに含まれる１以上の第１対象に基づいて評価してもよい。ここでＭは正の整数である。Ｍが１の場合、評価部２５０は、第２対象の各々を、該第２対象の評価用情報として決定された顧客のラベルを有する第１対象の中で最も第２対象に近い第１対象、即ち最近傍の顧客のラベルを有する第１対象で評価することになる。 Further, the evaluation unit 250 determines each of the second objects from the top to the Mth in the order of the shortest distance from the second object among the first objects having the customer label determined as the evaluation information of the second object. You may evaluate based on the 1 or more 1st object contained in. Here, M is a positive integer. When M is 1, the evaluation unit 250 sets each of the second objects as the first object closest to the second object among the first objects having the customer label determined as the evaluation information for the second object. That is, the evaluation is performed on the first object having the label of the nearest customer.

また、例えば評価部２５０は、そのようにして求めた１以上の第１対象の中にサービス利用者の優良顧客が含まれる第２対象をより高く評価するようしてもよい。更に評価部２５０は、そのようにして求めた１以上の第１対象の情報そのものを評価結果として扱ってもよい。そのような情報そのものの利用法の一例として、第２対象である顧客候補の営業担当者を、該第２対象について求められた上記１以上の第１対象であるいずれかの顧客の営業担当者とすることが挙げられる。 In addition, for example, the evaluation unit 250 may evaluate the second target in which the excellent customer of the service user is included in the one or more first targets thus determined. Furthermore, the evaluation unit 250 may treat one or more pieces of information of the first target thus obtained as evaluation results. As an example of how to use such information itself, a sales representative of a customer candidate who is the second target is a sales representative of any one of the above-described one or more first targets determined for the second target. And so on.

また、評価部２５０は、評価結果に基づいて第２対象にスコア付けを行い、スコア順に並べた第２対象のリストをスコア格納部２５５に格納する。これに代えて評価部２５０は、評価結果をそのままスコア格納部２５５に格納してもよい。図７に、スコア格納部２５５に格納されるスコア表の一例を示す。図７に示す例では、顧客候補となる各企業のランキングを、その近傍に存在する顧客企業の数により行っている。但し近傍に存在する顧客企業の数が同一の場合は、近傍の顧客企業の平均売上高により顧客候補となる各企業のランキングを行っている。このような方法で顧客候補のランキングを行った場合の本発明の有効性を、クロスバリデーションによる比較実験を行って検証した。 In addition, the evaluation unit 250 scores the second target based on the evaluation result, and stores the second target list arranged in the score order in the score storage unit 255. Instead, the evaluation unit 250 may store the evaluation result in the score storage unit 255 as it is. FIG. 7 shows an example of a score table stored in the score storage unit 255. In the example shown in FIG. 7, each company that is a customer candidate is ranked according to the number of customer companies that exist in the vicinity. However, when the number of customer companies in the vicinity is the same, each company that is a customer candidate is ranked according to the average sales of the nearby customer companies. The effectiveness of the present invention when ranking customer candidates by such a method was verified by conducting a comparative experiment by cross-validation.

図８は、クロスバリデーションによる比較実験の結果を示す。ここで、２つに分けた顧客又は非顧客のラベルを有するデータを、一方をデータ１、残りをデータ２と呼ぶことにする。すると、図８（ａ）は、データ１を、ラベルを有する訓練データ、データ２をラベルのない未知のデータとした場合の実験結果を示す。図８（ｂ）はその逆であり、データ１をラベルのない未知のデータ、データ２を、ラベルを有する訓練データとした場合の実験結果を示す。どちらにおいても、kNNとして示された下側のデータが、Ｌ１距離を用いたk近傍法による結果を示し、proposedAlgorithmとして示された上側のデータが本発明を用いた結果を示す。 FIG. 8 shows the results of a comparative experiment by cross validation. Here, data having a customer or non-customer label divided into two is called data 1 and the rest is called data 2. Then, Fig.8 (a) shows the experimental result at the time of making data 1 into training data which has a label, and data 2 as unknown data without a label. FIG. 8B is the opposite, and shows experimental results when data 1 is unknown data with no label and data 2 is training data with a label. In both cases, the lower data shown as kNN shows the result of the k-nearest neighbor method using the L1 distance, and the upper data shown as proposedAlgorithm shows the result of using the present invention.

図８において横軸はランキング上位の企業数を示し、縦軸は実際の顧客企業の数を示す。例えば横軸の値が２００の時の縦軸の値は、その手法で上位２００社を選択したときに、実際にその中に含まれる顧客企業の数を表している。横軸が４００付近で比較すると、本発明によれば７０社以上の正解があるが、従来のk近傍法では６０社程度の正解しかなく、本発明が１５〜２０％程度多くの正解を出していることが分かる。 In FIG. 8, the horizontal axis indicates the number of companies in the top ranking, and the vertical axis indicates the actual number of customer companies. For example, when the value on the horizontal axis is 200, the value on the vertical axis represents the number of customer companies actually included in the top 200 companies when that method is selected. When the horizontal axis is around 400, according to the present invention, there are more than 70 correct answers, but the conventional k-nearest method has only about 60 correct answers, and the present invention gives about 15 to 20% more correct answers. I understand that

次に、図９を参照して、本発明の一実施形態に係る新規顧客候補の評価作業を支援する処理の流れの一例を説明する。図９においてコンピュータにおいて実行される処理はステップ１０００で開始し、コンピュータは、顧客、非顧客及び顧客候補のそれぞれのリストに記載される各対象に対し、上記説明した特徴ベクトルを作成する。そしてコンピュータはその計算処理により、顧客又は非顧客のラベルを有する複数の対象の特徴ベクトルを、ＳＶＭ、ソフトマージン法、バイアスドＳＶＭ、及びバイアスドＳＶＭのソフトマージン法のいずれか１つにおける訓練データとして、判別面を算出する（ステップ１００５）。 Next, with reference to FIG. 9, an example of the flow of processing for supporting the evaluation work for new customer candidates according to an embodiment of the present invention will be described. In FIG. 9, the processing executed in the computer starts at step 1000, and the computer creates the above-described feature vector for each object listed in the respective lists of customers, non-customers and customer candidates. Then, the computer uses the calculation process to convert feature vectors of a plurality of objects having customer or non-customer labels as training data in any one of SVM, soft margin method, biased SVM, and biased SVM soft margin method. A discrimination plane is calculated (step 1005).

次にコンピュータはその計算処理により、顧客候補リストにリストされた各対象と、顧客又は非顧客いずれかのラベルを有する複数の対象の夫々との間の距離を、それぞれの特徴ベクトル間の距離を判別面の法線ベクトルに射影した長さとして算出する（ステップ１０１０）。 Next, the computer calculates the distance between each object listed in the customer candidate list and each of a plurality of objects having either a customer or non-customer label, and the distance between each feature vector. The length projected onto the normal vector of the discriminant plane is calculated (step 1010).

そして、コンピュータは、顧客候補リストの各対象について、該対象の近傍に位置し、かつ顧客のラベルを有する対象を算出した各距離に従って抽出し、抽出した各対象の識別情報を顧客候補リストの対応する対象に関連付けて記録する（ステップ１０１５） Then, the computer extracts, for each target in the customer candidate list, according to each calculated distance that is located in the vicinity of the target and has a customer label, and identifies the extracted identification information for each target in the correspondence to the customer candidate list Recording in association with the target to be performed (step 1015)

最後にコンピュータは、顧客候補リストの各対象を、該対象に関連付けした１以上の顧客のラベルを有する対象に基づいて評価する（ステップ１０２０）。そして処理は終了する。 Finally, the computer evaluates each target in the customer candidate list based on a target having one or more customer labels associated with the target (step 1020). Then, the process ends.

このように、本発明によれば、過去に接触がなく何らの反応も得られていない個人、企業、団体等であっても、新規顧客となり得る程度を求めることが可能となる。また、現在若しくは過去又はその両方の顧客情報に基づいて、新たな候補の新規顧客としての適正を評価することが可能となる。 As described above, according to the present invention, it is possible to obtain the degree to which a new customer can be obtained even for an individual, a company, an organization, or the like who has not been contacted in the past and has not received any reaction. In addition, it is possible to evaluate the appropriateness of a new candidate as a new customer based on current or past customer information or both.

以上、実施形態を用いて本発明の説明をしたが、本発明の技術範囲は上記実施形態に記載の範囲には限定されない。上記の実施形態に、種々の変更または改良を加えることが可能であることが当業者に明らかである。従って、そのような変更または改良を加えた形態も当然に本発明の技術的範囲に含まれる。 As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment. It will be apparent to those skilled in the art that various modifications or improvements can be added to the above-described embodiments. Therefore, it is a matter of course that embodiments with such changes or improvements are also included in the technical scope of the present invention.

本発明の一実施の形態に係る新規顧客候補の評価作業を支援する支援装置２００を実現するのに好適な情報処理装置のハードウェア構成の一例を示した図である。It is the figure which showed an example of the hardware constitutions of the information processing apparatus suitable for implement | achieving the assistance apparatus 200 which supports the evaluation operation | work of the new customer candidate which concerns on one embodiment of this invention. 本発明の一実施の形態に係る支援装置２００の機能構成の一例を示す図である。It is a figure which shows an example of a function structure of the assistance apparatus 200 which concerns on one embodiment of this invention. 非顧客、顧客、及び顧客候補の各情報の相互関係の一例を示す図である。It is a figure which shows an example of the mutual relationship of each information of a non-customer, a customer, and a customer candidate. バイアスドＳＶＭにおける判別面の一例を示す図である。It is a figure which shows an example of the discrimination | determination surface in biased SVM. 本発明により定義される距離空間内における特徴ベクトル間の距離の一例を示す図である。It is a figure which shows an example of the distance between the feature vectors in the metric space defined by this invention. 本発明により定義される距離空間内におけるk近傍法利用の一例を示す図である。It is a figure which shows an example of k neighborhood method utilization in the metric space defined by this invention. 本発明を適用して得られるスコア表の一例を示す図である。It is a figure which shows an example of the score table | surface obtained by applying this invention. （ａ）は、クロスバリデーションによる比較実験の結果の一例を示す。（ｂ）は、クロスバリデーションによる比較実験の結果の他の一例を示す。(A) shows an example of the result of the comparative experiment by cross validation. (B) shows another example of the result of the comparative experiment by cross validation. 本発明の一実施の形態に係る新規顧客候補の評価作業を支援する処理の流れの一例を示すフローチャートを示す図である。It is a figure which shows the flowchart which shows an example of the flow of a process which supports evaluation work of the new customer candidate which concerns on one embodiment of this invention.

Claims

A device that supports evaluation work for new customer candidates,
Attribute information about a plurality of preselected attributes obtained for each of a plurality of first objects having a label of either customer or non-customer and a plurality of second objects listed in the customer candidate list A feature vector storage unit for storing feature vectors
Discriminant plane calculating means for calculating a discriminant plane in a support vector machine using a plurality of feature vectors of the first target read from the feature vector storage unit as training data;
Distance calculating means for calculating the distance between the second object and the first object as a length obtained by projecting the distance between the feature vectors onto the normal vector of the discrimination surface;
For each of the second objects, the first object that is located in the vicinity of the second object and has the customer's label is extracted according to the calculated distances, and is used for evaluating the second object Determining means for determining as information;
Including the device.

The apparatus according to claim 1, wherein the discriminant plane is a discriminant plane obtained by applying a biased support vector machine.

The apparatus according to claim 1, wherein the discriminant plane is a discriminant plane obtained by applying a soft margin method in a biased support vector machine.

The attribute information related to the plurality of attributes that are elements of the feature vector of each target includes attribute information related to third-party evaluation of the target, attribute information related to the scale and sales of the target, and attributes related to the business type and region of the target The apparatus of claim 1, comprising any plurality of attribute information in the information.

The apparatus of claim 1, wherein the feature vector of each object is a normalized feature vector.

The apparatus according to claim 1, wherein the vicinity of the second object is a distance that enters the top Kth when the first objects are arranged in order of increasing distance from the second object.

The apparatus according to claim 1, further comprising: an evaluation unit that evaluates each of the second objects according to the number of the first objects having the customer label determined as the evaluation information of the second object.

And further comprising an evaluation unit that evaluates each of the second objects by a sum of attribute values of specific attributes of the first object having the customer label determined as the evaluation information of the second object. Item 2. The apparatus according to Item 1.

The apparatus according to claim 8, wherein the specific attribute is sales information of the first target.

From the top to the Mth in descending order of the distance from the second object among the first objects having the customer label determined as the evaluation information of the second object The apparatus according to claim 1, further comprising an evaluation unit that evaluates based on one or more of the first objects included.

In a computer, a method for supporting evaluation of new customer candidates,
In a support vector machine, a set of feature vectors whose elements are attribute information about a plurality of preselected attributes, which are obtained for each of a plurality of objects having either a customer or non-customer label, are used as training data. Calculating the discriminant plane by a computer calculation process;
The distance between each object listed in the customer candidate list and each of the plurality of objects having the label of either the customer or the non-customer, and the distance between the respective feature vectors Calculating the length projected onto the line vector by a computer calculation process; and
For each object in the customer candidate list, an object located in the vicinity of the object and having the customer label is extracted according to the calculated distances, and the identification information of each extracted object corresponds to the customer candidate list Storing in the storage unit in association with the object;
Including methods.

A program for supporting evaluation of new customer candidates using a computer, the program being stored in the computer,
In a support vector machine, a set of feature vectors whose elements are attribute information about a plurality of preselected attributes, which are obtained for each of a plurality of objects having either a customer or non-customer label, are used as training data. Calculating a discriminant plane;
The distance between each object listed in the customer candidate list and each of the plurality of objects having the label of either the customer or the non-customer, and the distance between the respective feature vectors Calculating a length projected onto a line vector;
For each object in the customer candidate list, an object located in the vicinity of the object and having the customer label is extracted according to the calculated distances, and the identification information of each extracted object corresponds to the customer candidate list A program for executing the step of storing in the storage unit in association with the target.