JP2011113104A

JP2011113104A - Bidirectional cluster division device, method, and program

Info

Publication number: JP2011113104A
Application number: JP2009265928A
Authority: JP
Inventors: Yuki Kosaka; 勇気小阪; Toshiaki Hirose; 俊亮広瀬
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2009-11-24
Filing date: 2009-11-24
Publication date: 2011-06-09

Abstract

<P>PROBLEM TO BE SOLVED: To provide a bidirectional cluster division device for simultaneously dividing multi-variable data and sequence data corresponding to the multi-variable data into clusters having common features between the respective variables and between the respective sequence data. <P>SOLUTION: An input means 11 inputs multi-variable data and sequence data corresponding to the multi-variable data. A bidirectional clustering means 12 performs bidirectional clustering to the multi-variable data and the sequence data. The bidirectional clustering means 12 divides the multi-variable data and the sequence data into a plurality of clusters by using evaluation functions indicating whether there are a large number of common features or a small number of the common features between respective variables and between the respective sequence data included in the cluster. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、双方向クラスタ分割装置、方法、及び、プログラムに関し、更に詳しくは、多変量データの集合を、データ間で共通の特徴を持つクラスタに分割する双方向クラスタ分割装置、方法、及び、プログラムに関する。 The present invention relates to a bidirectional cluster dividing device, method, and program, and more particularly, a bidirectional cluster dividing device, method, and method for dividing a set of multivariate data into clusters having common characteristics among the data. Regarding the program.

クラスタリング技術は、データの集合を、共通の特徴を持つクラスタに分割する技術である。多変量データは、あるデータ点が複数の変量から成るデータである。多変量データを変量ごとにクラスタリングする技術は、一方向のクラスタリングと呼ばれている。これに対し、複数の変量を同時にクラスタリングする技術は、双方向クラスタリング（Co-clustering）と呼ばれる。非特許文献１及び２は、双方向クラスタリングが記載された文献である。 The clustering technique is a technique for dividing a data set into clusters having common characteristics. Multivariate data is data in which a data point consists of a plurality of variables. A technique for clustering multivariate data for each variable is called one-way clustering. On the other hand, a technique for simultaneously clustering a plurality of variables is called bidirectional clustering (Co-clustering). Non-Patent Documents 1 and 2 are documents in which bidirectional clustering is described.

双方向クラスタリングは、特に、自然言語処理の技術として開発されている。自然言語処理の分野では、双方向クラスタリングを、文章と単語とを同時にクラスタリングする際に使用している。双方向クラスタリングでは、文章と単語という多変量データを、文章と単語との共起情報を基に、文章と単語との各部分集合が共起関係になるクラスタにクラスタ分割を行う。 Bidirectional clustering is developed especially as a natural language processing technique. In the field of natural language processing, bidirectional clustering is used to simultaneously cluster sentences and words. In bidirectional clustering, multivariate data such as sentences and words is divided into clusters based on co-occurrence information between sentences and words, and clusters in which each subset of sentences and words has a co-occurrence relationship.

自然言語処理の分野で、双方向クラスタリングを用いずに文章と単語とをクラスタリングする場合には、文章と単語とを別々にクラスタリングする必要がある。文章のクラスタリングでは、各文章に含まれる単語の頻度を特徴として利用し、その特徴が同じ文章が同一クラスタに属するように、クラスタ分割を行う。単語のクラスタリングでは、各単語がどの文章に含まれているかを特徴として利用し、その特徴が同じ単語が同一クラスタに含まれるように、クラスタ分割を行う。 In the field of natural language processing, when sentence and word are clustered without using bidirectional clustering, it is necessary to cluster the sentence and word separately. In sentence clustering, the frequency of words included in each sentence is used as a feature, and cluster division is performed so that sentences having the same feature belong to the same cluster. In word clustering, which sentence contains each word is used as a feature, and cluster division is performed so that words having the same feature are included in the same cluster.

自然言語処理に一方向のクラスタリングを用いる場合、上記のように、文章は単語の特徴を用いてクラスタリングし、単語は文章の特徴を用いてクラスタリングする。このため、クラスタリング処理が冗長になる。また、文書でクラスタリングした結果と、単語でクラスタリングした結果とを組み合わせることで、文書と単語の双方のクラスタリングが実現できる。しかし、一方向のクラスタリングでは、文章と単語とを別々にクラスタリングするために、文章と単語との相関や、共起関係を適切にクラスタに組み込むことが困難である。これに対し、双方向のクラスタリングでは、文書と単語との相関や、共起関係をクラスタに組み込むことができる。 When unidirectional clustering is used for natural language processing, as described above, sentences are clustered using word features, and words are clustered using sentence features. For this reason, the clustering process becomes redundant. Further, by combining the result of clustering with documents and the result of clustering with words, clustering of both documents and words can be realized. However, in the one-way clustering, since sentences and words are clustered separately, it is difficult to appropriately incorporate the correlation between sentences and words and the co-occurrence relationship into the cluster. On the other hand, in bidirectional clustering, the correlation between documents and words and the co-occurrence relationship can be incorporated into the cluster.

特許文献１は、顧客ごとの商品の購買履歴データから、クラスタを抽出する購買情報処理装置が記載された文献である。特許文献１の購買情報処理装置は、購買情報生成手段と、購買情報処理手段とから成る。購買情報生成手段は、購買履歴データにある顧客と商品とをそれぞれ、行及び列の一方の項目として当てはめる。購買情報生成手段は、顧客が購入した履歴がある商品の行列要素と、購入した履歴がない商品の行列要素とに、互いに異なる所定の指標値（０又は１）を付与して、行列テーブルを生成する。 Patent Document 1 is a document describing a purchase information processing apparatus that extracts clusters from purchase history data of merchandise for each customer. The purchase information processing apparatus of Patent Document 1 includes purchase information generation means and purchase information processing means. The purchase information generating means applies each of the customer and the product in the purchase history data as one item of a row and a column. The purchase information generation means assigns predetermined index values (0 or 1) different from each other to the matrix element of the product with the history of purchase by the customer and the matrix element of the product with no purchase history, Generate.

購買情報処理手段は、行列テーブルについて、行ごとの指標値の総和に基づいて行を並び替えると共に、列ごとの指標値の総和に基づいて列を並び替える。購買情報処理手段は、指標値の総和を、昇順又は降順に並び変える。購買情報処理手段は、並び変え後、行列テーブル上の指標値の分布にて規定されるクラスタを抽出する。特許文献１では、このようなクラスタリングを行うことで、顧客情報のクラスタ抽出に要する計算量及び処理時間の低減が可能である。 The purchase information processing means rearranges the rows in the matrix table based on the sum of the index values for each row, and rearranges the columns based on the sum of the index values for each column. The purchase information processing means rearranges the sum of the index values in ascending order or descending order. The purchase information processing means extracts the clusters defined by the distribution of the index values on the matrix table after the rearrangement. In Patent Document 1, by performing such clustering, it is possible to reduce the amount of calculation and processing time required for cluster extraction of customer information.

特許文献２は、時系列データをクラスタリングする時系列データ処理装置が記載された文献である。時系列データは、処理日時などの時間情報、顧客特定情報、及び、商品特定情報を最低限含む。時系列データ処理装置は、時系列データを対象として、商品をその購買顧客が類似する複数のグループにクラスタリングする。時系列データ処理装置は、クラスタ内の任意の２つの商品（商品Ａ、Ｂ）に対して、２つの商品が同時に購入されている事例数、Ａが購入された後にＢが購入された事例数、Ｂが購入された後にＡが購入された事例数をカウントする。時系列データ処理装置は、カウンタした事例数から、２つの商品の順序関係を決定する。 Patent document 2 is a document describing a time-series data processing device for clustering time-series data. The time series data includes at least time information such as processing date and time, customer specifying information, and product specifying information. The time-series data processing apparatus clusters products into a plurality of groups whose purchase customers are similar to each other with respect to time-series data. The time-series data processing apparatus is configured such that, for any two products (products A and B) in the cluster, the number of cases where two products are purchased at the same time, and the number of cases where B is purchased after A is purchased. , Count the number of cases where A is purchased after B is purchased. The time-series data processing apparatus determines the order relationship between the two products from the number of cases counted.

特開２００３−２４８７５０号公報JP 2003-248750 A 特開平９−３０５５７１号公報Japanese Patent Laid-Open No. 9-305571

A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation,A.Banerjee and I.Dhillon and J.Ghosh and S.Merugu and D.S.Modha,KDD2004A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation, A. Banerjee and I. Dhillon and J. Ghosh and S. Merugu and D.S. Modha, KDD2004 Fully Automatic Cross-associations,Deepayan Chakrabarti and Spiros Papadimitriou and Dharmendra S.Modha and Christos Faloutsos,KDD2004Fully Automatic Cross-associations, Deepayan Chakrabarti and Spiros Papadimitriou and Dharmendra S. Modha and Christos Faloutsos, KDD2004 Probabilistic Model-Based Clustering of Multivariate and Sequential Data,Padhraic Smyth,In Proceedings of Artificial Intelligence and Statistics,1999Probabilistic Model-Based Clustering of Multivariate and Sequential Data, Padhraic Smyth, In Proceedings of Artificial Intelligence and Statistics, 1999

情報化社会が進み、蓄積されたデータも膨大な量になっている。例えば、小売業では、ＰＯＳ（Point of Sales）データと呼ばれる多変量データが大量に蓄積されている。ＰＯＳデータは、どの顧客が、いつ、どこで、何を購入したかという情報を含む。蓄積されるデータは多変量データだけではなく、各データ点に順序情報が与えられたシーケンスデータも膨大に蓄積されている。シーケンスデータは、データ点に対応したデータであり、多変量データの２以上のキー（属性）に関連する情報が時系列に並んだデータである。多変量データのデータ点に対応してシーケンスデータがある場合、シーケンスデータも考慮した上で、クラスタリングを行うことが好ましいと考えられる。 As the information society advances, the amount of accumulated data has become enormous. For example, in the retail industry, a large amount of multivariate data called POS (Point of Sales) data is accumulated. The POS data includes information on which customers have purchased what, when, where and what. Not only multivariate data but also sequence data in which order information is given to each data point is accumulated in an enormous amount. The sequence data is data corresponding to data points, and is data in which information related to two or more keys (attributes) of multivariate data is arranged in time series. When there is sequence data corresponding to the data points of the multivariate data, it is considered preferable to perform clustering in consideration of the sequence data.

しかし、非特許文献１及び２は、多変量データに対して双方向クラスタリングを行うのみであり、多変量データとシーケンスデータとを同時にクラスタリングすることはできない。特許文献１も、同様に、多変量データに対して双方向クラスタリングを行うのみで、シーケンスデータを考慮して双方向クラスタリングを行うことができない。また、特許文献２は、クラスタリング後に、同じクラスタに属する２つの商品について、時系列データから、どちらの商品が先に購入されたか、又は、同時に購入されたかを求めているに過ぎず、シーケンスデータを考慮したクラスタリングは行っていない。 However, Non-Patent Documents 1 and 2 only perform bidirectional clustering on multivariate data, and cannot cluster multivariate data and sequence data simultaneously. Similarly, Patent Document 1 simply performs bidirectional clustering on multivariate data, and cannot perform bidirectional clustering in consideration of sequence data. In addition, Patent Literature 2 merely determines which product was purchased first or at the same time from time-series data for two products belonging to the same cluster after clustering. Clustering considering the above is not performed.

ここで、非特許文献３には、多変量データとシーケンスデータとをクラスタに分割する技術が記載されている。しかし、非特許文献３におけるクラスタリングは、一方向クラスタリングである。従って、非特許文献３では、多変量データとシーケンスデータとを同時に双方向クラスタリングすることはできない。 Here, Non-Patent Document 3 describes a technique for dividing multivariate data and sequence data into clusters. However, clustering in Non-Patent Document 3 is unidirectional clustering. Therefore, in Non-Patent Document 3, multivariate data and sequence data cannot be bi-directionally clustered simultaneously.

本発明は、多変量データと多変量データに対応したシーケンスデータとを、各変量間、及び、シーケンスデータ間で共通の特徴をもつクラスタに同時に分割可能な双方向クラスタ分割装置、方法、及び、プログラムを提供することを目的とする。 The present invention is a bi-directional cluster division apparatus, method, and method capable of simultaneously dividing multivariate data and sequence data corresponding to the multivariate data into each variable and a cluster having a common feature among the sequence data, and The purpose is to provide a program.

上記目的を達成するために、本発明は、多変量データと多変量データに対応したシーケンスデータとを入力する入力手段と、前記多変量データとシーケンスデータとに対して双方向クラスタリングを行い、前記多変量データと前記シーケンスデータとを、クラスタに含まれる各変量間とシーケンスデータ間とのそれぞれで共通した特徴が多いか少ないかを表す評価関数を用いて複数のクラスタに分割する双方向クラスタリング手段とを備える双方向クラスタ分割装置を提供する。 In order to achieve the above object, the present invention performs multi-variate data and sequence data corresponding to the multivariate data, and performs bi-directional clustering on the multivariate data and the sequence data. Bidirectional clustering means for dividing multivariate data and the sequence data into a plurality of clusters using an evaluation function that indicates whether there are many or less common features between each variable included in the cluster and between the sequence data A bi-directional cluster partitioning device is provided.

本発明は、多変量データと多変量データに対応したシーケンスデータとを入力するステップと、前記多変量データとシーケンスデータとに対して双方向クラスタリングを行い、前記多変量データを、クラスタに含まれる各変量間とシーケンスデータ間とのそれぞれで共通した特徴が多いか少ないかを表す評価関数を用いて複数のクラスタに分割するステップとを有する双方向クラスタ分割方法を提供する。 The present invention includes a step of inputting multivariate data and sequence data corresponding to the multivariate data, bi-directional clustering is performed on the multivariate data and the sequence data, and the multivariate data is included in the cluster. There is provided a bidirectional cluster dividing method including a step of dividing into a plurality of clusters using an evaluation function indicating whether there are many or less common features between each variable and between sequence data.

本発明は、コンピュータに、多変量データと多変量データに対応したシーケンスデータとを入力する処理と、前記多変量データとシーケンスデータとに対して双方向クラスタリングを行い、前記多変量データを、クラスタに含まれる各変量間とシーケンスデータ間とのそれぞれで共通した特徴が多いか少ないかを表す評価関数を用いて複数のクラスタに分割する処理とを実行させるプログラムを提供する。 The present invention provides a computer that inputs multivariate data and sequence data corresponding to the multivariate data, and performs bi-directional clustering on the multivariate data and the sequence data, and the multivariate data is clustered. A program for executing a process of dividing into a plurality of clusters using an evaluation function that indicates whether there are many or less common features between each of the variables included in the sequence data and between the sequence data.

本発明は、ユーザからのコンテンツへのリクエストを受け付け、リクエストを送信したユーザとリクエストしたコンテンツとをユーザリクエスト記憶部に記憶するリクエスト受付手段と、ユーザがリクエストしたコンテンツに、ユーザに広告主のコンテンツをリクエストさせるための仕組みを含む広告を付加して送信するコンテンツ配信手段と、前記ユーザリクエスト記憶部に記憶された情報に基づいて、ユーザと広告とを変量とし、ユーザが広告から広告主のコンテンツをリクエストしたか否かを示す多変量データを生成すると共に、前記多変量データ対応して、ユーザが広告主のコンテンツをリクエストするまでに送信したリクエストを時系列で並べたシーケンスデータを生成するデータ生成手段と、前記多変量データと前記シーケンスデータとに対して双方向クラスタリングを行い、前記多変量データと前記シーケンスデータとを、クラスタに含まれる各変量間とシーケンスデータ間とのそれぞれで共通した特徴が多いか少ないかを表す評価関数を用いて複数のクラスタに分割し、双方向クラスタリング結果を出力する双方向クラスタリング手段と、前記双方向クラスタリング結果に基づいて、前記コンテンツ配信手段がコンテンツに付加すべき広告を決定する広告選択手段とを備える広告配信システムを提供する。 The present invention accepts a request for content from a user and stores request request means for storing the user who transmitted the request and the requested content in the user request storage unit, and the content requested by the user to the content of the advertiser to the user. Content distribution means for adding and transmitting an advertisement including a mechanism for requesting the user, and based on the information stored in the user request storage unit, the user and the advertisement are used as variables, and the user takes the content of the advertiser from the advertisement. Is generated to generate multivariate data indicating whether or not a request has been made, and to generate sequence data corresponding to the multivariate data, in which the requests transmitted until the user requests the content of the advertiser are arranged in time series Generating means, the multivariate data and the sequence An evaluation function that performs bi-directional clustering on the data and indicates whether the multivariate data and the sequence data have many or few common features between the variables included in the cluster and between the sequence data Bi-directional clustering means for dividing the cluster into a plurality of clusters and outputting a bi-directional clustering result; and an advertisement selecting means for determining an advertisement to be added to the content by the content distribution means based on the bi-directional clustering result; An advertisement distribution system is provided.

本発明は、ユーザからのコンテンツへのリクエストを受け付け、リクエストを送信したユーザとリクエストしたコンテンツとをユーザリクエスト記憶部に記憶するリクエスト受付ステップと、ユーザがリクエストしたコンテンツに、ユーザに広告主のコンテンツをリクエストさせるための仕組みを含む広告を付加して送信するコンテンツ配信ステップと、前記ユーザリクエスト記憶部に記憶された情報に基づいて、ユーザと広告とを変量とし、ユーザが広告から広告主のコンテンツをリクエストしたか否かを示す多変量データを生成すると共に、前記多変量データ対応して、ユーザが広告主のコンテンツをリクエストするまでに送信したリクエストを時系列で並べたシーケンスデータを生成するデータ生成ステップと、前記多変量データと前記シーケンスデータとに対して双方向クラスタリングを行い、前記多変量データを、クラスタに含まれる各変量間とシーケンスデータ間とのそれぞれで共通した特徴が多いか少ないかを表す評価関数を用いて複数のクラスタに分割し、双方向クラスタリング結果を出力する双方向クラスタリングステップと、前記双方向クラスタリング結果に基づいて、前記コンテンツに付加すべき広告を決定する広告選択ステップとを有する広告配信方法を提供する。 The present invention accepts a request for content from a user, stores a request requesting user and the requested content in a user request storage unit, the content requested by the user, the content of the advertiser to the user A content distribution step of adding and transmitting an advertisement including a mechanism for requesting a request, and using the information stored in the user request storage unit as a variable between the user and the advertisement, Is generated to generate multivariate data indicating whether or not a request has been made, and to generate sequence data corresponding to the multivariate data, in which the requests transmitted until the user requests the content of the advertiser are arranged in time series Generating step, said multivariate data and previous Bidirectional clustering is performed on sequence data, and the multivariate data is divided into a plurality of evaluation functions using an evaluation function that represents whether there are many or few features common between each variable included in the cluster and between the sequence data. There is provided an advertisement distribution method that includes a bidirectional clustering step of dividing a cluster and outputting a bidirectional clustering result, and an advertisement selection step of determining an advertisement to be added to the content based on the bidirectional clustering result.

本発明は、コンピュータに、ユーザからのコンテンツへのリクエストを受け付け、リクエストを送信したユーザとリクエストしたコンテンツとをユーザリクエスト記憶部に記憶するリクエスト受付処理と、ユーザがリクエストしたコンテンツに、ユーザに広告主のコンテンツをリクエストさせるための仕組みを含む広告を付加して送信するコンテンツ配信処理と、前記ユーザリクエスト記憶部に記憶された情報に基づいて、ユーザと広告とを変量とし、ユーザが広告から広告主のコンテンツをリクエストしたか否かを示す多変量データを生成すると共に、前記多変量データ対応して、ユーザが広告主のコンテンツをリクエストするまでに送信したリクエストを時系列で並べたシーケンスデータを生成するデータ生成処理と、前記多変量データと前記シーケンスデータとに対して双方向クラスタリングを行い、前記多変量データを、クラスタに含まれる各変量間とシーケンスデータ間とのそれぞれで共通した特徴が多いか少ないかを表す評価関数を用いて複数のクラスタに分割し、双方向クラスタリング結果を出力する双方向クラスタリング処理と、前記双方向クラスタリング結果に基づいて、前記コンテンツに付加すべき広告を決定する広告選択処理とを実行させるプログラムを提供する。 The present invention accepts a request for content from a user to a computer, stores the request requesting user and the requested content in a user request storage unit, and advertises the user with the content requested by the user. Based on content distribution processing that adds and transmits an advertisement including a mechanism for requesting the main content, and information stored in the user request storage unit, the user and the advertisement are variables, and the user advertises from the advertisement. In addition to generating multivariate data indicating whether or not the main content has been requested, corresponding to the multivariate data, sequence data in which requests transmitted until the user requests the advertiser content is arranged in time series. Data generation processing to be generated and the multivariate data Bidirectional clustering is performed on the sequence data, and the multivariate data is divided into a plurality of evaluation functions that indicate whether there are many or few features common to each variable included in the cluster and between the sequence data. There is provided a program for executing a bi-directional clustering process for dividing an image into two clusters and outputting a bi-directional clustering result and an ad selection process for determining an advertisement to be added to the content based on the bi-directional clustering result.

本発明は、顧客が商品を購入したという情報を含む売上情報を収集し、該収集した売上情報に基づいて、顧客と商品とを変量とし、顧客が商品を購入したか否かを示す多変量データを生成すると共に、前記多変量データに対応して、顧客が商品を購入したことに関する履歴を時系列で並べたシーケンスデータとを生成するデータ生成手段と、前記多変量データとシーケンスデータとに対して双方向クラスタリングを行い、前記多変量データと前記シーケンスデータとを、クラスタに含まれる各変量間とシーケンスデータ間とのそれぞれで共通した特徴が多いか少ないかを表す評価関数を用いて複数のクラスタに分割し、双方向クラスタリング結果を出力する双方向クラスタリング手段と、前記双方向クラスタリング結果に基づいて、顧客に推薦する商品を決定する推薦商品リスト生成手段とを備える商品推薦システムを提供する。 The present invention collects sales information including information that a customer has purchased a product, and based on the collected sales information, the customer and the product are used as variables, and the multivariate indicating whether or not the customer has purchased the product. Generating data, and corresponding to the multivariate data, data generating means for generating a sequence data in which histories about the purchase of a product by a customer are arranged in time series, and the multivariate data and the sequence data Bidirectional clustering is performed on the multivariate data and the sequence data by using an evaluation function that indicates whether there are many or few common features between each variable and sequence data included in the cluster. And a bi-directional clustering means for outputting bi-directional clustering results and a customer based on the bi-directional clustering results. Providing products recommendation system and a recommendation item list generating means for determining a product to be.

本発明は、顧客が商品を購入したという情報を含む売上情報を収集し、該収集した売上情報に基づいて、顧客と商品とを変量とし、顧客が商品を購入したか否かを示す多変量データを生成すると共に、前記多変量データに対応して、顧客が商品を購入したことに関する履歴を時系列で並べたシーケンスデータとを生成するデータ生成ステップと、前記多変量データとシーケンスデータとに対して双方向クラスタリングを行い、前記多変量データと前記シーケンスデータとを、クラスタに含まれる各変量間とシーケンスデータ間とのそれぞれで共通した特徴が多いか少ないかを表す評価関数を用いて複数のクラスタに分割し、双方向クラスタリング結果を出力する双方向クラスタリングステップと、前記双方向クラスタリング結果に基づいて、顧客に推薦する商品を決定する推薦商品リスト生成ステップとを有する商品推薦方法を提供する。 The present invention collects sales information including information that a customer has purchased a product, and based on the collected sales information, the customer and the product are used as variables, and the multivariate indicating whether or not the customer has purchased the product. A data generation step for generating data, and generating data corresponding to the multivariate data, in which the history regarding the purchase of the product by the customer is arranged in time series, and the multivariate data and the sequence data Bidirectional clustering is performed on the multivariate data and the sequence data by using an evaluation function that indicates whether there are many or few common features between each variable and sequence data included in the cluster. A bi-directional clustering step that outputs a bi-directional clustering result, and based on the bi-directional clustering result, To provide products recommendation method and a recommendation item list generation step of determining a commodity to be recommended to the customer.

本発明は、コンピュータに、顧客が商品を購入したという情報を含む売上情報を収集し、該収集した売上情報に基づいて、顧客と商品とを変量とし、顧客が商品を購入したか否かを示す多変量データを生成すると共に、前記多変量データに対応して、顧客が商品を購入したことに関する履歴を時系列で並べたシーケンスデータとを生成するデータ生成処理と、前記多変量データとシーケンスデータとに対して双方向クラスタリングを行い、前記多変量データと前記シーケンスデータとを、クラスタに含まれる各変量間とシーケンスデータ間とのそれぞれで共通した特徴が多いか少ないかを表す評価関数を用いて複数のクラスタに分割し、双方向クラスタリング結果を出力する双方向クラスタリング処理と、前記双方向クラスタリング結果に基づいて、顧客に推薦する商品を決定する推薦商品リスト生成処理とを実行させるプログラムを提供する。 The present invention collects sales information including information that a customer has purchased a product on a computer, makes a variable between the customer and the product based on the collected sales information, and determines whether or not the customer has purchased the product. Data generation processing for generating multivariate data to be generated and generating sequence data in which histories relating to the purchase of a product by a customer are arranged in time series corresponding to the multivariate data, and the multivariate data and the sequence Bidirectional clustering is performed on the data, and the evaluation function that indicates whether the multivariate data and the sequence data have many or few common features between each variable included in the cluster and between the sequence data A bi-directional clustering process that divides the data into a plurality of clusters and outputs a bi-directional clustering result; and based on the bi-directional clustering result. Stomach, to provide a program to be executed by the recommendation item list generation process to determine the items to be recommended to the customer.

本発明は、車両の車種と故障個所とを含む故障情報を収集し、該収集した故障情報に基づいて、車種と地域とを変量とし、当該車種に対し当該地域で故障が発生したか否かを示す多変量データを生成すると共に、前記多変量データに対応して、当該車種で過去に発生した故障個所の履歴を時系列で並べたシーケンスデータを生成するデータ生成手段と、前記多変量データとシーケンスデータとに対して双方向クラスタリングを行い、前記多変量データと前記シーケンスデータとを、クラスタに含まれる各変量間とシーケンスデータ間とのそれぞれで共通した特徴が多いか少ないかを表す評価関数を用いて複数のクラスタに分割し、双方向クラスタリング結果を出力する双方向クラスタリング手段と、前記双方向クラスタリング結果に基づいて、車種に対して故障の発生が予測される地域を推測する故障予測候補リスト生成手段とを備える故障予測システムを提供する。 The present invention collects failure information including the vehicle type and failure location of the vehicle, and based on the collected failure information, the vehicle type and the region are variables, and whether or not a failure has occurred in the region for the vehicle type. Data generating means for generating sequence data in which the history of failure locations that occurred in the vehicle model in the past is arranged in time series corresponding to the multivariate data, and the multivariate data And multi-variate data and the sequence data are evaluated with respect to each of the variables included in the cluster and between the sequence data. A bidirectional clustering means for dividing the cluster into a plurality of clusters using a function and outputting a bidirectional clustering result, and based on the bidirectional clustering result Occurrence of a failure to provide a failure prediction system comprising a failure prediction candidate list generating unit to estimate a region expected for vehicles.

本発明は、車両の車種と故障個所とを含む故障情報を収集し、該収集した故障情報に基づいて、車種と地域とを変量とし、当該車種に対し当該地域で故障が発生したか否かを示す多変量データを生成すると共に、前記多変量データに対応して、当該車種で過去に発生した故障個所の履歴を時系列で並べたシーケンスデータを生成するデータ生成ステップと、前記多変量データとシーケンスデータとに対して双方向クラスタリングを行い、前記多変量データと前記シーケンスデータとを、クラスタに含まれる各変量間とシーケンスデータ間とのそれぞれで共通した特徴が多いか少ないかを表す評価関数を用いて複数のクラスタに分割し、双方向クラスタリング結果を出力する双方向クラスタリングステップと、前記双方向クラスタリング結果に基づいて、車種に対して故障の発生が予測される地域を推測する故障予測候補リスト生成ステップとを有する故障予測方法を提供する。 The present invention collects failure information including the vehicle type and failure location of the vehicle, and based on the collected failure information, the vehicle type and the region are variables, and whether or not a failure has occurred in the region for the vehicle type. A data generation step for generating multivariate data indicating a sequence data in which a history of fault locations that occurred in the vehicle in the past is arranged in time series corresponding to the multivariate data; and the multivariate data And multi-variate data and the sequence data are evaluated with respect to each of the variables included in the cluster and between the sequence data. A bi-directional clustering step of dividing a multi-cluster using a function and outputting a bi-directional clustering result; Zui and provides a failure prediction method and a failure prediction candidate list generating step to estimate the local occurrence of the failure relative to vehicle type is expected.

本発明は、コンピュータに、車両の車種と故障個所とを含む故障情報を収集し、該収集した故障情報に基づいて、車種と地域とを変量とし、当該車種に対し当該地域で故障が発生したか否かを示す多変量データを生成すると共に、前記多変量データに対応して、当該車種で過去に発生した故障個所の履歴を時系列で並べたシーケンスデータを生成するデータ生成処理と、前記多変量データとシーケンスデータとに対して双方向クラスタリングを行い、前記多変量データと前記シーケンスデータとを、クラスタに含まれる各変量間とシーケンスデータ間とのそれぞれで共通した特徴が多いか少ないかを表す評価関数を用いて複数のクラスタに分割し、双方向クラスタリング結果を出力する双方向クラスタリング処理と、前記双方向クラスタリング結果に基づいて、車種に対して故障の発生が予測される地域を推測する故障予測候補リスト生成処理とを実行させるプログラムを提供する。 The present invention collects failure information including the vehicle type and failure location of the vehicle in a computer, and based on the collected failure information, the vehicle type and the region are variables, and a failure has occurred in the region for the vehicle type. A data generating process for generating multivariate data indicating whether or not, and corresponding to the multivariate data, generating a sequence data in which histories of fault locations that have occurred in the past in the vehicle type are arranged in time series, and Whether bi-directional clustering is performed on multivariate data and sequence data, and the multivariate data and the sequence data have many or less common features between each variable included in the cluster and between the sequence data A bi-directional clustering process for dividing the cluster into a plurality of clusters using an evaluation function representing Based on the results, the occurrence of a fault with respect to vehicles to provide a program for executing the failure prediction candidate list generation process to estimate an area to be predicted.

本発明の双方向クラスタ分割装置、方法、及び、プログラムは、多変量データと多変量データに対応したシーケンスデータとを、各変量間、及び、シーケンスデータ間で共通の特徴をもつクラスタに同時に分割することができる。 The bidirectional cluster dividing apparatus, method, and program according to the present invention simultaneously divide multivariate data and sequence data corresponding to the multivariate data into clusters having characteristics common to each variable and between the sequence data. can do.

本発明の第１実施形態に係る双方向クラスタ分割装置を示すブロック図。The block diagram which shows the bidirectional | two-way cluster division | segmentation apparatus which concerns on 1st Embodiment of this invention. 双方向クラスタ分割装置の動作手順を示すフローチャート。The flowchart which shows the operation | movement procedure of a bidirectional | two-way cluster division | segmentation apparatus. 入力データの一例を示す図。The figure which shows an example of input data. 入力データをテーブル形式（行列形式）で示す図。The figure which shows input data in a table format (matrix format). 初期クラスタリングの結果を示す図。The figure which shows the result of an initial clustering. 最終的に得られたクラスタリング結果を示す図。The figure which shows the clustering result finally obtained. （ａ）は、多変量データを示し、（ｂ）は、クラスタリング結果を示す図。(A) shows multivariate data, (b) is a figure which shows a clustering result. （ａ）は、シーケンスデータが付加された多変量データを示し、（ｂ）は、クラスタリング結果を示す図。(A) shows multivariate data to which sequence data is added, and (b) shows a clustering result. 本発明の第２実施形態に係る広告配信システムを示すブロック図。The block diagram which shows the advertisement delivery system which concerns on 2nd Embodiment of this invention. ユーザ端末を示すブロック図。The block diagram which shows a user terminal. Ｗｅｂサーバを示すブロック図。The block diagram which shows a Web server. 双方向クラスタリング処理の手順を示すフローチャート。The flowchart which shows the procedure of a bidirectional | two-way clustering process. 双方向クラスタリング手段の入力データを示す図。The figure which shows the input data of a bidirectional | two-way clustering means. 入力データをテーブル形式（行列形式）で示す図。The figure which shows input data in a table format (matrix format). 双方向クラスタリング手段のクラスタリング結果を示す図。The figure which shows the clustering result of a bidirectional | two-way clustering means. 広告配信処理の手順を示すフローチャート。The flowchart which shows the procedure of an advertisement delivery process. 広告配信候補を示す図。The figure which shows an advertisement delivery candidate. Ｗｅｂ広告が付加されたＷｅｂページを示す図。The figure which shows the web page to which the web advertisement was added. 本発明の第３実施形態に係る商品推薦システムを示す図。The figure which shows the goods recommendation system which concerns on 3rd Embodiment of this invention. 商品推薦の動作手順を示すフローチャート。The flowchart which shows the operation | movement procedure of goods recommendation. 双方向クラスタリング手段の入力データを示す図。The figure which shows the input data of a bidirectional | two-way clustering means. 双方向クラスタリング結果を示す図。The figure which shows a bidirectional | two-way clustering result. 推薦商品リストを示す図。The figure which shows a recommendation goods list. 本発明の第４実施形態に係る故障予測システムを示すブロック図。The block diagram which shows the failure prediction system which concerns on 4th Embodiment of this invention. 故障予測の動作手順を示すフローチャート。The flowchart which shows the operation | movement procedure of failure prediction. 双方向クラスタリング手段の入力データを示す図。The figure which shows the input data of a bidirectional | two-way clustering means. 双方向クラスタリング結果を示す図。The figure which shows a bidirectional | two-way clustering result. 故障予測候補リストを示す図。The figure which shows a failure prediction candidate list | wrist. 本発明の双方向クラスタ分割装置の概略を示すブロック図。The block diagram which shows the outline of the bidirectional | two-way cluster division | segmentation apparatus of this invention.

以下、図面を参照し、本発明の実施の形態について詳細に説明する。図１は、本発明の第１実施形態に係る双方向クラスタ分割装置を示している。双方向クラスタ分割装置１００は、入力手段１０１、双方向クラスタリング手段１０２、クラスタ数算出手段１０３、及び、出力手段１０４を備える。双方向クラスタ分割装置１００内の各手段の機能は、コンピュータが所定のプログラムを読み込んで実行することで実現可能である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 shows a bidirectional cluster dividing device according to a first embodiment of the present invention. The bidirectional cluster dividing device 100 includes an input unit 101, a bidirectional clustering unit 102, a cluster number calculation unit 103, and an output unit 104. The function of each means in the bidirectional cluster dividing device 100 can be realized by a computer reading and executing a predetermined program.

入力手段１０１は、多変量データとシーケンスデータとを入力する。多変量データは、２以上の属性を変量とするデータである。シーケンスデータは、多変量データに対応したデータであり、多変量データの２以上のキー（属性）に関連する情報が時系列に並んだデータである。多変量データは、例えば、顧客と商品とを変量とし、顧客が商品を購入したか否かを示すデータとする。シーケンスデータは、例えば、顧客がある商品を購入したというデータ点に対応して、顧客がこれまでにその商品を購入したということに関する履歴を時系列で並べた履歴データとする。 The input means 101 inputs multivariate data and sequence data. Multivariate data is data having two or more attributes as variables. The sequence data is data corresponding to multivariate data, and is data in which information related to two or more keys (attributes) of the multivariate data is arranged in time series. The multivariate data is, for example, data indicating whether or not the customer has purchased the product, with the customer and the product as variables. The sequence data is, for example, history data in which histories related to the fact that the customer has purchased the product so far are arranged in time series corresponding to the data point that the customer has purchased the product.

双方向クラスタリング手段１０２は、入力データに対し双方向クラスタリングを行う。双方向クラスタリング手段１０２は、評価関数を用いて、多変量データを複数のクラスタに分割する。評価関数は、多変量データを、クラスタに含まれる各変量間とシーケンスデータ間とのそれぞれで、共通した特徴が多いか少ないかを表す関数である。双方向クラスタリング手段１０２は、例えば、評価関数が共通した特徴が多くなるほど値が小さくなる関数であるとすれば、クラスタごとに計算した評価関数の値の総和が小さくなるように、クラスタ分割を行う。出力手段１０４は、双方向クラスタリング結果を出力する。 The bidirectional clustering means 102 performs bidirectional clustering on the input data. The bidirectional clustering means 102 divides the multivariate data into a plurality of clusters using the evaluation function. The evaluation function is a function that represents whether the multivariate data has many or few common features between the variables included in the cluster and between the sequence data. For example, if the bi-directional clustering means 102 is a function whose value decreases as the common features of the evaluation function increase, the clustering is performed so that the sum of the evaluation function values calculated for each cluster is small. . The output unit 104 outputs a bidirectional clustering result.

クラスタ数算出手段１０３は、双方向クラスタリングにおけるクラスタ分割数を決定する。クラスタ数算出手段１０３は、初回のクラスタリングでは、クラスタ分割数として所定の初期値を出力する。双方向クラスタリング手段１０２は、初回のクラスタリングでは、入力データを、所定の初期値の数のクラスタに分割する。クラスタ数算出手段１０３は、双方向クラスタリング手段１０２がクラスタリングを行うと、評価関数の値に基づいて、クラスタ分割数を増加させるか否かを決定する。双方向クラスタリング手段１０２は、クラスタ数算出手段１０３がクラスタ分割数を増加させると、そのクラスタ分割数でクラスタ分割を再度行う。 The cluster number calculation means 103 determines the number of cluster divisions in bidirectional clustering. The cluster number calculation means 103 outputs a predetermined initial value as the number of cluster divisions in the initial clustering. In the initial clustering, the bidirectional clustering means 102 divides input data into a predetermined number of initial clusters. When the bidirectional clustering means 102 performs clustering, the cluster number calculating means 103 determines whether or not to increase the number of cluster divisions based on the value of the evaluation function. When the cluster number calculation unit 103 increases the cluster division number, the bidirectional clustering unit 102 performs the cluster division again with the cluster division number.

図２は、動作手順を示している。入力手段１０１は、多変量データとシーケンスデータとを入力する（ステップＡ１）。図３は、入力データの一例を示している。この例では、多変量データは、誰がどの商品を買ったかを表すデータである。多変量データの変量は、「顧客」と、「商品」との２つである。多変量データの各データに対して、商品購入の曜日履歴のデータ（シーケンスデータ）が付加されている。シーケンスデータは、ｙ_ｉｋで表現する。シーケンスデータｙ_ｊｋは、顧客ｊが、過去に商品ｋを購入した曜日を時系列で並べたデータである。 FIG. 2 shows an operation procedure. The input means 101 inputs multivariate data and sequence data (step A1). FIG. 3 shows an example of input data. In this example, the multivariate data is data representing who bought which product. There are two variables of multivariate data: “customer” and “product”. Data of the day of the week of product purchase (sequence data) is added to each data of the multivariate data. The sequence data is expressed as y _ik . The sequence data y _jk is data in which the days of purchase of the product k in the past by the customer j are arranged in time series.

なお、顧客が商品を購入したという情報は、所定の期間ごとに求めることができる。所定の期間は、例えば一月単位とする。図３では、顧客Ｂが商品２を購入したというデータが２つあるが、これは、顧客Ｂが商品２を購入した期間が異なるためである。例えば、２つの購入データのうちの一方は、顧客Ｂが商品２を先月購入したというデータに対応し、他方は、顧客Ｂが商品２を先々月購入したというデータに対応している。また、シーケンスデータｙ^１ _２Ｂは、顧客Ｂが商品２を購入した先々月の購入曜日履歴を表し、ｙ^２ _２Ｂは、顧客Ｂが商品２を購入した先々月の購入曜日履歴を表している。 Information that a customer has purchased a product can be obtained every predetermined period. The predetermined period is, for example, one month. In FIG. 3, there are two data that customer B purchased product 2, because the period during which customer B purchased product 2 is different. For example, one of the two purchase data corresponds to data that the customer B purchased the product 2 last month, and the other corresponds to data that the customer B purchased the product 2 last month. The sequence data y ¹ _2B represents the purchase day history of the month before the customer B purchased the product 2, and y ² _2B represents the purchase day history of the month before the customer B purchased the product 2.

図４は、入力データをテーブル形式（行列形式）で示している。図３に示す入力データを、行列で表すと、図４に示すようになる。入力データの行列を、Ｄで表す。行列Ｄの行は顧客を表し、列は商品を表す。行列Ｄの各要素は、０又は１の値を取る。０は商品を購入していないことを表し、１は商品を購入したことを表す。シーケンスデータは、顧客が商品を購入したことを表すデータ点に付加される。シーケンスデータは、１つのデータ点に対して１つとは限らず、１つのデータ点に複数のシーケンスデータが対応することもあり得る。 FIG. 4 shows the input data in a table format (matrix format). When the input data shown in FIG. 3 is represented by a matrix, it becomes as shown in FIG. A matrix of input data is represented by D. The rows of the matrix D represent customers and the columns represent products. Each element of the matrix D takes a value of 0 or 1. 0 indicates that no product has been purchased, and 1 indicates that a product has been purchased. The sequence data is added to a data point indicating that a customer has purchased a product. The sequence data is not limited to one for one data point, and a plurality of sequence data may correspond to one data point.

双方向クラスタリング手段１０２は、クラスタ数算出手段１０３から、多変量データの各変量について、クラスタ分割数を受け取る。双方向クラスタリング手段１０２は、例えば、変量が２つであるとき、クラスタ数算出手段１０３から、各変量のクラスタ分割数ｋ、ｌを受け取る。双方向クラスタリング手段１０２は、シーケンスデータを考慮しつつ、多変量データを双方向クラスタリングする（ステップＡ２）。双方向クラスタリング手段１０２は、多変量データをｋ×ｌのクラスタに分割する。双方向クラスタリング手段１０２は、例えば、クラスタ分割数の初期値として、ｋ＝２、ｌ＝２を受け取り、多変量データを４つのクラスタに分割する。 The bidirectional clustering means 102 receives the cluster division number for each variable of the multivariate data from the cluster number calculation means 103. For example, when there are two variables, the bidirectional clustering unit 102 receives the cluster division numbers k and l of each variable from the cluster number calculation unit 103. The bidirectional clustering means 102 performs bidirectional clustering on the multivariate data in consideration of the sequence data (step A2). Bidirectional clustering means 102 divides the multivariate data into k × l clusters. For example, the bidirectional clustering means 102 receives k = 2 and l = 2 as initial values of the number of cluster divisions, and divides the multivariate data into four clusters.

双方向クラスタリング手段１０２は、評価関数を用いてクラスタリングを行う。評価関数には、各クラスタに属するデータが共通した特徴を持っていない度合いを計算する関数を用いる。入力データを双方向クラスタリングしたとき、各クラスタに属するデータが共通した特徴を持つほど、評価関数の値は小さくなる。逆に、各クラスタに属するデータが共通した特徴を持たないほど、評価関数の値は大きくなる。双方向クラスタリング手段１０２は、評価関数を小さくするようなクラスタ分割を行う。 The bi-directional clustering means 102 performs clustering using the evaluation function. As the evaluation function, a function for calculating the degree to which the data belonging to each cluster does not have a common feature is used. When the input data is bi-directionally clustered, the value of the evaluation function becomes smaller as the data belonging to each cluster has a common feature. Conversely, the value of the evaluation function increases as the data belonging to each cluster does not have a common feature. The bidirectional clustering means 102 performs cluster division so as to reduce the evaluation function.

シーケンスデータを考慮した双方向クラスタリングで用いる評価関数について説明する。分割されたクラスタを、Ｄｉｊ（ｉ＝１〜ｋ、ｊ＝１〜ｋ）で表す。クラスタリングのコストは、下記式１で定義する。

式１にて、Ｃ（Ｄｉｊ）は、評価関数を用いて計算されるクラスタＤｉｊのコストを表す。コストＴは、各クラスタＤｉｊのコストの総和である。 An evaluation function used in bidirectional clustering considering sequence data will be described. The divided cluster is represented by Dij (i = 1 to k, j = 1 to k). The cost of clustering is defined by the following formula 1.

In Expression 1, C (Dij) represents the cost of the cluster Dij calculated using the evaluation function. The cost T is the sum of the costs of each cluster Dij.

コストには、ＭＤＬ（Minimum Description Length）という基準を用いる。各クラスタのコストＣ（Ｄｉｊ）は、下記式２で定義する。

ここで、ｕは、多変量データが取る値である。図４の例では、ｕは、０又は１を取る。ｎ_ｕは、クラスタＤｉｊに属するｕの値の個数を表す。ｎ（Ｄｉｊ）は、クラスタに属するデータ点の数である。すなわち、

である。なお、式２において、ｎ_ｕ（Ｄｉｊ）＝０のときは、

と定義する。 For the cost, a standard called MDL (Minimum Description Length) is used. The cost C (Dij) of each cluster is defined by the following formula 2.

Here, u is a value taken by the multivariate data. In the example of FIG. 4, u takes 0 or 1. n _u represents the number of values of u belonging to the cluster Dij. n (Dij) is the number of data points belonging to the cluster. That is,

It is. In Equation 2, when n _u (Dij) = 0,

It is defined as

式２で定義される関数が、評価関数に該当する。式２において、第１項は、クラスタＤｉｊに含まれる多変量データの類似度が高いほど値が小さくなり、第２項（コストＤＬ（ｙ（Ｄｉｊ）））は、クラスタＤｉｊに含まれるシーケンスデータの類似度合が高いほど値が小さくなる。コストＤＬ（ｙ（Ｄｉｊ））は、下記式３で定義する。

ここで、｜Ｄｉｊ｜は、クラスタＤｉｊに属するシーケンスデータの総数を表す。ｍは、ｌｏｇの底である。＾θは、ｙ（Ｄｉｊ）をモデルで表すときのパラメータである。モデルには、シーケンスデータをモデル化する方法として広く利用されているＨＭＭ（Hidden Markov Model）やMarkov Model等の確率モデルを用いることができる。Ｒは、＾θに含まれるパラメータの数を表す。 The function defined by Equation 2 corresponds to the evaluation function. In Equation 2, the value of the first term becomes smaller as the similarity of the multivariate data included in the cluster Dij is higher, and the second term (cost DL (y (Dij))) is the sequence data included in the cluster Dij. The value decreases as the degree of similarity increases. The cost DL (y (Dij)) is defined by the following formula 3.

Here, | Dij | represents the total number of sequence data belonging to the cluster Dij. m is the bottom of the log. ^ Θ is a parameter when y (Dij) is represented by a model. As the model, a probability model such as HMM (Hidden Markov Model) or Markov Model widely used as a method for modeling sequence data can be used. R represents the number of parameters included in ^ θ.

コストＣ（Ｄｉｊ）は、クラスタＤｉｊに含まれる多変量データとシーケンスデータとの共通した特徴が多いか少ないかを表す。コストＣ（Ｄｉｊ）の値が小さいほど、共通した特徴が多く、値が大きいほど、共通した特徴が少ない。なお、クラスタＤｉｊの属する多変量データが全て同じ値のときは、式２における第１項の値は０となる。その場合、コストＣ（Ｄｉｊ）は、ＤＬ（ｙ（Ｄｉｊ））のみで決まる。例えば、図４で、ｕ＝１のデータ点のみで構成されるクラスタのコストは、クラスタに属するデータ点のシーケンスデータの類似度に応じた値のみで決まる。なお、ｕ＝０のデータ点のみで構成されるクラスタのコストは、シーケンスデータがないことから０となる。 The cost C (Dij) represents whether the multivariate data and sequence data included in the cluster Dij have many common features or few features. The smaller the value of the cost C (Dij), the more common features, and the larger the value, the fewer common features. When all the multivariate data to which the cluster Dij belongs have the same value, the value of the first term in Equation 2 is 0. In that case, the cost C (Dij) is determined only by DL (y (Dij)). For example, in FIG. 4, the cost of a cluster composed of only u = 1 data points is determined only by the value corresponding to the similarity of the sequence data of the data points belonging to the cluster. Note that the cost of a cluster composed only of u = 0 data points is 0 because there is no sequence data.

クラスタ数算出手段１０３は、双方向クラスタリング手段１０２がクラスタ分割を行うと、クラスタ分割結果と評価関数とを用いて、クラスタ数を増加するか否かを決定する（ステップＡ３）。クラスタ数算出手段１０３は、例えば、式１で定義されるコストＴの値が所定のしきい値を上回るか否かを判断する。クラスタ数算出手段１０３は、コストＴの値がしきい値を上回るときは、クラスタ数を増加すると決定する。 When the bi-directional clustering means 102 performs cluster division, the cluster number calculating means 103 determines whether or not to increase the number of clusters using the cluster division result and the evaluation function (step A3). For example, the cluster number calculation unit 103 determines whether or not the value of the cost T defined by Equation 1 exceeds a predetermined threshold value. The cluster number calculation means 103 determines to increase the number of clusters when the value of the cost T exceeds the threshold value.

クラスタ数算出手段１０３にて、クラスタ数を増加させるか否かの判断手法は、特に上記したものには限定されない。例えば、以下のように判断してもよい。クラスタＤｉｊに属する多変量データとシーケンスデータとのから、どれか１つのデータ点を取り除く。データ点を１つ取り除いたクラスタをＤ’ｉｊとする。クラスタ数算出手段１０３は、データ点を取り除く前後のコスト、Ｃ（Ｄｉｊ）とＣ（Ｄ’ｉｊ）を計算し、両者を比較する。クラスタ数算出手段１０３は、Ｃ（Ｄｉｊ）＞Ｃ（Ｄ’ｉｊ）となるデータ点が存在する場合は、クラスタ数を増加すると決定する。 The method for determining whether or not to increase the number of clusters in the cluster number calculation means 103 is not particularly limited to the above. For example, the determination may be made as follows. Any one data point is removed from the multivariate data and the sequence data belonging to the cluster Dij. Let D'ij be the cluster from which one data point is removed. The cluster number calculation means 103 calculates the costs before and after removing the data points, C (Dij) and C (D′ ij), and compares the two. The cluster number calculation means 103 determines to increase the number of clusters when there is a data point where C (Dij)> C (D′ ij).

クラスタ数算出手段１０３は、クラスタ数を増加させると決定すると、双方向クラスタリング手段１０２に、増加後のクラスタ数を通知する。クラスタ数算出手段１０３は、例えば、現在のクラスタ数をｋ、ｌとして、ｋ＋１とｌ、ｋとｌ＋１、又は、ｋ＋１とｌ＋１を、新たなクラスタ数として双方向クラスタリング手段１０２に通知する。その後、ステップＡ３からステップＡ２へ戻り、双方向クラスタリング手段１０２は、入力データを、通知されたクラスタ数にクラスタ分割する。ステップＡ２とステップＡ３とを繰り返し行うことで、適切な分割数のクラスタを得ることができる。 When the cluster number calculating unit 103 determines to increase the number of clusters, the cluster number calculating unit 103 notifies the bidirectional clustering unit 102 of the increased number of clusters. For example, the cluster number calculation unit 103 notifies the bidirectional clustering unit 102 of k + 1 and l, k and l + 1, or k + 1 and l + 1 as the new number of clusters, where k and l are the current number of clusters. Thereafter, the process returns from step A3 to step A2, and the bidirectional clustering means 102 divides the input data into the number of notified clusters. By repeating step A2 and step A3, it is possible to obtain an appropriate number of clusters.

出力手段１０４は、ステップＡ３で、クラスタ数算出手段１０３がクラスタ数を増加させないと決定すると、双方向クラスタリング手段１０２が行った双方クラスタリングの結果を出力する（ステップＡ４）。出力手段１０４は、例えば、クラスタ分割で得られた各クラスタＤｉｊについて、各クラスタに属するデータ点の情報を、ディスプレイ等の出力装置に表示する。 When the output unit 104 determines in step A3 that the cluster number calculation unit 103 does not increase the number of clusters, the output unit 104 outputs the result of the bilateral clustering performed by the bidirectional clustering unit 102 (step A4). The output unit 104 displays, for example, information on data points belonging to each cluster on an output device such as a display for each cluster Dij obtained by cluster division.

図５は、初期クラスタリングの結果を示している。双方向クラスタリング手段１０２が、入力データ（図４）を初期クラスタ数（ｋ＝２、ｌ＝２）のクラスタに分割することで、図５に示す４つのクラスタＤ１１、Ｄ１２、Ｄ２１、Ｄ２２が得られる。各クラスタについて、コストを計算すると、
Ｃ（Ｄ１１）＝４ｌｏｇ（６／４）＋２ｌｏｇ（６／２）＋ＤＬ（ｙ^１ _１Ａ、ｙ^１ _２Ｂ、ｙ^２ _２Ｂ）＝１．６６＋ＤＬ（ｙ^１ _１Ａ、ｙ^１ _２Ｂ、ｙ^２ _２Ｂ）
Ｃ（Ｄ１２）＝５２ｌｏｇ（５４／５２）＋２ｌｏｇ（５４／２）＋ＤＬ（ｙ^１ _５Ａ、ｙ^１ _２８Ｂ）＝３．７２＋ＤＬ（ｙ^１ _５Ａ、ｙ^１ _２８Ｂ）
Ｃ（Ｄ２１）＝７ｌｏｇ（９／７）＋２ｌｏｇ（９／２）＋ＤＬ（ｙ^１ _１Ｄ、ｙ^１ _２Ｅ）＝２．０７＋ＤＬ（ｙ^１ _１Ｄ、ｙ^１ _２Ｅ）
Ｃ（Ｄ２２）＝７８ｌｏｇ（８１／７８）＋３ｌｏｇ（８１／３）＋ＤＬ（ｙ^１ _５Ｄ、ｙ^２ _２８Ｅ、ｙ^１ _３０Ｃ）＝５．５７＋ＤＬ（ｙ^１ _５Ｄ、ｙ^２ _２８Ｅ、ｙ^１ _３０Ｃ）
となる。全体のコストＴは、
Ｔ＝Ｃ（Ｄ１１）＋Ｃ（Ｄ１２）＋Ｃ（Ｄ２１）＋Ｃ（Ｄ２２）
＝１３．０２＋ＤＬ（ｙ^１ _１Ａ、ｙ^１ _２Ｂ、ｙ^２ _２Ｂ）＋ＤＬ（ｙ^１ _５Ａ、ｙ^１ _２８Ｂ）＋ＤＬ（ｙ^１ _１Ｄ、ｙ^１ _２Ｅ）＋ＤＬ（ｙ^１ _５Ｄ、ｙ^２ _２８Ｅ、ｙ^１ _３０Ｃ）
となる。 FIG. 5 shows the result of the initial clustering. The bidirectional clustering means 102 divides the input data (FIG. 4) into clusters having the initial number of clusters (k = 2, l = 2), thereby obtaining four clusters D11, D12, D21, and D22 shown in FIG. It is done. For each cluster, calculating the cost:
C (D11) = 4log (6/4) + 2log (6/2) + DL (y ¹ _1A , y ¹ _2B , y ² _2B ) = 1.66 + DL (y ¹ _1A , y ¹ _2B , y ² _2B )
C (D12) = 52 log (54/52) +2 log (54/2) + DL (y ¹ _5A , y ¹ _28B ) = 3.72 + DL (y ¹ _5A , y ¹ _28B )
C (D21) = 7log (9/7) + 2log (9/2) + DL (y ¹ _1D , y ¹ _2E ) = 2.07 + DL (y ¹ _1D , y ¹ _2E )
C (D22) = 78 log (81/78) +3 log (81/3) + DL (y ¹ _5D , y ² _28E , y ¹ _30C ) = 5.57 + DL (y ¹ _5D , y ² _28E , y ¹ _30C )
It becomes. The total cost T is
T = C (D11) + C (D12) + C (D21) + C (D22)
= 13.02 + DL (y ¹ _1A , y ¹ _2B , y ² _2B ) + DL (y ¹ _5A , y ¹ _28B ) + DL (y ¹ _1D , y ¹ _2E ) + DL (y ¹ _5D , y ² _28E , y ¹ _30C )
It becomes.

図６は、最終的に得られたクラスタリング結果を示している。ステップＡ２、Ａ３を繰り返し行うことで、「顧客」方向のクラスタ分割数は２に、「商品」方向のクラスタ分割数は３になり、最終的に、図６に示す６個のクラスタＤ１１〜Ｄ１３、Ｄ２１〜Ｄ２３が得られたとする。図６に示すＤ１１〜Ｄ１３、Ｄ２１〜Ｄ２３について、各クラスタのコストを計算すると、
Ｃ（Ｄ１１）＝１ｌｏｇ（４／１）＋３ｌｏｇ（４／３）＋ＤＬ（ｙ^１ _１Ａ、ｙ^１ _５Ａ、ｙ^１ _１Ｄ）＝０．９７７＋ＤＬ（ｙ^１ _１Ａ、ｙ^１ _５Ａ、ｙ^１ _１Ｄ）
Ｃ（Ｄ１２）＝０
Ｃ（Ｄ１３）＝０
Ｃ（Ｄ２１）＝０
Ｃ（Ｄ２２）＝３ｌｏｇ（９／３）＋６ｌｏｇ（９／６）＋ＤＬ（ｙ^１ _２Ｂ、ｙ^２ _２Ｂ、ｙ^１ _２８Ｂ、ｙ^１ _２Ｅ、ｙ^２ _２８Ｅ、ｙ^１ _２Ｃ、ｙ^１ _３０Ｃ）＝２．４８＋ＤＬ（ｙ^１ _２Ｂ、ｙ^２ _２Ｂ、ｙ^１ _２８Ｂ、ｙ^１ _２Ｅ、ｙ^２ _２８Ｅ、ｙ^１ _２Ｃ、ｙ^１ _３０Ｃ）
Ｃ（Ｄ２３）＝０
となる。全体のコストＴは、
Ｔ＝ΣＣ（Ｄｉｊ）＝３．４６＋ＤＬ（ｙ^１ _１Ａ、ｙ^１ _５Ａ、ｙ^１ _１Ｄ）＋ＤＬ（ｙ^１ _２Ｂ、ｙ^２ _２Ｂ、ｙ^１ _２８Ｂ、ｙ^１ _２Ｅ、ｙ^２ _２８Ｅ、ｙ^１ _２Ｃ、ｙ^１ _３０Ｃ）
となる。 FIG. 6 shows the finally obtained clustering result. By repeatedly performing steps A2 and A3, the number of cluster divisions in the “customer” direction is 2, and the number of cluster divisions in the “product” direction is 3. Finally, the six clusters D11 to D13 shown in FIG. , D21 to D23 are obtained. When the cost of each cluster is calculated for D11 to D13 and D21 to D23 shown in FIG.
C (D11) = 1 log (4/1) +3 log (4/3) + DL (y ¹ _1A , y ¹ _5A , y ¹ _1D ) = 0.997 + DL (y ¹ _1A , y ¹ _5A , y ¹ _1D )
C (D12) = 0
C (D13) = 0
C (D21) = 0
C (D22) = 3 log (9/3) +6 log (9/6) + DL (y ¹ _2B , y ² _2B , y ¹ _28B , y ¹ _2E , y ² _28E , y ¹ _2C , y ¹ _30C ) = 2. 48 + DL (y ¹ _2B , y ² _2B , y ¹ _28B , y ¹ _2E , y ² _28E , y ¹ _2C , y ¹ _30C )
C (D23) = 0
It becomes. The total cost T is
T = ΣC (Dij) = 3.46 + DL (y ¹ _1A , y ¹ _5A , y ¹ _1D ) + DL (y ¹ _2B , y ² _2B , y ¹ _28B , y ¹ _2E , y ² _28E , y ¹ _2C , y ¹ _30C )
It becomes.

図５に示すクラスタリング結果におけるコストＴと、図６に示すクラスタリング結果におけるコストＴとを比較すると、ＤＬの値（シーケンスデータの類似度）を除いて、評価関数の値が下がっていることが確認できる。すなわち、評価関数に基づいて双方向クラスタリングを行うことで、多変量データと多変量データに対応したシーケンスデータを、各変量間及びシーケンスデータ間で共通の特徴を持つクラスタに分割できる。なお、コストＴは、上記したものには限定されず、双方向クラスタリングに必要な他のコストを含んでいてもよい。 When the cost T in the clustering result shown in FIG. 5 is compared with the cost T in the clustering result shown in FIG. 6, it is confirmed that the value of the evaluation function is reduced except for the DL value (similarity of sequence data). it can. That is, by performing bi-directional clustering based on the evaluation function, multivariate data and sequence data corresponding to the multivariate data can be divided into clusters having features common to the respective variables and between the sequence data. Note that the cost T is not limited to the above, and may include other costs necessary for bidirectional clustering.

比較例として、シーケンスデータを考慮しない双方向クラスタリングを考える。多変量データとして、２変量データを考える。変量の１つは顧客で、もう１つは商品とする。図７（ａ）に、多変量データを示す。顧客は、Ａ、Ｂ、Ｃの値を取り、商品は１、２、３の値を取る。多変量データの値は、顧客が商品を購入したか否かを表す。例えば、顧客Ａが商品１を購入したとき、顧客Ａと商品１とに対応するデータ点の値は１となる。 As a comparative example, consider bidirectional clustering that does not consider sequence data. Consider bivariate data as multivariate data. One variable is the customer and the other is the product. FIG. 7A shows multivariate data. The customer takes A, B, and C values, and the product takes 1, 2, and 3 values. The value of the multivariate data represents whether or not the customer has purchased the product. For example, when customer A purchases product 1, the value of the data point corresponding to customer A and product 1 is 1.

図７（ａ）に示す多変量データに対して、顧客及び商品の双方向でクラスタリングを行うと、図７（ｂ）のクラスタリング結果が得られる。この場合、クラスタ分割数は４である。多変量データに対して双方向クラスタリングを行うことで、顧客Ａ、Ｃが、商品１、３を購入するというデータ点から成るクラスタと、顧客Ｂが商品２を購入するというデータ点から成るクラスタとが得られる。このクラスタリング結果から、顧客Ａ、Ｃが、商品１、３と共通した特徴を持ち、顧客Ｂは商品２と共通した特徴を持つことがわかる。 When the multivariate data shown in FIG. 7A is clustered in both directions of the customer and the product, the clustering result shown in FIG. 7B is obtained. In this case, the number of cluster divisions is four. By performing bi-directional clustering on multivariate data, a cluster consisting of data points that customers A and C purchase products 1 and 3 and a cluster consisting of data points that customer B purchases product 2 Is obtained. From this clustering result, it can be seen that customers A and C have characteristics common to products 1 and 3, and customer B has characteristics common to product 2.

図８は、多変量データとシーケンスデータとを双方向クラスタリングする例を示している。図８（ａ）は、シーケンスデータが付加された多変量データを示している。シーケンスデータは、例えば、顧客が、過去に商品を購入した曜日を示すデータから成る。シーケンスデータは、顧客が商品を購入したことを示すデータ点、すなわち、値が１のデータ点に添付される。 FIG. 8 shows an example of bidirectional clustering of multivariate data and sequence data. FIG. 8A shows multivariate data to which sequence data is added. The sequence data includes, for example, data indicating the day of the week on which the customer has purchased the product. The sequence data is attached to a data point indicating that a customer has purchased a product, that is, a data point having a value of 1.

図８（ａ）に示す多変量データを、顧客、商品のみでなく、シーケンスデータを考慮して双方向クラスタリングすると、図８（ｂ）に示すクラスタリング結果が得られる。この場合、クラスタ分割数は６となる。シーケンスデータも考慮して双方向クラスタリングを行うことで、顧客Ａ、Ｃが、商品１を購入するというデータ点から成るクラスタと、顧客Ａ、Ｃが商品３を購入するというデータ点から成るクラスタと、顧客Ｂが商品２を購入するというデータ点から成るクラスタとが得られる。 When the multivariate data shown in FIG. 8A is bi-directionally clustered in consideration of not only customers and products but also sequence data, the clustering result shown in FIG. 8B is obtained. In this case, the number of cluster divisions is 6. By performing bi-directional clustering in consideration of sequence data, a cluster consisting of data points that customers A and C purchase product 1 and a cluster consisting of data points that customers A and C purchase product 3 , A cluster consisting of data points that customer B purchases product 2 is obtained.

図８（ｂ）に示す双方向クラスタリング結果から、顧客Ａ、Ｃは、商品１を同じような購入曜日履歴で購入していることが読み取れる。また、顧客Ａ、Ｃは、商品３を同じような購入曜日履歴で購入していることが読み取れる。顧客Ａ、Ｃは、共に商品１及び商品３を購入しているものの、商品１と商品３とが同じクラスタに分類されなかったことから、商品１と商品３とでは、購入曜日履歴が異なるということを読み取ることができる。つまり、顧客Ａ、Ｃは、商品１を、商品３と同じような曜日間隔で購入していないことが読み取ることができる。商品２については、顧客Ｂが商品２を購入する曜日履歴は、商品１、３の購入曜日履歴とは異なっていることを読み取ることができる。 From the bidirectional clustering result shown in FIG. 8B, it can be seen that the customers A and C have purchased the product 1 with the same purchase day history. Further, it can be read that the customers A and C purchase the product 3 with the same purchase day history. Customers A and C both purchase product 1 and product 3, but product 1 and product 3 are not classified into the same cluster, so product 1 and product 3 have different purchase day histories. Can read that. That is, it can be read that the customers A and C do not purchase the product 1 at the same day interval as the product 3. Regarding the product 2, it can be read that the day of the week history of the purchase of the product 2 by the customer B is different from the purchase day history of the products 1 and 3.

本実施形態では、双方向クラスタリング手段１０２は、多変量データとシーケンスデータとに対して双方向クラスタリングを行う。評価関数として、クラスタに含まれる各変量間とシーケンスデータ間とのそれぞれで共通した特徴が多いか少ないかを表す評価関数を用い、双方向クラスタリングを行うことで、多変量データと多変量データに対応したシーケンスデータとを、各変量間、及び、シーケンスデータ間でそれぞれ共通の特徴をもつクラスタに同時に分割することができる。 In the present embodiment, the bidirectional clustering means 102 performs bidirectional clustering on multivariate data and sequence data. As an evaluation function, multi-variate data and multivariate data are obtained by performing bi-directional clustering using an evaluation function that indicates whether there are many or few common features between each variable included in the cluster and between sequence data. Corresponding sequence data can be simultaneously divided into clusters having common features between the variables and between the sequence data.

また、本実施形態では、双方向クラスタ分割装置１００は、クラスタ数算出手段１０３を有する。クラスタ数算出手段１０３は、評価関数に基づいてクラスタリング結果が適切であるか否かを判断し、よりよいクラスタリング結果を得るために、クラスタ分割数を増加させる。クラスタリングに際して、いくつのクラスタに分割すればよいかは、事前にわからないことが多い。本実施形態では、クラスタ数算出手段１０３が、動的にクラスタ数を決定することで、事前に、何個のクラスタに分割すればよいかがわからないときでも、多変量データを、適切な分割数で、クラスタ分割することができる。 In the present embodiment, the bidirectional cluster dividing device 100 includes the cluster number calculating unit 103. The cluster number calculation means 103 determines whether or not the clustering result is appropriate based on the evaluation function, and increases the number of cluster divisions in order to obtain a better clustering result. In clustering, it is often not known in advance how many clusters should be divided. In the present embodiment, the cluster number calculation unit 103 dynamically determines the number of clusters, so that even when it is not known in advance how many clusters should be divided, the multivariate data is converted into an appropriate number of divisions. Can be divided into clusters.

図９は、本発明の第２実施形態に係る広告配信システムを示している。広告配信システムは、双方向クラスタ分割装置１００とＷｅｂサーバ３００とを有する。双方向クラスタ分割装置１００の構成は、図１に示す第１実施形態における双方向クラスタ分割装置の構成と同様である。Ｗｅｂサーバ３００は、ユーザ端末２００と、インターネット４００などのネットワークを介して接続している。ユーザ端末２００は、ユーザに対して、入出力等のインターフェースを提供する。ユーザ端末２００は、例えば、パーソナルコンピュータや携帯型の情報端末装置である。 FIG. 9 shows an advertisement distribution system according to the second embodiment of the present invention. The advertisement distribution system includes a bidirectional cluster dividing device 100 and a Web server 300. The configuration of the bidirectional cluster dividing device 100 is the same as the configuration of the bidirectional cluster dividing device in the first embodiment shown in FIG. The Web server 300 is connected to the user terminal 200 via a network such as the Internet 400. The user terminal 200 provides an interface such as input / output to the user. The user terminal 200 is, for example, a personal computer or a portable information terminal device.

広告配信システムは、ユーザがＷｅｂコンテンツをリクエストした際に、ユーザがリクエストしたコンテンツに広告を付け加えてユーザに配信する。広告は、ユーザに広告主のコンテンツをリクエストさせるための仕組みを含む。より具体的には、広告には、広告主が誘導したいサイトのリンクが含まれており、ユーザが広告をクリックすることで、ユーザが広告主などのサイトを訪問できるようになっている。広告主は、例えば、商品やサービスの詳細情報を掲載したＷｅｂページへのリンクを広告に含め、ユーザを、そのＷｅｂページに誘導する。 When a user requests web content, the advertisement distribution system adds an advertisement to the content requested by the user and distributes the content to the user. The advertisement includes a mechanism for allowing the user to request the content of the advertiser. More specifically, the advertisement includes a link of a site that the advertiser wants to guide, and the user can click on the advertisement so that the user can visit a site such as the advertiser. The advertiser includes, for example, a link to a web page on which detailed information on products or services is posted, and guides the user to the web page.

ここで、ユーザにコンテンツに付随して広告を配信しても、その広告がユーザの好みと異なれば、ユーザが広告をクリックする可能性は低く、ユーザを訪問させたいサイトに誘導することができる可能性が低くなる。広告主は、ユーザが広告をクリックしなければ、広告配信の効果を得ることが難しい。従って、広告配信システムでは、ユーザの好みに合致した広告を正確に予測することが重要になる。 Here, even if an advertisement is delivered to the user along with the content, if the advertisement is different from the user's preference, the user is unlikely to click on the advertisement and can be directed to a site that the user wants to visit. Less likely. It is difficult for the advertiser to obtain the effect of advertisement distribution unless the user clicks on the advertisement. Therefore, in the advertisement distribution system, it is important to accurately predict an advertisement that matches the user's preference.

Ｗｅｂサーバ３００は、双方向クラスタ分割装置１００に対し、多変量データ及びシーケンスデータを与える。双方向クラスタ分割装置１００は、多変量データと、シーケンスデータとに対して双方向クラスタリングを行う。Ｗｅｂサーバ３００は、双方向クラスタ分割装置１００から双方向クラスタリング結果を受け取る。Ｗｅｂサーバ３００は、双方向クラスタリング結果を用いて、Ｗｅｂコンテンツをリクエストしたユーザに、ユーザの好みに対応した広告を配信する。本実施形態は、協調フィルタリングやコラボレーティブフィルタリングという分野に当てはまる。 The Web server 300 provides multivariate data and sequence data to the bidirectional cluster dividing device 100. The bidirectional cluster dividing device 100 performs bidirectional clustering on multivariate data and sequence data. The Web server 300 receives the bidirectional clustering result from the bidirectional cluster dividing device 100. The Web server 300 uses the bidirectional clustering result to distribute an advertisement corresponding to the user's preference to the user who requested the Web content. This embodiment applies to the field of collaborative filtering and collaborative filtering.

図１０は、ユーザ端末２００を示している。ユーザ端末２００は、コンテンツリクエスト手段２０１と、コンテンツ表示手段２０２とを有する。コンテンツリクエスト手段２０１は、ユーザが閲覧を希望するコンテンツを、Ｗｅｂサーバ３００にリクエストする。コンテンツ表示手段２０２は、Ｗｅｂサーバ３００から、ユーザがリクエストしたコンテンツを取得し、表示する。ユーザ端末２００内の各部の機能は、コンピュータが所定のプログラムに従って動作することで実現可能である。 FIG. 10 shows the user terminal 200. The user terminal 200 includes a content request unit 201 and a content display unit 202. The content request unit 201 requests the Web server 300 for content that the user desires to browse. The content display unit 202 acquires the content requested by the user from the Web server 300 and displays it. The function of each unit in the user terminal 200 can be realized by the computer operating according to a predetermined program.

コンテンツリクエスト手段２０１は、例えば、ユーザがスポーツのコンテンツを希望するときは、Ｗｅｂサーバ３００にスポーツのコンテンツをリクエストする。また、コンテンツリクエスト手段２０１は、ユーザが、コンテンツに付随して配信された広告をクリックすると、Ｗｅｂサーバ３００に、その広告に対応するコンテンツをリクエストする。 For example, when the user desires sports content, the content request unit 201 requests the Web server 300 for sports content. Further, when the user clicks on an advertisement distributed along with the content, the content request unit 201 requests the Web server 300 for content corresponding to the advertisement.

図１１は、Ｗｅｂサーバ３００を示している。Ｗｅｂサーバ３００は、コンテンツ配信手段３０１、ユーザリクエスト記憶部３０２、コンテンツ記憶部３０３、広告選択手段３０４、広告記憶部３０５、リクエスト受付手段３０６、クラスタリング制御手段３０７、出力装置３０８、入力装置３０９、及び、クラスタリング結果記憶部３１０を有する。Ｗｅｂサーバ３００内の各部の機能は、コンピュータが所定のプログラムに従って動作することで実現可能である。 FIG. 11 shows the Web server 300. The Web server 300 includes a content distribution unit 301, a user request storage unit 302, a content storage unit 303, an advertisement selection unit 304, an advertisement storage unit 305, a request reception unit 306, a clustering control unit 307, an output device 308, an input device 309, and And a clustering result storage unit 310. The function of each unit in the Web server 300 can be realized by a computer operating according to a predetermined program.

リクエスト受付手段３０６は、ユーザからのリクエストを受け付ける。ユーザからのリクエストには、所望のＷｅｂページの取得を要求するリクエストと、Ｗｅｂ広告に対応するＷｅｂページの取得を要求するリクエストとがある。ユーザリクエスト記憶部３０２は、ユーザからのリクエストに関する情報を記憶する。リクエスト受付手段３０６は、例えば、ユーザ名、リクエストの内容、リクエストの時刻を、ユーザリクエスト記憶部３０２に記憶する。 The request receiving unit 306 receives a request from the user. The request from the user includes a request for requesting acquisition of a desired Web page and a request for requesting acquisition of a Web page corresponding to the Web advertisement. The user request storage unit 302 stores information related to requests from users. The request receiving unit 306 stores, for example, the user name, the request content, and the request time in the user request storage unit 302.

コンテンツ記憶部３０３は、ユーザに配信すべきコンテンツを記憶する。広告記憶部３０５は、Ｗｅｂ広告を記憶する。コンテンツ配信手段３０１は、コンテンツ記憶部３０３から、ユーザがリクエストしたコンテンツを取得し、ユーザに配信する。その際、コンテンツ配信手段３０１は、コンテンツに広告記憶部３０５が記憶するＷｅｂ広告を付け加えて、ユーザにコンテンツを配信する。なお、コンテンツ配信手段３０１は、ユーザがリクエストしたコンテンツがコンテンツ記憶部３０３にない場合は、外部サーバにリクエストを転送してもよい。また、ユーザがリクエストしたコンテンツが広告に対応したＷｅｂページである場合、コンテンツ配信手段３０１は、コンテンツにＷｅｂ広告を付け加えなくてもよい。 The content storage unit 303 stores content to be distributed to the user. The advertisement storage unit 305 stores web advertisements. The content distribution unit 301 acquires the content requested by the user from the content storage unit 303 and distributes the content to the user. At that time, the content distribution unit 301 adds the Web advertisement stored in the advertisement storage unit 305 to the content and distributes the content to the user. The content distribution unit 301 may transfer the request to an external server when the content requested by the user is not in the content storage unit 303. Further, when the content requested by the user is a Web page corresponding to the advertisement, the content distribution unit 301 may not add the Web advertisement to the content.

クラスタリング制御手段３０７は、データ生成手段を兼ねている。クラスタリング制御手段３０７は、双方向クラスタ分割装置１００に与えるデータの生成と、双方向クラスタ分割装置１００が行う双方向クラスタリングの制御を行う。クラスタリング制御手段３０７は、例えば、Ｗｅｂサーバ３００への全アクセス回数が所定のしきい値を越えると、ユーザリクエスト記憶部３０２から、全ユーザの過去のコンテンツ訪問履歴を読み出す。クラスタリング制御手段３０７は、読み出した情報に基づいて、ユーザと広告とを変量とし、ユーザが広告から広告主のコンテンツをリクエストしたか否かを示す多変量データを生成する。ユーザは、広告をクリックすることで、広告主のコンテンツをリクエストするので、多変量データは、ユーザがどの広告をクリックしたかを示すデータを表していることになる。 The clustering control unit 307 also serves as a data generation unit. The clustering control unit 307 performs generation of data to be given to the bidirectional cluster dividing device 100 and control of bidirectional clustering performed by the bidirectional cluster dividing device 100. For example, when the total number of accesses to the Web server 300 exceeds a predetermined threshold, the clustering control unit 307 reads past content visit histories of all users from the user request storage unit 302. Based on the read information, the clustering control unit 307 uses the user and the advertisement as variables, and generates multivariate data indicating whether the user has requested the advertiser's content from the advertisement. Since the user requests the advertiser's content by clicking on the advertisement, the multivariate data represents data indicating which advertisement the user has clicked.

また、クラスタリング制御手段３０７は、多変量データ対応して、ユーザが広告主のコンテンツをリクエストするまでに送信したリクエストを時系列で並べたシーケンスデータを生成する。以下では、ユーザが送信したリクエストを時系列で並べたデータを、コンテンツ訪問履歴とも呼ぶ。クラスタリング制御手段３０７は、どのユーザがどの広告をクリックしたかを示す多変量データと、広告をクリックするまでのコンテンツ訪問履歴（シーケンスデータ）とを、出力装置３０８に渡すと共に、双方向クラスタ分割装置１００に双方向クラスタリングを依頼する。 Further, the clustering control unit 307 generates sequence data corresponding to the multivariate data, in which the requests transmitted until the user requests the content of the advertiser are arranged in time series. Below, the data which arranged the request which the user transmitted in time series are also called content visit history. The clustering control unit 307 passes the multivariate data indicating which user has clicked which advertisement and the content visit history (sequence data) until the advertisement is clicked to the output device 308, and the bidirectional cluster dividing device. Request 100 for bi-directional clustering.

双方向クラスタ分割装置１００は、どのユーザがどの広告をクリックしたかを示す多変量データと、ユーザがＷｅｂ広告をクリックするまでのコンテンツ訪問履歴とを、出力装置３０８を介して入力する。双方向クラスタ分割装置１００は、多変量データと、ユーザがＷｅｂ広告をクリックするまでのコンテンツ訪問履歴とに対して、双方向クラスタリングを行う。双方向クラスタ分割装置１００は、双方向クラスタリング結果をＷｅｂサーバ３００に出力する。 The interactive cluster dividing device 100 inputs multivariate data indicating which user has clicked which advertisement and the content visit history until the user clicked the web advertisement via the output device 308. The bidirectional cluster dividing device 100 performs bidirectional clustering on the multivariate data and the content visit history until the user clicks on the Web advertisement. The bidirectional cluster dividing device 100 outputs the bidirectional clustering result to the Web server 300.

入力装置３０９は、双方向クラスタ分割装置１００から、双方向クラスタリング結果を入力し、クラスタリング結果記憶部３１０に渡す。クラスタリング結果記憶部３１０は、入力装置３０９から受け取った双方向クラスタリング結果を記憶する。広告選択手段３０４は、クラスタリング結果記憶部３１０を参照し、双方向クラスタリング結果に基づいて、ユーザに配信すべきＷｅｂ広告を決定する。広告選択手段３０４は、広告記憶部３０５からＷｅｂ広告を読み出し、コンテンツ配信手段３０１に与える。 The input device 309 inputs the bidirectional clustering result from the bidirectional cluster dividing device 100 and passes it to the clustering result storage unit 310. The clustering result storage unit 310 stores the bidirectional clustering result received from the input device 309. The advertisement selection unit 304 refers to the clustering result storage unit 310 and determines a Web advertisement to be distributed to the user based on the bidirectional clustering result. The advertisement selection unit 304 reads the web advertisement from the advertisement storage unit 305 and gives it to the content distribution unit 301.

以下、動作手順を説明する。広告配信システムの動作は、大きく分けて、双方向クラスタリング処理と、双方向クラスタリング結果を用いた広告配信処理との２つある。図１２は、双方向クラスタリング処理の手順を示している。ユーザがコンテンツを要求すると、ユーザ端末２００のコンテンツリクエスト手段２０１は、Ｗｅｂサーバ３００に、コンテンツをリクエストする（ステップＢ１）。ユーザは、あらかじめ属性情報が判明しているユーザであり、Ｗｅｂサーバ３００は、どのユーザからのコンテンツリクエストであるかを判別可能であるとする。 The operation procedure will be described below. The operation of the advertisement distribution system is roughly divided into two types, that is, a bidirectional clustering process and an advertisement distribution process using the bidirectional clustering result. FIG. 12 shows the procedure of bidirectional clustering processing. When the user requests content, the content request unit 201 of the user terminal 200 requests the content from the Web server 300 (step B1). The user is a user whose attribute information is known in advance, and the Web server 300 can determine which user the content request is from.

Ｗｅｂサーバ３００のリクエスト受付手段３０６は、ユーザからのリクエストを受け付ける。リクエスト受付手段３０６は、ユーザ名、リクエストの内容、及び、時刻を、ユーザリクエスト記憶部に記憶する（ステップＢ２）。また、リクエスト受付手段３０６は、ユーザからのリクエストをコンテンツ配信手段３０１に渡す。 The request receiving unit 306 of the Web server 300 receives a request from a user. The request accepting unit 306 stores the user name, request content, and time in the user request storage unit (step B2). Further, the request reception unit 306 passes a request from the user to the content distribution unit 301.

コンテンツ配信手段３０１は、コンテンツ記憶部３０３からリクエストに対応するコンテンツを読み出す。また、コンテンツ配信手段３０１は、広告選択手段３０４からＷｅｂ広告を受け取る。コンテンツ配信手段３０１は、コンテンツ記憶部３０３から読み出したコンテンツにＷｅｂ広告を付加して、ユーザ端末２００に送信する（ステップＢ３）。ユーザ端末２００のコンテンツ表示手段２０２は、受信したコンテンツを表示する（ステップＢ４）。 The content distribution unit 301 reads the content corresponding to the request from the content storage unit 303. Further, the content distribution unit 301 receives a Web advertisement from the advertisement selection unit 304. The content distribution unit 301 adds a Web advertisement to the content read from the content storage unit 303 and transmits it to the user terminal 200 (step B3). The content display unit 202 of the user terminal 200 displays the received content (step B4).

クラスタリング制御手段３０７は、Ｗｅｂサーバ３００への全アクセス回数が所定のしきい値を超えたか否かを判断する。クラスタリング制御手段３０７は、全アクセス回数がしきい値を超えたと判断すると、ユーザリクエスト記憶部３０２から、全ユーザの過去のコンテンツ訪問履歴を読み出す（ステップＢ５）。クラスタリング制御手段３０７は、読み出したコンテンツ訪問履歴に基づいて、どのユーザがどの広告をクリックしたかを示す多変量データと、ユーザがＷｅｂ広告をクリックするまでのコンテンツ訪問履歴とを生成する。クラスタリング制御手段３０７は、生成した多変量データ、及び、ユーザがＷｅｂ広告をクリックするまでのコンテンツ訪問履歴とを、双方向クラスタ分割装置１００に出力する（ステップＢ６）。 The clustering control unit 307 determines whether or not the total number of accesses to the Web server 300 has exceeded a predetermined threshold value. When the clustering control unit 307 determines that the total number of accesses exceeds the threshold, the clustering control unit 307 reads the past content visit history of all users from the user request storage unit 302 (step B5). The clustering control unit 307 generates multivariate data indicating which user has clicked which advertisement and content visit history until the user clicked the web advertisement based on the read content visit history. The clustering control unit 307 outputs the generated multivariate data and the content visit history until the user clicks the Web advertisement to the bidirectional cluster dividing device 100 (step B6).

双方向クラスタ分割装置１００の双方向クラスタリング手段１０２（図１）は、入力手段１０１を介して、多変量データと、ユーザがＷｅｂ広告をクリックするまでのコンテンツ訪問履歴とを入力する。双方向クラスタリング手段１０２は、多変量データと、ユーザがＷｅｂ広告をクリックするまでのコンテンツ訪問履歴とに対し、双方向クラスタリングを行う（ステップＢ７）。双方向クラスタリングの手順は、図２に示す手順と同様である。 The bidirectional clustering unit 102 (FIG. 1) of the bidirectional cluster dividing device 100 inputs multivariate data and a content visit history until the user clicks on the Web advertisement via the input unit 101. The bidirectional clustering means 102 performs bidirectional clustering on the multivariate data and the content visit history until the user clicks on the web advertisement (step B7). The procedure of bidirectional clustering is the same as the procedure shown in FIG.

双方向クラスタリング手段１０２は、出力手段１０４を介して、Ｗｅｂサーバ３００に双方向クラスタリング結果を送信する（ステップＢ８）。Ｗｅｂサーバ３００の入力装置３０９は、双方向クラスタリング結果を受け取ると、受け取った双方向クラスタリング結果をクラスタリング結果記憶部３１０に記憶する。クラスタリング結果記憶部３１０は、双方向クラスタリング得結果を記憶する（ステップＢ９）。 The bidirectional clustering unit 102 transmits the bidirectional clustering result to the Web server 300 via the output unit 104 (step B8). Upon receiving the bidirectional clustering result, the input device 309 of the Web server 300 stores the received bidirectional clustering result in the clustering result storage unit 310. The clustering result storage unit 310 stores the bidirectional clustering result (step B9).

図１３は、双方向クラスタリング手段１０２の入力データを示している。多変量データの変量は、「ユーザ」と、「Ｗｅｂ広告」との２つである。コンテンツ訪問履歴は、ユーザが広告をクリックするまでのリクエストを時系列で並べたシーケンスデータである。コンテンツ訪問履歴は、例えば、ユーザが広告をクリックしたその日に、ユーザが最初に送信したリクエストから、広告をクリックする直前のリクエストまでを時系列に並べたものである。或いは、コンテンツ訪問履歴は、ユーザが広告をクリックした時点から１０個前までのリクエストを時系列に並べたものでもよい。コンテンツ訪問履歴の定義は、特に上記したものに限定されるわけではない。 FIG. 13 shows input data of the bidirectional clustering means 102. There are two variables of multivariate data: “user” and “Web advertisement”. The content visit history is sequence data in which requests until a user clicks on an advertisement are arranged in time series. The content visit history is, for example, a time series of requests from the first request sent by the user on the day when the user clicks on the advertisement to the request immediately before clicking on the advertisement. Alternatively, the content visit history may be a time series of requests up to 10 before the user clicks on the advertisement. The definition of the content visit history is not particularly limited to the above.

コンテンツ訪問履歴（シーケンスデータ）は、ｙ^ｉ _ｊｋで表す。ｙ^ｉ _ｊｋは、ユーザｋが広告ｊをクリックしたというデータ点に対応したシーケンスデータであり、ユーザｋが広告ｊをクリックするまでに送信したリクエストを時系列で並べたコンテンツ訪問履歴である。ｉは、ユーザが広告をクリックしたのが何回目であるかを表している。例えば、ｙ^１ _ｊｋは、ユーザｋが広告ｊをクリックするのが１回目のときのコンテンツ訪問履歴を表し、ｙ^２ _ｊｋは、ユーザｋが広告ｊをクリックするのが２回目のときのコンテンツ訪問履歴を表している。 The content visit history (sequence data) is represented by y ⁱ _jk . y ⁱ _jk is sequence data corresponding to the data point that the user k clicked the advertisement j, and is a content visit history in which requests transmitted until the user k clicked the advertisement j are arranged in time series. i represents the number of times the user clicked on the advertisement. For example, y ¹ _jk represents the content visit history when the user k clicks the advertisement j for the first time, and y ² _jk represents the content visit when the user k clicks the advertisement j for the second time. Represents history.

図１４は、入力データをテーブル形式（行列形式）で示している。図１３に示す入力データを、行列で表すと、図１４に示すようになる。入力データの行列を、Ｄで表す。行列Ｄの行はユーザを表し、列はＷｅｂ広告を表す。行列Ｄの各要素は、０又は１の値を取る。０はユーザが広告をクリックしていないことを表し、１はユーザが広告をクリックしたことを表す。シーケンスデータは、１つのデータ点に対して１つとは限らず、１つのデータ点に複数のシーケンスデータが対応することもあり得る。 FIG. 14 shows the input data in a table format (matrix format). When the input data shown in FIG. 13 is represented by a matrix, it becomes as shown in FIG. A matrix of input data is represented by D. The rows of the matrix D represent users, and the columns represent web advertisements. Each element of the matrix D takes a value of 0 or 1. 0 indicates that the user has not clicked on the advertisement, and 1 indicates that the user has clicked on the advertisement. The sequence data is not limited to one for one data point, and a plurality of sequence data may correspond to one data point.

図１５は、双方向クラスタリング手段１０２のクラスタリング結果を示している。双方向クラスタリング手段１０２は、図１４に示す多変量データ及びシーケンスデータに対して双方向クラスタリングを行うことで、入力データを、図１５に示す６個のクラスタＤ１１〜Ｄ１３、Ｄ２１〜Ｄ２３に分割する。 FIG. 15 shows the clustering result of the bidirectional clustering means 102. The bidirectional clustering means 102 divides input data into six clusters D11 to D13 and D21 to D23 shown in FIG. 15 by performing bidirectional clustering on the multivariate data and sequence data shown in FIG. .

図１５に示すＤ１１〜Ｄ１３、Ｄ２１〜Ｄ２３について、各クラスタのコストを計算すると、
Ｃ（Ｄ１１）＝１ｌｏｇ（４／１）＋３ｌｏｇ（４／３）＋ＤＬ（ｙ^１ _１Ａ、ｙ^１ _５Ａ、ｙ^１ _１Ｄ）＝０．９７７＋ＤＬ（ｙ^１ _１Ａ、ｙ^１ _５Ａ、ｙ^１ _１Ｄ）
Ｃ（Ｄ１２）＝０．０
Ｃ（Ｄ１３）＝０．０
Ｃ（Ｄ２１）＝０．０
Ｃ（Ｄ２２）＝３ｌｏｇ（９／３）＋６ｌｏｇ（９／６）＋ＤＬ（ｙ^１ _２Ｂ、ｙ^２ _２Ｂ、ｙ^１ _２８Ｂ、ｙ^１ _２Ｅ、ｙ^２ _２８Ｅ、ｙ^１ _２Ｃ、ｙ^１ _３０Ｃ）＝２．４８＋ＤＬ（ｙ^１ _２Ｂ、ｙ^２ _２Ｂ、ｙ^１ _２８Ｂ、ｙ^１ _２Ｅ、ｙ^２ _２８Ｅ、ｙ^１ _２Ｃ、ｙ^１ _３０Ｃ）
Ｃ（Ｄ２３）＝０．０
となる。全体のコストＴは、
Ｔ＝ΣＣ（Ｄｉｊ）＝３．４６＋ＤＬ（ｙ^１ _１Ａ、ｙ^１ _５Ａ、ｙ^１ _１Ｄ、ｙ^１ _５Ｄ）＋ＤＬ（ｙ^１ _２Ｂ、ｙ^２ _２Ｂ、ｙ^１ _２８Ｂ、ｙ^１ _２Ｅ、ｙ^２ _２８Ｅ、ｙ^１ _２Ｃ、ｙ^１ _３０Ｃ）
となる。 When the cost of each cluster is calculated for D11 to D13 and D21 to D23 shown in FIG.
C (D11) = 1 log (4/1) +3 log (4/3) + DL (y ¹ _1A , y ¹ _5A , y ¹ _1D ) = 0.997 + DL (y ¹ _1A , y ¹ _5A , y ¹ _1D )
C (D12) = 0.0
C (D13) = 0.0
C (D21) = 0.0
C (D22) = 3 log (9/3) +6 log (9/6) + DL (y ¹ _2B , y ² _2B , y ¹ _28B , y ¹ _2E , y ² _28E , y ¹ _2C , y ¹ _30C ) = 2. 48 + DL (y ¹ _2B , y ² _2B , y ¹ _28B , y ¹ _2E , y ² _28E , y ¹ _2C , y ¹ _30C )
C (D23) = 0.0
It becomes. The total cost T is
T = ΣC (Dij) = 3.46 + DL (y ¹ _1A , y ¹ _5A , y ¹ _1D , y ¹ _5D ) + DL (y ¹ _2B , y ² _2B , y ¹ _28B , y ¹ _2E , y ² _28E , y ¹ _2C , y ¹ _30C )
It becomes.

続いて、双方向クラスタリング結果を用いた広告配信処理を説明する。図１６は、広告配信処理の手順を示している。ユーザ端末２００のコンテンツリクエスト手段２０１は、Ｗｅｂサーバ３００にコンテンツをリクエストする（ステップＣ１）。リクエスト受付手段３０６は、ユーザ端末２００からのリクエストを受け付ける。ユーザリクエスト記憶部３０２は、リクエスト受付手段３０６が受け付けたリクエストを記憶する（ステップＣ２）。 Subsequently, an advertisement distribution process using the bidirectional clustering result will be described. FIG. 16 shows the procedure of the advertisement distribution process. The content request unit 201 of the user terminal 200 requests content from the Web server 300 (step C1). The request receiving unit 306 receives a request from the user terminal 200. The user request storage unit 302 stores the request received by the request receiving unit 306 (step C2).

コンテンツ配信手段３０１は、リクエスト受付手段３０６からリクエストを受け取る。コンテンツ配信手段３０１は、リクエストを送信したユーザを識別する情報、例えばユーザ名を広告選択手段３０４に渡す（ステップＣ３）。広告選択手段３０４は、クラスタリング結果記憶部３１０から、ユーザが属するクラスタの情報を読み出す（ステップＣ４）。クラスタリング結果記憶部３１０は、ステップＣ４では、ユーザが所属するクラスタに所属するユーザのユーザ名と、所属クラスのユーザがクリックしたＷｅｂ広告を識別する情報とを読み出す。 The content distribution unit 301 receives a request from the request reception unit 306. The content distribution unit 301 passes information for identifying the user who transmitted the request, for example, the user name to the advertisement selection unit 304 (step C3). The advertisement selection unit 304 reads the cluster information to which the user belongs from the clustering result storage unit 310 (step C4). In step C4, the clustering result storage unit 310 reads the user name of the user belonging to the cluster to which the user belongs and the information for identifying the Web advertisement clicked by the user of the belonging class.

広告選択手段３０４は、ステップＣ４で読み出した情報に基づいて、コンテンツをリクエストしたユーザに配信すべきＷｅｂ広告を決定する（ステップＣ５）。広告選択手段３０４は、同じクラスタに所属するユーザがクリックした広告を、ユーザに配信する広告の候補とし、その候補の中から、ユーザに配信する広告を決定する。広告選択手段３０４は、広告の決定では、他のユーザはクリックしたが、コンテンツをリクエストしたユーザがクリックしていない広告があるときは、その広告を、優先的に、ユーザに配信する広告として決定する。 The advertisement selection unit 304 determines a web advertisement to be distributed to the user who has requested the content based on the information read in step C4 (step C5). The advertisement selection unit 304 sets an advertisement clicked by a user belonging to the same cluster as an advertisement candidate to be distributed to the user, and determines an advertisement to be distributed to the user from the candidates. In the advertisement determination, the advertisement selection unit 304 determines that the advertisement is preferentially delivered to the user when there is an advertisement that is clicked by another user but is not clicked by the user who has requested the content. To do.

図１７は、広告配信候補を示している。クラスタ分割結果として、図１５に示すクラスタ分割結果が得られているとき、各ユーザに配信すべき広告の候補は、図１７に示すようになる。図１７において、Ｗｅｂ広告配信候補の並び順は、優先順位が高い順とする。例えば、クラスタＤ１１を考える。図１５を参照すると、このクラスタに所属するユーザは、ユーザＡとユーザＤの二人である。また、ユーザＡは、広告１と広告５とをクリックし、ユーザＤは、広告１をクリックしている。 FIG. 17 shows advertisement distribution candidates. When the cluster division result shown in FIG. 15 is obtained as the cluster division result, advertisement candidates to be distributed to each user are as shown in FIG. In FIG. 17, the order in which the Web advertisement distribution candidates are arranged is the order of priority. For example, consider cluster D11. Referring to FIG. 15, there are two users belonging to this cluster, user A and user D. User A clicks advertisement 1 and advertisement 5, and user D clicks advertisement 1.

同じクラスタに所属するユーザは、Ｗｅｂ広告に関して好みが似通っていており、そのクラスタに属するＷｅｂ広告群に興味があると考えられる。また、双方向クラスタ分割装置１００は、どのＷｅｂページにアクセスしてから広告をクリックしたかというシーケンスデータも用いて双方向クラスタリングを行うので、同じクラスタに所属するユーザは、コンテンツ訪問履歴に関しても、共通した特徴が多く持つと考えられる。このため、コンテンツをリクエストしたユーザに対して、同じクラスタに所属するユーザのうちの少なくとも一人がクリックしたことがある広告を配信すれば、広告の配信を受けたユーザは、その広告をクリックすると予測できる。 Users belonging to the same cluster have similar preferences regarding Web advertisements and are considered to be interested in Web advertisement groups belonging to the cluster. In addition, since the interactive cluster dividing apparatus 100 performs bidirectional clustering using the sequence data indicating which Web page was accessed and then clicked on the advertisement, users belonging to the same cluster can There are many common features. For this reason, if an advertisement that has been clicked by at least one of the users belonging to the same cluster is distributed to the user who requested the content, the user who received the advertisement is predicted to click the advertisement. it can.

クラスタＤ１１に所属するユーザは、広告１と広告５とをクリックしたことがあるので、ユーザＡとユーザＤとに配信する広告の候補は、広告１と広告５とする。ユーザＤは、広告５をクリックしたことがないので、広告選択手段３０４は、広告５の優先順位を広告１の優先順位よりも高くする。広告選択手段３０４は、優先順位に従って、広告５、広告１の順で、ユーザＤに配信すべき広告を決定する。ユーザＡは、広告１と広告５とをクリックしているので、特に優先順位はない。広告選択手段３０４は、ユーザＡに対しては、広告１と広告５との何れかを、ランダムに、ユーザＡに配信すべき広告として決定すればよい。 Since the user belonging to the cluster D11 has clicked on the advertisement 1 and the advertisement 5, the advertisement candidates to be distributed to the user A and the user D are the advertisement 1 and the advertisement 5. Since the user D has never clicked the advertisement 5, the advertisement selection unit 304 makes the priority of the advertisement 5 higher than the priority of the advertisement 1. The advertisement selection unit 304 determines advertisements to be distributed to the user D in the order of the advertisement 5 and the advertisement 1 in accordance with the priority order. Since user A clicks advertisement 1 and advertisement 5, there is no particular priority. The advertisement selection unit 304 may determine, for the user A, any one of the advertisement 1 and the advertisement 5 as an advertisement to be distributed to the user A at random.

図１６に戻り、広告選択手段３０４は、配信する広告を決定すると、広告記憶部３０５からＷｅｂ広告を読み出し、コンテンツ配信手段３０１に与える。広告選択手段３０４は、決定したＷｅｂ広告を識別する情報をコンテンツ配信手段３０１に渡し、コンテンツ配信手段３０１が、広告記憶部３０５からＷｅｂコンテンツを読み出してもよい。 Returning to FIG. 16, when the advertisement selection unit 304 determines an advertisement to be distributed, the advertisement selection unit 304 reads a Web advertisement from the advertisement storage unit 305 and supplies the Web advertisement to the content distribution unit 301. The advertisement selection unit 304 may pass information for identifying the determined Web advertisement to the content distribution unit 301, and the content distribution unit 301 may read the Web content from the advertisement storage unit 305.

コンテンツ配信手段３０１は、コンテンツ記憶部３０３から、ユーザがリクエストしたコンテンツを読み出す（ステップＣ６）。コンテンツ配信手段３０１は、広告選択手段３０４が決定したＷｅｂ広告を、読み出したコンテンツに付け加える（ステップＣ７）。コンテンツ配信手段３０１は、Ｗｅｂ広告を付け加えたＷｅｂコンテンツを、ユーザ端末２００に送信する（ステップＣ８）。ユーザ端末２００のコンテンツ表示手段２０２は、コンテンツ配信手段３０１が送信した、Ｗｅｂ広告を含むＷｅｂコンテンツを表示する（ステップＣ９）。 The content distribution unit 301 reads the content requested by the user from the content storage unit 303 (step C6). The content distribution unit 301 adds the Web advertisement determined by the advertisement selection unit 304 to the read content (step C7). The content distribution unit 301 transmits the Web content added with the Web advertisement to the user terminal 200 (Step C8). The content display unit 202 of the user terminal 200 displays the web content including the web advertisement transmitted by the content distribution unit 301 (step C9).

図１８は、Ｗｅｂ広告が付加されたＷｅｂページを示している。コンテンツ配信手段３０１は、Ｗｅｂページ９０１内に、広告表示領域９０２を設け、その広告表示領域９０２内に、Ｗｅｂ広告を埋め込む。Ｗｅｂ広告の配信は、特にここで記載したものには限定されない。例えば、ＷｅｂコンテンツにＷｅｂ広告を埋め込まずに、Ｗｅｂコンテンツとは別に、Ｗｅｂ広告を配信する形でもよい。 FIG. 18 shows a Web page to which a Web advertisement is added. The content distribution unit 301 provides an advertisement display area 902 in the web page 901 and embeds a web advertisement in the advertisement display area 902. Web advertisement distribution is not particularly limited to the one described here. For example, the web advertisement may be distributed separately from the web content without embedding the web advertisement in the web content.

本実施形態では、双方向クラスタ分割装置は、どのユーザがどのＷｅｂ広告をクリックしたかというデータを多変量データとし、ユーザがＷｅｂ広告をクリックするまでのコンテンツ訪問履歴をシーケンスデータとして、多変量データとシーケンスデータとに対し、双方向クラスタリングを行う。ユーザの特徴は、どのＷｅｂ広告をリクエストしたかという情報に加えて、どのようにＷｅｂ広告をクリックしたかという情報にも現れる。本実施形態では、多変量データとシーケンスデータとを同時に扱い、それらに対して双方向クラスタリングを行うので、ユーザの特徴や好みを、より正確に抽出できることが期待できる。また、そのような双方向クラスタリングを行った結果を用いて、ユーザに配信する広告を決定することで、ユーザが広告をクリックすることが期待できる。 In this embodiment, the bi-directional cluster partitioning device uses multivariate data as data indicating which user clicked which web advertisement, and multi-variate data using the content visit history until the user clicked the web advertisement as sequence data. Bidirectional clustering is performed on the sequence data. The feature of the user also appears in the information on how to click on the Web advertisement in addition to the information on which Web advertisement is requested. In this embodiment, since multivariate data and sequence data are handled simultaneously and bidirectional clustering is performed on them, it can be expected that user characteristics and preferences can be extracted more accurately. Moreover, it can be expected that the user clicks on the advertisement by determining the advertisement to be distributed to the user using the result of such bidirectional clustering.

図１９は、本発明の第３実施形態に係る商品推薦システムを示している。商品推薦システムは、サーバシステム６００を有する。サーバシステム６００は、双方向クラスタ分割装置１００と、データ生成手段６０１と、推薦商品リスト生成手段６０２と、クラスタリング結果記憶部６０３とを有する。サーバシステム６００は、クライアントシステム５０１〜５０３と、ネットワーク４０１を介して接続されている。クライアントシステム５０１〜５０３は、例えば、小売店に設置される売上管理システムである。サーバシステム６００は、小売店の情報を束ねる中央管理システムであり、データセンタなどに設置される。 FIG. 19 shows a product recommendation system according to the third embodiment of the present invention. The product recommendation system has a server system 600. The server system 600 includes a bidirectional cluster dividing device 100, a data generation unit 601, a recommended product list generation unit 602, and a clustering result storage unit 603. A server system 600 is connected to client systems 501 to 503 via a network 401. The client systems 501 to 503 are sales management systems installed in retail stores, for example. The server system 600 is a central management system that bundles retail store information, and is installed in a data center or the like.

クライアントシステム５０１〜５０３は、各店舗の売上情報を管理する。売上情報は、例えば、顧客名と、顧客が購入した商品名と、購入日時に関する情報とを含む。サーバシステム６００は、クライアントシステムからどの顧客がどの商品を購入したかを示すデータを含む顧客の購入情報を収集する。サーバシステム６００は、収集した情報を用いて双方向クラスタリングを行う。多変量データとしてこのような情報を用いる場合、双方向クラスタリング結果を、小売業のマーケティングなどに利用することができる。サーバシステム６００は、双方向クラスタリング結果を用いて、顧客に対して今後推薦する商品を決定する。サーバシステム６００は、推薦商品の情報を、各店舗のクライアントシステム５０１〜５０３に送信する。 The client systems 501 to 503 manage sales information of each store. The sales information includes, for example, a customer name, a product name purchased by the customer, and information related to the purchase date and time. The server system 600 collects customer purchase information including data indicating which customers have purchased which products from the client system. The server system 600 performs bidirectional clustering using the collected information. When such information is used as multivariate data, the bidirectional clustering result can be used for retail marketing or the like. The server system 600 determines a product to be recommended to the customer in the future using the bidirectional clustering result. The server system 600 transmits recommended product information to the client systems 501 to 503 of each store.

データ生成手段６０１は、クライアントシステム５０１〜５０３から顧客の購入情報を収集する。データ生成手段６０１は、収集した顧客の購入情報に基づいて、顧客と商品とを変量とし、顧客が商品を購入したか否かを示す多変量データを生成する。また、データ生成手段６０１は、多変量データに対応して、顧客が商品を購入したことに関する履歴を時系列で並べたシーケンスデータを生成する。ここでは、シーケンスデータは、顧客の商品購入曜日を時系列に並べた購入曜日履歴であるとする。 The data generation unit 601 collects customer purchase information from the client systems 501 to 503. Based on the collected customer purchase information, the data generation unit 601 uses the customer and the product as variables, and generates multivariate data indicating whether the customer has purchased the product. Further, the data generation unit 601 generates sequence data in which histories related to the purchase of a product by a customer are arranged in time series corresponding to the multivariate data. Here, it is assumed that the sequence data is a purchase day history in which the customer's product purchase days are arranged in time series.

双方向クラスタ分割装置１００の構成は、図１に示す第１実施形態における双方向クラスタ分割装置の構成と同様である。双方向クラスタ分割装置１００は、データ生成手段６０１が生成した多変量データとシーケンスデータとに対して双方向クラスタリングを行う。双方向双方向クラスタ分割装置１００は、双方向クラスタリング結果を、クラスタリング結果記憶部６０３に記憶する。推薦商品リスト生成手段６０２は、クラスタリング結果記憶部６０３記憶するクラスタリング結果を用いて、顧客に推薦する商品のリストを生成する。 The configuration of the bidirectional cluster dividing device 100 is the same as the configuration of the bidirectional cluster dividing device in the first embodiment shown in FIG. The bidirectional cluster dividing device 100 performs bidirectional clustering on the multivariate data and sequence data generated by the data generation unit 601. The bidirectional bidirectional cluster dividing device 100 stores the bidirectional clustering result in the clustering result storage unit 603. The recommended product list generation unit 602 generates a list of products recommended to the customer using the clustering result stored in the clustering result storage unit 603.

図２０は、動作手順を示している。クライアントシステム５０１〜５０３は、それぞれ、ネットワーク４０１を介して、サーバシステム６００に、顧客の購入情報を送信する（ステップＤ１）。サーバシステム６００は、各クライアントから、顧客の購入情報を受け取る。各クライアントがサーバシステム６００に顧客の購入情報を送信するタイミングは、クライアントごとに異なっていてもよい。 FIG. 20 shows an operation procedure. Each of the client systems 501 to 503 transmits customer purchase information to the server system 600 via the network 401 (step D1). The server system 600 receives customer purchase information from each client. The timing at which each client transmits customer purchase information to the server system 600 may be different for each client.

データ生成手段６０１は、どの顧客がどの商品を購入したかを示す多変量データと、顧客が商品を購入した曜日の履歴を示す購入曜日履歴とを生成する。データ生成手段６０１は、生成した多変量データと購入曜日履歴とを、双方向クラスタ分割装置１００に出力する（ステップＤ２）。 The data generation means 601 generates multivariate data indicating which customers have purchased which products and purchase day history indicating the history of the days on which the customers have purchased the products. The data generation unit 601 outputs the generated multivariate data and purchase day history to the bidirectional cluster dividing device 100 (step D2).

双方向クラスタ分割装置１００の双方向クラスタリング手段１０２（図１）は、入力手段１０１を介して、多変量データと、購入曜日履歴とを入力する。双方向クラスタリング手段１０２は、多変量データと、購入曜日履歴とに対し、双方向クラスタリングを行う（ステップＤ３）。双方向クラスタリングの手順は、図２に示す手順と同様である。双方向クラスタ分割装置１００は、双方向クラスタリング結果をクラスタリング結果記憶部６０３に送り、双方向クラスタリング結果を、クラスタリング結果記憶部６０３に記憶する（ステップＤ４）。 Bidirectional clustering means 102 (FIG. 1) of bidirectional cluster dividing apparatus 100 inputs multivariate data and purchase day history via input means 101. The bidirectional clustering means 102 performs bidirectional clustering on the multivariate data and the purchase day history (step D3). The procedure of bidirectional clustering is the same as the procedure shown in FIG. The bidirectional cluster dividing device 100 sends the bidirectional clustering result to the clustering result storage unit 603, and stores the bidirectional clustering result in the clustering result storage unit 603 (step D4).

推薦商品リスト生成手段６０２は、双方向クラスタリング結果記憶部６０３から双方向クラスタリング結果を読み出し、顧客ごとの推薦商品リストを生成する（ステップＤ５）。推薦商品リスト生成手段６０２は、ステップＤ５では、クラスタごとに、そのクラスタに所属する顧客のうちの少なくとも一人が購入した商品を調べる。推薦商品リスト生成手段６０２は、顧客ごとに、当該顧客が所属するクラスタに所属する顧客のうちの少なくとも一人が購入した商品のうち、当該顧客が購入していない商品を、推薦商品リストに含める。 The recommended product list generation unit 602 reads the bidirectional clustering result from the bidirectional clustering result storage unit 603, and generates a recommended product list for each customer (step D5). In step D5, the recommended product list generation unit 602 checks, for each cluster, products purchased by at least one of the customers belonging to the cluster. For each customer, the recommended product list generation unit 602 includes, in the recommended product list, products that the customer has not purchased among products purchased by at least one of the customers who belong to the cluster to which the customer belongs.

推薦商品リスト生成手段６０２は、推薦商品リストをクライアントシステム５０１〜５０３に送信する（ステップＤ６）。クライアントシステム５０１〜５０３は、各顧客に対する推薦商品リストを、サーバシステム６００から受信する（ステップＤ７）。 The recommended product list generation unit 602 transmits the recommended product list to the client systems 501 to 503 (step D6). The client systems 501 to 503 receive the recommended product list for each customer from the server system 600 (step D7).

図２１は、双方向クラスタリング手段１０２の入力データを示している。多変量データの変量は、「顧客」と、「商品」との２つである。購入曜日履歴は、例えば１月単位で、顧客が商品を購入した曜日の履歴を時系列で並べたシーケンスデータである。図２２は、双方向クラスタリング結果を示している。双方向クラスタリング手段１０２が、図２１に示す入力データに対して双方向クラスタリングを行うことで、図２２に示す、２×３＝６つのクラスタが得られたとする。 FIG. 21 shows input data of the bidirectional clustering means 102. There are two variables of multivariate data: “customer” and “product”. The purchase day history is sequence data in which, for example, in January, the customer's purchase day history is arranged in time series. FIG. 22 shows the bidirectional clustering result. Assume that the bi-directional clustering means 102 performs bi-directional clustering on the input data shown in FIG. 21 to obtain 2 × 3 = 6 clusters shown in FIG.

図２３は、推薦商品リストを示している。クラスタ分割結果として、図２２に示すクラスタ分割結果が得られているとき、各顧客に推薦すべき商品のリスト（推薦商品候補）は、図２３に示すようになる。例えば、クラスタＤ１１を考える。図２２を参照すると、このクラスタに所属する顧客は、顧客Ａと顧客Ｄの二人である。また、顧客Ａは、商品１と商品５とを購入し、顧客Ｄは、商品１を購入している。 FIG. 23 shows a recommended product list. When the cluster division result shown in FIG. 22 is obtained as the cluster division result, a list of products to be recommended to each customer (recommended product candidates) is as shown in FIG. For example, consider cluster D11. Referring to FIG. 22, there are two customers belonging to this cluster, customer A and customer D. Customer A purchases product 1 and product 5, and customer D purchases product 1.

本実施形態では、顧客、商品、購入曜日履歴に対して双方向クラスタリングを行っており、双方向クラスタリングを行うことで、同じ商品に興味があり、また、商品の購入曜日履歴も類似する顧客を、各クラスタに集めることができる。同じクラスタに所属する顧客は、購入商品に関して好みが似通っていており、また、商品を購入する曜日履歴も共通した特徴が多く含まれていると考えられる。従って、あるクラスタに属する商品に関連したお勧め商品を、そのクラスタに属する顧客に対してお勧めすると、顧客が商品を購入することが期待できる。 In the present embodiment, bi-directional clustering is performed on customers, products, and purchase day histories. By bi-directional clustering, customers who are interested in the same product and have similar purchase day histories of products can be obtained. Can be collected in each cluster. Customers belonging to the same cluster have similar preferences regarding purchased products, and it is considered that there are many common features in the day of the week history of purchasing products. Therefore, if a recommended product related to a product belonging to a cluster is recommended to a customer belonging to the cluster, it can be expected that the customer purchases the product.

推薦商品リスト生成手段６０２は、クラスタＤ１１に所属する顧客は、商品１と商品５とを購入しているので、顧客Ａと顧客Ｄとに推薦する商品を、商品１と商品５との中から選ぶ。顧客Ｄは、商品５を購入していないので、推薦商品リスト生成手段６０２は、顧客Ｄに推薦する商品を商品５と決定する。顧客Ａは、商品１と商品５とを既に購入しているので、推薦商品リスト生成手段６０２は、顧客Ａに推薦する商品はないと判断する。 Since the customer belonging to the cluster D11 has purchased the product 1 and the product 5, the recommended product list generation unit 602 selects the product recommended for the customer A and the customer D from the product 1 and the product 5. Choose. Since the customer D has not purchased the product 5, the recommended product list generation unit 602 determines the product recommended for the customer D as the product 5. Since the customer A has already purchased the product 1 and the product 5, the recommended product list generation unit 602 determines that there is no product recommended for the customer A.

本実施形態では、双方向クラスタ分割装置１００は、どの顧客がどの商品を購入したかというデータを多変量データとし、顧客が商品を購入した曜日の履歴をシーケンスデータとして、多変量データとシーケンスデータとに対し、双方向クラスタリングを行う。顧客の特徴は、どの商品を購入したかという情報に加えて、どのような曜日履歴で商品を購入したかという情報にも現れる。本実施形態では、多変量データとシーケンスデータとを同時に扱い、それらに対して双方向クラスタリングを行うので、ユーザの特徴や好みを、より正確に抽出できることが期待できる。また、そのような双方向クラスタリングを行った結果を用いて、顧客に推薦する商品を決定することで、ユーザがその後購入することを期待できる商品を、推薦商品とすることができる。 In the present embodiment, the bidirectional cluster dividing device 100 uses multivariate data as data indicating which customers have purchased which products, and multivariate data and sequence data using the history of days of the week when customers have purchased products as sequence data. And bi-directional clustering. In addition to the information on which products have been purchased, the customer characteristics also appear in the information on what day of the week the products have been purchased. In this embodiment, since multivariate data and sequence data are handled simultaneously and bidirectional clustering is performed on them, it can be expected that user characteristics and preferences can be extracted more accurately. Further, by determining a product to be recommended to a customer using the result of performing such bidirectional clustering, a product that the user can expect to purchase can be set as a recommended product.

図２４は、本発明の第４実施形態に係る故障予測システムを示している。故障予測システムは、サーバシステム８００を有する。サーバシステム８００は、双方向クラスタ分割装置１００と、データ生成手段８０１と、故障予測候補リスト生成手段８０２と、クラスタリング結果記憶部８０３とを有する。サーバシステム８００は、クライアントシステム７０１〜７０３と、ネットワーク４０２を介して接続されている。クライアントシステム７０１〜７０３は、例えば、自動車販売店や修理工場に設置されている。サーバシステム８００は、中央管理システムであり、データセンタなどに設置される。 FIG. 24 shows a failure prediction system according to the fourth embodiment of the present invention. The failure prediction system has a server system 800. The server system 800 includes a bidirectional cluster dividing device 100, a data generation unit 801, a failure prediction candidate list generation unit 802, and a clustering result storage unit 803. The server system 800 is connected to client systems 701 to 703 via a network 402. The client systems 701 to 703 are installed in, for example, car dealers and repair shops. The server system 800 is a central management system and is installed in a data center or the like.

クライアントシステム７０１〜７０３は、自動車の故障情報を管理する。故障情報は、車種と故障個所（故障部品）とを含む。例えば、各車種に対して、複数の地域で故障が起きており、クライアントシステム７０１〜７０３は、車種ごとに故障が起こった部品の故障履歴を蓄積しているとする。サーバシステム８００は、クライアントシステム７０１〜７０３から、故障情報を収集する。サーバシステム８００は、クライアントシステムから収集した情報を用いて、双方向クラスタリングを行う。サーバシステム８００は、双方向クラスタリング結果を用いて、故障予測を行い、予測結果をクライアントシステム７０１〜７０３に送信する。 The client systems 701 to 703 manage vehicle failure information. The failure information includes a vehicle type and a failure location (failed part). For example, it is assumed that failures have occurred in a plurality of regions for each vehicle type, and the client systems 701 to 703 store failure histories of parts that have failed for each vehicle type. The server system 800 collects failure information from the client systems 701 to 703. The server system 800 performs bidirectional clustering using information collected from the client system. The server system 800 performs failure prediction using the bidirectional clustering result, and transmits the prediction result to the client systems 701 to 703.

データ生成手段８０１は、クライアントシステム７０１〜７０３から車種ごとの故障情報を収集する。データ生成手段８０１は、収集した故障情報に基づいて、車種と地域とを変量とし、当該車種に当該地域で故障が発生したか否かを示す多変量データを生成する。また、データ生成手段８０１は、多変量データに対応して、当該車種で故障が発生したことに関する履歴を時系列で並べたシーケンスデータを生成する。ここでは、シーケンスデータは、過去に故障が発生した部品を時系列に並べた故障部品履歴であるとする。 The data generation unit 801 collects failure information for each vehicle type from the client systems 701 to 703. The data generation unit 801 uses the vehicle type and the region as variables based on the collected failure information, and generates multivariate data indicating whether or not a failure has occurred in the vehicle type in the region. Further, the data generation unit 801 generates sequence data in which histories related to the occurrence of a failure in the vehicle type are arranged in time series corresponding to the multivariate data. Here, it is assumed that the sequence data is a failure part history in which parts that have failed in the past are arranged in time series.

双方向クラスタ分割装置１００の構成は、図１に示す第１実施形態における双方向クラスタ分割装置の構成と同様である。双方向クラスタ分割装置１００は、車種ごとの故障発生地域と、故障部品履歴とに対して双方向クラスタリングを行う。クラスタリング結果記憶部８０３は、双方向クラスタ分割装置１００のクラスタリング結果を記憶する。故障予測候補リスト生成手段８０２は、クラスタリング結果記憶部８０３が記憶するクラスタリング結果を用いて、車種と地域とに対して、今後故障が発生すると予測される部品のリストを生成する。 The configuration of the bidirectional cluster dividing device 100 is the same as the configuration of the bidirectional cluster dividing device in the first embodiment shown in FIG. The bidirectional cluster dividing device 100 performs bidirectional clustering on the failure occurrence area for each vehicle type and the failure part history. The clustering result storage unit 803 stores the clustering result of the bidirectional cluster dividing device 100. The failure prediction candidate list generation unit 802 generates a list of parts that are predicted to have a failure in the future for the vehicle type and the region, using the clustering result stored in the clustering result storage unit 803.

図２５は、動作手順を示している。クライアントシステム７０１〜７０３は、それぞれ、ネットワーク４０２を介して、サーバシステム８００に、故障情報を送信する（ステップＥ１）。サーバシステム８００は、各クライアントから、故障情報を受け取る。クライアントシステム７０１〜７０３が管理する故障情報は地域が異なっており、サーバシステム８００は、どのクライアントから故障情報を受信したかに応じて、故障が発生した地域が判別可能であるとする。或いは、故障情報が地域に関する情報を含んでいてもよい。各クライアントがサーバシステム８００に故障情報を送信するタイミングは、クライアントごとに異なっていてもよい。 FIG. 25 shows an operation procedure. Each of the client systems 701 to 703 transmits failure information to the server system 800 via the network 402 (step E1). The server system 800 receives failure information from each client. It is assumed that failure information managed by the client systems 701 to 703 has different regions, and the server system 800 can determine the region where the failure has occurred, depending on from which client the failure information is received. Alternatively, the failure information may include information regarding the area. The timing at which each client transmits failure information to the server system 800 may be different for each client.

データ生成手段８０１は、どの車種にどの地域で故障が発生しているかを示す多変量データと、当該車種で過去に故障が発生した部品の履歴を示す故障部品履歴とを生成する。データ生成手段８０１は、生成した多変量データと故障部品履歴とを、双方向クラスタ分割装置１００に出力する（ステップＥ２）。 The data generation unit 801 generates multivariate data that indicates in which vehicle a failure has occurred in which vehicle type, and a failure component history that indicates a history of components that have previously failed in that vehicle type. The data generation unit 801 outputs the generated multivariate data and the failure part history to the bidirectional cluster dividing device 100 (step E2).

双方向クラスタ分割装置１００の双方向クラスタリング手段１０２（図１）は、入力手段１０１を介して、多変量データと、故障部品履歴とを入力する。双方向クラスタリング手段１０２は、多変量データと、故障部品履歴とに対し、双方向クラスタリングを行う（ステップＥ３）。双方向クラスタリングの手順は、図２に示す手順と同様である。双方向クラスタ分割装置１００は、双方向クラスタリング結果をクラスタリング結果記憶部８０３に送り、双方向クラスタリング結果を、クラスタリング結果記憶部８０３に記憶する（ステップＥ４）。 Bidirectional clustering means 102 (FIG. 1) of bidirectional cluster dividing apparatus 100 inputs multivariate data and a failure part history via input means 101. The bidirectional clustering means 102 performs bidirectional clustering on the multivariate data and the failure part history (step E3). The procedure of bidirectional clustering is the same as the procedure shown in FIG. The bidirectional cluster dividing device 100 sends the bidirectional clustering result to the clustering result storage unit 803, and stores the bidirectional clustering result in the clustering result storage unit 803 (step E4).

故障予測候補リスト生成手段８０２は、双方向クラスタリング結果記憶部８０３から双方向クラスタリング結果を読み出し、車種ごとの故障予測候補リストを生成する（ステップＥ５）。故障予測候補リスト生成手段８０２は、ステップＥ５では、クラスタごとに、そのクラスタに所属する車種の少なくとも一つに故障が発生した地域を調べる。故障予測候補リスト生成手段８０２は、車種ごとに、当該車種が所属するクラスタに所属する車種のうちの少なくとも一つで故障が発生した地域のうち、当該顧客でまだ故障が発生していない地域を、故障予測候補リストに含める。 The failure prediction candidate list generation unit 802 reads the bidirectional clustering result from the bidirectional clustering result storage unit 803, and generates a failure prediction candidate list for each vehicle type (step E5). In step E5, the failure prediction candidate list generation unit 802 checks, for each cluster, an area where a failure has occurred in at least one of the vehicle types belonging to the cluster. The failure prediction candidate list generation unit 802 selects, for each vehicle type, a region where a failure has not yet occurred in the customer from among regions where a failure has occurred in at least one of the vehicle types belonging to the cluster to which the vehicle type belongs. Include in the failure prediction candidate list.

故障予測候補リスト生成手段８０２は、故障予測候補リストをクライアントシステム７０１〜７０３に送信する（ステップＥ６）。クライアントシステム７０１〜７０３は、各顧客に対する故障予測候補リストを、サーバシステム８００から受信する（ステップＥ７）。 The failure prediction candidate list generation unit 802 transmits the failure prediction candidate list to the client systems 701 to 703 (step E6). The client systems 701 to 703 receive a failure prediction candidate list for each customer from the server system 800 (step E7).

図２６は、双方向クラスタリング手段１０２の入力データを示している。多変量データの変量は、「車種」と、「地域」との２つである。故障部品履歴は、例えば１年単位で、当該車種で故障が発生した部品の履歴を時系列で並べたシーケンスデータである。或いは、故障部品履歴は、故障発生以前に故障が発生した過去の故障部品を並べたものでもよい。図２７は、双方向クラスタリング結果を示している。双方向クラスタリング手段１０２が、図２６に示す入力データに対して双方向クラスタリングを行うことで、図２７に示す、２×３＝６つのクラスタが得られたとする。 FIG. 26 shows input data of the bidirectional clustering means 102. There are two variables of the multivariate data: “car type” and “region”. The failure part history is sequence data in which, for example, in a year unit, the history of parts that have failed in the vehicle type is arranged in time series. Alternatively, the failed part history may be a list of past failed parts in which a failure occurred before the failure occurred. FIG. 27 shows the bidirectional clustering result. Assume that the bi-directional clustering means 102 performs bi-directional clustering on the input data shown in FIG. 26 to obtain 2 × 3 = 6 clusters shown in FIG.

図２８は、故障予測候補リストを示している。クラスタ分割結果として、図２７に示すクラスタ分割結果が得られているとき、各車種に故障が発生すると予測される地域のリスト（故障発生地域候補）は、図２８に示すようになる。例えば、クラスタＤ１１を考える。図２７を参照すると、このクラスタに所属する車種は、車種Ａと車種Ｄの２つである。また、車種Ａは、地域１と地域５とで故障が発生しており、車種Ｄは、地域１で故障が発生している。 FIG. 28 shows a failure prediction candidate list. When the cluster division result shown in FIG. 27 is obtained as the cluster division result, a list of areas (failure occurrence area candidates) where a failure is predicted to occur in each vehicle type is as shown in FIG. For example, consider cluster D11. Referring to FIG. 27, there are two vehicle types belonging to this cluster, vehicle type A and vehicle type D. Further, the failure of the vehicle type A occurs in the region 1 and the region 5, and the failure of the vehicle type D occurs in the region 1.

本実施形態では、車種、地域、故障部品履歴に対して双方向クラスタリングを行っており、双方向クラスタリンを行うことで、同じ地域で故障が発生し、また、故障備品履歴も類似する車種を、各クラスタに集めることができる。同じクラスタに所属する車種は、故障発生地域が同じ傾向にあり、また、故障が発生した部品履歴も共通した特徴を多く含んでいると考えられる。従って、あるクラスタに属する車種は、今後、そのクラスタに所属する地域で故障が発生すると予測できる。 In this embodiment, bi-directional clustering is performed on the vehicle type, region, and failure part history. By performing bi-directional clustering, a failure occurs in the same region, and a vehicle type that has a similar failure equipment history is also obtained. Can be collected in each cluster. The types of vehicles belonging to the same cluster tend to have the same failure occurrence area, and it is considered that the history of parts in which a failure has occurred includes many common features. Therefore, it can be predicted that a vehicle type belonging to a certain cluster will fail in an area belonging to the cluster in the future.

故障予測候補リスト生成手段８０２は、クラスタＤ１１に所属する車種は、地域１と地域５とで故障が発生しているので、故障発生地域を、地域１と地域５との中から選ぶ。車種Ｄは、既に地域１で故障が発生しているので、故障予測候補リスト生成手段８０２は、車種Ｄで故障の発生が予測される地域を地域５と決定する。車種Ａは、既に地域１と地域５とで故障が発生しているので、故障予測候補リスト生成手段８０２は、車種Ａに今後故障が発生すると予測される地域はないと判断する。 The failure prediction candidate list generation means 802 selects the failure occurrence region from the regions 1 and 5 because the vehicle type belonging to the cluster D11 has a failure in the region 1 and the region 5. Since the failure has already occurred in the region 1 for the vehicle type D, the failure prediction candidate list generation unit 802 determines the region in which the occurrence of the failure is predicted as the region 5 in the vehicle type D. Since the vehicle type A has already failed in the region 1 and the region 5, the failure prediction candidate list generation unit 802 determines that there is no region in which the vehicle type A is predicted to have a failure in the future.

本実施形態では、双方向クラスタ分割装置１００は、どの地域でどの車種に故障が発生しているかというデータを多変量データとし、故障発生部品の履歴をシーケンスデータとして、多変量データとシーケンスデータとに対し、双方向クラスタリングを行う。多変量データとシーケンスデータとに対して双方向クラスタリングを行うことで、車種、地域、故障部品履歴に共通した特徴を持つクラスタに分割することができ、車種と地域で共通の特徴をもつクラスタを発見することができる。クラスタリング結果から、車種ごとに、今後、故障が発生すると予測される地域を予測することができる。サーバシステム８００から、故障発生が予測される地域のクライアントシステムに対してどの車種でどのような故障が発生する可能性が高いかを示す情報を送信することで、故障発生に備えることができる。また、故障原因を発見するための調査を早期に行うこともできる。 In the present embodiment, the bidirectional cluster dividing device 100 uses multivariate data as data indicating which vehicle type in which region has a failure, multivariate data, and history of failure components as sequence data. In contrast, bidirectional clustering is performed. By performing bi-directional clustering on multivariate data and sequence data, it is possible to divide into clusters with common characteristics in vehicle type, region, and fault component history. Can be found. From the clustering result, it is possible to predict an area where a failure is predicted to occur in the future for each vehicle type. The server system 800 can prepare for the occurrence of a failure by transmitting information indicating what type of failure is likely to occur in which vehicle type to a client system in a region where the occurrence of the failure is predicted. In addition, an investigation for finding the cause of the failure can be performed at an early stage.

ここで、双方向クラスタリングでは、通常、事前にクラスタ数を設定する必要がある。本実施形態で言えば、クラスタ数は、全体で発生している故障の数を表している。双方向クラスタリングで事前にクラスタ数を設定する場合、全体として故障が何個発生しているかが不明な状態でも、事前にクラスタ数を決めなければならない。言い換えれば、クラスタリングを行うことで、発生している故障の数を知りたいにもかかわらず、発生している故障の数を事前に決めなくてはならない。本実施形態では、双方向クラスタ分割装置１００がクラスタ数算出手段１０３（図１）を有しているので、事前にクラスタ数を決めておかなくても、適切な分割数でクラスタ分割を行うことができる。応用上、双方向クラスタリングでは、データを入力するだけで、適切な数でクラスタに分割したクラスタリング結果を出力することが重要である。 Here, in bidirectional clustering, it is usually necessary to set the number of clusters in advance. In the present embodiment, the number of clusters represents the total number of failures occurring. When the number of clusters is set in advance by bidirectional clustering, the number of clusters must be determined in advance even if it is unknown how many failures have occurred as a whole. In other words, by performing clustering, it is necessary to determine in advance the number of faults that have occurred, even though it is desired to know the number of faults that have occurred. In the present embodiment, since the bidirectional cluster dividing apparatus 100 has the cluster number calculation means 103 (FIG. 1), cluster division is performed with an appropriate number of divisions even if the number of clusters is not determined in advance. Can do. In application, in bi-directional clustering, it is important to output a clustering result divided into clusters by an appropriate number just by inputting data.

なお、上記各実施形態では、多変量データの変量を２つとしているが、変量の数は２つには限定されない。また、多変量データ及びシーケンスデータとの組み合わせは、上記各実施形態で用いたものには限定されない。例えば、多変量データの変量として「顧客」、「商品」を用い、シーケンスデータとして「商品購入履歴」を用いてもよい。或いは、多変量データの変量として「顧客」、「会社名」を用い、シーケンスデータとして「転職履歴」を用いることや、多変量データの変量として「商品」、「Ｗｅｂページ」を用い、シーケンスデータとして「ｗｅｂページで各商品を紹介キャンペーンした日時の履歴」を用いてもよい。更には、多変量データの変量として「部品」、「部品製造会社」を用い、シーケンスデータとして、「部品製造会社が部品を配送した履歴」用いることも可能であり、また、多変量データの変量として「インターネットウィルス名」、「インターネットウィルスの感染が確認された地域」を用い、シーケンスデータとして「１日にウィルスに感染したと報告のあった数の履歴」を用いることもできる。 In each of the above embodiments, the number of variables of multivariate data is two, but the number of variables is not limited to two. Further, the combination of multivariate data and sequence data is not limited to that used in the above embodiments. For example, “customer” and “product” may be used as the variables of the multivariate data, and “product purchase history” may be used as the sequence data. Or, use “customer” and “company name” as variables of multivariate data, use “change of job history” as sequence data, and use “product” and “Web page” as variables of multivariate data. As “history of the date and time when each product was introduced and introduced on the web page” may be used. Furthermore, it is possible to use “parts” and “part manufacturers” as variables of multivariate data, and “history of parts delivered by parts manufacturers” can be used as sequence data. “Internet virus name” and “region where Internet virus infection was confirmed” can be used as the sequence data, and “the history of the number of virus infections reported per day” can be used as the sequence data.

図１では、双方向クラスタ分割装置１００はクラスタ数算出手段１０３を有しているが、クラスタ数算出手段１０３を持たない構成も可能である。その場合、双方向クラスタリング手段１０２は、事前に設定されたクラスタ分割数で、クラスタ分割を行えばよい。また、双方向クラスタリング手段１０２と、クラスタ数算出手段１０３とは、同一の装置が備えている必要はなく、双方向クラスタリング手段１０２と、クラスタ数算出手段１０３とを別の装置に分けて、クラスタリングの実行と、クラスタリング結果の評価とを、異なる装置で行ってもよい。 In FIG. 1, the bidirectional cluster dividing device 100 includes the cluster number calculating unit 103, but a configuration without the cluster number calculating unit 103 is also possible. In this case, the bidirectional clustering means 102 may perform cluster division with a preset number of cluster divisions. Further, the bidirectional clustering means 102 and the cluster number calculating means 103 do not have to be provided in the same apparatus, and the bidirectional clustering means 102 and the cluster number calculating means 103 are divided into separate apparatuses, and clustering is performed. And the evaluation of the clustering result may be performed by different apparatuses.

上記各実施形態では、外部から、多変量データとシーケンスデータとを双方向クラスタ分割装置１００に入力する例を説明したが、多変量データとシーケンスデータとの生成は、双方向クラスタ分割装置１００内で行ってもよい。例えば、第２実施形態で、Ｗｅｂサーバ３００（図１１）のクラスタリング制御手段３０７は、多変量データとシーケンスデータとの生成を行わずに、ユーザリクエスト記憶部３０２から読み出した各ユーザのリクエスト履歴を、出力装置３０８を介して双方向クラスタ分割装置１００に出力する。双方向クラスタ分割装置１００には、データ生成手段を設けておく。双方向クラスタ分割装置１００は、クラスタリング制御手段３０７から入力した情報に基づいて、どのユーザがどのＷｅｂ広告をクリックしたかを示す多変量データと、Ｗｅｂ広告をクリックするまでのコンテンツ訪問履歴とを生成し、その後、双方向クラスタリングを実施してもよい。 In each of the above embodiments, the example in which multivariate data and sequence data are input from the outside to the bidirectional cluster dividing device 100 has been described. However, the generation of multivariate data and sequence data is performed in the bidirectional cluster dividing device 100. You may go on. For example, in the second embodiment, the clustering control unit 307 of the Web server 300 (FIG. 11) does not generate multivariate data and sequence data, but stores the request history of each user read from the user request storage unit 302. , And output to the bidirectional cluster dividing device 100 via the output device 308. The bidirectional cluster dividing device 100 is provided with data generation means. Based on the information input from the clustering control unit 307, the bidirectional cluster dividing device 100 generates multivariate data indicating which user has clicked which web advertisement and content visit history until the web advertisement is clicked. Thereafter, bidirectional clustering may be performed.

以上、本発明をその好適な実施形態に基づいて説明したが、本発明の双方向クラスタ分割装置、広告配信システム、商品推薦システム、故障予測システム、方法、及び、プログラムは、上記実施形態にのみ限定されるものではなく、上記実施形態の構成から種々の修正及び変更を施したものも、本発明の範囲に含まれる。 As described above, the present invention has been described based on the preferred embodiment. However, the bidirectional cluster dividing device, the advertisement distribution system, the product recommendation system, the failure prediction system, the method, and the program of the present invention are limited to the above embodiment. The present invention is not limited, and modifications and changes made from the configuration of the above embodiment are also included in the scope of the present invention.

最後に、本発明の概要について説明する。図２９は、本発明の双方向クラスタ分割装置の概略を示している。双方向クラスタ分割装置１０は、入力手段１１と双方向クラスタリング手段１２とを有する。入力手段１１は、変量データと多変量データに対応したシーケンスデータとを入力する。双方向クラスタリング手段１２は、多変量データとシーケンスデータとに対して双方向クラスタリングを行う。双方向クラスタリング手段１２は、クラスタに含まれる各変量間とシーケンスデータ間とのそれぞれで共通した特徴が多いか少ないかを表す評価関数を用いて、多変量データとシーケンスデータとを、複数のクラスタに分割する。 Finally, the outline of the present invention will be described. FIG. 29 shows an outline of the bidirectional cluster dividing apparatus of the present invention. The bidirectional cluster dividing device 10 includes an input unit 11 and a bidirectional clustering unit 12. The input means 11 inputs variable data and sequence data corresponding to multivariate data. The bidirectional clustering means 12 performs bidirectional clustering on multivariate data and sequence data. The bi-directional clustering means 12 converts the multivariate data and the sequence data into a plurality of clusters using an evaluation function that indicates whether there are many or less common features between the variables included in the cluster and between the sequence data. Divide into

本発明では、多変量データだけでなく、多変量データに対応したシーケンスデータも同時に双方向クラスタリングする。従って、各変量間、及び、シーケンスデータ間でそれぞれ共通の特徴を持つクラスタに同時に分割することができる。また、データの特徴は、多変量データだけでなく、多変量データに対応したシーケンスデータにも現れる。このため、多変量データとシーケンスデータとを同時に扱い、双方向クラスタリングを行うことで、より正確に、多変量データ間の特徴を抽出できるとことが期待できる。 In the present invention, not only multivariate data but also sequence data corresponding to the multivariate data is simultaneously subjected to bidirectional clustering. Therefore, it is possible to simultaneously divide into clusters having common features between the variables and between the sequence data. Further, data characteristics appear not only in multivariate data but also in sequence data corresponding to the multivariate data. For this reason, it can be expected that features between multivariate data can be extracted more accurately by simultaneously handling multivariate data and sequence data and performing bidirectional clustering.

１０：双方向クラスタ分割装置
１１：入力手段
１２：双方向クラスタリング手段
１００：双方向クラスタ分割装置
１０１：入力手段
１０２：双方向クラスタリング手段
１０３：クラスタ数算出手段
１０４：出力手段
２００：ユーザ端末
２０１：コンテンツリクエスト手段
２０２：コンテンツ表示手段
３００：Ｗｅｂサーバ
３０１：コンテンツ配信手段
３０２：ユーザリクエスト記憶部
３０３：コンテンツ記憶部
３０４：広告選択手段
３０５：広告記憶部
３０６：リクエスト受付手段
３０７：クラスタリング制御手段
３０８：出力装置
３０９：入力装置
３１０：クラスタリング結果記憶部
４００：インターネット
４０１、４０２：ネットワーク
５０１〜５０３、７０１〜７０３：クライアントシステム
６００：サーバシステム
６０１：データ生成手段
６０２：推薦商品リスト生成手段
６０３：クラスタリング結果記憶部
８００：サーバシステム
８０１：データ生成手段
８０２：故障予測候補リスト生成手段
８０３：クラスタリング結果記憶部 10: Bidirectional cluster dividing device 11: Input means 12: Bidirectional clustering means 100: Bidirectional cluster dividing device 101: Input means 102: Bidirectional clustering means 103: Cluster number calculating means 104: Output means 200: User terminal 201: Content request unit 202: Content display unit 300: Web server 301: Content distribution unit 302: User request storage unit 303: Content storage unit 304: Advertisement selection unit 305: Advertisement storage unit 306: Request reception unit 307: Clustering control unit 308: Output device 309: Input device 310: Clustering result storage unit 400: Internet 401, 402: Networks 501-503, 701-703: Client system 600: Server system 601: Data raw Means 602: Recommendation Product list generating means 603: clustering result storage unit 800: the server system 801: data generating means 802: failure prediction candidate list generating unit 803: clustering result storage unit

Claims

Input means for inputting multivariate data and sequence data corresponding to the multivariate data;
Bidirectional clustering is performed on the multivariate data and the sequence data, and the multivariate data and the sequence data have many or less features common to each variable included in the cluster and between the sequence data. A bi-directional cluster dividing device comprising bi-directional clustering means for dividing into a plurality of clusters using an evaluation function representing

The evaluation function is a function whose value decreases as the number of common features between each variable included in the cluster and between the sequence data increases, and the bidirectional clustering means calculates the value of the evaluation function for each cluster. The bidirectional cluster dividing apparatus according to claim 1, wherein cluster division is performed so that a sum total of evaluation function values for each cluster is small.

The cluster number calculating means for determining the number of cluster divisions of bidirectional clustering performed by the bidirectional clustering means based on the value of the evaluation function after the bidirectional clustering means performs cluster division. 3. The bidirectional cluster dividing device according to 2.

4. The bidirectional cluster dividing apparatus according to claim 3, wherein the bidirectional clustering means performs cluster division with the cluster division number determined by the cluster number calculating means when the cluster number calculating means increases the cluster division number.

A request accepting means for accepting a request for content from a user, and storing the requested user and the requested content in a user request storage unit;
A content distribution means for transmitting the content requested by the user with an advertisement including a mechanism for allowing the user to request the content of the advertiser;
Based on the information stored in the user request storage unit, the user and the advertisement are used as variables, and multivariate data indicating whether or not the user has requested the content of the advertiser from the advertisement is generated. Correspondingly, a data generation means for generating sequence data in which the requests sent until the user requests the advertiser's content are arranged in time series,
Is bi-directional clustering performed on the multivariate data and the sequence data, and whether the multivariate data and the sequence data have many common features between each variable and sequence data included in the cluster? A bi-directional clustering means for dividing into a plurality of clusters using an evaluation function representing a small number and outputting bi-directional clustering results;
An advertisement distribution system comprising: an advertisement selection unit that determines an advertisement to be added to the content by the content distribution unit based on the bidirectional clustering result.

Collect sales information including information that the customer has purchased the product, and based on the collected sales information, variable the customer and the product, and generate multivariate data indicating whether the customer has purchased the product In addition, in response to the multivariate data, data generation means for generating sequence data in which histories related to the purchase of a product by a customer are arranged in time series, and
Bidirectional clustering is performed on the multivariate data and the sequence data, and the multivariate data and the sequence data have many or less features common to each variable included in the cluster and between the sequence data. Bi-directional clustering means for dividing the multi-cluster into a plurality of clusters using an evaluation function representing
A product recommendation system comprising recommended product list generation means for determining a product recommended for a customer based on the bidirectional clustering result.

Multivariate that collects failure information including vehicle type and failure location of the vehicle, and based on the collected failure information, makes the vehicle type and region variable, and indicates whether or not a failure has occurred in the region for the vehicle type Data generating means for generating data, and corresponding to the multivariate data, generating data of sequence data in which the history of failure locations that occurred in the past in the vehicle type is arranged in time series,
Bidirectional clustering is performed on the multivariate data and the sequence data, and the multivariate data and the sequence data have many or less features common to each variable included in the cluster and between the sequence data. Bi-directional clustering means for dividing the multi-cluster into a plurality of clusters using an evaluation function representing
A failure prediction system comprising failure prediction candidate list generation means for estimating a region where a failure is predicted to occur for a vehicle type based on the bidirectional clustering result.

Inputting multivariate data and sequence data corresponding to the multivariate data to a computer;
The computer performs bi-directional clustering on the multivariate data and the sequence data, and whether the multivariate data has many or few features common to each variable included in the cluster and between the sequence data. A bi-directional cluster partitioning method comprising the step of partitioning into a plurality of clusters using a representing evaluation function

The evaluation function is a function whose value decreases as the number of features common to each variable included in the cluster and between the sequence data increases, and the computer performs evaluation for each cluster in the step of performing the bidirectional clustering. 9. The bidirectional cluster dividing method according to claim 8, wherein the function is calculated and the cluster is divided so that the sum of evaluation function values for each cluster becomes small.

10. The bidirectional cluster according to claim 8, further comprising a step of determining a number of cluster divisions for bidirectional clustering based on a value of the evaluation function after the step of performing the bidirectional clustering. Split method.

The computer according to claim 10, wherein in the step of determining the number of cluster divisions, when it is determined that the number of cluster divisions is increased from the current number of cluster divisions, bi-directional clustering is further performed using the determined number of cluster divisions. Bidirectional cluster partitioning method.

A request receiving step in which a computer receives a request for content from a user, and stores the requested user and the requested content in a user request storage unit;
A content distribution step in which a computer adds an advertisement including a mechanism for causing the user to request the advertiser's content to the content requested by the user;
Based on the information stored in the user request storage unit, the computer uses the user and the advertisement as variables, and generates multivariate data indicating whether the user has requested the advertiser's content from the advertisement. In response to multivariate data, a data generation step for generating sequence data in which the requests sent before the user requests the content of the advertiser are arranged in time series,
Whether the computer performs bi-directional clustering on the multivariate data and the sequence data, and the multivariate data has many or less common features between the variables included in the cluster and between the sequence data A bi-directional clustering step that divides the data into a plurality of clusters using an evaluation function that represents and outputs bi-directional clustering results;
An advertisement distribution method comprising: an advertisement selection step in which a computer determines an advertisement to be added to the content based on the bidirectional clustering result.

Multivariate data in which a computer collects sales information including information that a customer has purchased a product, and based on the collected sales information, the customer and the product are variables, and the customer has purchased the product. And generating data corresponding to the multivariate data and generating sequence data in which histories relating to the purchase of the product by the customer are arranged in time series, and
The computer performs bi-directional clustering on the multivariate data and the sequence data, and the multivariate data and the sequence data are common to each variable included in the cluster and between the sequence data. A bi-directional clustering step that divides the data into a plurality of clusters using an evaluation function that indicates whether there are many or few, and outputs a bi-directional clustering result;
A product recommendation method comprising: a recommended product list generation step in which a computer determines a product recommended for a customer based on the bidirectional clustering result.

The computer collects failure information including the vehicle type and the failure location of the vehicle, and based on the collected failure information, the vehicle type and the region are variables, and whether or not a failure has occurred in the region for the vehicle type. A data generation step of generating sequence data in which the history of failure locations that occurred in the past in the vehicle type is arranged in time series in response to the multivariate data,
The computer performs bi-directional clustering on the multivariate data and the sequence data, and the multivariate data and the sequence data are common to each variable included in the cluster and between the sequence data. A bi-directional clustering step that divides the data into a plurality of clusters using an evaluation function that indicates whether there are many or few, and outputs a bi-directional clustering result;
A failure prediction method comprising: a failure prediction candidate list generation step in which a computer estimates a region where a failure is predicted to occur for a vehicle type based on the bidirectional clustering result.

On the computer,
Processing to input multivariate data and sequence data corresponding to the multivariate data;
An evaluation function that performs bidirectional clustering on the multivariate data and the sequence data, and indicates whether the multivariate data has many or few features common to each variable included in the cluster and between the sequence data A program that executes processing to divide into a plurality of clusters using.

The evaluation function is a function having a value that decreases as the number of features common to each variable and sequence data included in the cluster increases. In the process of performing bidirectional clustering, the value of the evaluation function is determined for each cluster. The program according to claim 15, wherein cluster division is performed so that the sum of evaluation function values for each cluster is reduced.

The program according to claim 15 or 16, further causing the computer to further execute a process of determining a cluster division number of the bidirectional clustering based on the value of the evaluation function after the process of performing the bidirectional clustering.

The program according to claim 17, wherein in the process of determining the number of cluster divisions, if it is determined that the number of cluster divisions is to be increased from the current number of cluster divisions, bi-directional clustering is further executed with the determined number of cluster divisions.

On the computer,
A request reception process for receiving a request for content from a user, and storing the request transmission user and the requested content in a user request storage unit;
A content distribution process in which an advertisement including a mechanism for allowing the user to request the content of the advertiser is added to the content requested by the user and transmitted;
Based on the information stored in the user request storage unit, the user and the advertisement are used as variables, and multivariate data indicating whether or not the user has requested the content of the advertiser from the advertisement is generated. Correspondingly, a data generation process for generating sequence data in which the requests sent until the user requests the advertiser's content are arranged in time series,
Bidirectional clustering is performed on the multivariate data and the sequence data, and the multivariate data is evaluated to indicate whether there are many or few features common to each variable included in the cluster and between the sequence data. Bidirectional clustering processing that divides into multiple clusters using functions and outputs bidirectional clustering results;
A program for executing an advertisement selection process for determining an advertisement to be added to the content based on the bidirectional clustering result.

On the computer,
Collect sales information including information that the customer has purchased the product, and based on the collected sales information, variable the customer and the product, and generate multivariate data indicating whether the customer has purchased the product In addition, in response to the multivariate data, a data generation process for generating sequence data in which histories relating to the purchase of a product by a customer are arranged in time series, and
Bidirectional clustering is performed on the multivariate data and the sequence data, and the multivariate data and the sequence data have many or less features common to each variable included in the cluster and between the sequence data. A bi-directional clustering process that divides the data into a plurality of clusters using an evaluation function that expresses
A program for executing a recommended product list generation process for determining a product recommended for a customer based on the bidirectional clustering result.

On the computer,
Multivariate that collects failure information including vehicle type and failure location of the vehicle, and based on the collected failure information, makes the vehicle type and region variable, and indicates whether or not a failure has occurred in the region for the vehicle type A data generation process for generating data, and corresponding to the multivariate data, generating sequence data in which histories of fault locations that occurred in the past in the vehicle type are arranged in time series,
Bidirectional clustering is performed on the multivariate data and the sequence data, and the multivariate data and the sequence data have many or less features common to each variable included in the cluster and between the sequence data. A bi-directional clustering process that divides the data into a plurality of clusters using an evaluation function that expresses
A program that executes a failure prediction candidate list generation process that estimates a region where a failure is predicted to occur for a vehicle type based on the bidirectional clustering result.