WO2016132683A1 - クラスタリングシステム、方法およびプログラム - Google Patents
クラスタリングシステム、方法およびプログラム Download PDFInfo
- Publication number
- WO2016132683A1 WO2016132683A1 PCT/JP2016/000403 JP2016000403W WO2016132683A1 WO 2016132683 A1 WO2016132683 A1 WO 2016132683A1 JP 2016000403 W JP2016000403 W JP 2016000403W WO 2016132683 A1 WO2016132683 A1 WO 2016132683A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target data
- cluster
- variable
- function
- store
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Definitions
- Data partitioning by clustering is one of the most basic methods in data mining. Examples of scenes in which a large amount of data is divided to generate a plurality of segments include document clustering and store clustering. For example, when clustering documents, topic segments can be generated and clustered by dividing data based on the presence or absence of words appearing in each document. Further, when clustering stores, for example, a segment including store groups with similar sales can be generated by clustering sales feature vectors whose elements are sales of individual products by the k-means method.
- Non-Patent Document 1 describes a multivariate regression tree (MRT) used for searching, describing, and predicting the relationship between multiple types of data and environmental characteristics.
- MRT multivariate regression tree
- the MRT described in Non-Patent Document 1 forms clusters by repeating data division based on simple rules based on environment values.
- the MRT described in Non-Patent Document 1 is not a probability model in the first place, but has a problem that applicable models are limited. Further, since data handled by the MRT described in Non-Patent Document 1 is limited to data represented by continuous values, it is difficult to generate, for example, the document cluster described above.
- the present invention provides a clustering system, a clustering method, and a clustering that can appropriately cluster the classification target data using a mixed distribution model regardless of whether the classification target data has information indicating the characteristics of the cluster.
- the purpose is to provide a program.
- a clustering system includes a classifier that classifies target data into clusters based on a confusion distribution model defined using two different variables indicating characteristics of the target data, and the classifier includes a confusion distribution model.
- the target data is classified into clusters based on a mixed distribution model in which the confusion ratio is expressed as a function of the first variable and the element distribution of the cluster that classifies the target data is expressed as a function of the second variable. To do.
- the computer classifies the target data into clusters based on the confusion distribution model defined using two different variables indicating the characteristics of the target data. Classify target data into clusters based on a mixed distribution model in which the confusion ratio of the distribution model is expressed as a function of the first variable and the element distribution of the cluster that classifies the target data is expressed as a function of the second variable It is characterized by.
- a clustering program causes a computer to execute a classification process for classifying target data into clusters based on a confusion distribution model defined using two different variables indicating characteristics of the target data.
- the above-described technical means can appropriately cluster the classification target data using the mixed distribution model regardless of whether the classification target data has information indicating the characteristics of the cluster. There is a technical effect.
- FIG. FIG. 1 is a block diagram showing a configuration example of a first embodiment of a clustering system according to the present invention.
- the clustering system according to the present embodiment includes an input device 11, a classifier 12, an output device 13, a learning device 14, and a storage unit 15.
- the input device 11 inputs classification target data. Further, the input device 11 may simultaneously input parameters necessary for model optimization.
- the storage unit 15 stores a model used by the classifier 12 for clustering.
- the storage unit 15 may store a model learned in advance, or may store a model learned by the learning device 14 described later.
- the storage unit 15 may store learning data used for model learning, a clustering result based on the learning data, and the like.
- the storage unit 15 is realized by, for example, a magnetic disk.
- the classifier 12 clusters input data based on the model stored in the storage unit 15.
- the classifier 12 calculates data segments using a probabilistic conditional clustering model PCCs (Probabilistic Conditional Clustering models), which is a kind of confusion distribution model.
- PCCs Probabilistic Conditional Clustering models
- PCCs is a confusion distribution model defined using two different types of variables that indicate the characteristics (attributes) of the target data. As the two types of variables, condition variables (condition variables) and feature variables (feature variables) are assumed. To do.
- the condition variable is a variable used to express a condition for assigning target data to a segment, and is used as a condition in a prior distribution of cluster classification variables (hidden variables).
- the feature variable is a variable used for expressing the characteristic statistics of the segment, and is used in the element distribution (component distribution). For example, when clustering is performed based on store sales, the characteristic variable corresponds to sales (sales), and the condition variable corresponds to the demographics of each store.
- the two different variables can be referred to as variables used in the component distribution and variables not directly used in the component distribution.
- condition variables are used in PCCs.
- the condition variable is used to express the structure of the segment (that is, a condition used when assigning target data to the segment).
- conditional variable is a variable that can be used at any stage of the prediction stage and the learning stage.
- Feature variables are used to characterize segments (ie for segment characteristic statistics).
- the feature variable can be used only in the learning stage. That is, in the present embodiment, it is assumed that the target data does not include information indicated by feature variables in the prediction stage.
- the classifier 12 enables the target data to be assigned to the cluster in any case.
- the condition variable is denoted as Xc
- the feature variable is denoted as Xf .
- Range of X c are numerical values or value indicative of the division or is their mixing, the dimension of X c is D c. Further, the range of X f is dependent on the application (e.g., the shape of the cluster in the probability model), the dimension of X f is D f.
- PCCs of this embodiment are defined by the following formula 1 using Xc and Xf described above.
- Equation 1 ( ⁇ 1 ,..., ⁇ K , ⁇ 1 ,..., ⁇ k ) is a parameter of the entire model, and ⁇ k and ⁇ k are parameters representing the condition and shape of the kth cluster. It is.
- Equation 1 w k is a function that defines a condition belonging to the k-th cluster.
- a cluster classification variable Z (Z 1 ,..., Z K ) is defined.
- Z k 0, it means that data is not generated from the k th cluster.
- Expression 2 is a probability distribution that assigns target data to clusters
- Expression 3 is a probability distribution that indicates the shape of each cluster, and the contents of the probability distribution are arbitrary.
- this model is a variation of the confusion model. That is, this model is the same as the normal confusion model in the component distribution p (X f ; ⁇ k ) that can be expressed by a Gaussian distribution such as a Gaussian mixture model or a linear regression such as a mixed expert model.
- the model illustrated in Equation 1 is a mixed distribution model defined using two different variables Xc and Xf , and the confusion ratio of the confusion distribution model is expressed as a function of the condition variable Xc. Is different from the model described above. That is, in the model exemplified in the above-described Expression 1, the confusion ratio of the confusion distribution model is expressed as a function of the conditional variable, and the component distribution of the cluster at the classification destination is expressed as a function of the characteristic variable.
- the classifier 12 clusters the classification target data based on the confusion distribution model.
- Equation 4 is a method of assigning target data to clusters based on conditional variables and feature variables
- Equation 5 is a method of assigning target data to clusters based only on conditional variables.
- the method illustrated in Equation 4 is a standard method for finding a cluster that maximizes the posterior probability.
- the classifier 12 may derive a cluster having the maximum probability using the above-described formula 4, and the case where the sales are unknown. However, it is possible to derive a cluster having the maximum probability using Equation 5 exemplified above.
- the classifier 12 of the present embodiment can cluster the classification target data using the above formula 4 or 5 derived from the mixed distribution model (PCCs) exemplified in the above formula 1. Therefore, for example, the classification target data can be appropriately classified into clusters regardless of whether the classification target data has information indicating the characteristics of the cluster such as sales.
- PCCs mixed distribution model
- the output device 13 outputs the clustering result.
- the output device 13 may output, for example, information (for example, a cluster name) specifying a cluster to which the target data is assigned, or information indicating characteristics of the cluster (for example, cluster statistical information).
- information for example, a cluster name
- characteristics of the cluster for example, cluster statistical information.
- the content to be output is not limited to the method described above.
- the learning device 14 learns a model used by the classifier 12 for clustering.
- the learning device 14 uses a Bayesian learning algorithm for identifying a segment structure from data based on recently developed FAB (factorizedymasymptotic Bayesian) inference. It is preferable to use it.
- FAB factorizedymasymptotic Bayesian
- FAB inference provides principles for dealing with model selection problems in hidden variable models and reveals the segment structure by maximizing (almost accurately) the marginal log likelihood. Furthermore, in the FAB inference, the parameter prior distribution is asymptotically ignored, thereby removing the hyperparameters and automating the clustering. This is particularly useful in scenarios (unsupervised data partitioning) that seek to provide a subjective assessment lurking in large amounts of data without adjusting the objective parameters.
- the learning device 14 stores the learning result in the storage unit 15.
- the classifier 12 is realized by a CPU of a computer that operates according to a program (clustering program).
- the program may be stored in a storage unit (not shown) included in the clustering system, and the CPU may read the program and operate as the classifier 12 according to the program.
- each of the input device 11, the classifier 12, and the output device 13 may be realized by dedicated hardware.
- the clustering system according to the present invention may be configured by connecting two or more physically separated devices in a wired or wireless manner.
- FIG. 2 is a flowchart showing an operation example of the clustering system of the present embodiment.
- PCCs confusion distribution models
- the input device 11 inputs classification target data (step S11).
- the classifier 12 acquires the above-described confusion distribution models (PCCs) from the storage unit 15 (step S12). Then, the classifier 12 predicts a cluster to which the input classification target data belongs based on the acquired mixed distribution model, and assigns the classification target data to the predicted cluster (step S13).
- the output device 13 outputs the result of assigning the classification target data (step S14).
- the classifier 12 classifies target data into clusters based on a confusion distribution model defined using two different variables indicating the characteristics of the target data.
- the confusion ratio is expressed as a function of the first variable (specifically, a conditional variable), and the element distribution of the cluster that classifies the target data is the second variable (specifically, the characteristic variable). ).
- the target data can be appropriately clustered using the mixed distribution model. For example, a merchant can predict clusters that classify new data using prior probabilities conditioned using conditional variables, even when classifying new data where feature variables such as sales are not available.
- Embodiment 2 when the classifier 12 performs clustering, the conditional prior probability p (Z
- a method of clustering using a rule-based conditional function (Rule-based conditional prior function) as a conditional prior probability will be described in order to improve the interpretability of the cluster.
- Rule-based conditional prior function Rule-based conditional prior function
- FIG. 3 is a block diagram showing a configuration example of the second embodiment of the clustering system according to the present invention.
- symbol same as FIG. 1 is attached
- subjected and description is abbreviate
- the clustering system of the present embodiment includes an input device 11, a classifier 22, an output device 13, a learning device 14, and a storage unit 15. That is, the clustering system of the present embodiment is different from the first embodiment in that the classifier 22 is provided instead of the classifier 12.
- the classifier 22 clusters input data based on the model stored in the storage unit 15 as in the classifier 12 of the first embodiment.
- the classifier 22 classifies target data into clusters using a rule-based conditional clustering model in which PCCs are improved.
- FIG. 4 is an explanatory diagram showing an example of a rule-based conditional clustering model used in the present embodiment.
- the model illustrated in FIG. 4 is represented by a tree structure, and each node indicated by a rectangle indicates a condition node, and a leaf node indicated by a circle indicates a cluster node.
- Beta i ( ⁇ i, ⁇ i))
- g i [0, 1]
- ⁇ i is an index related to the elements of X c
- t i ⁇ R (the whole real number) is an arbitrary value.
- U in Equation 6 is a step function.
- Equation 7 When used in a condition node, w k used in Equation 1 illustrated above can be modeled as Equation 7 illustrated below.
- Equation 8 ⁇ is a function of a, i, k, and has a probability a when the k-th cluster exists in the left subtree of the i-th condition node, and the probability (1- This is a function that becomes a).
- the right side in Equation 7 represents the probability that the classification target data will reach the kth cluster node.
- the rule-based conditional clustering model described above can be said to be a model obtained by applying a special probability model to Equations 1 to 3 described in the first embodiment.
- a special probability model By representing PCCs with such a tree structure, the interpretability of the model is improved. Therefore, since the classification condition of each cluster can be grasped at a glance, it is possible to utilize the knowledge obtained from the cluster (segment) for various strategies (for example, marketing strategy).
- Embodiment 3 a third embodiment of the clustering system according to the present invention will be described.
- an example will be described in which stores are classified based on product sales data, and a store opening plan for a new store is made based on a store cluster into which stores are classified.
- This product sales data corresponds to the characteristic variable of the first embodiment.
- a store opening plan for a new store is made using the demographic information of the store. This demographic information corresponds to the condition variable of the first embodiment.
- FIG. 5 is a block diagram showing a configuration example of the third embodiment of the clustering system according to the present invention.
- the clustering system of the present embodiment includes an input unit 31, a classifier 32, an output unit 33, a learning device 34, a model storage unit 35, a performance data storage unit 36, and a condition data storage unit 37. Yes.
- the input unit 31, the classifier 32, the output unit 33, the learning device 34, and the model storage unit 35 of the present embodiment are the input device 11, the classifier 12, the output device 13, the learning device 14, and the storage unit of the first embodiment. The detailed description will be omitted.
- the condition data storage unit 37 stores statistical information about stores that can be acquired when a new store opens. Specifically, the condition data storage unit 37 stores the demographic information of the store (for example, the population for each region, the sex ratio, the ratio by age, etc.).
- the input unit 31, the classifier 32, the output unit 33, and the learning unit 34 are realized by a CPU of a computer that operates according to a program (clustering program). Each of the input unit 31, the classifier 32, the output unit 33, and the learning device 34 may be realized by dedicated hardware. Further, the clustering system according to the present invention may be configured by connecting two or more physically separated devices in a wired or wireless manner.
- the model storage unit 35, the result data storage unit 36, and the condition data storage unit 37 are realized by, for example, a magnetic disk device.
- FIG. 6 is a flowchart showing an operation example of the clustering system of the present embodiment.
- the operation example illustrated in FIG. 6 shows a process of classifying a new store S without product sales data into a store cluster.
- FIG. 7 is an explanatory diagram showing an example of a mixed distribution model.
- the example shown in FIG. 7 shows a mixed distribution model in which the function of the first variable is represented by a tree structure.
- a store cluster is arranged in the leaf node of the tree structure illustrated in FIG. 7, and demographic information conditions are arranged in the other nodes.
- demographic conditions for example, in the example shown in FIG. 7, demographic conditions (gender ratio, population per household, ratio by age) are arranged in the condition node, and four store clusters are arranged in the leaf node.
- X c [ ⁇ i ] corresponds to demographic information, and branches to one node when X c [ ⁇ i ] ⁇ t i (X c [ ⁇ i ] ⁇ t i Sometimes branch to the other node).
- the input unit 31 inputs target data to be classified (step S22). Specifically, the input unit 31 inputs demographic information regarding the new store S as information on the new store S.
- the classifier 32 specifies a store cluster that classifies the new store S based on the mixed distribution model stored in the model storage unit 35 (step S23).
- the output unit 33 outputs information regarding the identified store cluster (step S24). For example, the output unit 33 may output statistical information obtained by tabulating product sales data of the specified store cluster, or may output product sales data of a representative store from the specified store cluster.
- the clustering system of this embodiment classifies target stores into store clusters generated based on sales.
- the input unit 31 inputs demographic information indicating the characteristics of the target store as a first variable
- the classifier 32 indicates the target store indicated by the input demographic information based on the mixed distribution model.
- the classifier 32 uses a mixed distribution model in which the mixture ratio is expressed as a function of demographic information and the element distribution of the store cluster is expressed as a function of the merchandise sales data of the store.
- the mixed distribution model (PCCs) of the present embodiment is represented by a tree structure in which store clusters are arranged in leaf nodes and demographic information conditions are arranged in other nodes. Is possible. Therefore, for example, by tracing the condition of demographic information from the store cluster (leaf node) toward the root node, it is possible to specify the condition of demographic information so that a desired sales trend can be obtained.
- a method of classifying a new store into a store cluster in which stores are classified according to product sales is exemplified.
- the target of the clustering system of the present embodiment is not limited to a new store.
- the clustering system of the present embodiment is based on another variable that characterizes the classification target even when a new classification target is generated due to temporal transition and the classification target does not have data regarding the variable that characterizes the cluster.
- the classification target can be classified into clusters.
- FIG. 8 is a block diagram showing an outline of the cluster rigging system according to the present invention.
- the clustering system according to the present invention is a confusion distribution model (for example, Equation 1 shown above) defined using two different variables (for example, a conditional variable and a characteristic variable) indicating the characteristics of target data (for example, a store).
- a classifier 81 (for example, a classifier 12) is provided that classifies target data into clusters based on the above.
- the classification target data can be appropriately clustered using the mixed distribution model.
- the computer can appropriately execute the process of clustering the classification target data using the mixed distribution model.
- the function of the first variable is represented by a tree structure in which a cluster for classifying the target data is arranged in the leaf node, and the condition of the first variable is arranged in the condition node that is a node other than the leaf node. May be defined by a probability model. Then, the classifier 81 may estimate a cluster that optimizes the probability model as a cluster into which the target data is classified.
- the clustering system of the present invention may classify the target stores into store clusters generated based on sales.
- the clustering system may include an input unit (for example, the input unit 31) that inputs demographic information indicating the characteristics of the target store as the first variable.
- the classifier 81 is input based on a mixed distribution model in which the mixture ratio is expressed as a function of demographic information and the element distribution of the store cluster is expressed as a function of the merchandise sales data of the store which is the second variable.
- the target store indicated by the demographic information that has been made may be classified into a store cluster.
- the classification target stores can be appropriately clustered using the mixed distribution model regardless of whether or not the classification target stores have the product sales data.
- the classifier 81 may estimate a cluster that optimizes the probability model as a store cluster into which the target store is classified.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Development Economics (AREA)
- Strategic Management (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Entrepreneurship & Innovation (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
図1は、本発明によるクラスタリングシステムの第1の実施形態の構成例を示すブロック図である。本実施形態のクラスタリングシステムは、入力装置11と、分類器12と、出力装置13と、学習器14と、記憶部15とを備えている。
第1の実施形態では、分類器12がクラスタリングを行う際、上記に例示する式1の条件付き事前確率p(Z|Xc;ηk)がPCCsにおいて重要な役割を果たしていた。第1の実施形態で示したように、分類器12は、p(Z|Xc;ηk)に任意の確率モデルを適用してクラスタリングが可能である。
次に、本発明によるクラスタリングシステムの第3の実施形態を説明する。本実施形態では、商品売上データに基づいて店舗を分類するものとし、店舗を分類した店舗クラスタに基づいて、新規店舗の出店計画を立てる場合を例に説明する。この商品売上データが、第1の実施形態の特徴変数に対応する。
12,22,32 分類器
13 出力装置
14,34 学習器
15 記憶部
31 入力部
33 出力部
35 モデル記憶部
36 実績データ記憶部
37 条件データ記憶部
Claims (9)
- 対象データの特徴を示す2種類の異なる変数を用いて定義される混同分布モデルに基づいて前記対象データをクラスタに分類する分類器を備え、
前記分類器は、前記混同分布モデルの混同比が第一の変数の関数で表され、前記対象データを分類するクラスタの要素分布が第二の変数の関数で表される混合分布モデルに基づいて前記対象データをクラスタに分類する
ことを特徴とするクラスタリングシステム。 - 分類器は、第二の変数が未知の対象データに対し、第一の変数が示す状態のもとで前記対象データが属する条件付き確率が最大になるクラスタを第一の変数の関数に基づいて推定し、推定されたクラスタを対象データが分類されるクラスタと推定する
請求項1記載のクラスタリングシステム。 - 第一の変数の関数が、葉ノードに対象データを分類するクラスタが配され、葉ノード以外の他のノードである条件ノードに第一の変数の条件が配される木構造で表される確率モデルで定義され、
分類器は、前記確率モデルを最適化するクラスタを対象データが分類されるクラスタと推定する
請求項1または請求項2に記載のクラスタリングシステム。 - 売上に基づいて生成される店舗クラスタに対象店舗を分類するクラスタリングシステムであって、
前記対象店舗の特徴を示す人口統計情報を第一の変数として入力する入力部を備え、
分類器は、混合比が前記人口統計情報の関数で表され、前記店舗クラスタの要素分布が第二の変数である店舗の商品売上データの関数で表される混合分布モデルに基づいて、入力された人口統計情報が示す対象店舗を前記店舗クラスタに分類する
請求項1から請求項3のうちのいずれか1項に記載のクラスタリングシステム。 - 人口統計情報の関数が、葉ノードに対象店舗を分類する店舗クラスタが配され、条件ノードに人口統計情報の条件が配される木構造で表される確率モデルで定義され、
分類器は、前記確率モデルを最適化するクラスタを対象店舗が分類される店舗クラスタと推定する
請求項4記載のクラスタリングシステム。 - コンピュータが、対象データの特徴を示す2種類の異なる変数を用いて定義される混同分布モデルに基づいて前記対象データをクラスタに分類し、
その分類の際、前記コンピュータが、前記混同分布モデルの混同比が第一の変数の関数で表され、前記対象データを分類するクラスタの要素分布が第二の変数の関数で表される混合分布モデルに基づいて前記対象データをクラスタに分類する
ことを特徴とするクラスタリング方法。 - コンピュータが、第二の変数が未知の対象データに対し、第一の変数が示す状態のもとで前記対象データが属する条件付き確率が最大になるクラスタを第一の変数の関数に基づいて推定し、推定されたクラスタを対象データが分類されるクラスタと推定する
請求項6記載のクラスタリング方法。 - コンピュータに、
対象データの特徴を示す2種類の異なる変数を用いて定義される混同分布モデルに基づいて前記対象データをクラスタに分類する分類処理を実行させ、
前記分類処理で、前記混同分布モデルの混同比が第一の変数の関数で表され、前記対象データを分類するクラスタの要素分布が第二の変数の関数で表される混合分布モデルに基づいて前記対象データをクラスタに分類させる
ためのクラスタリングプログラム。 - コンピュータに、
分類処理で、第二の変数が未知の対象データに対し、第一の変数が示す状態のもとで前記対象データが属する条件付き確率が最大になるクラスタを第一の変数の関数に基づいて推定させ、推定されたクラスタを対象データが分類されるクラスタと推定させる
請求項8記載のクラスタリングプログラム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2017500496A JP6687011B2 (ja) | 2015-02-18 | 2016-01-27 | クラスタリングシステム、方法およびプログラム |
US15/549,897 US10877996B2 (en) | 2015-02-18 | 2016-01-27 | Clustering system, method, and program |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562117659P | 2015-02-18 | 2015-02-18 | |
US62/117,659 | 2015-02-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016132683A1 true WO2016132683A1 (ja) | 2016-08-25 |
Family
ID=56692182
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2016/000403 WO2016132683A1 (ja) | 2015-02-18 | 2016-01-27 | クラスタリングシステム、方法およびプログラム |
Country Status (3)
Country | Link |
---|---|
US (1) | US10877996B2 (ja) |
JP (1) | JP6687011B2 (ja) |
WO (1) | WO2016132683A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20220121074A (ko) * | 2021-02-24 | 2022-08-31 | 주식회사 그로비 | 머신 러닝에 기반하는 수요 예측 시스템 및 수요 예측 방법 |
JP7458547B1 (ja) | 2023-11-07 | 2024-03-29 | 株式会社インターネットイニシアティブ | 情報処理装置、システムおよび方法 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11544596B2 (en) * | 2019-04-08 | 2023-01-03 | Google Llc | Creating a machine learning model with k-means clustering |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8412656B1 (en) * | 2009-08-13 | 2013-04-02 | Videomining Corporation | Method and system for building a consumer decision tree in a hierarchical decision tree structure based on in-store behavior analysis |
US9600831B1 (en) * | 2012-06-22 | 2017-03-21 | Google Inc. | User association attribution system |
US9317594B2 (en) * | 2012-12-27 | 2016-04-19 | Sas Institute Inc. | Social community identification for automatic document classification |
-
2016
- 2016-01-27 WO PCT/JP2016/000403 patent/WO2016132683A1/ja active Application Filing
- 2016-01-27 US US15/549,897 patent/US10877996B2/en active Active
- 2016-01-27 JP JP2017500496A patent/JP6687011B2/ja active Active
Non-Patent Citations (2)
Title |
---|
KAN YAMAGAMI ET AL.: "Senzai Class Model no Kongohi ni Chakumoku shita Clustering Shuho no Teian", JAPAN INDUSTRIAL MANAGEMENT ASSOCIATION 2014 NEN SHUKI TAIKAI YOKOSHU, 2014, pages 132 - 133 * |
MAO HAYAKAWA ET AL.: "A Prediction Model ot Finish Dates of Job Hunting Based on Stratification Tree and Mixture Weibull Distribution", PROCEEDINGS OF THE 36TH SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (SITA2013, 2013, pages 442 - 447 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20220121074A (ko) * | 2021-02-24 | 2022-08-31 | 주식회사 그로비 | 머신 러닝에 기반하는 수요 예측 시스템 및 수요 예측 방법 |
KR102590158B1 (ko) * | 2021-02-24 | 2023-10-17 | 주식회사 그로비 | 머신 러닝에 기반하는 수요 예측 시스템 및 수요 예측 방법 |
JP7458547B1 (ja) | 2023-11-07 | 2024-03-29 | 株式会社インターネットイニシアティブ | 情報処理装置、システムおよび方法 |
Also Published As
Publication number | Publication date |
---|---|
US20180025072A1 (en) | 2018-01-25 |
JP6687011B2 (ja) | 2020-04-22 |
US10877996B2 (en) | 2020-12-29 |
JPWO2016132683A1 (ja) | 2017-12-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Entezari-Maleki et al. | Comparison of classification methods based on the type of attributes and sample size. | |
US11893466B2 (en) | Systems and methods for model fairness | |
CN109657805B (zh) | 超参数确定方法、装置、电子设备及计算机可读介质 | |
Dai et al. | Best-practice benchmarking using clustering methods: Application to energy regulation | |
US20160232637A1 (en) | Shipment-Volume Prediction Device, Shipment-Volume Prediction Method, Recording Medium, and Shipment-Volume Prediction System | |
van Stein et al. | Optimally weighted cluster kriging for big data regression | |
JP5454827B1 (ja) | 文書評価装置、文書評価方法、及びプログラム | |
US20160210681A1 (en) | Product recommendation device, product recommendation method, and recording medium | |
WO2014199920A1 (ja) | 予測関数作成装置、予測関数作成方法、及びコンピュータ読み取り可能な記録媒体 | |
Silhavy et al. | Improving algorithmic optimisation method by spectral clustering | |
CN109063743A (zh) | 基于半监督多任务学习的医疗数据分类模型的构建方法 | |
WO2016132683A1 (ja) | クラスタリングシステム、方法およびプログラム | |
WO2017188048A1 (ja) | 作成装置、作成プログラム、および作成方法 | |
Kanda et al. | Using meta-learning to recommend meta-heuristics for the traveling salesman problem | |
Mori et al. | Inference in hybrid Bayesian networks with large discrete and continuous domains | |
KR20180123826A (ko) | 이종 분류 간 상품분류의 대응관계 생성시스템 | |
Chen | Embedding a back propagation network into fuzzy c-means for estimating job cycle time: wafer fabrication as an example | |
US20240013123A1 (en) | Utilizing machine learning models to analyze an impact of a change request | |
Vedavathi et al. | Unsupervised learning algorithm for time series using bivariate AR (1) model | |
JPWO2018235841A1 (ja) | グラフ構造解析装置、グラフ構造解析方法、及びプログラム | |
US20220138632A1 (en) | Rule-based calibration of an artificial intelligence model | |
CN108629062A (zh) | 用于定价优化的方法、装置和系统 | |
Abdelatif et al. | Optimization of the organized Kohonen map by a new model of preprocessing phase and application in clustering | |
Brazdil et al. | Metalearning approaches for algorithm selection I (exploiting rankings) | |
Cruz | Behavioral Analysis and Pattern Validation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16752076 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2017500496 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15549897 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16752076 Country of ref document: EP Kind code of ref document: A1 |