WO2015085916A1 - 数据挖掘方法 - Google Patents

数据挖掘方法 Download PDF

Info

Publication number
WO2015085916A1
WO2015085916A1 PCT/CN2014/093430 CN2014093430W WO2015085916A1 WO 2015085916 A1 WO2015085916 A1 WO 2015085916A1 CN 2014093430 W CN2014093430 W CN 2014093430W WO 2015085916 A1 WO2015085916 A1 WO 2015085916A1
Authority
WO
WIPO (PCT)
Prior art keywords
target object
type
regression model
sample
mining method
Prior art date
Application number
PCT/CN2014/093430
Other languages
English (en)
French (fr)
Inventor
王骏
杨鸿超
Original Assignee
中国银联股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国银联股份有限公司 filed Critical 中国银联股份有限公司
Priority to US15/100,533 priority Critical patent/US10482093B2/en
Priority to EP14869820.2A priority patent/EP3082051A4/en
Publication of WO2015085916A1 publication Critical patent/WO2015085916A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection

Definitions

  • the present invention relates to data mining methods, and more particularly to data mining methods based on regression models.
  • the target object is generally classified according to one or more attribute data associated with the target object, that is, the value of the target object based on the value of one or some specific attribute data of each target object. sort.
  • the existing technical solution has the following problem: since the target object is classified based on only a single or a plurality of attribute data, the accuracy of the classification result is low, and the same evaluation of the attribute data of each target object is required. Operation, so data mining is less efficient.
  • the present invention proposes a regression model based data mining method capable of mining and classifying target objects according to the integrated features of the target objects.
  • a data mining method comprising the following steps:
  • (A1) statistic vectors of each target object are counted according to records in the target data set to constitute a rough data set, each of the feature vectors including a value of at least one attribute data of the target object corresponding thereto;
  • the feature vector included in the regression model includes values of attribute feature data common to all first-class target objects.
  • the filtering operation comprises: filtering out noise from the filtered characteristic vector according to a predetermined criterion.
  • the step (A3) further comprises: extracting a first part of the sample as a training sample to construct the regression model; and extracting a second part of the sample as a test sample To test the constructed regression model; the third part of the sample is extracted as an application sample to test the stability of the constructed regression model.
  • the step (A3) further comprises: performing a normalization operation on each field in each sample before constructing the regression model, comprising: (1) processing a missing value; (2) processing singular values; (3) re-encoding discrete character fields; (4) normalizing each field in each sample in z-score mode to eliminate the influence of dimensional inconsistency.
  • the step (A3) further comprises: performing a discretization operation for each field in each sample after the normalization operation is completed, including: (1) The continuous data is discretized in the manner of dividing the interval. The dividing point between the intervals is the point that causes the target variable to turn significantly. (2) According to the trend of the WOE value curve, the merits of the interval dividing result are judged. If the WOE value curve is a trend of incrementing, decrementing, or only one turning point, it is determined that the dividing result is excellent and the discretizing operation is terminated accordingly, otherwise returning to step (1) to continue dividing in the interval.
  • the constructed regression model is used to determine whether each of the known second type of target objects potentially belongs to the first type of target object: based on the regression model
  • the feature vector corresponding to the known second type of target object calculates a probability that the known second type target object belongs to the first type of target object, and if the calculated probability is greater than a predetermined classification threshold, determines the known The second type of target object potentially belongs to the first type of target object.
  • the data mining method based on the regression model disclosed in the present invention has the following advantages: the target object can be mined and classified according to the comprehensive feature of the target object, and since the regression model is used for the determination, the reusability is high and the determination can be significantly improved. The efficiency and accuracy of the operation.
  • FIG. 1 is a flow chart of a data mining method in accordance with an embodiment of the present invention.
  • the data mining method disclosed by the present invention includes the following steps: (A1) counting each target object according to records in a target data set (for example, a transaction record set in a financial field) (for example, a financial card holder) Feature vectors to form a rough data set, each of the feature vectors including at least one attribute data of the target object to which it corresponds (eg, monthly average spending amount in the financial field, monthly average transaction frequency, cross-border transaction number, (A2) screen out all known first-class target objects (such as high-end cardholders in the financial field) from the rough data set, (A2) a feature vector corresponding to a Platinum cardholder, and performing a filtering operation on the filtered feature vector to obtain a sample; (A3) constructing a regression model based on the sample, and then using the constructed regression model to determine that all Whether each of the second type of target objects (such as non-high-end cardholder
  • the feature vector included in the regression model includes values of attribute feature data shared by all first-class target objects (ie, the regression model includes all first-class target objects. Shared features).
  • the filtering operation comprises: filtering out noise from the filtered feature vector according to a predetermined criterion (for example, for high-end cardholder information in the financial field, if The monthly average consumption quota is the screening standard.
  • the process of filtering noise is as follows: sort the field and filter out the top 10% and the last 10% of the cardholder transaction information, because not all high-end card consumption records belong to high-end consumption. Features, and a small number of high-end card consumption records are too high-end and lack of universality).
  • the step (A3) further includes: Extracting a first portion (eg, 70%) of the sample as a training sample to construct the regression model; extracting a second portion (eg, 20%) of the sample as a test sample to test the constructed regression model; The third part of the sample (eg 10%) is used as an application sample to test the stability of the constructed regression model.
  • the step (A3) further comprises: performing a normalization operation on each field in each sample before constructing the regression model, including: (1) processing Missing values (for example, if the numeric field is missing data, the column mean padding is used, if the character field is missing data, the sample is discarded); (2) processing the singular value (for example, using the boxed graph technique to filter out extreme outliers) (3) Re-encoding discrete character fields; (4) Normalizing each field in each sample in z-score mode to eliminate the influence of dimensional inconsistency.
  • a normalization operation on each field in each sample before constructing the regression model, including: (1) processing Missing values (for example, if the numeric field is missing data, the column mean padding is used, if the character field is missing data, the sample is discarded); (2) processing the singular value (for example, using the boxed graph technique to filter out extreme outliers) (3) Re-encoding discrete character fields; (4) Normalizing each field in each sample in z-score mode to eliminate the influence of dimensional incons
  • the step (A3) further comprises: performing a discretization operation for each field in each sample after the normalization operation is completed, including: 1) Discretization of continuous data in a manner of dividing the interval, wherein the dividing point between the intervals is a point that causes a significant turning of the target variable; (2) according to the WOE (weight of evidence) value curve The trend is to judge the pros and cons of the interval division result. If the WOE value curve is increasing, decrementing or only one turning point trend, it is determined that the division result is excellent and the discretization operation is terminated, otherwise the step (1) is returned to continue. The division is performed within this interval.
  • the reconstructed model is used to determine whether each of the known second type of target objects potentially belongs to the first type of target object in the following manner:
  • the regression model calculates a probability that the known second type of target object belongs to the first type of target object for the feature vector corresponding to the known second type of target object, and if the calculated probability is greater than a predetermined classification threshold (eg, 0.8) Then, it is determined that the known second type of target object potentially belongs to the first type of target object (for example, in the financial field, it is determined that a common card user is a potential high value cardholder).
  • a predetermined classification threshold eg, 0.8
  • the data mining method disclosed by the present invention has the following advantages: the target object can be mined and classified according to the comprehensive feature of the target object, and since the regression model is used for the determination, the reusability is high and the determination can be significantly improved. The efficiency and accuracy of the operation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

本发明提出了数据挖掘方法,所述方法包括:根据目标数据集中的记录统计每个目标对象的特征向量以构成粗糙数据集,每个所述特征向量包括其所对应的目标对象的至少一个属性数据的值;从所述粗糙数据集中筛选出所有已知的第一类目标对象所对应的特征向量,并对筛选出的特性向量执行过滤操作以获得样本;基于所述样本构建回归模型,并随之使用所构建的回归模型确定所有已知的第二类目标对象中的每个是否潜在地属于第一类目标对象。本发明所公开的数据挖掘方法能够根据目标对象的综合特征来挖掘和分类目标对象。

Description

数据挖掘方法 技术领域
本发明涉及数据挖掘方法,更具体地,涉及基于回归模型的数据挖掘方法。
背景技术
目前,随着计算机和网络应用的日益广泛以及不同领域的业务种类的日益丰富,从与特定的对象相关的海量数据记录中有效地挖掘出不同类别的对象以便针对不同类别的对象实施不同的处理方案变的越来越重要。
在现有的技术方案中,通常根据与目标对象相关联的一个或多个属性数据来对目标对象进行分类,即基于每个目标对象的某个或某些特定的属性数据的值对目标对象进行分类。
然而,现有的技术方案存在如下问题:由于仅仅基于单一或数个属性数据对目标对象进行分类,故分类结果的精确度较低,并且由于需要对每个目标对象的属性数据进行相同的评估操作,故数据挖掘的效率较低。
因此,存在如下需求:提供能够根据目标对象的综合特征来挖掘和分类目标对象的基于回归模型的数据挖掘方法。
发明内容
为了解决上述现有技术方案所存在的问题,本发明提出了能够根据目标对象的综合特征来挖掘和分类目标对象的基于回归模型的数据挖掘方法。
本发明的目的是通过以下技术方案实现的:
一种数据挖掘方法,所述数据挖掘方法包括下列步骤:
(A1)根据目标数据集中的记录统计每个目标对象的特征向量以构成粗糙数据集,每个所述特征向量包括其所对应的目标对象的至少一个属性数据的值;
(A2)从所述粗糙数据集中筛选出所有已知的第一类目标对象所对应的特征向量,并对筛选出的特性向量执行过滤操作以获得样本;
(A3)基于所述样本构建回归模型,并随之使用所构建的回归模型确定所 有已知的第二类目标对象中的每个是否潜在地属于第一类目标对象。
在上面所公开的方案中,优选地,所述回归模型所包含的特征向量包括所有第一类目标对象共有的属性特征数据的值。
在上面所公开的方案中,优选地,所述过滤操作包括:根据预定的标准从筛选出的特性向量中过滤掉噪点。
在上面所公开的方案中,优选地,所述步骤(A3)进一步包括:抽取所述样本中的第一部分作为训练样本以构建所述回归模型;抽取所述样本中的第二部分作为测试样本以测试已构建的回归模型;抽取所述样本中的第三部分作为应用样本以测试已构建的回归模型稳定性。
在上面所公开的方案中,优选地,所述步骤(A3)进一步包括:在构建所述回归模型之前对每个样本中的每个字段执行规范化操作,其包括:(1)处理缺失值;(2)处理奇异值;(3)对离散型的字符型字段进行重新编码;(4)对每个样本中的每个字段以z-score方式进行规范化,以消除量纲不一致的影响。
在上面所公开的方案中,优选地,所述步骤(A3)进一步包括:在所述规范化操作执行完成之后进一步对每个样本中的每个字段执行离散化操作,其包括:(1)对连续型的数据以划分区间的方式进行离散化,其中,区间之间的划分点是导致目标变量出现明显转折的点;(2)根据WOE值曲线的趋势来判断区间划分结果的优劣,其中,如果WOE值曲线是递增、递减或者只有一个转折点的趋势,则确定划分结果是优良的并随之终止离散化操作,否则返回步骤(1)以继续在该区间内进行划分。
在上面所公开的方案中,优选地,以如下方式使用所构建的回归模型确定所有已知的第二类目标对象中的每个是否潜在地属于第一类目标对象:基于所述回归模型针对已知的第二类目标对象所对应的特征向量计算该已知的第二类目标对象属于第一类目标对象的概率,并且如果计算出的概率大于预定的分类阈值,则判定该已知的第二类目标对象潜在地属于第一类目标对象。
本发明所公开的基于回归模型的数据挖掘方法具有以下优点:能够根据目标对象的综合特征来挖掘和分类目标对象,并且由于使用回归模型进行判定,故复用性较高并且能够显著地提高判定操作的效率和准确性。
附图说明
结合附图,本发明的技术特征以及优点将会被本领域技术人员更好地理解,其中:
图1是根据本发明的实施例的数据挖掘方法的流程图。
具体实施方式
图1是根据本发明的实施例的数据挖掘方法的流程图。如图1所示,本发明所公开的数据挖掘方法包括下列步骤:(A1)根据目标数据集(例如金融领域中的交易记录集合)中的记录统计每个目标对象(例如金融卡持卡人)的特征向量以构成粗糙数据集,每个所述特征向量包括其所对应的目标对象的至少一个属性数据(例如金融领域中的月均消费金额、月均交易频度、跨境交易次数、境外消费金额、大额交易占比、高端卡标记等等)的值;(A2)从所述粗糙数据集中筛选出所有已知的第一类目标对象(例如金融领域中的高端持卡人,诸如白金卡持卡人)所对应的特征向量,并对筛选出的特性向量执行过滤操作以获得样本;(A3)基于所述样本构建回归模型,并随之使用所构建的回归模型确定所有已知的第二类目标对象(例如金融领域中的非高端持卡人,诸如普通卡持卡人)中的每个是否潜在地属于第一类目标对象(例如,从非高端持卡人中挖掘出潜在的高端持卡人)。
优选地,在本发明所公开的数据挖掘方法中,所述回归模型所包含的特征向量包括所有第一类目标对象共有的属性特征数据的值(即所述回归模型包含所有第一类目标对象共有的特征)。
优选地,在本发明所公开的数据挖掘方法中,所述过滤操作包括:根据预定的标准从筛选出的特性向量中过滤掉噪点(例如,针对金融领域中的高端持卡人信息,如果以月均消费额度为筛选标准,则过滤噪点的过程如下:对该字段进行排序,过滤掉前10%和后10%的持卡人交易信息,因为并非所有的高端卡的消费记录都属于高端消费特性,并且少部分的高端卡的消费记录过于高端而缺少普适性)。
优选地,在本发明所公开的数据挖掘方法中,所述步骤(A3)进一步包括: 抽取所述样本中的第一部分(例如70%)作为训练样本以构建所述回归模型;抽取所述样本中的第二部分(例如20%)作为测试样本以测试已构建的回归模型;抽取所述样本中的第三部分(例如10%)作为应用样本以测试已构建的回归模型稳定性。
优选地,在本发明所公开的数据挖掘方法中,所述步骤(A3)进一步包括:在构建所述回归模型之前对每个样本中的每个字段执行规范化操作,其包括:(1)处理缺失值(例如,如果数值型字段缺失数据,则采用列均值填充,如果字符型字段缺失数据,则丢弃该样本);(2)处理奇异值(例如采用盒装图技术过滤出极值异常点);(3)对离散型的字符型字段进行重新编码;(4)对每个样本中的每个字段以z-score方式进行规范化,以消除量纲不一致的影响。
优选地,在本发明所公开的数据挖掘方法中,所述步骤(A3)进一步包括:在所述规范化操作执行完成之后进一步对每个样本中的每个字段执行离散化操作,其包括:(1)对连续型的数据以划分区间的方式进行离散化,其中,区间之间的划分点是导致目标变量出现明显转折的点;(2)根据WOE(weight of evidence,证据权重)值曲线的趋势来判断区间划分结果的优劣,其中,如果WOE值曲线是递增、递减或者只有一个转折点的趋势,则确定划分结果是优良的并随之终止离散化操作,否则返回步骤(1)以继续在该区间内进行划分。
优选地,在本发明所公开的数据挖掘方法中,以如下方式使用所构建的回归模型确定所有已知的第二类目标对象中的每个是否潜在地属于第一类目标对象:基于所述回归模型针对已知的第二类目标对象所对应的特征向量计算该已知的第二类目标对象属于第一类目标对象的概率,并且如果计算出的概率大于预定的分类阈值(例如0.8),则判定该已知的第二类目标对象潜在地属于第一类目标对象(例如,在金融领域中,判定某个普通卡用户是潜在的高价值持卡人)。
由上可见,本发明所公开的数据挖掘方法具有下列优点:能够根据目标对象的综合特征来挖掘和分类目标对象,并且由于使用回归模型进行判定,故复用性较高并且能够显著地提高判定操作的效率和准确性。
尽管本发明是通过上述的优选实施方式进行描述的,但是其实现形式并不局限于上述的实施方式。应该认识到:在不脱离本发明主旨和范围的情况下, 本领域技术人员可以对本发明做出不同的变化和修改。

Claims (7)

  1. 一种数据挖掘方法,所述数据挖掘方法包括下列步骤:
    (A1)根据目标数据集中的记录统计每个目标对象的特征向量以构成粗糙数据集,每个所述特征向量包括其所对应的目标对象的至少一个属性数据的值;
    (A2)从所述粗糙数据集中筛选出所有已知的第一类目标对象所对应的特征向量,并对筛选出的特性向量执行过滤操作以获得样本;
    (A3)基于所述样本构建回归模型,并随之使用所构建的回归模型确定所有已知的第二类目标对象中的每个是否潜在地属于第一类目标对象。
  2. 根据权利要求1所述的数据挖掘方法,其特征在于,所述回归模型所包含的特征向量包括所有第一类目标对象共有的属性特征数据的值。
  3. 根据权利要求2所述的数据挖掘方法,其特征在于,所述过滤操作包括:根据预定的标准从筛选出的特性向量中过滤掉噪点。
  4. 根据权利要求3所述的数据挖掘方法,其特征在于,所述步骤(A3)进一步包括:抽取所述样本中的第一部分作为训练样本以构建所述回归模型;抽取所述样本中的第二部分作为测试样本以测试已构建的回归模型;抽取所述样本中的第三部分作为应用样本以测试已构建的回归模型稳定性。
  5. 根据权利要求4所述的数据挖掘方法,其特征在于,所述步骤(A3)进一步包括:在构建所述回归模型之前对每个样本中的每个字段执行规范化操作,其包括:(1)处理缺失值;(2)处理奇异值;(3)对离散型的字符型字段进行重新编码;(4)对每个样本中的每个字段以z-score方式进行规范化,以消除量纲不一致的影响。
  6. 根据权利要求5所述的数据挖掘方法,其特征在于,所述步骤(A3)进一步包括:在所述规范化操作执行完成之后进一步对每个样本中的每个字段执行离散化操作,其包括:(1)对连续型的数据以划分区间的方式进行离散化,其中,区间之间的划分点是导致目标变量出现明显转折的点;(2)根据WOE值曲线的趋势来判断区间划分结果的优劣,其中,如果WOE值曲线是递增、递减或者只有一个转折点的趋势,则确定划分结果是优良的并随之终止离散化操 作,否则返回步骤(1)以继续在该区间内进行划分。
  7. 根据权利要求6所述的数据挖掘方法,其特征在于,以如下方式使用所构建的回归模型确定所有已知的第二类目标对象中的每个是否潜在地属于第一类目标对象:基于所述回归模型针对已知的第二类目标对象所对应的特征向量计算该已知的第二类目标对象属于第一类目标对象的概率,并且如果计算出的概率大于预定的分类阈值,则判定该已知的第二类目标对象潜在地属于第一类目标对象。
PCT/CN2014/093430 2013-12-10 2014-12-10 数据挖掘方法 WO2015085916A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US15/100,533 US10482093B2 (en) 2013-12-10 2014-12-10 Data mining method
EP14869820.2A EP3082051A4 (en) 2013-12-10 2014-12-10 Data mining method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310665357.7A CN104699717B (zh) 2013-12-10 2013-12-10 数据挖掘方法
CN201310665357.7 2013-12-10

Publications (1)

Publication Number Publication Date
WO2015085916A1 true WO2015085916A1 (zh) 2015-06-18

Family

ID=53346850

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/093430 WO2015085916A1 (zh) 2013-12-10 2014-12-10 数据挖掘方法

Country Status (4)

Country Link
US (1) US10482093B2 (zh)
EP (1) EP3082051A4 (zh)
CN (1) CN104699717B (zh)
WO (1) WO2015085916A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108369584A (zh) * 2015-11-25 2018-08-03 日本电气株式会社 信息处理系统、函数创建方法和函数创建程序
CN109325167A (zh) * 2017-07-31 2019-02-12 株式会社理光 特征分析方法、装置、设备、计算机可读存储介质
CN112102074A (zh) * 2020-10-14 2020-12-18 深圳前海弘犀智能科技有限公司 一种评分卡建模方法
US11514062B2 (en) 2017-10-05 2022-11-29 Dotdata, Inc. Feature value generation device, feature value generation method, and feature value generation program
US11727203B2 (en) 2017-03-30 2023-08-15 Dotdata, Inc. Information processing system, feature description method and feature description program

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570015B (zh) * 2015-10-09 2020-02-21 杭州海康威视数字技术股份有限公司 图像搜索方法和装置
CN107229621B (zh) * 2016-03-23 2020-12-04 北大方正集团有限公司 差异数据的清洗方法及装置
CN105975590A (zh) * 2016-05-03 2016-09-28 无锡雅座在线科技发展有限公司 对象类型的确定方法和装置
CN107153907A (zh) * 2017-03-22 2017-09-12 华为技术有限公司 一种评估视频业务的潜在用户的方法及相关装置
CN108334954A (zh) * 2018-01-22 2018-07-27 中国平安人寿保险股份有限公司 逻辑回归模型的构建方法、装置、存储介质及终端
CN108427753A (zh) * 2018-03-13 2018-08-21 河海大学 一种新的数据挖掘方法
CN108932530A (zh) * 2018-06-29 2018-12-04 新华三大数据技术有限公司 标签体系的构建方法及装置
CN109241669A (zh) * 2018-10-08 2019-01-18 成都四方伟业软件股份有限公司 一种自动建模方法、装置及其存储介质
CN109583468B (zh) * 2018-10-12 2020-09-22 阿里巴巴集团控股有限公司 训练样本获取方法,样本预测方法及对应装置
CN109615232A (zh) * 2018-12-12 2019-04-12 税友软件集团股份有限公司 一种信用积分预测的方法、系统及相关装置
CN109636482B (zh) * 2018-12-21 2021-07-27 南京星云数字技术有限公司 基于相似度模型的数据处理方法及系统
CN111667919A (zh) * 2019-03-05 2020-09-15 上海悟景信息科技有限公司 一种基于物联网的智慧养老系统及方法
CN110245981B (zh) * 2019-05-31 2021-10-01 南京瑞栖智能交通技术产业研究院有限公司 一种基于手机信令数据的人群类型识别方法
CN110908858B (zh) * 2019-10-12 2022-10-25 中国平安财产保险股份有限公司 基于双漏斗结构的日志类样本抽样方法及相关装置
CN110766944A (zh) * 2019-10-28 2020-02-07 长沙地大物泊网络科技有限公司 一种基于车辆轨迹大数据挖掘的停车位推荐方法
CN110910231A (zh) * 2019-11-06 2020-03-24 上海百事通信息技术股份有限公司 一种债务清收管理平台
CN112783934B (zh) * 2019-11-11 2024-04-05 北京沃东天骏信息技术有限公司 交易数据区间确定方法及装置、存储介质及计算机设备
CN113222632A (zh) * 2020-02-04 2021-08-06 北京京东振世信息技术有限公司 对象挖掘的方法和装置
CN111984707A (zh) * 2020-08-21 2020-11-24 重庆大数据研究院有限公司 一种营运车辆多模式跨界大数据的多层次深度融合挖掘方法
CN114422973B (zh) * 2022-03-30 2022-06-28 北京融信数联科技有限公司 一种基于大数据的网约车司机智能识别方法、系统及可读存储介质
CN114511047B (zh) * 2022-04-20 2022-07-08 北京寄云鼎城科技有限公司 挖掘机工作模式识别方法、计算机设备及介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101227435A (zh) * 2008-01-28 2008-07-23 浙江大学 基于Logistic回归的中文垃圾邮件过滤方法
CN101477544A (zh) * 2009-01-12 2009-07-08 腾讯科技(深圳)有限公司 一种识别垃圾文本的方法和系统
CN103176981A (zh) * 2011-12-20 2013-06-26 中国科学院计算机网络信息中心 一种事件信息挖掘并预警的方法

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6016394A (en) * 1997-09-17 2000-01-18 Tenfold Corporation Method and system for database application software creation requiring minimal programming
US20020169735A1 (en) * 2001-03-07 2002-11-14 David Kil Automatic mapping from data to preprocessing algorithms
US7139764B2 (en) * 2003-06-25 2006-11-21 Lee Shih-Jong J Dynamic learning and knowledge representation for data mining
US7873218B2 (en) * 2004-04-26 2011-01-18 Canon Kabushiki Kaisha Function approximation processing method and image processing method
US7627620B2 (en) * 2004-12-16 2009-12-01 Oracle International Corporation Data-centric automatic data mining
US8503796B2 (en) * 2006-12-29 2013-08-06 Ncr Corporation Method of validating a media item
US20100191734A1 (en) * 2009-01-23 2010-07-29 Rajaram Shyam Sundar System and method for classifying documents
US8527445B2 (en) * 2010-12-02 2013-09-03 Pukoa Scientific, Llc Apparatus, system, and method for object detection and identification
US8402397B2 (en) * 2011-07-26 2013-03-19 Mentor Graphics Corporation Hotspot detection based on machine learning
US8612599B2 (en) * 2011-09-07 2013-12-17 Accenture Global Services Limited Cloud service monitoring system
US9152997B2 (en) * 2012-01-27 2015-10-06 Robert M. Sellers, Jr. Method for buying and selling stocks and securities
CN103324938A (zh) * 2012-03-21 2013-09-25 日电(中国)有限公司 训练姿态分类器及物体分类器、物体检测的方法及装置
CN102693498A (zh) * 2012-05-16 2012-09-26 上海卓达信息技术有限公司 一种基于不完善数据的精准推荐方法
US9164961B2 (en) * 2012-11-30 2015-10-20 Xerox Corporation Methods and systems for predicting learning curve for statistical machine translation system
JP6072078B2 (ja) * 2012-12-25 2017-02-01 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation 分析装置、分析プログラム、分析方法、推定装置、推定プログラム、及び、推定方法。

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101227435A (zh) * 2008-01-28 2008-07-23 浙江大学 基于Logistic回归的中文垃圾邮件过滤方法
CN101477544A (zh) * 2009-01-12 2009-07-08 腾讯科技(深圳)有限公司 一种识别垃圾文本的方法和系统
CN103176981A (zh) * 2011-12-20 2013-06-26 中国科学院计算机网络信息中心 一种事件信息挖掘并预警的方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3082051A4 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108369584A (zh) * 2015-11-25 2018-08-03 日本电气株式会社 信息处理系统、函数创建方法和函数创建程序
CN108369584B (zh) * 2015-11-25 2022-07-08 圆点数据公司 信息处理系统、描述符创建方法和描述符创建程序
US11727203B2 (en) 2017-03-30 2023-08-15 Dotdata, Inc. Information processing system, feature description method and feature description program
CN109325167A (zh) * 2017-07-31 2019-02-12 株式会社理光 特征分析方法、装置、设备、计算机可读存储介质
CN109325167B (zh) * 2017-07-31 2022-02-18 株式会社理光 特征分析方法、装置、设备、计算机可读存储介质
US11514062B2 (en) 2017-10-05 2022-11-29 Dotdata, Inc. Feature value generation device, feature value generation method, and feature value generation program
CN112102074A (zh) * 2020-10-14 2020-12-18 深圳前海弘犀智能科技有限公司 一种评分卡建模方法
CN112102074B (zh) * 2020-10-14 2024-01-30 深圳前海弘犀智能科技有限公司 一种评分卡建模方法

Also Published As

Publication number Publication date
US20160314174A1 (en) 2016-10-27
EP3082051A1 (en) 2016-10-19
CN104699717B (zh) 2019-01-18
US10482093B2 (en) 2019-11-19
CN104699717A (zh) 2015-06-10
EP3082051A4 (en) 2017-08-16

Similar Documents

Publication Publication Date Title
WO2015085916A1 (zh) 数据挖掘方法
WO2017143919A1 (zh) 一种建立数据识别模型的方法及装置
CN111428599B (zh) 票据识别方法、装置和设备
CN107563757B (zh) 数据风险识别的方法及装置
WO2017143932A1 (zh) 基于样本聚类的欺诈交易检测方法
CN111695626A (zh) 基于混合采样与特征选择的高维度不平衡数据分类方法
CN105893380B (zh) 一种改良的文本分类特征选择方法
CN110895758B (zh) 存在作弊交易的信用卡账户的筛选方法、装置及系统
CN111143415A (zh) 一种数据处理方法、装置和计算机可读存储介质
CN107870956B (zh) 一种高效用项集挖掘方法、装置及数据处理设备
CN108268886B (zh) 用于识别外挂操作的方法及系统
CN105893388A (zh) 一种基于类间区分度及类内高表征度的文本特征提取方法
CN113034238B (zh) 基于电子商务平台交易的商品品牌特征提取及智能推荐管理方法
WO2017114276A1 (zh) 基于图的分析用户的方法和系统
CN111626842A (zh) 一种消费行为数据的分析方法和装置
CN111861486A (zh) 异常账户识别方法、装置、设备及介质
WO2018090643A1 (zh) 客户分类方法、电子装置及存储介质
CN107679862B (zh) 一种欺诈交易模型的特征值确定方法及装置
CN111046947B (zh) 分类器的训练系统及方法、异常样本的识别方法
Lim et al. Conditional weighted transaction aggregation for credit card fraud detection
CN113111935B (zh) 一种大宗商品电子商务市场中基于交易数据实时聚类的相同交易主体判定方法
CN116029755A (zh) 一种评估促销费政策成效的分析方法及系统
CN105354597B (zh) 一种游戏物品的分类方法及装置
CN114066173A (zh) 资金流动行为分析方法及存储介质
CN105320666B (zh) 多数据集的数据聚合方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14869820

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2014869820

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2014869820

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 15100533

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE