CN111597232A

CN111597232A - A data mining method and system

Info

Publication number: CN111597232A
Application number: CN202010454602.XA
Authority: CN
Inventors: 张媛媛; 方静; 赵军伟; 付小伟; 孙临珺; 白宏斌
Original assignee: North China Institute of Science and Technology
Current assignee: North China Institute of Science and Technology
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2020-08-28

Abstract

The invention discloses a data mining method and a system thereof, wherein the method comprises the following steps: s1, outputting corresponding key phrases based on the data mining requirements; s2, generating an associated phrase of each keyword phrase, wherein the associated phrase consists of the keyword phrases, the keyword opposite phrase, the keyword similar phrase and the keyword associated phrase; s3, constructing a data mining model based on the keyword group, the keyword opposite group, the keyword similar group and the keyword related group; and S4, mining the target data based on the Hadoop running data mining model. The invention realizes the high-precision and high-efficiency mining of the target data.

Description

A data mining method and system

技术领域technical field

本发明涉及数据挖掘领域，具体涉及一种数据挖掘方法及系统。The invention relates to the field of data mining, in particular to a data mining method and system.

背景技术Background technique

目前，随着计算机和网络应用的日益广泛以及不同领域的业务种类的日益丰富，从与特定的对象相关的海量数据记录中有效地挖掘出不同类别的对象以便针对不同类别的对象实施不同的处理方案变的越来越重要。At present, with the widening of computer and network applications and the increasing variety of business types in different fields, different types of objects are effectively mined from massive data records related to specific objects in order to implement different processing for different types of objects. Programs are becoming more and more important.

在现有的技术方案中，通常根据与目标对象相关联的一个或多个属性数据来对目标对象进行分类，即基于每个目标对象的某个或某些特定的属性数据的值对目标对象进行分类。In the existing technical solution, the target objects are usually classified according to one or more attribute data associated with the target object, that is, the target object is classified based on the value of one or some specific attribute data of each target object. sort.

然而，现有的技术方案存在如下问题：由于仅仅基于单一或数个属性数据对目标对象进行分类，故分类结果的精确度较低，并且由于需要对每个目标对象的属性数据进行相同的评估操作，故数据挖掘的效率较低。However, the existing technical solutions have the following problems: since the target object is only classified based on a single or several attribute data, the accuracy of the classification result is low, and since the attribute data of each target object needs to be evaluated the same operation, so the efficiency of data mining is low.

发明内容SUMMARY OF THE INVENTION

为解决上述问题，本发明提供了一种数据挖掘方法及系统，实现了目标数据的高精确度和高效率挖掘。In order to solve the above problems, the present invention provides a data mining method and system, which realizes high-precision and high-efficiency mining of target data.

为实现上述目的，本发明采取的技术方案为：To achieve the above object, the technical scheme adopted in the present invention is:

一种数据挖掘方法，包括如下步骤：A data mining method, comprising the following steps:

S1、基于数据挖掘要求输出对应的关键词组；S1. Output corresponding keyword groups based on data mining requirements;

S2、生成每一个关键词组的关联词组，该关联词组由关键词组、关键词相反词组、关键字相似词组、关键字关联词组构成；S2, generating an associated phrase of each keyword group, and the associated phrase is composed of a keyword group, a keyword opposite phrase, a keyword similar phrase, and a keyword associated phrase;

S3、基于关键词组、关键词相反词组、关键字相似词组、关键字关联词组构建数据挖掘模型；S3. Construct a data mining model based on keyword groups, keyword opposite phrases, keyword similar phrases, and keyword associated phrases;

S4、基于Hadoop运行数据挖掘模型实现目标数据的挖掘。S4. Run the data mining model based on Hadoop to realize target data mining.

进一步地，所述步骤S1中基于CCIPCA算法实现关键词组的获取。Further, in the step S1, the acquisition of the keyword group is realized based on the CCIPCA algorithm.

进一步地，所述步骤S2中基于Inception V3深度神经网络模型实现关键词组、关键词相反词组、关键字相似词组、关键字关联词组。Further, in the step S2, keyword groups, keyword opposite phrases, keyword similar phrases, and keyword associated phrases are implemented based on the Inception V3 deep neural network model.

进一步地，所述步骤S3分别根据关键词组、关键词相反词组、关键字相似词组、关键字关联词组构建关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型。Further, the step S3 constructs a keyword group mining model, a keyword opposite phrase mining model, a keyword similar phrase mining model and a keyword associated phrase respectively according to the keyword group, the keyword opposite phrase, the keyword similar phrase, and the keyword associated phrase. Mining models.

进一步地，所述步骤S4基于Hadoop同时运行关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型进行目标数据的挖掘。Further, in the step S4, a keyword group mining model, a keyword opposite phrase mining model, a keyword similar phrase mining model and a keyword associated phrase mining model are simultaneously run based on Hadoop to mine the target data.

本发明还提供了一种数据挖掘系统，包括：The present invention also provides a data mining system, comprising:

关键词组生成模块，用于基于数据挖掘要求生成对应的关键词组；The keyword group generation module is used to generate corresponding keyword groups based on data mining requirements;

关联词组生成模块，用于基于所述关键词组生成对应的关联词组，该关联词组由关键词组、关键词相反词组、关键字相似词组、关键字关联词组构成；an associated phrase generation module, configured to generate a corresponding associated phrase based on the keyword group, where the associated phrase is composed of a keyword group, a keyword opposite phrase, a keyword similar phrase, and a keyword associated phrase;

数据挖掘模型构建模块，用于基于关键词组、关键词相反词组、关键字相似词组、关键字关联词组构建数据挖掘模型；A data mining model building module is used to build a data mining model based on keyword groups, keyword opposite phrases, keyword similar phrases, and keyword associated phrases;

数据挖掘模块，用于基于Hadoop运行数据挖掘模型实现目标数据的挖掘The data mining module is used to run the data mining model based on Hadoop to realize the target data mining

进一步地，所述关联词组生成模块根据关键词组、关键词相反词组、关键字相似词组、关键字关联词组构建关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型；Further, the associated phrase generation module constructs the keyword group mining model, the keyword opposite phrase mining model, the keyword similar phrase mining model and the keyword association according to the keyword group, the keyword opposite phrase, the keyword similar phrase, and the keyword associated phrase. Phrase mining model;

进一步地，所述数据挖掘模块基于Hadoop同时运行关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型进行目标数据的挖掘，每一个数据挖掘模型挖掘出来的目标数据对应一个数据库。Further, the data mining module simultaneously runs the keyword group mining model, the keyword opposite phrase mining model, the keyword similar phrase mining model and the keyword associated phrase mining model based on Hadoop to mine the target data, and each data mining model excavates the target data. The target data corresponds to a database.

本发明具有以下有益效果：The present invention has the following beneficial effects:

基于CCIPCA算法进行关键词组提取，基于Inception V3深度神经网络模型实现关键词组、关键词相反词组、关键字相似词组、关键字关联词组获取，再基于Inception V3深度神经网络模型实现关键词组、关键词相反词组、关键字相似词组、关键字关联词组分别构建关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型，然后基于Hadoop同时运行关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型进行目标数据的挖掘，从而实现了目标数据的高精确度和高效率挖掘。Extract keyword groups based on CCIPCA algorithm, realize keyword group, keyword opposite phrase, keyword similar phrase, keyword related phrase based on Inception V3 deep neural network model, and then realize keyword group and keyword opposite based on Inception V3 deep neural network model The keyword group mining model, keyword opposite phrase mining model, keyword similar phrase mining model and keyword related phrase mining model are respectively constructed for phrases, keyword-similar phrases, and keyword-related phrases. The word opposite phrase mining model, the keyword similar phrase mining model and the keyword associated phrase mining model are used to mine the target data, thereby realizing the high accuracy and high efficiency mining of the target data.

附图说明Description of drawings

图1为本发明实施例一种数据挖掘方法的流程图。FIG. 1 is a flowchart of a data mining method according to an embodiment of the present invention.

图2为本发明实施例一种数据挖掘系统的系统框图。FIG. 2 is a system block diagram of a data mining system according to an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的及优点更加清楚明白，以下结合实施例对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。In order to make the objects and advantages of the present invention more clear, the present invention will be further described in detail below with reference to the embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

如图1所示，本发明实施例提供了一种数据挖掘方法，包括如下步骤：As shown in FIG. 1, an embodiment of the present invention provides a data mining method, including the following steps:

本实施例中，所述步骤S1中基于CCIPCA算法实现关键词组的获取。In this embodiment, in the step S1, the keyword group is obtained based on the CCIPCA algorithm.

本实施例中，所述步骤S2中基于Inception V3深度神经网络模型实现关键词组、关键词相反词组、关键字相似词组、关键字关联词组。In this embodiment, in the step S2, keyword groups, keyword opposite phrases, keyword similar phrases, and keyword associated phrases are implemented based on the Inception V3 deep neural network model.

本实施例中，所述步骤S3分别根据关键词组、关键词相反词组、关键字相似词组、关键字关联词组构建关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型挖掘模型。In this embodiment, the step S3 constructs the keyword group mining model, the keyword opposite phrase mining model, the keyword similar phrase mining model and the keyword respectively according to the keyword group, the keyword opposite phrase, the keyword similar phrase, and the keyword related phrase. Associated Phrase Mining Model Mining Model.

本实施例中，所述步骤S4基于Hadoop同时运行关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型进行目标数据的挖掘。In this embodiment, the step S4 runs the keyword group mining model, the keyword opposite phrase mining model, the keyword similar phrase mining model and the keyword associated phrase mining model simultaneously based on Hadoop to mine the target data.

如图2所示，本发明实施例提供了一种数据挖掘系统，包括：As shown in FIG. 2, an embodiment of the present invention provides a data mining system, including:

数据挖掘模块，用于基于Hadoop运行数据挖掘模型实现目标数据的挖掘；The data mining module is used to run the data mining model based on Hadoop to realize the target data mining;

本实施例中，所述关联词组生成模块根据关键词组、关键词相反词组、关键字相似词组、关键字关联词组构建关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型；In this embodiment, the associated phrase generation module constructs a keyword group mining model, a keyword opposite phrase mining model, a keyword similar phrase mining model and a Word-related phrase mining model;

本实施例中，所述数据挖掘模块基于Hadoop同时运行关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型进行目标数据的挖掘，每一个数据挖掘模型挖掘出来的目标数据对应一个数据库。In this embodiment, the data mining module runs the keyword group mining model, the keyword opposite phrase mining model, the keyword similar phrase mining model and the keyword associated phrase mining model simultaneously based on Hadoop to mine the target data. The mined target data corresponds to a database.

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以作出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the principles of the present invention, several improvements and modifications can be made, and these improvements and modifications should also be It is regarded as the protection scope of the present invention.

Claims

1. a data mining method, is characterized in that: comprise the steps:

S1. Output corresponding keyword groups based on data mining requirements;

S2, generating an associated phrase of each keyword group, and the associated phrase is composed of a keyword group, a keyword opposite phrase, a keyword similar phrase, and a keyword associated phrase;

S3. Construct a data mining model based on keyword groups, keyword opposite phrases, keyword similar phrases, and keyword associated phrases;

S4. Run the data mining model based on Hadoop to realize target data mining.

2 . The data mining method according to claim 1 , wherein in the step S1 , the acquisition of the keyword group is realized based on the CCIPCA algorithm. 3 .

3. A data mining method as claimed in claim 1, characterized in that: in the step S2, keyword groups, keyword opposite phrases, keyword similar phrases, and keyword associated phrases are implemented based on the InceptionV3 deep neural network model.

4. a kind of data mining method as claimed in claim 1 is characterized in that: described step S3 constructs keyword group mining model according to keyword group, keyword opposite phrase, keyword similar phrase, keyword associated phrase respectively, keyword Opposite phrase mining model, keyword similarity phrase mining model and keyword association phrase mining model.

5. a kind of data mining method as claimed in claim 4, is characterized in that: described step S4 simultaneously runs keyword group mining model, keyword opposite phrase mining model, keyword similar phrase mining model and keyword associated phrase based on Hadoop The mining model is used to mine the target data.

6. A data mining system, characterized in that: comprising:

The keyword group generation module is used to generate corresponding keyword groups based on data mining requirements;

an associated phrase generation module, configured to generate a corresponding associated phrase based on the keyword group, where the associated phrase is composed of a keyword group, a keyword opposite phrase, a keyword similar phrase, and a keyword associated phrase;

A data mining model building module is used to build a data mining model based on keyword groups, keyword opposite phrases, keyword similar phrases, and keyword associated phrases;

The data mining module is used to run the data mining model based on Hadoop to realize the target data mining

A data mining system as claimed in claim 6, characterized in that: the associated phrase generation module constructs a keyword group mining model according to keyword groups, keyword opposite phrases, keyword similar phrases, and keyword associated phrases. Phrase mining model, keyword similar phrase mining model and keyword related phrase mining model;

A data mining system according to claim 6, wherein the data mining module simultaneously runs a keyword group mining model, a keyword opposite phrase mining model, a keyword similar phrase mining model and a keyword associated phrase mining model based on Hadoop The model mines the target data, and the target data mined by each data mining model corresponds to a database.