CN111597232A - A data mining method and system - Google Patents
A data mining method and system Download PDFInfo
- Publication number
- CN111597232A CN111597232A CN202010454602.XA CN202010454602A CN111597232A CN 111597232 A CN111597232 A CN 111597232A CN 202010454602 A CN202010454602 A CN 202010454602A CN 111597232 A CN111597232 A CN 111597232A
- Authority
- CN
- China
- Prior art keywords
- keyword
- phrase
- data mining
- mining
- mining model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007418 data mining Methods 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 14
- 238000005065 mining Methods 0.000 claims abstract description 60
- 238000003062 neural network model Methods 0.000 claims description 5
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及数据挖掘领域,具体涉及一种数据挖掘方法及系统。The invention relates to the field of data mining, in particular to a data mining method and system.
背景技术Background technique
目前,随着计算机和网络应用的日益广泛以及不同领域的业务种类的日益丰富,从与特定的对象相关的海量数据记录中有效地挖掘出不同类别的对象以便针对不同类别的对象实施不同的处理方案变的越来越重要。At present, with the widening of computer and network applications and the increasing variety of business types in different fields, different types of objects are effectively mined from massive data records related to specific objects in order to implement different processing for different types of objects. Programs are becoming more and more important.
在现有的技术方案中,通常根据与目标对象相关联的一个或多个属性数据来对目标对象进行分类,即基于每个目标对象的某个或某些特定的属性数据的值对目标对象进行分类。In the existing technical solution, the target objects are usually classified according to one or more attribute data associated with the target object, that is, the target object is classified based on the value of one or some specific attribute data of each target object. sort.
然而,现有的技术方案存在如下问题:由于仅仅基于单一或数个属性数据对目标对象进行分类,故分类结果的精确度较低,并且由于需要对每个目标对象的属性数据进行相同的评估操作,故数据挖掘的效率较低。However, the existing technical solutions have the following problems: since the target object is only classified based on a single or several attribute data, the accuracy of the classification result is low, and since the attribute data of each target object needs to be evaluated the same operation, so the efficiency of data mining is low.
发明内容SUMMARY OF THE INVENTION
为解决上述问题,本发明提供了一种数据挖掘方法及系统,实现了目标数据的高精确度和高效率挖掘。In order to solve the above problems, the present invention provides a data mining method and system, which realizes high-precision and high-efficiency mining of target data.
为实现上述目的,本发明采取的技术方案为:To achieve the above object, the technical scheme adopted in the present invention is:
一种数据挖掘方法,包括如下步骤:A data mining method, comprising the following steps:
S1、基于数据挖掘要求输出对应的关键词组;S1. Output corresponding keyword groups based on data mining requirements;
S2、生成每一个关键词组的关联词组,该关联词组由关键词组、关键词相反词组、关键字相似词组、关键字关联词组构成;S2, generating an associated phrase of each keyword group, and the associated phrase is composed of a keyword group, a keyword opposite phrase, a keyword similar phrase, and a keyword associated phrase;
S3、基于关键词组、关键词相反词组、关键字相似词组、关键字关联词组构建数据挖掘模型;S3. Construct a data mining model based on keyword groups, keyword opposite phrases, keyword similar phrases, and keyword associated phrases;
S4、基于Hadoop运行数据挖掘模型实现目标数据的挖掘。S4. Run the data mining model based on Hadoop to realize target data mining.
进一步地,所述步骤S1中基于CCIPCA算法实现关键词组的获取。Further, in the step S1, the acquisition of the keyword group is realized based on the CCIPCA algorithm.
进一步地,所述步骤S2中基于Inception V3深度神经网络模型实现关键词组、关键词相反词组、关键字相似词组、关键字关联词组。Further, in the step S2, keyword groups, keyword opposite phrases, keyword similar phrases, and keyword associated phrases are implemented based on the Inception V3 deep neural network model.
进一步地,所述步骤S3分别根据关键词组、关键词相反词组、关键字相似词组、关键字关联词组构建关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型。Further, the step S3 constructs a keyword group mining model, a keyword opposite phrase mining model, a keyword similar phrase mining model and a keyword associated phrase respectively according to the keyword group, the keyword opposite phrase, the keyword similar phrase, and the keyword associated phrase. Mining models.
进一步地,所述步骤S4基于Hadoop同时运行关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型进行目标数据的挖掘。Further, in the step S4, a keyword group mining model, a keyword opposite phrase mining model, a keyword similar phrase mining model and a keyword associated phrase mining model are simultaneously run based on Hadoop to mine the target data.
本发明还提供了一种数据挖掘系统,包括:The present invention also provides a data mining system, comprising:
关键词组生成模块,用于基于数据挖掘要求生成对应的关键词组;The keyword group generation module is used to generate corresponding keyword groups based on data mining requirements;
关联词组生成模块,用于基于所述关键词组生成对应的关联词组,该关联词组由关键词组、关键词相反词组、关键字相似词组、关键字关联词组构成;an associated phrase generation module, configured to generate a corresponding associated phrase based on the keyword group, where the associated phrase is composed of a keyword group, a keyword opposite phrase, a keyword similar phrase, and a keyword associated phrase;
数据挖掘模型构建模块,用于基于关键词组、关键词相反词组、关键字相似词组、关键字关联词组构建数据挖掘模型;A data mining model building module is used to build a data mining model based on keyword groups, keyword opposite phrases, keyword similar phrases, and keyword associated phrases;
数据挖掘模块,用于基于Hadoop运行数据挖掘模型实现目标数据的挖掘The data mining module is used to run the data mining model based on Hadoop to realize the target data mining
进一步地,所述关联词组生成模块根据关键词组、关键词相反词组、关键字相似词组、关键字关联词组构建关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型;Further, the associated phrase generation module constructs the keyword group mining model, the keyword opposite phrase mining model, the keyword similar phrase mining model and the keyword association according to the keyword group, the keyword opposite phrase, the keyword similar phrase, and the keyword associated phrase. Phrase mining model;
进一步地,所述数据挖掘模块基于Hadoop同时运行关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型进行目标数据的挖掘,每一个数据挖掘模型挖掘出来的目标数据对应一个数据库。Further, the data mining module simultaneously runs the keyword group mining model, the keyword opposite phrase mining model, the keyword similar phrase mining model and the keyword associated phrase mining model based on Hadoop to mine the target data, and each data mining model excavates the target data. The target data corresponds to a database.
本发明具有以下有益效果:The present invention has the following beneficial effects:
基于CCIPCA算法进行关键词组提取,基于Inception V3深度神经网络模型实现关键词组、关键词相反词组、关键字相似词组、关键字关联词组获取,再基于Inception V3深度神经网络模型实现关键词组、关键词相反词组、关键字相似词组、关键字关联词组分别构建关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型,然后基于Hadoop同时运行关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型进行目标数据的挖掘,从而实现了目标数据的高精确度和高效率挖掘。Extract keyword groups based on CCIPCA algorithm, realize keyword group, keyword opposite phrase, keyword similar phrase, keyword related phrase based on Inception V3 deep neural network model, and then realize keyword group and keyword opposite based on Inception V3 deep neural network model The keyword group mining model, keyword opposite phrase mining model, keyword similar phrase mining model and keyword related phrase mining model are respectively constructed for phrases, keyword-similar phrases, and keyword-related phrases. The word opposite phrase mining model, the keyword similar phrase mining model and the keyword associated phrase mining model are used to mine the target data, thereby realizing the high accuracy and high efficiency mining of the target data.
附图说明Description of drawings
图1为本发明实施例一种数据挖掘方法的流程图。FIG. 1 is a flowchart of a data mining method according to an embodiment of the present invention.
图2为本发明实施例一种数据挖掘系统的系统框图。FIG. 2 is a system block diagram of a data mining system according to an embodiment of the present invention.
具体实施方式Detailed ways
为了使本发明的目的及优点更加清楚明白,以下结合实施例对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。In order to make the objects and advantages of the present invention more clear, the present invention will be further described in detail below with reference to the embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
如图1所示,本发明实施例提供了一种数据挖掘方法,包括如下步骤:As shown in FIG. 1, an embodiment of the present invention provides a data mining method, including the following steps:
S1、基于数据挖掘要求输出对应的关键词组;S1. Output corresponding keyword groups based on data mining requirements;
S2、生成每一个关键词组的关联词组,该关联词组由关键词组、关键词相反词组、关键字相似词组、关键字关联词组构成;S2, generating an associated phrase of each keyword group, and the associated phrase is composed of a keyword group, a keyword opposite phrase, a keyword similar phrase, and a keyword associated phrase;
S3、基于关键词组、关键词相反词组、关键字相似词组、关键字关联词组构建数据挖掘模型;S3. Construct a data mining model based on keyword groups, keyword opposite phrases, keyword similar phrases, and keyword associated phrases;
S4、基于Hadoop运行数据挖掘模型实现目标数据的挖掘。S4. Run the data mining model based on Hadoop to realize target data mining.
本实施例中,所述步骤S1中基于CCIPCA算法实现关键词组的获取。In this embodiment, in the step S1, the keyword group is obtained based on the CCIPCA algorithm.
本实施例中,所述步骤S2中基于Inception V3深度神经网络模型实现关键词组、关键词相反词组、关键字相似词组、关键字关联词组。In this embodiment, in the step S2, keyword groups, keyword opposite phrases, keyword similar phrases, and keyword associated phrases are implemented based on the Inception V3 deep neural network model.
本实施例中,所述步骤S3分别根据关键词组、关键词相反词组、关键字相似词组、关键字关联词组构建关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型挖掘模型。In this embodiment, the step S3 constructs the keyword group mining model, the keyword opposite phrase mining model, the keyword similar phrase mining model and the keyword respectively according to the keyword group, the keyword opposite phrase, the keyword similar phrase, and the keyword related phrase. Associated Phrase Mining Model Mining Model.
本实施例中,所述步骤S4基于Hadoop同时运行关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型进行目标数据的挖掘。In this embodiment, the step S4 runs the keyword group mining model, the keyword opposite phrase mining model, the keyword similar phrase mining model and the keyword associated phrase mining model simultaneously based on Hadoop to mine the target data.
如图2所示,本发明实施例提供了一种数据挖掘系统,包括:As shown in FIG. 2, an embodiment of the present invention provides a data mining system, including:
关键词组生成模块,用于基于数据挖掘要求生成对应的关键词组;The keyword group generation module is used to generate corresponding keyword groups based on data mining requirements;
关联词组生成模块,用于基于所述关键词组生成对应的关联词组,该关联词组由关键词组、关键词相反词组、关键字相似词组、关键字关联词组构成;an associated phrase generation module, configured to generate a corresponding associated phrase based on the keyword group, where the associated phrase is composed of a keyword group, a keyword opposite phrase, a keyword similar phrase, and a keyword associated phrase;
数据挖掘模型构建模块,用于基于关键词组、关键词相反词组、关键字相似词组、关键字关联词组构建数据挖掘模型;A data mining model building module is used to build a data mining model based on keyword groups, keyword opposite phrases, keyword similar phrases, and keyword associated phrases;
数据挖掘模块,用于基于Hadoop运行数据挖掘模型实现目标数据的挖掘;The data mining module is used to run the data mining model based on Hadoop to realize the target data mining;
本实施例中,所述关联词组生成模块根据关键词组、关键词相反词组、关键字相似词组、关键字关联词组构建关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型;In this embodiment, the associated phrase generation module constructs a keyword group mining model, a keyword opposite phrase mining model, a keyword similar phrase mining model and a Word-related phrase mining model;
本实施例中,所述数据挖掘模块基于Hadoop同时运行关键词组挖掘模型、关键词相反词组挖掘模型、关键字相似词组挖掘模型和关键字关联词组挖掘模型进行目标数据的挖掘,每一个数据挖掘模型挖掘出来的目标数据对应一个数据库。In this embodiment, the data mining module runs the keyword group mining model, the keyword opposite phrase mining model, the keyword similar phrase mining model and the keyword associated phrase mining model simultaneously based on Hadoop to mine the target data. The mined target data corresponds to a database.
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以作出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the principles of the present invention, several improvements and modifications can be made, and these improvements and modifications should also be It is regarded as the protection scope of the present invention.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010454602.XA CN111597232A (en) | 2020-05-26 | 2020-05-26 | A data mining method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010454602.XA CN111597232A (en) | 2020-05-26 | 2020-05-26 | A data mining method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111597232A true CN111597232A (en) | 2020-08-28 |
Family
ID=72190655
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010454602.XA Pending CN111597232A (en) | 2020-05-26 | 2020-05-26 | A data mining method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111597232A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268620A (en) * | 2018-01-08 | 2018-07-10 | 南京邮电大学 | A kind of Document Classification Method based on hadoop data minings |
US20190034823A1 (en) * | 2017-07-27 | 2019-01-31 | Getgo, Inc. | Real time learning of text classification models for fast and efficient labeling of training data and customization |
CN110889443A (en) * | 2019-11-21 | 2020-03-17 | 成都数联铭品科技有限公司 | Unsupervised text classification system and unsupervised text classification method |
-
2020
- 2020-05-26 CN CN202010454602.XA patent/CN111597232A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190034823A1 (en) * | 2017-07-27 | 2019-01-31 | Getgo, Inc. | Real time learning of text classification models for fast and efficient labeling of training data and customization |
CN108268620A (en) * | 2018-01-08 | 2018-07-10 | 南京邮电大学 | A kind of Document Classification Method based on hadoop data minings |
CN110889443A (en) * | 2019-11-21 | 2020-03-17 | 成都数联铭品科技有限公司 | Unsupervised text classification system and unsupervised text classification method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112115232B (en) | Data error correction method, device and server | |
CN111191002B (en) | Neural code searching method and device based on hierarchical embedding | |
CN109359172B (en) | An Entity Alignment Optimization Method Based on Graph Partitioning | |
CN111709235A (en) | A system and method for statistical analysis of text data based on natural language processing | |
CN106168965A (en) | Knowledge mapping constructing system | |
CN104216874B (en) | Positive and negative mode excavation method and system are weighted between the Chinese word based on coefficient correlation | |
CN109710621B (en) | Keyword Search KSANEW Method Combining Semantic Class Nodes and Edge Weights | |
Feng et al. | Hypergraph isomorphism computation | |
Wu et al. | An improved k-means algorithm for document clustering | |
Gao et al. | Democratic diffusion aggregation for image retrieval | |
Jiang et al. | Combining embedding-based and symbol-based methods for entity alignment | |
CN106055652A (en) | Method and system for database matching based on patterns and examples | |
CN108154185A (en) | A kind of k-means clustering methods of secret protection | |
CN108399268A (en) | A kind of increment type isomery figure clustering method based on game theory | |
CN110851577A (en) | Knowledge graph expansion method and device in electric power field | |
CN104008119A (en) | One-to-many mixed string comparison method | |
CN118278519A (en) | Knowledge graph completion method and related equipment | |
CN105335499B (en) | It is a kind of based on distribution-convergence model document clustering method | |
CN104679732A (en) | Syntax tree similarity calculation method based on fuzzy tree kernel | |
CN111597232A (en) | A data mining method and system | |
CN118279925A (en) | Image text matching algorithm integrating local and global semantics | |
Liang et al. | News text classification method for edge computing based on BiLSTM-attention | |
CN109670071B (en) | A serialized multi-feature-guided cross-media hash retrieval method and system | |
Yuan | A new word clustering algorithm based on word similarity | |
Song et al. | Large scale network embedding: A separable approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200828 |