CN111597232A - Data mining method and system - Google Patents
Data mining method and system Download PDFInfo
- Publication number
- CN111597232A CN111597232A CN202010454602.XA CN202010454602A CN111597232A CN 111597232 A CN111597232 A CN 111597232A CN 202010454602 A CN202010454602 A CN 202010454602A CN 111597232 A CN111597232 A CN 111597232A
- Authority
- CN
- China
- Prior art keywords
- phrase
- keyword
- mining
- key
- mining model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007418 data mining Methods 0.000 title claims abstract description 55
- 238000000034 method Methods 0.000 title claims abstract description 16
- 238000005065 mining Methods 0.000 claims abstract description 68
- 238000003062 neural network model Methods 0.000 claims description 5
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data mining method and a system thereof, wherein the method comprises the following steps: s1, outputting corresponding key phrases based on the data mining requirements; s2, generating an associated phrase of each keyword phrase, wherein the associated phrase consists of the keyword phrases, the keyword opposite phrase, the keyword similar phrase and the keyword associated phrase; s3, constructing a data mining model based on the keyword group, the keyword opposite group, the keyword similar group and the keyword related group; and S4, mining the target data based on the Hadoop running data mining model. The invention realizes the high-precision and high-efficiency mining of the target data.
Description
Technical Field
The invention relates to the field of data mining, in particular to a data mining method and system.
Background
Currently, with the increasing popularity of computer and network applications and the increasing abundance of business categories in different domains, it is becoming increasingly important to efficiently mine different classes of objects from the mass data records associated with a particular object in order to implement different processing schemes for the different classes of objects.
In prior art solutions, the target objects are typically classified according to one or more attribute data associated with the target objects, i.e. the target objects are classified based on the value of certain attribute data or certain specific attribute data of each target object.
However, the prior art solutions have the following problems: since the target objects are classified based on only a single or several attribute data, the accuracy of the classification result is low, and since the same evaluation operation needs to be performed on the attribute data of each target object, the efficiency of data mining is low.
Disclosure of Invention
In order to solve the problems, the invention provides a data mining method and a data mining system, which realize high-precision and high-efficiency mining of target data.
In order to achieve the purpose, the invention adopts the technical scheme that:
a data mining method comprises the following steps:
s1, outputting corresponding key phrases based on the data mining requirements;
s2, generating an associated phrase of each keyword phrase, wherein the associated phrase consists of the keyword phrases, the keyword opposite phrase, the keyword similar phrase and the keyword associated phrase;
s3, constructing a data mining model based on the keyword group, the keyword opposite group, the keyword similar group and the keyword related group;
and S4, mining the target data based on the Hadoop running data mining model.
Further, in the step S1, the key phrase is obtained based on the CCIPCA algorithm.
Further, in the step S2, a keyword group, a keyword reverse group, a keyword similar group, and a keyword related group are implemented based on the inclusion V3 deep neural network model.
Further, in step S3, a keyword phrase mining model, a keyword reverse phrase mining model, a keyword similar phrase mining model, and a keyword related phrase mining model are respectively constructed according to the keyword phrase, the keyword reverse phrase, the keyword similar phrase, and the keyword related phrase.
Further, the step S4 performs mining of the target data by simultaneously operating the keyword phrase mining model, the keyword reverse phrase mining model, the keyword similar phrase mining model, and the keyword associated phrase mining model based on the Hadoop.
The invention also provides a data mining system, comprising:
the key phrase generating module is used for generating a corresponding key phrase based on the data mining requirement;
the related phrase generating module is used for generating a corresponding related phrase based on the key phrase, wherein the related phrase consists of a key phrase, a key word opposite phrase, a key word similar phrase and a key word related phrase;
the data mining model building module is used for building a data mining model based on the key phrase, the keyword opposite phrase, the key similar phrase and the key related phrase;
a data mining module for realizing the mining of the target data based on the Hadoop operation data mining model
Further, the associated phrase generating module constructs a keyword phrase mining model, a keyword opposite phrase mining model, a keyword similar phrase mining model and a keyword associated phrase mining model according to the keyword phrases, the keyword opposite phrase, the keyword similar phrase and the keyword associated phrase;
further, the data mining module simultaneously operates a key phrase mining model, a key opposite phrase mining model, a key similar phrase mining model and a key related phrase mining model to mine target data based on Hadoop, and the target data mined by each data mining model corresponds to one database.
The invention has the following beneficial effects:
extracting key phrases based on a CCIPCA algorithm, acquiring the key phrases, the opposite key phrases, the similar key phrases and the related key phrases based on an inclusion V3 deep neural network model, respectively constructing a key phrase mining model, an opposite key phrase mining model, a similar key phrase mining model and a related key phrase mining model based on an inclusion V3 deep neural network model, and simultaneously operating the key phrase mining model, the opposite key phrase mining model, the similar key phrase mining model and the related key phrase mining model based on Hadoop to mine target data, thereby realizing high accuracy and high efficiency mining of the target data.
Drawings
Fig. 1 is a flowchart of a data mining method according to an embodiment of the present invention.
Fig. 2 is a system block diagram of a data mining system according to an embodiment of the present invention.
Detailed Description
In order that the objects and advantages of the invention will be more clearly understood, the invention is further described in detail below with reference to examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, an embodiment of the present invention provides a data mining method, including the following steps:
s1, outputting corresponding key phrases based on the data mining requirements;
s2, generating an associated phrase of each keyword phrase, wherein the associated phrase consists of the keyword phrases, the keyword opposite phrase, the keyword similar phrase and the keyword associated phrase;
s3, constructing a data mining model based on the keyword group, the keyword opposite group, the keyword similar group and the keyword related group;
and S4, mining the target data based on the Hadoop running data mining model.
In this embodiment, in step S1, the key phrase is obtained based on the CCIPCA algorithm.
In this embodiment, in the step S2, a keyword group, a keyword reverse group, a keyword similar group, and a keyword associated group are implemented based on the inclusion V3 deep neural network model.
In this embodiment, in step S3, a keyword phrase mining model, a keyword reverse phrase mining model, a keyword similar phrase mining model, and a keyword related phrase mining model are respectively constructed according to the keyword phrase, the keyword reverse phrase, the keyword similar phrase, and the keyword related phrase.
In this embodiment, in step S4, the keyword phrase mining model, the keyword reverse phrase mining model, the keyword similar phrase mining model, and the keyword associated phrase mining model are simultaneously run based on Hadoop to perform mining on the target data.
As shown in fig. 2, an embodiment of the present invention provides a data mining system, including:
the key phrase generating module is used for generating a corresponding key phrase based on the data mining requirement;
the related phrase generating module is used for generating a corresponding related phrase based on the key phrase, wherein the related phrase consists of a key phrase, a key word opposite phrase, a key word similar phrase and a key word related phrase;
the data mining model building module is used for building a data mining model based on the key phrase, the keyword opposite phrase, the key similar phrase and the key related phrase;
the data mining module is used for realizing the mining of target data based on a Hadoop running data mining model;
in this embodiment, the associated phrase generating module constructs a keyword phrase mining model, a keyword reverse phrase mining model, a keyword similar phrase mining model, and a keyword associated phrase mining model according to the keyword phrase, the keyword reverse phrase, the keyword similar phrase, and the keyword associated phrase;
in this embodiment, the data mining module simultaneously operates a keyword group mining model, a keyword reverse group mining model, a keyword similar group mining model and a keyword associated group mining model based on Hadoop to mine target data, and the target data mined by each data mining model corresponds to one database.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and these improvements and modifications should also be construed as the protection scope of the present invention.
Claims (6)
1. A data mining method is characterized in that: the method comprises the following steps:
s1, outputting corresponding key phrases based on the data mining requirements;
s2, generating an associated phrase of each keyword phrase, wherein the associated phrase consists of the keyword phrases, the keyword opposite phrase, the keyword similar phrase and the keyword associated phrase;
s3, constructing a data mining model based on the keyword group, the keyword opposite group, the keyword similar group and the keyword related group;
and S4, mining the target data based on the Hadoop running data mining model.
2. A method of data mining as claimed in claim 1, wherein: in step S1, the key phrase is obtained based on the CCIPCA algorithm.
3. A method of data mining as claimed in claim 1, wherein: in the step S2, a keyword group, a keyword reverse group, a keyword similar group, and a keyword related group are realized based on the inclusion v3 deep neural network model.
4. A method of data mining as claimed in claim 1, wherein: the step S3 is to construct a keyword phrase mining model, a keyword reverse phrase mining model, a keyword similar phrase mining model, and a keyword related phrase mining model according to the keyword phrase, the keyword reverse phrase, the keyword similar phrase, and the keyword related phrase, respectively.
5. A method of data mining as claimed in claim 4, wherein: and step S4, simultaneously operating a keyword phrase mining model, a keyword reverse phrase mining model, a keyword similar phrase mining model and a keyword associated phrase mining model based on Hadoop to mine target data.
6. A data mining system, characterized by: the method comprises the following steps:
the key phrase generating module is used for generating a corresponding key phrase based on the data mining requirement;
the related phrase generating module is used for generating a corresponding related phrase based on the key phrase, wherein the related phrase consists of a key phrase, a key word opposite phrase, a key word similar phrase and a key word related phrase;
the data mining model building module is used for building a data mining model based on the key phrase, the keyword opposite phrase, the key similar phrase and the key related phrase;
a data mining module for realizing the mining of the target data based on the Hadoop operation data mining model
A data mining system according to claim 6, wherein: the related phrase generating module constructs a keyword phrase mining model, a keyword opposite phrase mining model, a keyword similar phrase mining model and a keyword related phrase mining model according to the keyword phrases, the keyword opposite phrase, the keyword similar phrase and the keyword related phrase;
a data mining system according to claim 6, wherein: the data mining module simultaneously operates a key phrase mining model, a key opposite phrase mining model, a key similar phrase mining model and a key related phrase mining model to mine target data based on Hadoop, and the target data mined by each data mining model corresponds to one database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010454602.XA CN111597232A (en) | 2020-05-26 | 2020-05-26 | Data mining method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010454602.XA CN111597232A (en) | 2020-05-26 | 2020-05-26 | Data mining method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111597232A true CN111597232A (en) | 2020-08-28 |
Family
ID=72190655
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010454602.XA Pending CN111597232A (en) | 2020-05-26 | 2020-05-26 | Data mining method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111597232A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268620A (en) * | 2018-01-08 | 2018-07-10 | 南京邮电大学 | A kind of Document Classification Method based on hadoop data minings |
US20190034823A1 (en) * | 2017-07-27 | 2019-01-31 | Getgo, Inc. | Real time learning of text classification models for fast and efficient labeling of training data and customization |
CN110889443A (en) * | 2019-11-21 | 2020-03-17 | 成都数联铭品科技有限公司 | Unsupervised text classification system and unsupervised text classification method |
-
2020
- 2020-05-26 CN CN202010454602.XA patent/CN111597232A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190034823A1 (en) * | 2017-07-27 | 2019-01-31 | Getgo, Inc. | Real time learning of text classification models for fast and efficient labeling of training data and customization |
CN108268620A (en) * | 2018-01-08 | 2018-07-10 | 南京邮电大学 | A kind of Document Classification Method based on hadoop data minings |
CN110889443A (en) * | 2019-11-21 | 2020-03-17 | 成都数联铭品科技有限公司 | Unsupervised text classification system and unsupervised text classification method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108573045B (en) | Comparison matrix similarity retrieval method based on multi-order fingerprints | |
Chakraborty et al. | Analysis and study of incremental k-means clustering algorithm | |
US9171158B2 (en) | Dynamic anomaly, association and clustering detection | |
CN102272713B (en) | For the method and system of the power consumption management of pattern identification processor | |
CN103345496B (en) | multimedia information retrieval method and system | |
CN108170650B (en) | Text comparison method and text comparison device | |
CN103823838A (en) | Method for inputting and comparing multi-format documents | |
CN112115232A (en) | Data error correction method and device and server | |
CN105706092A (en) | Methods and systems of four-valued simulation | |
CN102622346B (en) | Method, device and system for protein knowledge mining and discovery in Chinese bibliographic database | |
CN106055652A (en) | Method and system for database matching based on patterns and examples | |
CN105404677A (en) | Tree structure based retrieval method | |
CN101685502A (en) | Mode matching method and device | |
CN104199977A (en) | Method for searching based on data creation information in database | |
US20140129543A1 (en) | Search service including indexing text containing numbers in part using one or more number index structures | |
CN109657060B (en) | Safety production accident case pushing method and system | |
CN114238334A (en) | Heterogeneous data encoding method and device, heterogeneous data decoding method and device, computer equipment and storage medium | |
CN113220710A (en) | Data query method and device, electronic equipment and storage medium | |
CN117171296A (en) | Information acquisition method and device and electronic equipment | |
CN111597232A (en) | Data mining method and system | |
CN115048913B (en) | Command processing method and device and electronic equipment | |
CN105426490A (en) | Tree structure based indexing method | |
US11709798B2 (en) | Hash suppression | |
CN102253983A (en) | Method and system for identifying Chinese high-risk words | |
CN113051900B (en) | Synonym recognition method, synonym recognition device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200828 |
|
RJ01 | Rejection of invention patent application after publication |