CN105653615A - Big data based computer data mining discovery method - Google Patents
Big data based computer data mining discovery method Download PDFInfo
- Publication number
- CN105653615A CN105653615A CN201510991107.1A CN201510991107A CN105653615A CN 105653615 A CN105653615 A CN 105653615A CN 201510991107 A CN201510991107 A CN 201510991107A CN 105653615 A CN105653615 A CN 105653615A
- Authority
- CN
- China
- Prior art keywords
- data
- interest
- data mining
- parameter
- discovery
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a big data based computer data mining discovery method. The computer data mining discovery method comprises the steps of denoising and normalizing an input sample set; then selecting a generated cluster number and an initial cluster parameter to execute a mean cluster algorithm, taking computing results as sub-clusters of an initial cluster, and computing an eigenvector; and finally setting a discovery interest parameter and comparing the discovery interest parameter with an interest characteristic to output interest characteristic data. According to the method, massive data can be effectively processed, the computing speed of data mining can be increased, the precision of data mining can be improved, and the required discovery interest characteristic data can be effectively extracted.
Description
Technical field
The present invention relates to the field of computer data digging technology, especially relate to excavate heuristic approach based on the computer data of big data.
Background technology
In recent years, along with the development of the technology such as data acquisition and storage, the data of information-intensive society present formula of being packed and increase, and occur in that the situation of " data rich, poor in information ". Mass data not only makes people be difficult to tell useful data, more considerably increases the complexity of data analysis work. In order to solve this problem, data mining technology is arisen at the historic moment. The birth of data mining, it is intended to by society exists can widely used mass data, convert useful knowledge and information to, be applied to the market analysis, fake monitoring, client possess, the control of product and Science Explorations etc.
In actual applications, data mining task is various, but can be generally divided into two classes: predicts and broadcasts and states. It relates to multiple subject, and such as machine learning, mathematical statistics, chess formula identification, signal processing, data base etc., data mining is as the application oriented technology of a bite, and traditional data mining algorithm is not applied for all application scenarios. Because in actual applications, the data in data base are frequently not very good, such as non-equilibrium data, many categorical datas, time series and data stream etc.
Although in recent years, data mining technology all achieves great successes in theory and practical application, but owing in Practical Project, data are complicated, mining task is various, still have many challenging problems urgently to be resolved hurrily, excavation based on big data is exactly one of them major issue, and its arithmetic speed and precision etc. all await further raising.
Summary of the invention
It is an object of the invention to overcome the drawbacks described above existed in prior art, a kind of computer data based on big data is provided to excavate heuristic approach, can effectively process mass data, improve arithmetic speed and the precision of data mining, can effectively extract required exploration interest characteristic.
To achieve these goals, the invention provides a kind of computer data based on big data and excavate heuristic approach, the method comprises the steps:
Step 1: the big set of data samples X, wherein X={X that input is given1,X2,������,Xn;
Step 2: input sample set is carried out denoising, normalized;
Step 3: choose m value and W=(W1,W2,��Wm) respectively as the parameter generating bunch number and first prothyl of means clustering algorithm;
Step 4: perform means clustering algorithm, obtains m bunch { M1,M2,��,Mm;
Step 5: by each M of this m bunchiSubmanifold as initial cluster;
Step 6: calculating characteristic vector Y, its characteristic vector Y is expressed as:
Y=(Y1,Y2,...,Ym);
Step 7: set exploration interest parameter d, work as Yi< d, then output interest characteristics Yi, otherwise do not process.
Compared with prior art, the having important advantages in that of the present invention:
The invention discloses the computer data based on big data and excavate heuristic approach, this computer data excavates heuristic approach by input sample set is carried out denoising, normalized, then choose the parameter generating bunch number and first prothyl and perform means clustering algorithm, and using result of calculation as initial cluster submanifold, calculate characteristic vector again, last set exploration interest parameter and interest characteristics compare, thus exporting interest characteristics data. The method can process mass data effectively, improves arithmetic speed and the precision of data mining, can effectively extract required exploration interest characteristic.
Accompanying drawing explanation
Fig. 1 be the present invention realize theory diagram.
Detailed description of the invention
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in detail, in order to those skilled in the art is more fully understood that the present invention.
As it is shown in figure 1, be the computer data based on the big data of the present invention detailed description of the invention of excavating heuristic approach, it is embodied as step and is:
Step 1: the big set of data samples X, wherein X={X that input is given1,X2,������,Xn;
Step 2: input sample set is carried out denoising, normalized;
Step 3: choose m value and W=(W1,W2,��Wm) respectively as the parameter generating bunch number and first prothyl of means clustering algorithm;
Step 4: perform means clustering algorithm, obtains m bunch { M1,M2,��,Mm;
Step 5: by each M of this m bunchiSubmanifold as initial cluster;
Step 6: calculating characteristic vector Y, its characteristic vector Y is expressed as:
Y=(Y1,Y2,...,Ym);
Step 7: set exploration interest parameter d, work as Yi< d, then output interest characteristics Yi, otherwise do not process.
This computer data excavates heuristic approach by input sample set is carried out denoising, normalized, then choose the parameter generating bunch number and first prothyl and perform means clustering algorithm, and using result of calculation as initial cluster submanifold, calculate characteristic vector again, last set exploration interest parameter and interest characteristics compare, thus exporting interest characteristics data. The method can process mass data effectively, improves arithmetic speed and the precision of data mining, can effectively extract required exploration interest characteristic.
Embodiment of above is only the technological thought that the present invention is described, it is impossible to limits protection scope of the present invention, every technological thought proposed according to the present invention, any change done on technical scheme basis with this, each falls within scope.
Claims (1)
1. excavate heuristic approach based on the computer data of big data, it is characterised in that the method comprises the following steps:
Step 1: the big set of data samples X, wherein X={X that input is given1,X2,��,Xn;
Step 2: input sample set is carried out denoising, normalized;
Step 3: choose m value and W=(W1,W2,��Wm) respectively as the parameter generating bunch number and first prothyl of means clustering algorithm;
Step 4: perform means clustering algorithm, obtains m bunch { M1,M2,��,Mm;
Step 5: by each M of this m bunchiSubmanifold as initial cluster;
Step 6: calculating characteristic vector Y, its characteristic vector Y is expressed as:
Y=(Y1,Y2,...,Ym);
Step 7: set exploration interest parameter d, work as Yi< d, then output interest characteristics Yi, otherwise do not process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510991107.1A CN105653615A (en) | 2015-12-25 | 2015-12-25 | Big data based computer data mining discovery method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510991107.1A CN105653615A (en) | 2015-12-25 | 2015-12-25 | Big data based computer data mining discovery method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105653615A true CN105653615A (en) | 2016-06-08 |
Family
ID=56476767
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510991107.1A Pending CN105653615A (en) | 2015-12-25 | 2015-12-25 | Big data based computer data mining discovery method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105653615A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423764A (en) * | 2017-07-26 | 2017-12-01 | 西安交通大学 | K Means clustering methods based on NSS AKmeans and MapReduce processing big data |
CN113408207A (en) * | 2021-06-24 | 2021-09-17 | 上海硕恩网络科技股份有限公司 | Data mining method based on social network analysis technology |
-
2015
- 2015-12-25 CN CN201510991107.1A patent/CN105653615A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423764A (en) * | 2017-07-26 | 2017-12-01 | 西安交通大学 | K Means clustering methods based on NSS AKmeans and MapReduce processing big data |
CN113408207A (en) * | 2021-06-24 | 2021-09-17 | 上海硕恩网络科技股份有限公司 | Data mining method based on social network analysis technology |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103761236B (en) | Incremental frequent pattern increase data mining method | |
CN102081655B (en) | Information retrieval method based on Bayesian classification algorithm | |
CN110967974B (en) | Coal flow balance self-adaptive control method based on rough set | |
CN103744935A (en) | Rapid mass data cluster processing method for computer | |
US10241767B2 (en) | Distributed function generation with shared structures | |
CN105320764A (en) | 3D model retrieval method and 3D model retrieval apparatus based on slow increment features | |
CN104731891A (en) | Method for extracting mass data in ETL (extract transform load) | |
CN105653615A (en) | Big data based computer data mining discovery method | |
Khan et al. | A unified theoretical framework for data mining | |
Amirgaliev et al. | Recognition of rocks at uranium deposits by using a few methods of machine learning | |
KR101307337B1 (en) | System and method for Triangle Counting Sampling by using Map-Reduce | |
CN105390132A (en) | Language model-based application protocol identification method and system | |
Hüls | A contour algorithm for computing stable fiber bundles of nonautonomous, noninvertible maps | |
CN105653672A (en) | Time sequence based computer data mining method | |
CN105469122A (en) | Computer data mining method based on unbalance samples | |
CN107391433B (en) | Feature selection method based on KDE conditional entropy of mixed features | |
CN102043910B (en) | Remote protein homology detection and fold recognition method based on Top-n-gram | |
CN104317861A (en) | Mutual information based interval data attribute selection method | |
Moss et al. | An FPGA-based spectral anomaly detection system | |
Prasad et al. | Assessment of clustering tendency through progressive random sampling and graph-based clustering results | |
CN105224697A (en) | Sort method with filtercondition and the device for performing described method | |
CN105404892A (en) | Ordered fuzzy C mean value cluster method used for sequence data segmentation | |
Chatterjee et al. | A markov chain based ensemble method for crowdsourced clustering | |
Das et al. | A divide and conquer feature reduction and feature selection algorithm in KDD intrusion detection dataset | |
Fang et al. | A model for aggregating contributions of synergistic crowdsourcing workflows |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160608 |
|
WD01 | Invention patent application deemed withdrawn after publication |