CN105653615A - Big data based computer data mining discovery method - Google Patents

Big data based computer data mining discovery method Download PDF

Info

Publication number
CN105653615A
CN105653615A CN201510991107.1A CN201510991107A CN105653615A CN 105653615 A CN105653615 A CN 105653615A CN 201510991107 A CN201510991107 A CN 201510991107A CN 105653615 A CN105653615 A CN 105653615A
Authority
CN
China
Prior art keywords
data
interest
data mining
parameter
discovery
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510991107.1A
Other languages
Chinese (zh)
Inventor
蒋雪峰
蒋顺恺
石永丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201510991107.1A priority Critical patent/CN105653615A/en
Publication of CN105653615A publication Critical patent/CN105653615A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a big data based computer data mining discovery method. The computer data mining discovery method comprises the steps of denoising and normalizing an input sample set; then selecting a generated cluster number and an initial cluster parameter to execute a mean cluster algorithm, taking computing results as sub-clusters of an initial cluster, and computing an eigenvector; and finally setting a discovery interest parameter and comparing the discovery interest parameter with an interest characteristic to output interest characteristic data. According to the method, massive data can be effectively processed, the computing speed of data mining can be increased, the precision of data mining can be improved, and the required discovery interest characteristic data can be effectively extracted.

Description

Computer data based on big data excavates heuristic approach
Technical field
The present invention relates to the field of computer data digging technology, especially relate to excavate heuristic approach based on the computer data of big data.
Background technology
In recent years, along with the development of the technology such as data acquisition and storage, the data of information-intensive society present formula of being packed and increase, and occur in that the situation of " data rich, poor in information ". Mass data not only makes people be difficult to tell useful data, more considerably increases the complexity of data analysis work. In order to solve this problem, data mining technology is arisen at the historic moment. The birth of data mining, it is intended to by society exists can widely used mass data, convert useful knowledge and information to, be applied to the market analysis, fake monitoring, client possess, the control of product and Science Explorations etc.
In actual applications, data mining task is various, but can be generally divided into two classes: predicts and broadcasts and states. It relates to multiple subject, and such as machine learning, mathematical statistics, chess formula identification, signal processing, data base etc., data mining is as the application oriented technology of a bite, and traditional data mining algorithm is not applied for all application scenarios. Because in actual applications, the data in data base are frequently not very good, such as non-equilibrium data, many categorical datas, time series and data stream etc.
Although in recent years, data mining technology all achieves great successes in theory and practical application, but owing in Practical Project, data are complicated, mining task is various, still have many challenging problems urgently to be resolved hurrily, excavation based on big data is exactly one of them major issue, and its arithmetic speed and precision etc. all await further raising.
Summary of the invention
It is an object of the invention to overcome the drawbacks described above existed in prior art, a kind of computer data based on big data is provided to excavate heuristic approach, can effectively process mass data, improve arithmetic speed and the precision of data mining, can effectively extract required exploration interest characteristic.
To achieve these goals, the invention provides a kind of computer data based on big data and excavate heuristic approach, the method comprises the steps:
Step 1: the big set of data samples X, wherein X={X that input is given1,X2,������,Xn;
Step 2: input sample set is carried out denoising, normalized;
Step 3: choose m value and W=(W1,W2,��Wm) respectively as the parameter generating bunch number and first prothyl of means clustering algorithm;
Step 4: perform means clustering algorithm, obtains m bunch { M1,M2,��,Mm;
Step 5: by each M of this m bunchiSubmanifold as initial cluster;
Step 6: calculating characteristic vector Y, its characteristic vector Y is expressed as:
Y=(Y1,Y2,...,Ym);
Step 7: set exploration interest parameter d, work as Yi< d, then output interest characteristics Yi, otherwise do not process.
Compared with prior art, the having important advantages in that of the present invention:
The invention discloses the computer data based on big data and excavate heuristic approach, this computer data excavates heuristic approach by input sample set is carried out denoising, normalized, then choose the parameter generating bunch number and first prothyl and perform means clustering algorithm, and using result of calculation as initial cluster submanifold, calculate characteristic vector again, last set exploration interest parameter and interest characteristics compare, thus exporting interest characteristics data. The method can process mass data effectively, improves arithmetic speed and the precision of data mining, can effectively extract required exploration interest characteristic.
Accompanying drawing explanation
Fig. 1 be the present invention realize theory diagram.
Detailed description of the invention
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in detail, in order to those skilled in the art is more fully understood that the present invention.
As it is shown in figure 1, be the computer data based on the big data of the present invention detailed description of the invention of excavating heuristic approach, it is embodied as step and is:
Step 1: the big set of data samples X, wherein X={X that input is given1,X2,������,Xn;
Step 2: input sample set is carried out denoising, normalized;
Step 3: choose m value and W=(W1,W2,��Wm) respectively as the parameter generating bunch number and first prothyl of means clustering algorithm;
Step 4: perform means clustering algorithm, obtains m bunch { M1,M2,��,Mm;
Step 5: by each M of this m bunchiSubmanifold as initial cluster;
Step 6: calculating characteristic vector Y, its characteristic vector Y is expressed as:
Y=(Y1,Y2,...,Ym);
Step 7: set exploration interest parameter d, work as Yi< d, then output interest characteristics Yi, otherwise do not process.
This computer data excavates heuristic approach by input sample set is carried out denoising, normalized, then choose the parameter generating bunch number and first prothyl and perform means clustering algorithm, and using result of calculation as initial cluster submanifold, calculate characteristic vector again, last set exploration interest parameter and interest characteristics compare, thus exporting interest characteristics data. The method can process mass data effectively, improves arithmetic speed and the precision of data mining, can effectively extract required exploration interest characteristic.
Embodiment of above is only the technological thought that the present invention is described, it is impossible to limits protection scope of the present invention, every technological thought proposed according to the present invention, any change done on technical scheme basis with this, each falls within scope.

Claims (1)

1. excavate heuristic approach based on the computer data of big data, it is characterised in that the method comprises the following steps:
Step 1: the big set of data samples X, wherein X={X that input is given1,X2,��,Xn;
Step 2: input sample set is carried out denoising, normalized;
Step 3: choose m value and W=(W1,W2,��Wm) respectively as the parameter generating bunch number and first prothyl of means clustering algorithm;
Step 4: perform means clustering algorithm, obtains m bunch { M1,M2,��,Mm;
Step 5: by each M of this m bunchiSubmanifold as initial cluster;
Step 6: calculating characteristic vector Y, its characteristic vector Y is expressed as:
Y=(Y1,Y2,...,Ym);
Step 7: set exploration interest parameter d, work as Yi< d, then output interest characteristics Yi, otherwise do not process.
CN201510991107.1A 2015-12-25 2015-12-25 Big data based computer data mining discovery method Pending CN105653615A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510991107.1A CN105653615A (en) 2015-12-25 2015-12-25 Big data based computer data mining discovery method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510991107.1A CN105653615A (en) 2015-12-25 2015-12-25 Big data based computer data mining discovery method

Publications (1)

Publication Number Publication Date
CN105653615A true CN105653615A (en) 2016-06-08

Family

ID=56476767

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510991107.1A Pending CN105653615A (en) 2015-12-25 2015-12-25 Big data based computer data mining discovery method

Country Status (1)

Country Link
CN (1) CN105653615A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423764A (en) * 2017-07-26 2017-12-01 西安交通大学 K Means clustering methods based on NSS AKmeans and MapReduce processing big data
CN113408207A (en) * 2021-06-24 2021-09-17 上海硕恩网络科技股份有限公司 Data mining method based on social network analysis technology

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423764A (en) * 2017-07-26 2017-12-01 西安交通大学 K Means clustering methods based on NSS AKmeans and MapReduce processing big data
CN113408207A (en) * 2021-06-24 2021-09-17 上海硕恩网络科技股份有限公司 Data mining method based on social network analysis technology

Similar Documents

Publication Publication Date Title
CN103761236B (en) Incremental frequent pattern increase data mining method
CN102081655B (en) Information retrieval method based on Bayesian classification algorithm
CN110967974B (en) Coal flow balance self-adaptive control method based on rough set
CN103744935A (en) Rapid mass data cluster processing method for computer
US10241767B2 (en) Distributed function generation with shared structures
CN105320764A (en) 3D model retrieval method and 3D model retrieval apparatus based on slow increment features
CN104731891A (en) Method for extracting mass data in ETL (extract transform load)
CN105653615A (en) Big data based computer data mining discovery method
Khan et al. A unified theoretical framework for data mining
Amirgaliev et al. Recognition of rocks at uranium deposits by using a few methods of machine learning
KR101307337B1 (en) System and method for Triangle Counting Sampling by using Map-Reduce
CN105390132A (en) Language model-based application protocol identification method and system
Hüls A contour algorithm for computing stable fiber bundles of nonautonomous, noninvertible maps
CN105653672A (en) Time sequence based computer data mining method
CN105469122A (en) Computer data mining method based on unbalance samples
CN107391433B (en) Feature selection method based on KDE conditional entropy of mixed features
CN102043910B (en) Remote protein homology detection and fold recognition method based on Top-n-gram
CN104317861A (en) Mutual information based interval data attribute selection method
Moss et al. An FPGA-based spectral anomaly detection system
Prasad et al. Assessment of clustering tendency through progressive random sampling and graph-based clustering results
CN105224697A (en) Sort method with filtercondition and the device for performing described method
CN105404892A (en) Ordered fuzzy C mean value cluster method used for sequence data segmentation
Chatterjee et al. A markov chain based ensemble method for crowdsourced clustering
Das et al. A divide and conquer feature reduction and feature selection algorithm in KDD intrusion detection dataset
Fang et al. A model for aggregating contributions of synergistic crowdsourcing workflows

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160608

WD01 Invention patent application deemed withdrawn after publication