CN105653615A

CN105653615A - Big data based computer data mining discovery method

Info

Publication number: CN105653615A
Application number: CN201510991107.1A
Authority: CN
Inventors: 蒋雪峰; 蒋顺恺; 石永丽
Original assignee: Individual
Current assignee: Individual
Priority date: 2015-12-25
Filing date: 2015-12-25
Publication date: 2016-06-08

Abstract

The invention discloses a big data based computer data mining discovery method. The computer data mining discovery method comprises the steps of denoising and normalizing an input sample set; then selecting a generated cluster number and an initial cluster parameter to execute a mean cluster algorithm, taking computing results as sub-clusters of an initial cluster, and computing an eigenvector; and finally setting a discovery interest parameter and comparing the discovery interest parameter with an interest characteristic to output interest characteristic data. According to the method, massive data can be effectively processed, the computing speed of data mining can be increased, the precision of data mining can be improved, and the required discovery interest characteristic data can be effectively extracted.

Description

Computer data based on big data excavates heuristic approach

Technical field

The present invention relates to the field of computer data digging technology, especially relate to excavate heuristic approach based on the computer data of big data.

Background technology

In recent years, along with the development of the technology such as data acquisition and storage, the data of information-intensive society present formula of being packed and increase, and occur in that the situation of " data rich, poor in information ". Mass data not only makes people be difficult to tell useful data, more considerably increases the complexity of data analysis work. In order to solve this problem, data mining technology is arisen at the historic moment. The birth of data mining, it is intended to by society exists can widely used mass data, convert useful knowledge and information to, be applied to the market analysis, fake monitoring, client possess, the control of product and Science Explorations etc.

In actual applications, data mining task is various, but can be generally divided into two classes: predicts and broadcasts and states. It relates to multiple subject, and such as machine learning, mathematical statistics, chess formula identification, signal processing, data base etc., data mining is as the application oriented technology of a bite, and traditional data mining algorithm is not applied for all application scenarios. Because in actual applications, the data in data base are frequently not very good, such as non-equilibrium data, many categorical datas, time series and data stream etc.

Although in recent years, data mining technology all achieves great successes in theory and practical application, but owing in Practical Project, data are complicated, mining task is various, still have many challenging problems urgently to be resolved hurrily, excavation based on big data is exactly one of them major issue, and its arithmetic speed and precision etc. all await further raising.

Summary of the invention

It is an object of the invention to overcome the drawbacks described above existed in prior art, a kind of computer data based on big data is provided to excavate heuristic approach, can effectively process mass data, improve arithmetic speed and the precision of data mining, can effectively extract required exploration interest characteristic.

To achieve these goals, the invention provides a kind of computer data based on big data and excavate heuristic approach, the method comprises the steps:

Step 1: the big set of data samples X, wherein X={X that input is given₁,X₂,��,X_n;

Step 2: input sample set is carried out denoising, normalized;

Step 3: choose m value and W=(W₁,W₂,��W_m) respectively as the parameter generating bunch number and first prothyl of means clustering algorithm;

Step 4: perform means clustering algorithm, obtains m bunch { M₁,M₂,��,M_m;

Step 5: by each M of this m bunch_iSubmanifold as initial cluster;

Step 6: calculating characteristic vector Y, its characteristic vector Y is expressed as:

Y=(Y₁,Y₂,...,Y_m);

Step 7: set exploration interest parameter d, work as Y_i< d, then output interest characteristics Y_i, otherwise do not process.

Compared with prior art, the having important advantages in that of the present invention:

The invention discloses the computer data based on big data and excavate heuristic approach, this computer data excavates heuristic approach by input sample set is carried out denoising, normalized, then choose the parameter generating bunch number and first prothyl and perform means clustering algorithm, and using result of calculation as initial cluster submanifold, calculate characteristic vector again, last set exploration interest parameter and interest characteristics compare, thus exporting interest characteristics data. The method can process mass data effectively, improves arithmetic speed and the precision of data mining, can effectively extract required exploration interest characteristic.

Accompanying drawing explanation

Fig. 1 be the present invention realize theory diagram.

Detailed description of the invention

Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in detail, in order to those skilled in the art is more fully understood that the present invention.

As it is shown in figure 1, be the computer data based on the big data of the present invention detailed description of the invention of excavating heuristic approach, it is embodied as step and is:

Step 2: input sample set is carried out denoising, normalized;

Step 5: by each M of this m bunch_iSubmanifold as initial cluster;

Y=(Y₁,Y₂,...,Y_m);

This computer data excavates heuristic approach by input sample set is carried out denoising, normalized, then choose the parameter generating bunch number and first prothyl and perform means clustering algorithm, and using result of calculation as initial cluster submanifold, calculate characteristic vector again, last set exploration interest parameter and interest characteristics compare, thus exporting interest characteristics data. The method can process mass data effectively, improves arithmetic speed and the precision of data mining, can effectively extract required exploration interest characteristic.

Embodiment of above is only the technological thought that the present invention is described, it is impossible to limits protection scope of the present invention, every technological thought proposed according to the present invention, any change done on technical scheme basis with this, each falls within scope.

Claims

1. excavate heuristic approach based on the computer data of big data, it is characterised in that the method comprises the following steps:

Step 2: input sample set is carried out denoising, normalized;

Step 4: perform means clustering algorithm, obtains m bunch { M_1,M₂,��,M_m;

Step 5: by each M of this m bunch_iSubmanifold as initial cluster;

Y=(Y₁,Y₂,...,Y_m);