CN113486088A

CN113486088A - Data mining method based on complex technology

Info

Publication number: CN113486088A
Application number: CN202110759405.3A
Authority: CN
Inventors: 祖玉宁
Original assignee: Shanghai Sesns Network Technology Co ltd
Current assignee: Shanghai Sesns Network Technology Co ltd
Priority date: 2021-07-05
Filing date: 2021-07-05
Publication date: 2021-10-08

Abstract

The invention relates to the technical field of data mining, in particular to a data mining method based on a complex technology. The method comprises a data acquisition step, a database establishment step, a data mining step and a redundancy processing step. According to the method, useful data in a field is defined, then data which is not in the field or repeated data is compared and selected, and useless data is deleted through redundancy processing, so that the decision-making performance of decision-making data is prevented from being reduced due to the useless data, the data mining effect is improved, and the problem that the decision-making performance of the mined data cannot be supported is solved.

Description

Data mining method based on complex technology

Technical Field

The invention relates to the technical field of data mining, in particular to a data mining method based on a complex technology.

Background

In recent years, data mining has attracted great attention in the information industry, mainly because there is a great deal of data that can be widely used or utilized, and there is an urgent need to convert the data into useful information and knowledge that can be widely used in various applications including business management, production control, market analysis, engineering design, scientific exploration, and the like, and thus there is a need to mine such valuable decision data.

Data mining is a decision support process, and is mainly based on artificial intelligence, machine learning, pattern recognition, statistics, databases, visualization technologies and the like, the data of enterprises are analyzed in a highly automated manner, inductive reasoning is made, potential patterns are mined out from the data, decision makers are helped to adjust market strategies, risks are reduced, and correct decisions are made.

However, many data mined by many data mining methods at present have many repeated or redundant data without decision value, so that the data mining effect is greatly reduced, and the decision of the mined data cannot be supported.

Disclosure of Invention

The present invention aims to provide a method for data mining based on complex technology to solve the problems in the background art.

In order to achieve the above object, the present invention provides a method for mining data based on complex technology, comprising the following steps:

s1.1, data acquisition: collecting data;

s1.2, establishing a database: constructing metadata corresponding to the acquired data according to the acquired data, storing data information in the metadata, and loading a mining database according to the metadata;

s1.3, data mining: mining useful data in a mining database to form decision data;

s1.4, redundancy processing: and cleaning redundant data in the mined decision data.

As a further improvement of the technical solution, the step of establishing the database in S1.2 is as follows:

s2.1, describing the acquired data;

s2.2, performing quality evaluation on the described data, and combining and integrating to obtain metadata;

and S2.3, loading the mining database and maintaining the mining database.

As a further improvement of the technical solution, the data mining in S1.3 adopts an intelligent mining algorithm, which comprises the following steps:

s3.1, defining decision data according to different decision requirements;

s3.2, extracting the data in a mining database by taking the defined decision data as a standard, and preprocessing the extracted data to improve the quality of the data;

s3.3, evaluating the extracted data, distinguishing redundant data, and forming decision data after distinguishing;

and S3.4, analyzing the decision data and producing a data mining result.

As a further improvement of the present technical solution, the method for preprocessing the extracted data in S3.2 includes noise elimination and data type conversion.

As a further improvement of the technical solution, the K nearest neighbor algorithm in S3.4 classifies data to be analyzed, and the algorithm steps are as follows:

s4.1, extracting characteristic values of the data according to the description of the collected data, and re-describing training data set vectors according to the characteristic values;

s4.2, calculating K data sets similar to the data set acquired again in the training data set;

and S4.3, sequentially calculating the weight of each class in the K adjacent sets of the data set collected again, comparing the weight of each class, and classifying the data set into the class with the maximum weight.

As a further improvement of the present technical solution, the formula of the similarity calculation in S4.2 is as follows:

wherein, Sim (d)_i,d_j) For the j-th acquired data set d_jWith the ith training data set d_iThe similarity of (2); m is the number of acquired data; w_ikFor training data set d_iThe total number of (2); w_jkFor the acquired data set d_jThe total number of (c).

As a further improvement of the present technical solution, the weight calculation formula in S4.3 is as follows:

wherein the content of the first and second substances,

for the feature vector of the acquired data set,

is the feature vector similarity;

is a category attribute function; c_iIs i categories; if the data set d is collected_jBelong to C_iClass, then

Otherwise

As a further improvement of the present technical solution, the redundancy processing step in S1.4 is as follows:

and S5.1, comparing the data in the decision data.

And S5.2, deleting redundant decision data in comparison by using a decision algorithm.

As a further improvement of the technical solution, the decision algorithm formula is as follows:

wherein, γ_iFor the ith decision data (c)_iM) ultimately resulting in an upper bound supported by the rule; s_iFor the ith decision data (c)_iM) ultimately yields a lower bound for rule support if γ_i≤γ₀Or S_i≥S₀Then the ith decision data (c) is added_iM) deletion, and additionally γ₀Is minimum rule support, S₀Is the maximum rule support.

As a further improvement of the present technical solution, the support degree calculation formula supported by the rule is as follows:

wherein, X_iA support set for the ith decision data; and Y is a data total set.

Compared with the prior art, the invention has the beneficial effects that: by defining useful data in a field, comparing and selecting data which is not in the field or repeated data, and deleting the useless data through redundancy processing, the decision-making performance of decision-making data is prevented from being reduced due to the useless data, so that the data mining effect is improved, and the problem that the decision-making performance of the mined data cannot be supported is solved.

Drawings

FIG. 1 is a flow chart of method steps for data mining of the present invention;

FIG. 2 is a flow chart of the steps of building a database according to the present invention;

FIG. 3 is a flow chart of the steps of the intelligent mining algorithm of the present invention;

FIG. 4 is a flow chart of the data classification steps of the present invention;

FIG. 5 is a flow chart of the redundancy processing steps of the present invention.

Detailed Description

Example 1

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-5, the present invention provides a technical solution:

the invention provides a data mining method based on a complex technology, which comprises the following steps:

s1.1, data acquisition: collecting data, wherein the data comprises the collection of data of business management, production control, market analysis, engineering design, scientific exploration and the like;

In addition, the steps of establishing the database in S1.2 are as follows:

s2.1, describing the collected data, so that the conceptual data is converted into logic data to be input into a computer for the computer to recognize;

wherein, the quality evaluation flow is as follows:

firstly, determining a data quality index and an evaluation rule to be detected, then writing a corresponding SQL script to detect and analyze data, and finally calculating the percentage score of the data meeting each rule; the overall score of the system can be calculated by calculating the score of each rule, and then averaging after the overall score is obtained to obtain the final evaluation value.

And S2.3, loading the mining database, and maintaining the mining database, wherein the maintenance of the database comprises backing up system data, recovering the database system, generating a user information table, authorizing the information table, monitoring the operation condition of the system, processing system errors in time and ensuring the safety of the system data.

Further, data mining in S1.3 adopts an intelligent mining algorithm, and the algorithm steps are as follows:

s3.1, defining decision data according to different decision requirements, wherein the definition refers to defining useful data in various fields, for example, defining market analysis data, defining data of market research, market risk assessment data and the like as useful data, otherwise, defining data irrelevant to the market or repeated data as useless data, and deleting the useless data through redundancy processing, so that the decision-making performance of the decision data is prevented from being reduced due to the useless data, the data mining effect is improved, and the problem that the decision-making performance of the mining data cannot be supported is solved;

and S3.4, analyzing the decision data and producing a data mining result.

Specifically, the method for preprocessing the extracted data in S3.2 includes noise elimination and data type conversion.

The noise elimination adopts a regression denoising method, if the data have a dependency relationship, the dependency relationship between the data is solved, so that the dependency relationship is predicted according to the data change, and the dependency relationship is normal distribution; assuming that the data is observed and noise exists, the observed value is updated according to the continuous change of the data so as to remove random noise in the observed value.

In addition, part of the algorithm for data type conversion is as follows:

//1, converting numeric type to string variable toString

varnum＝10；

varstr＝num.toString()；

console.log(str)；

console.log(typeofstr)；

//2, Using String (variants)

console.log(String(num))；

//3, realizing implicit conversion effect by using method of + splicing character strings

console.log(num+”)；

</script>。

In addition, the K nearest neighbor algorithm in S3.4 classifies the data to be analyzed, and the algorithm steps are as follows:

and S4.3, sequentially calculating the weight of each class in the K adjacent sets of the data set collected again, comparing the weight of each class, and classifying the data set into the class with the maximum weight, so that the data are classified and analyzed according to the classified class, distributed analysis of the data is realized, the operation speed is greatly improved, the analysis time is shortened, and the load during analysis is reduced.

In addition, the formula for the similar calculation in S4.2 is as follows:

Further, in S4.3, the weight calculation formula is as follows:

wherein the content of the first and second substances,

for the feature vector of the acquired data set,

is the feature vector similarity;

Otherwise

In addition, the redundancy processing steps in S1.4 are as follows:

and S5.1, comparing the data in the decision data.

In addition, the decision algorithm is formulated as follows:

Specifically, the support calculation formula supported by the rule is as follows:

wherein, X_iA support set for the ith decision data; and Y is a data total set.

Example 2

In order to improve the decision quality of the market analysis, this embodiment describes the embodiment 1 with respect to the application of the market analysis, and the working flow is as follows:

first, market data is collected, for example, when the market of fruits is analyzed, the collected data is the sales condition of each variety of fruits (it is worth to say that the fruit data to be analyzed in this embodiment does not need out-of-season fruits and fruits with short storage periods), and the data is represented by a set a ═ a1, a2 and a3, where a1 is apple, a2 is watermelon, and a3 is strawberry; at this time, data is mined, specifically:

classifying a1, a2 and a3, wherein the classification result is as follows:

the result of data mined by a1 is that daily fruits and the storage period are normal; a2 indicates that the fruit is out of season and the storage period is normal; a3 is out-of-season fruit and has short storage period;

then forming decision data of a1, wherein the storage period of the daily fruit is normal; a2 fruits out of season and normal storage period; a3 out-of-season fruits with short storage period;

and finally, deleting out-of-season fruits and fruits with short storage periods by comparison, and finally generating the fruit to be analyzed as a1, so that the out-of-season watermelon and strawberry data are deleted by a data mining method based on a complex technology, the decision-making performance of decision-making data is prevented from being reduced due to useless data, and the decision-making quality of market analysis is improved.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A method for data mining based on complex technology is characterized by comprising the following steps:

s1.1, data acquisition: collecting data;

2. The method of complex technology based data mining of claim 1, wherein: the steps of establishing the database in S1.2 are as follows:

s2.1, describing the acquired data;

and S2.3, loading the mining database and maintaining the mining database.

3. The method of complex technology based data mining of claim 2, wherein: the data mining in S1.3 adopts an intelligent mining algorithm, and the algorithm steps are as follows:

s3.1, defining decision data according to different decision requirements;

and S3.4, analyzing the decision data and producing a data mining result.

4. The method of complex technology based data mining of claim 3, wherein: the method for preprocessing the extracted data in the S3.2 comprises noise elimination and data type conversion.

5. The method of complex technology based data mining of claim 3, wherein: the K nearest neighbor algorithm in S3.4 classifies the data to be analyzed, and the algorithm steps are as follows:

6. The method of complex technology based data mining of claim 5, wherein: the formula for the similarity calculation in S4.2 is as follows:

wherein, Sim (d)_i，d_j) For the j-th acquired data set d_jWith the ith training data set d_iThe similarity of (2); m is the number of acquired data; w_ikFor training data set d_iThe total number of (2); w_jkFor the acquired data set d_jThe total number of (c).

7. The method of complex technology based data mining of claim 5, wherein: the weight calculation formula in S4.3 is as follows:

wherein the content of the first and second substances,

for the feature vector of the acquired data set,

is the feature vector similarity;

is a category attribute function; c_iAre i categories.

8. The method of complex technology based data mining of claim 1, wherein: the redundancy processing steps in S1.4 are as follows:

s5.1, comparing data in the decision data;

9. The method of complex technology based data mining of claim 8, wherein: the decision algorithm formula is as follows:

wherein, γ_iFor the ith decision data (c)_iM) ultimately resulting in an upper bound supported by the rule; s_iFor the ith decision data (c)_iM) ultimately results in a lower bound for rule support.

10. The method of complex technology based data mining of claim 9, wherein: the support calculation formula supported by the rule is as follows:

wherein, X_iA support set for the ith decision data; and Y is a data total set.