CN117216150A

CN117216150A - Data mining system based on data warehouse

Info

Publication number: CN117216150A
Application number: CN202311143074.6A
Authority: CN
Inventors: 陆海涛; 黎跃鸣; 杜泓江; 朱晓霞; 邰创业; 刘海龙; 季晓松
Original assignee: Shanghai Dragon New Media Co ltd
Current assignee: Shanghai Dragon New Media Co ltd
Priority date: 2023-09-06
Filing date: 2023-09-06
Publication date: 2023-12-12

Abstract

The invention provides a data mining system based on a data warehouse, belongs to the technical field of data management, and solves the technical problems that the existing data mining system is single in function and lacks functions of prediction, security protection, training learning and the like. The system comprises a user interface module, a mining theme and task module, a data preprocessing module, a mining module, a method slot module, a mode evaluation module, a manual selection module, a training learning module, a knowledge base module, a data cleaning module and a data security module for monitoring each module, wherein the mining module consists of a mining synthesizer and a data mining algorithm base, the method slot module comprises a descriptive data mining sub-module, an anomaly detection data mining sub-module and a predictive data mining sub-module, the knowledge representation module can collect and analyze valuable knowledge which cannot be predicted in advance, and the system can be used for modeling, data mining based on a data warehouse and has the functions of prediction, prevention, training learning and the like.

Description

Data mining system based on data warehouse

Technical Field

The invention belongs to the technical field of data management, and relates to a data mining system based on a data warehouse.

Background

With the wide application of database technology and the high-level demands of people on the current social information, the database technology which takes transaction processing as a core and supports business operation environments and platforms cannot adapt to the demands of people on analysis and decision levels. To effectively provide important information for enterprise and government management and decision making processes, it is necessary to collect and appropriately organize the relevant data from the inside and outside of the enterprise according to the decision making needs to form a comprehensive decision-oriented environment.

The data mining technology based on the data warehouse is a process of deep processing and handling of data in the data warehouse, and is also a method and tool for realizing decision value of the data warehouse. The final goal of constructing a data warehouse is to extract regular knowledge from various types of massive data that has important guiding significance for decision and management activities, but because various types of data are scattered in several business databases or other data sources, to obtain knowledge useful for various types of decision analysis must have corresponding tools for extracting value information from the massive data.

However, the existing data mining system has single and conservative performance, lacks functions of prediction, safety precaution, learning and training and the like, and cannot effectively perform personalized selection learning and multi-mode visual display.

By searching, for example, chinese patent literature discloses a data mining method and device [ application number: 201510598360.0; publication No.: CN105404637B ]. The method of the data mining method and the device comprises the following steps: acquiring a data mining model, wherein the data mining model corresponds to a data table in a data warehouse, and the data table is recorded with a data mining rule according to which data mining is performed; and mining the fact data in the data warehouse according to the data mining rule. Although the data mining method and device disclosed in the patent can realize automatic data mining in a data warehouse system, the data column names of the fact table can be obtained according to the data list model and the data mining rules corresponding to the index model, and indexes in the index model are screened, calculated, counted and classified, but the mining function is too single, the functions of prediction, security protection, learning training and the like are lacked, and the collection analysis of valuable knowledge which cannot be predicted in advance cannot be performed.

Based on the data, the data mining system based on the data warehouse is provided, modeling and deep data mining of the data warehouse are achieved, and meanwhile the system is guaranteed to have the functions of prediction, security protection, training learning and the like.

Disclosure of Invention

The invention aims at solving the problems in the prior art, and provides a data mining system based on a data warehouse, which aims at solving the technical problems that: how to model and deeply mine the data warehouse, and ensure that the system has the functions of prediction, security protection, training learning and the like.

The aim of the invention can be achieved by the following technical scheme:

the data mining system based on the data warehouse comprises a user interface module, a mining theme and task module, a data preprocessing module, a mining module, a method slot module, a mode evaluation module, a manual selection module, a training learning module, a knowledge base module, a data cleaning module and a data security module for monitoring each module, wherein the mining module consists of a mining synthesizer and a data mining algorithm library, the method slot module comprises a descriptive data mining sub-module, an anomaly detection data mining sub-module and a predictive data mining sub-module, the mode evaluation module and the knowledge base module are used for carrying out knowledge data conversion transmission through a knowledge representation module, the knowledge representation module can be used for collecting and analyzing valuable knowledge which cannot be predicted in advance, and the data cleaning module is used for carrying out data cleaning on the data preprocessing module.

The working principle of the invention is as follows: the user interface module is used for converting original data into the quantity and relation characteristics of specific data in a description and visualization mode, the mined theme and task module is used for determining the theme type, the data preprocessing module is used for normalizing, integrating, converting, reducing and converting the data, the mining module can establish a data mining model, the accuracy of the model, the understandability of the model and the performance of the model are guaranteed, the method slot module is used for providing a data mining method, the descriptive data mining sub-module is used for carrying out descriptive data mining, the anomaly detection data mining sub-module is used for carrying out anomaly detection data mining and predictive data mining, the mode evaluation module is used for verifying, evaluating and feeding back data mining problems, the knowledge representation is stored in the knowledge base module, the manual selection module is used for carrying out personalized selection according to the requirements of clients, the personalized selection mode and method are stored in the knowledge base module through training, the data mining algorithm base of the mining module is imported, and when the mining comprehensive device of the next mining module is used for carrying out mining, the data mining comprehensive device is also used for carrying out the descriptive data mining, the anomaly detection data mining and predictive data mining sub-module is used for carrying out predictive data mining, the mode evaluation and the data mining module is used for carrying out feedback on the data mining problem, the knowledge representation is stored in the knowledge base module is used for carrying out personalized selection according to the requirements of clients, the personalized selection mode and the data mining method is used for learning, the data mining method is used for learning mode is used for being imported into the data mining module, and the data mining mode is imported into the data mining mode, and the mode.

The topic types of the mined topic and task module include, but are not limited to, retention control, risk prediction, profitability analysis, data trend analysis, employee analysis, regional analysis, classification clustering, and visualization studies.

The data preprocessing module comprises a data source sub-module, a data normalization sub-module, a data integration sub-module, a data conversion sub-module, a data reduction sub-module and a Microsoft data conversion service sub-module which are sequentially carried out, after the data of the data source sub-module enter the data preprocessing module, the data cleaning module cleans the data, the cleaned data enter the data normalization sub-module, and meanwhile the data cleaning module can clean the data of the data source sub-module, the data normalization sub-module, the data integration sub-module, the data conversion sub-module, the data reduction sub-module and the Microsoft data conversion service sub-module, and the data source sub-module reads the subject data and the task data of the mined subject and task modules.

The key of the data integration sub-module is to acquire data, including but not limited to accessing a data warehouse, and the method for accessing the data includes but is not limited to: accessing data through a transaction-based relational database or a PC-based database, accessing data through a data conversion tool, accessing data with a query tool, accessing data from a flat file.

The data reduction sub-module compresses data by aggregation, deleting redundant features or clustering, etc., including but not limited to data cube aggregation, dimension reduction, data compression, numerical reduction, discretization, and concept layering generation.

Problems to be solved by the data cleaning module include, but are not limited to, data quality, redundant data, obsolete data, and changes in term definitions; problems that may cause a dataset to include, but are not limited to, consistency problems, cleanup of invalid data, cleanup of printing errors, missing values, and data export.

The mining module may build a model of data mining, including, but not limited to, ensuring accuracy of the model, model understandability, and performance of the model; the accuracy of the model can be checked by time to determine how much accuracy the model can understand, namely whether the model can make us know what effect the input will have on the result, whether the model can make us know why the prediction will succeed or fail, whether the model can make us generate the predicted result for the complex data set, and whether the model can detect the result generated by the model, the performance of the model is specifically what speed the model needs to be constructed and what speed needs to be obtained from the model.

The method slot module comprises a descriptive data mining sub-module, an anomaly detection sub-module and a predictive data mining sub-module, wherein the analysis method of the descriptive data mining sub-module comprises but is not limited to association analysis, cluster analysis and sequence analysis, the analysis method of the anomaly detection sub-module comprises but is not limited to anomaly detection analysis, and the analysis method of the predictive data mining sub-module comprises but is not limited to evolution analysis, classification analysis, unstructured data analysis and statistical regression analysis.

The mode evaluation module comprises, but is not limited to, a verification and evaluation submodule and a data mining problem feedback submodule, wherein the verification method of the verification and evaluation submodule comprises that the same data set as that used for establishing a model is used for evaluating the model, so that better results can be obtained when the model is evaluated by different data sets, certain prediction results of the model can be more accurate than other prediction results, and the model is established on the basis of sample data and has good results; the evaluation methods of the verification and evaluation submodules are quite different as different data mining methods are all collected under the data mining algorithm; the data mining uses things from the field of artificial intelligence, the variety of artificial intelligence techniques is various, and the reasons for a plurality of different data mining methods exist; problems with the data mining problem feedback sub-module include, but are not limited to, business user posed problems, technical problems, data mining application problems, problems with implementing data mining project considerations, and privacy related problems with the impact of data mining on society.

The valuable knowledge that cannot be predicted in advance includes, but is not limited to, other candidate results, selected marginal rates, and predictions, where other candidate results may be of interest to other candidate prediction results in addition to knowing what the model will predict, the selected marginal rates being of great interest to the prediction results as to how far the difference between the final prediction result and the other candidate results is, and the prediction being why the model will get such prediction results for another thing the prediction process may want to know.

The user interface module converts raw data by description and visualization into, but not limited to, the following: rules, tables, charts, images, decision trees, and data cubes for exposing the quantity and relational features of the data.

Compared with the prior art, the data mining system based on the data warehouse has the following advantages:

the data theme confirmation is carried out through the cooperation of the mined theme and task module and the data preprocessing module, and the standardization, integration, conversion, reduction and conversion are carried out on the data;

the method comprises the steps of establishing a data mining model through cooperation of a mining module and a method slot module, and providing a data mining method, wherein a descriptive data mining sub-module performs descriptive method data mining, an anomaly detection data mining sub-module performs anomaly detection method data mining and a predictive data mining sub-module performs predictive method data mining;

the mode evaluation module is matched with the knowledge base module and used for verifying, evaluating and feeding back the data mining problem, the knowledge representation is stored in the knowledge base module, and valuable knowledge which cannot be predicted in advance can be collected and analyzed through the knowledge representation module;

the manual selection module is matched with the training learning module, so that personalized selection is performed according to the requirements of customers, the personalized selection mode and method are learned through the training learning module, and the personalized selection mode and method are stored in the knowledge base module;

the data cleaning module is used for cleaning the data of the data preprocessing module, the data security module is used for monitoring each module, the operation safety of each module is guaranteed, and the system is prevented from being attacked by external viruses.

Drawings

FIG. 1 is a diagram of the relationship between data mining and data warehouse in the present invention.

FIG. 2 is a diagram of the relationship of data mining to a data warehouse and knowledge base in the present invention.

Fig. 3 is an analytical flow diagram of data mining in the present invention.

Fig. 4 is a flow chart of data mining in the present invention.

Fig. 5 is a block diagram of data integration in the present invention.

Fig. 6 is a structural diagram of data reduction in the present invention.

Fig. 7 is a block diagram of data cleaning in the present invention.

Fig. 8 is an analysis diagram of a model of data mining in the present invention.

FIG. 9 is a block diagram of valuable knowledge that cannot be predicted in the present invention.

Fig. 10 is a block diagram of a user interface in the present invention.

Fig. 11 is a block diagram of a supermarket data classification model of example 1 in the invention.

Fig. 12 is a block diagram of a supermarket data star model of example 1 of the invention.

Detailed Description

The following are specific embodiments of the present invention and the technical solutions of the present invention will be further described with reference to the accompanying drawings, but the present invention is not limited to these embodiments.

1-10, the data mining system based on the data warehouse comprises a user interface module, a subject and task mining module, a data preprocessing module, a mining module, a method slot module, a mode evaluation module, a manual selection module, a training learning module, a knowledge base module, a data cleaning module and a data security module for monitoring each module, wherein the mining module consists of a mining synthesizer and a data mining algorithm base, the method slot module comprises a descriptive data mining sub-module, an abnormality detection data mining sub-module and a predictive data mining sub-module, the mode evaluation module and the knowledge base module carry out knowledge data conversion transfer through a knowledge representation module, the knowledge representation module can carry out valuable knowledge collection analysis which cannot be predicted in advance, and the data cleaning module carries out data cleaning on the data preprocessing module.

The user interface module is used for converting original data into the quantity and relation characteristics of specific data in a description and visualization mode, the mined theme and task module is used for determining the theme type, the data preprocessing module is used for normalizing, integrating, converting, reducing and converting the data, the mining module can establish a data mining model, the accuracy of the model, the understandability of the model and the performance of the model are guaranteed, the method slot module is used for providing a data mining method, the descriptive data mining sub-module is used for carrying out descriptive data mining, the anomaly detection data mining sub-module is used for carrying out anomaly detection data mining and predictive data mining, the mode evaluation module is used for verifying, evaluating and feeding back data mining problems, the knowledge representation is stored in the knowledge base module, the manual selection module is used for carrying out personalized selection according to the requirements of clients, the personalized selection mode and method are stored in the knowledge base module through training, the data mining algorithm base of the mining module is imported, and when the mining comprehensive device of the next mining module is used for carrying out mining, the data mining comprehensive device is also used for carrying out the descriptive data mining, the anomaly detection data mining and predictive data mining sub-module is used for carrying out predictive data mining, the mode evaluation and the data mining module is used for carrying out feedback on the data mining problem, the knowledge representation is stored in the knowledge base module is used for carrying out personalized selection according to the requirements of clients, the personalized selection mode and the data mining method is used for learning, the data mining method is used for learning mode is used for being imported into the data mining module, and the data mining mode is imported into the data mining mode, and the mode.

The topic types of the mined topic and task module include, but are not limited to, retention control, risk prediction, rate of return analysis, data trend analysis, employee analysis, regional analysis, classification clustering, and visualization studies.

The data preprocessing module comprises a data source sub-module, a data normalization sub-module, a data integration sub-module, a data conversion sub-module, a data reduction sub-module and a Microsoft data conversion service sub-module which are sequentially carried out, after the data of the data source sub-module enter the data preprocessing module, the data cleaning module cleans the data, the cleaned data enter the data normalization sub-module, and meanwhile, the data cleaning module can clean the data of the data source sub-module, the data normalization sub-module, the data integration sub-module, the data conversion sub-module, the data reduction sub-module and the Microsoft data conversion service sub-module, and the data source sub-module reads the subject data and the task data of the mined subject and task modules.

Key to the data integration sub-module is to acquire data, including but not limited to accessing a data warehouse, methods of accessing data including but not limited to: accessing data through a transaction-based relational database or a PC-based database, accessing data through a data conversion tool, accessing data with a query tool, accessing data from a flat file.

The data reduction sub-module compresses data by aggregation, deleting redundancy characteristics or clustering, etc., including but not limited to data cube aggregation, dimension reduction, data compression, numerical reduction, discretization, and concept layering generation.

The mining module may build a model of data mining, including, but not limited to, ensuring accuracy of the model, understandability of the model, and performance of the model; the accuracy of the model can be checked by time to determine how much accuracy the model can understand, namely whether the model can make us know what effect the input will have on the result, whether the model can make us know why the prediction will succeed or fail, whether the model can make us generate the predicted result for the complex data set, and whether the model can detect the result generated by the model, the performance of the model is specifically what speed the model needs to be constructed and what speed needs to be obtained from the model.

The mode evaluation module comprises, but is not limited to, a verification and evaluation submodule and a data mining problem feedback submodule, wherein the verification method of the verification and evaluation submodule comprises that the same data set as that of the model is used for evaluating the model, so that better results can be obtained when the model is evaluated by different data sets, certain prediction results of the model can be more accurate than other prediction results, and the model is established on the basis of sample data and has good results; the evaluation methods of the verification and evaluation submodules are quite different as different data mining methods are all collected under the data mining algorithm; the data mining uses things from the field of artificial intelligence, the variety of artificial intelligence techniques is various, and the reasons for a plurality of different data mining methods exist; problems with the data mining problem feedback sub-module include, but are not limited to, business user posed problems, technical problems, data mining application problems, problems with implementing data mining project considerations, and privacy related problems with the impact of data mining on society.

Valuable knowledge that cannot be predicted in advance includes, but is not limited to, other candidate results, which may be of interest to other candidate predicted results in addition to what the model would be expected to predict, a selected marginal rate, which is how far the difference between the final predicted result and the other candidate results is, and predictions, which are why the model would be expected to have such predicted results for another thing the prediction process may be expected to know.

Example 1: data mining of data warehouse based on supermarket sales management

1. Demand analysis

The data of the data warehouse is subject-oriented, and the most important of the supermarket data warehouse system is commodity and customer. The most concern of the senior manager of the supermarket is commodity sales, sales and profits, and the purchasing behavior and habit of customers. A snowflake model may be employed, including fact tables and dimension tables. The fact table stores the measurement value of the fact and the code value of each dimension; the dimension table stores description information of the dimension, including the hierarchy of the dimension, the member category, the code value, and the like.

Aiming at mass data information of supermarkets, the system mainly starts with commodity sales, inventory, purchase information and client relationship information: (1) For commodity sales, how to acquire profits to the maximum extent through purchasing, storing and selling of commodities, the purchasing cost and the management cost of the commodities are reduced by strengthening management of each commodity, more customers are attracted, and the most important is commodity sales promotion, and proper sales promotion strategies are required for proper customer groups to increase sales profits. (2) Inventory has a great influence on supermarket profits, and the JIT technology is adopted, so that proper commodities are delivered to proper customers at proper time at proper places, and the inventory is reduced as much as possible under the condition of no out-of-stock so as to reduce the cost. The commodity is often the power of accelerating enterprise's fund flow to be sold again, and what needs analysis is the commodity of selling in commodity purchase, and the commodity of selling of purchasing is as far as possible. (3) The main customer groups are effectively divided to know the conditions of the main customer groups, the main customer groups have requirements on enterprise sales services, the different customer groups bring profits to enterprises, and different marketing strategies are adopted to cope with the different customer groups so as to reasonably guide the consumption of the customer groups.

2. Model construction

The mined theme and task module performs conceptual model design: the organization of data in a data warehouse is subject-oriented, where the subject is first determined, and the subject is a criterion for categorizing the data at a higher level.

The structure of the supermarket data classification model is shown in figure 11.

The present system determines two basic topics: and (5) supplying and selling. The attributes are as follows: goods (goods number, goods name, model, category, supplier number, unit price, supply amount, supply date); customers (customer number, name, gender, group); suppliers (supplier number, address, contact details, importance); supermarkets (chain supermarket numbers, addresses, contact ways); sales serial number, commodity number, customer number, purchase price, sales date, sales volume, sales unit price; supply (supply number, supplier number, date, supply quantity, chain supermarket number).

3. Facts table design

The data preprocessing module is used for storing main contents of topics, including business sales data such as cash registering transactions, commodity transaction transactions and the like, most supermarkets currently install and use a sales terminal system (POS), each POS list is a specific record of a transaction process, all information of one purchase activity of consumers is contained, the data are rich, and a plurality of POS lists are connected to form a plurality of potential information, and the most notable is beer diaper cases in the United states. The system processes the POS transaction list as a main fact list, and comprises sales serial number, commodity number, provider number, commodity unit price, commodity purchase number, commodity profit, purchase quantity, accumulated sales, accumulated profit, transaction time and the like. With the fact table as the center, each dimension is linked to the center fact table in a star-type fashion.

4. Dimension table design

The dimension table is designed to represent that the values of the attributes of the dimension are mainly placed in a separate table, and the dimension of the system design is as follows:

goods (goods number, goods name, category, vendor number, unit price, quantity); customers (customer number, name, gender, group); staff (staff number, staff name, staff level); suppliers (supplier number, supplier name, address, contact details, importance); supermarkets (chain supermarket numbers, addresses, contact ways, manager numbers); promotions (promotion number, promotion name, promotion category, offer category, start date, end date); reverse time dimension (year, month, day); product classification (product classification number, product classification name); sales listings (sales line number, product number, vendor number, customer number, order time, product price, product quantity, discount, sales volume, inventory volume).

The system uses a "star model" to represent a multi-dimensional dataset: the structure of the supermarket data star model is shown in figure 12.

5. The mining module performs data mining

And establishing an OLAP analysis model by online analysis processing, acquiring data on the 0LAP analysis model, performing OLAP analysis operation, and displaying an OLAP analysis result. OLAP has a very powerful function, and can display multidimensional data to a user in an intuitive manner according to any dimension path. We used Microsoft SQL Server 2005 to achieve the following functions by drilling up, down, rotating, slicing, etc:

sales analysis: the manager can inquire and analyze the sales condition of the commodity, analyze data from a multidimensional angle, and visually know the operation condition of the supermarket through sales analysis, so that efficient decision making is performed.

Commodity analysis: and judging the life cycle of the commodity through tracking investigation of the commodity. A number of promotional approaches can be employed for the commodity in the growing period to open the sales channel for the commodity: for developed commodities, proper marketing means can be used for keeping market share and stable growth rate of the commodities in a stable period, so that the consumer's purchase interest should be attracted as much as possible, and the market is developed to prolong the stable period: other alternatives should be found as appropriate for the goods in the decay phase.

Vendor analysis: and comparing and analyzing the sales, the cost and the profit of the same commodity provided by different suppliers, so as to select the best supplier.

The descriptive data mining sub-module mines correlations hidden between data through association analysis. The support, confidence and rule constraints may discover relationships between two or more data items. The support, the confidence and the rule constraint are used as threshold values for mining the association rules, and meaningless association rules can be filtered out. The association analysis may discover some hidden customer purchasing behavior in supermarket applications.

The predictive data mining sub-module can divide all items in a supermarket into four categories: high sales volume high profit commodity, high sales volume low profit commodity, low sales volume high profit commodity and low sales volume low profit commodity.

And analyzing the characteristics of each type of commodity in the sample, and establishing a classification rule so as to classify the new commodity according to the characteristics of the new commodity. In this module, information-based ID3 decision tree classification methods will be used for prediction. And finding a change rule in the historical data by testing, establishing a model, and predicting the change of future data by using the model. In supermarket management, sales conditions of a supermarket for a period of time in the future can be predicted by excavating sales rules, and corresponding marketing measures are formulated. For example, after the historical sales data is subjected to pulmonary analysis, the user can purchase follow-up commodities such as film batteries after purchasing the camera, so that after the camera is sold on a large scale, purchase of the follow-up commodities is considered, and meanwhile, marketing means such as advertisements, exhibitions, uses and the like can be adopted to excite the purchase interests of the customers, so that the increase of sales and the improvement of benefits are realized.

The anomaly detection data mining sub-module performs anomaly detection for finding unusual patterns, i.e., objects in the dataset that are significantly different from other data. The supermarket can use the technology to find excellent and poor sales personnel so as to improve the service quality. Also, good sales and particularly poor sales can be found and corresponding measures taken.

The mode evaluation module is used for verifying, evaluating and feeding back the data mining problem, and storing the knowledge representation to the knowledge base module;

the knowledge representation module may collect and analyze valuable knowledge that cannot be predicted in advance, and other candidate results may be interested in other candidate prediction results in addition to knowing what the model will predict, and the choice of the marginal rate is how great the difference between the final prediction result and the other candidate results is, and the prediction is why the model will obtain such prediction results for another thing the prediction process may want to know.

The manual selection module is used for carrying out personalized selection according to other requirements of supermarket sales, the personalized selection mode and method are learned through the training learning module, the personalized selection mode and method are stored in the knowledge base module and are imported into the data mining algorithm library of the mining module, and the data mining is synchronously carried out through the personalized selection mode and method when the mining synthesizer of the mining module is used for mining next time;

the data cleaning module is used for cleaning the data quality, redundant data, outdated data and the like of the data preprocessing module, the data security module is used for monitoring each module, the safe operation of each module is ensured, and the system is prevented from being attacked by external viruses (such as viruses manufactured by friends).

And the models and the data are described and visualized through a user interface module, and the original data are converted into the number and the relation characteristics of specific data to be displayed.

In conclusion, the data theme is confirmed through matching of the mined theme and task module and the data preprocessing module, and normalization, integration, conversion, reduction and conversion are carried out on the data;

The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions thereof without departing from the spirit of the invention or exceeding the scope of the invention as defined in the accompanying claims.

Claims

1. The data mining system based on the data warehouse comprises a user interface module, a mining theme and task module, a data preprocessing module, a mining module, a method slot module, a mode evaluation module, a manual selection module, a training learning module, a knowledge base module, a data cleaning module and a data security module for monitoring each module.

2. The data warehouse-based data mining system of claim 1, wherein the types of topics of the mined topics and task modules include, but are not limited to, retention control, risk prediction, yield analysis, data trend analysis, employee analysis, regional analysis, classification clustering, and visualization studies.

3. The data mining system according to claim 1, wherein the data preprocessing module includes a data source sub-module, a data normalization sub-module, a data integration sub-module, a data conversion sub-module, a data reduction sub-module and a microsoft data conversion service sub-module, the data of the data source sub-module enters the data preprocessing module, the data cleaning module cleans the data after entering the data normalization sub-module, and the data cleaning module can clean the data of the data source sub-module, the data normalization sub-module, the data integration sub-module, the data conversion sub-module, the data reduction sub-module and the microsoft data conversion service sub-module at the same time, and the data source sub-module reads the subject data and the task data of the mined subject and task modules.

4. A data warehouse-based data mining system as claimed in claim 3, wherein the key of the data integration sub-module is to obtain data, including but not limited to accessing the data warehouse, the method of accessing data including but not limited to: accessing data through a transaction-based relational database or a PC-based database, accessing data through a data conversion tool, accessing data with a query tool, accessing data from a flat file; the data reduction sub-module compresses data by aggregation, deleting redundant features or clustering, etc., including but not limited to data cube aggregation, dimension reduction, data compression, numerical reduction, discretization, and concept layering generation.

5. A data warehouse-based data mining system as claimed in claim 1 or 4, wherein the problems to be solved by the data cleaning module include, but are not limited to, data quality, redundant data, obsolete data, and changes in term definitions; problems that may cause a dataset to include, but are not limited to, consistency problems, cleanup of invalid data, cleanup of printing errors, missing values, and data export.

6. A data warehouse-based data mining system as claimed in claim 1, wherein the mining module may build a data mined model including, but not limited to, ensuring accuracy of the model, model understandability, and performance of the model; the accuracy of the model can be checked by time to determine how much accuracy the model can understand, namely whether the model can make us know what effect the input will have on the result, whether the model can make us know why the prediction will succeed or fail, whether the model can make us generate the predicted result for the complex data set, and whether the model can detect the result generated by the model, the performance of the model is specifically what speed the model needs to be constructed and what speed needs to be obtained from the model.

7. The data warehouse-based data mining system of claim 1, wherein the method slot module includes a descriptive data mining sub-module, an anomaly detection sub-module, and a predictive data mining sub-module, wherein the analysis method of the descriptive data mining sub-module includes, but is not limited to, association analysis, cluster analysis, and sequence analysis, the analysis method of the anomaly detection sub-module includes, but is not limited to, anomaly detection analysis, and the analysis method of the predictive data mining sub-module includes, but is not limited to, evolution analysis, classification analysis, unstructured data analysis, and statistical regression analysis.

8. A data warehouse-based data mining system as claimed in claim 1, wherein the pattern evaluation module includes, but is not limited to, a verification and evaluation sub-module and a data mining problem feedback sub-module, the verification and evaluation sub-module verification method including evaluating a pattern with the same data set as the pattern, to obtain better results than evaluating a pattern with a different data set, some of the predictions of the pattern being more accurate than others and should have good results because the pattern is built on the basis of sample data; the evaluation methods of the verification and evaluation submodules are quite different as different data mining methods are all collected under the data mining algorithm; the data mining uses things from the field of artificial intelligence, the variety of artificial intelligence techniques is various, and the reasons for a plurality of different data mining methods exist; problems with the data mining problem feedback sub-module include, but are not limited to, business user posed problems, technical problems, data mining application problems, problems with implementing data mining project considerations, and privacy related problems with the impact of data mining on society.

9. A data warehouse-based data mining system as claimed in claim 1, wherein the previously unpredictable valuable knowledge includes, but is not limited to, other candidate results, selected marginal rates and predictions, wherein other candidate results may be of interest to other candidate predicted results in addition to knowing what the model will predict, the selected marginal rates being of great interest to the predicted results as to how far the gap between the final predicted result and other candidate results is, the prediction being why the model will have such predicted results for another thing the prediction process may want to know.

10. The data warehouse-based data mining system as claimed in claim 1, wherein the user interface module converts raw data by description and visualization to, but not limited to, the following: rules, tables, charts, images, decision trees, and data cubes for exposing the quantity and relational features of the data.