CN104750813A - Data cleaning method based on data reduction model - Google Patents
Data cleaning method based on data reduction model Download PDFInfo
- Publication number
- CN104750813A CN104750813A CN201510143215.3A CN201510143215A CN104750813A CN 104750813 A CN104750813 A CN 104750813A CN 201510143215 A CN201510143215 A CN 201510143215A CN 104750813 A CN104750813 A CN 104750813A
- Authority
- CN
- China
- Prior art keywords
- data
- cleaning method
- regularization
- method based
- internet
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data cleaning method based on a data reduction model and relates to data processing technologies. The data cleaning method comprises the main steps of collecting data, establishing a data reduction model and cleaning the data, wherein the massive collected data are classified and cleaned through the data reduction module. According to the data cleaning method, massive data of the Internet are classified and cleaned through the data reduction module, the problems of incompleteness, noise, inconsistency and the like of the data are solved, a higher cleaning processing precision is obtained under the condition of obtaining fewer data, the data cleaning efficiency is improved, and the purpose of effectively utilizing the data is achieved.
Description
Technical field
The present invention relates to data processing technique, is exactly a kind of Data Cleaning Method based on data regularization model specifically.
Background technology
Along with Internet era development, enter large data age now, user Internet era the data grows that produces many, each large data company application to data also gets more and more, but also there is a large amount of problems in the practical application of data.At present, the data that internet produces also exist the feature of a large amount of imperfections, noisy property, inconsistency, so just cannot better utilize internet data to carry out the analytical work of being correlated with.
Data mining refers to the process excavating effective knowledge from the mass data leaving database, data warehouse or other information banks in.Data mining extracts implicit, valuable and intelligible information from mass data, to instruct the activity of people.Data mining technology mainly contains correlation rule, classifying rules, cluster analysis and sequence pattern etc.
Data cleansing refers to " the washing off " of " dirty ", refers to find and last one program of discernible mistake in correction of data file, comprises inspection data consistency, process invalid value and missing values etc.Because the data in data warehouse are set of the data towards a certain theme, these data extract and comprise historical data from multiple operation system, are misdata with regard to keeping away the data unavoidably had like this, the data that have have conflict each other, these mistakes or the data that have conflict be obviously undesired, so need these useless data cleansings to fall.
Summary of the invention
For the weak point that prior art exists, the present invention proposes a kind of Data Cleaning Method based on data regularization model.
A kind of Data Cleaning Method based on data regularization model of the present invention, the technical scheme solving the problems of the technologies described above employing is as follows: the key step of this Data Cleaning Method comprises: image data, set up data regularization module and cleaning data, described data regularization module is utilized to classify to the mass data gathered, clean, obtain higher cleaning treatment precision to obtain less data, reach the object of rational and efficient use data.
Preferably, described image data, utilizes vertical search engine technology by internet, from network collection structuring and non-structured network electricity quotient data; Search for all inside and outside data messages relevant with business object, and therefrom select the data being applicable to data mining application.
Preferably, utilize vertical search engine technology to build internet data capture program, by this internet data capture program according to the data on the commodity classification collection internet of setting, after data acquisition, unification is stored in raw data warehouse.
Preferably, describedly data regularization module is set up: by coding, utilize data cube to assemble, data compression, numerical value reduction, discretization technique, set up different data regularization models according to different data acquisitions, utilize data regularization module to process data.
Preferably, utilize above-mentioned program of writing to gather complete data to internet to clean.
Preferably, the complete data of cleaning are carried out technical Analysis.
A kind of beneficial effect compared with prior art had based on the Data Cleaning Method of data regularization model of the present invention is: maintenance data reduction model of the present invention, internet mass data are classified, cleaned, solve the problem such as imperfection, noisy property, inconsistency of data, obtain less data, improve data cleansing efficiency, obtain higher cleaning treatment precision, to reach the object effectively utilizing data.
Accompanying drawing explanation
Accompanying drawing 1 is the structural representation of the described Data Cleaning Method based on data regularization model.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with specific embodiment, and with reference to accompanying drawing, a kind of Data Cleaning Method based on data regularization model of the present invention is further described.
A kind of Data Cleaning Method based on data regularization model of the present invention, maintenance data reduction model, classifies to mass data, cleans to reach and effectively utilize data; Its key step comprises: image data, set up data regularization module and cleaning data, utilize described data regularization module to clean image data, to obtain less data, improve cleaning efficiency, obtain higher cleaning treatment precision.
Embodiment:
A kind of Data Cleaning Method based on data regularization model described in the present embodiment, carries out image data, utilizes vertical search engine technology by internet, from network collection structuring and non-structured network electricity quotient data; Search for all inside and outside data messages relevant with business object, and therefrom select the data being applicable to data mining application.
The described step setting up data regularization module: by coding, utilizes data cube to assemble, data compression, numerical value reduction, the technology such as discretize, set up different data regularization models according to different data acquisitions, utilize data regularization module to process data.
The concrete steps of described cleaning data, as shown in Figure 1:
The first step, utilizes vertical search engine technology to build internet data capture program, and this internet data capture program can gather the data on internet according to the commodity classification of setting, after data acquisition, unification is stored in raw data warehouse; This raw data warehouse is the basic of whole system;
Second step, coding, utilizes data cube to assemble, data compression, numerical value reduction, the technology such as discretize, sets up different data regularization models according to different data acquisitions, utilizes data regularization model to process data;
3rd step, utilizes the program of writing in second step to gather complete data to internet and cleans, to reduce the characteristics such as the inconsistency of data, noisy property, inconsistency;
The complete data of cleaning are carried out the technical Analysis of science by the 4th step.
Above-mentioned embodiment is only concrete case of the present invention; scope of patent protection of the present invention includes but not limited to above-mentioned embodiment; any claims according to the invention and any person of an ordinary skill in the technical field to its suitable change done or replacement, all should fall into scope of patent protection of the present invention.
Claims (6)
1. based on a Data Cleaning Method for data regularization model, it is characterized in that, its key step comprises: image data, set up data regularization module and cleaning data, utilize described data regularization module to gather mass data classify, clean.
2. a kind of Data Cleaning Method based on data regularization model according to claim 1, is characterized in that described image data utilizes vertical search engine technology by internet, from network collection structuring and non-structured network electricity quotient data; Search for all inside and outside data messages relevant with business object, and therefrom select the data being applicable to data mining application.
3. a kind of Data Cleaning Method based on data regularization model according to claim 2, it is characterized in that, vertical search engine technology is utilized to build internet data capture program, by internet data capture program according to the data on the commodity classification collection internet of setting, after data acquisition, unification is stored in raw data warehouse.
4. according to the arbitrary described a kind of Data Cleaning Method based on data regularization model of claims 1 to 3, it is characterized in that, describedly set up data regularization module: pass through coding, data cube is utilized to assemble, data compression, numerical value reduction, discretization technique, set up different data regularization models according to different data acquisitions, utilize data regularization module to process data.
5. a kind of Data Cleaning Method based on data regularization model according to claim 4, is characterized in that, utilize above-mentioned program of writing to gather complete data to internet and clean.
6. a kind of Data Cleaning Method based on data regularization model according to claim 5, is characterized in that, the complete data of cleaning are carried out technical Analysis.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510143215.3A CN104750813A (en) | 2015-03-30 | 2015-03-30 | Data cleaning method based on data reduction model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510143215.3A CN104750813A (en) | 2015-03-30 | 2015-03-30 | Data cleaning method based on data reduction model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104750813A true CN104750813A (en) | 2015-07-01 |
Family
ID=53590497
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510143215.3A Pending CN104750813A (en) | 2015-03-30 | 2015-03-30 | Data cleaning method based on data reduction model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104750813A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228270A (en) * | 2016-07-27 | 2016-12-14 | 广东工业大学 | The energy consumption Forecasting Methodology of the extrusion equipment of a kind of big data-driven and system thereof |
CN106250556A (en) * | 2016-08-17 | 2016-12-21 | 贵州数据宝网络科技有限公司 | Data digging method for big data analysis |
CN106354772A (en) * | 2016-08-23 | 2017-01-25 | 成都卡莱博尔信息技术股份有限公司 | Mass data system with data cleaning function |
CN106484887A (en) * | 2016-10-18 | 2017-03-08 | 安徽天达网络科技有限公司 | A kind of document handling method based on internet |
CN106503114A (en) * | 2016-10-18 | 2017-03-15 | 安徽天达网络科技有限公司 | Commodity resource data obtains system |
CN106649523A (en) * | 2016-10-18 | 2017-05-10 | 安徽天达网络科技有限公司 | Commodity resource data processing method |
CN106649516A (en) * | 2016-10-18 | 2017-05-10 | 安徽天达网络科技有限公司 | A large data processing method for educational resources |
CN108335231A (en) * | 2018-01-29 | 2018-07-27 | 国网福建省电力有限公司 | A kind of power distribution network data diagnosis method of Auto-matching |
CN109143017A (en) * | 2018-07-31 | 2019-01-04 | 成都天衡智造科技有限公司 | A kind of semicon industry production test data processing method |
CN109947751A (en) * | 2018-12-29 | 2019-06-28 | 医渡云(北京)技术有限公司 | A kind of medical data processing method, device, readable medium and electronic equipment |
CN112019869A (en) * | 2020-08-21 | 2020-12-01 | 广州欢网科技有限责任公司 | Live broadcast data processing method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090327330A1 (en) * | 2008-06-27 | 2009-12-31 | Business Objects, S.A. | Apparatus and method for dynamically materializing a multi-dimensional data stream cube |
CN101901260A (en) * | 2010-07-20 | 2010-12-01 | 北京酷讯科技有限公司 | Method and system for displaying vertical search result in real time |
CN102088459A (en) * | 2010-12-29 | 2011-06-08 | 广东楚天龙智能卡有限公司 | Large-centralized data exchanging and integration platform based on trusted exchange |
CN102915423A (en) * | 2012-09-11 | 2013-02-06 | 中国电力科学研究院 | System and method for filtering electric power business data on basis of rough sets and gene expressions |
-
2015
- 2015-03-30 CN CN201510143215.3A patent/CN104750813A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090327330A1 (en) * | 2008-06-27 | 2009-12-31 | Business Objects, S.A. | Apparatus and method for dynamically materializing a multi-dimensional data stream cube |
CN101901260A (en) * | 2010-07-20 | 2010-12-01 | 北京酷讯科技有限公司 | Method and system for displaying vertical search result in real time |
CN102088459A (en) * | 2010-12-29 | 2011-06-08 | 广东楚天龙智能卡有限公司 | Large-centralized data exchanging and integration platform based on trusted exchange |
CN102915423A (en) * | 2012-09-11 | 2013-02-06 | 中国电力科学研究院 | System and method for filtering electric power business data on basis of rough sets and gene expressions |
Non-Patent Citations (1)
Title |
---|
方洪鹰: ""数据挖掘中数据预处理的方法研究"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228270A (en) * | 2016-07-27 | 2016-12-14 | 广东工业大学 | The energy consumption Forecasting Methodology of the extrusion equipment of a kind of big data-driven and system thereof |
CN106228270B (en) * | 2016-07-27 | 2020-11-10 | 广东工业大学 | Energy consumption prediction method and system for big data driven extrusion equipment |
CN106250556B (en) * | 2016-08-17 | 2019-06-18 | 贵州数据宝网络科技有限公司 | Data digging method for big data analysis |
CN106250556A (en) * | 2016-08-17 | 2016-12-21 | 贵州数据宝网络科技有限公司 | Data digging method for big data analysis |
CN106354772A (en) * | 2016-08-23 | 2017-01-25 | 成都卡莱博尔信息技术股份有限公司 | Mass data system with data cleaning function |
CN106484887A (en) * | 2016-10-18 | 2017-03-08 | 安徽天达网络科技有限公司 | A kind of document handling method based on internet |
CN106649516A (en) * | 2016-10-18 | 2017-05-10 | 安徽天达网络科技有限公司 | A large data processing method for educational resources |
CN106649523A (en) * | 2016-10-18 | 2017-05-10 | 安徽天达网络科技有限公司 | Commodity resource data processing method |
CN106503114A (en) * | 2016-10-18 | 2017-03-15 | 安徽天达网络科技有限公司 | Commodity resource data obtains system |
CN108335231A (en) * | 2018-01-29 | 2018-07-27 | 国网福建省电力有限公司 | A kind of power distribution network data diagnosis method of Auto-matching |
CN109143017A (en) * | 2018-07-31 | 2019-01-04 | 成都天衡智造科技有限公司 | A kind of semicon industry production test data processing method |
CN109143017B (en) * | 2018-07-31 | 2021-03-30 | 成都天衡智造科技有限公司 | Production test data processing method for semiconductor industry |
CN109947751A (en) * | 2018-12-29 | 2019-06-28 | 医渡云(北京)技术有限公司 | A kind of medical data processing method, device, readable medium and electronic equipment |
CN112019869A (en) * | 2020-08-21 | 2020-12-01 | 广州欢网科技有限责任公司 | Live broadcast data processing method and device |
CN112019869B (en) * | 2020-08-21 | 2022-04-22 | 广州欢网科技有限责任公司 | Live broadcast data processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104750813A (en) | Data cleaning method based on data reduction model | |
CN109465676B (en) | Tool life prediction method | |
CN105956015A (en) | Service platform integration method based on big data | |
WO2016101628A1 (en) | Data processing method and device in data modeling | |
CN101806583B (en) | Microscopic image-based fiber fineness measurement method | |
CN106104496A (en) | The abnormality detection not being subjected to supervision for arbitrary sequence | |
CN106709035A (en) | Preprocessing system for electric power multi-dimensional panoramic data | |
CN102915432A (en) | Method and device for extracting vehicle-bone microcomputer image video data | |
CN103455636A (en) | Automatic capturing and intelligent analyzing method based on Internet tax data | |
CN106294390A (en) | A kind of data mining analysis method and system | |
CN110263230A (en) | A kind of data cleaning method and device based on Density Clustering | |
CN112462696A (en) | Intelligent manufacturing workshop digital twin model construction method and system | |
CN106709622A (en) | Database analysis device and database analysis method | |
CN103752533A (en) | Method and device for sorting coal and gangue online through image method | |
CN114596061B (en) | Project data management method and system based on big data | |
CN107784652A (en) | A kind of shaft tower quick determination method based on unmanned plane image | |
CN104516962A (en) | Monitoring method and system for microblogging public opinion | |
CN104850549A (en) | Method for monitoring public opinions on Internet | |
CN111709775A (en) | House property price evaluation method and device, electronic equipment and storage medium | |
CN115249331A (en) | Mine ecological safety identification method based on convolutional neural network model | |
CN202815869U (en) | Vehicle microcomputer image and video data extraction apparatus | |
CN104750812A (en) | Automatic data collecting method based on webpage label analysis | |
Gill et al. | Garbage Classification Utilizing Effective Convolutional Neural Network | |
CN106933990A (en) | A kind of sensing data cleaning method | |
Jiashun | A new trajectory clustering algorithm based on TRACLUS |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20150701 |
|
WD01 | Invention patent application deemed withdrawn after publication |