CN106250556A - Data digging method for big data analysis - Google Patents

Data digging method for big data analysis Download PDF

Info

Publication number
CN106250556A
CN106250556A CN201610675596.4A CN201610675596A CN106250556A CN 106250556 A CN106250556 A CN 106250556A CN 201610675596 A CN201610675596 A CN 201610675596A CN 106250556 A CN106250556 A CN 106250556A
Authority
CN
China
Prior art keywords
data
processing terminal
network
decoding
cleansing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610675596.4A
Other languages
Chinese (zh)
Other versions
CN106250556B (en
Inventor
汤寒林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Data Network Technology Co Ltd
Original Assignee
Guizhou Data Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Data Network Technology Co Ltd filed Critical Guizhou Data Network Technology Co Ltd
Priority to CN201610675596.4A priority Critical patent/CN106250556B/en
Publication of CN106250556A publication Critical patent/CN106250556A/en
Application granted granted Critical
Publication of CN106250556B publication Critical patent/CN106250556B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer And Data Communications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of data digging method for big data analysis, first build the abnormality detection for big data cleansing and the platform of elimination, it includes data source;Described data source is connected with processing terminal by network, includes data acquisition module and data cleaning module in described processing terminal;Then, after data acquisition module sends the request of request data to data source, described data source just transmits data to processing terminal, and processing terminal subsequent start-up data cleansing module carries out data cleansing to described data.Achieve in conjunction with remaining method and data are extracted the purpose of valid data by filtering " denoising ".

Description

Data digging method for big data analysis
Technical field
The present invention relates to a kind of data cleansing technical field, especially relate to a kind of data mining for big data analysis Method.
Background technology
Being proposed the precious big data trade platform of a kind of data currently on the market, the precious big data trade platform of data is class electricity business The big data trade platform of pattern, is traded based on commodity such as big data DAAS, PAAS and SAAS, and operation mode is on one's own account + businessman enters.
Wherein DAAS data, services is provided with 6 nearly 200 achievement datas of theme 19 class industry;PAAS application is government's machine Structure, enterprise and institution, individual team provide the hosts applications and secondary development customized, and reduce client's construction cost;SAAS product is For many years in technical capability, industry field, solution accumulation, respectively from its vertical industry extractive technique framework associated, industry General character and the universal product that formed, platform for product trading provide feasible, convenient, safely, save worry, efficient business model.
The precious big data trade platform of data is devoted to build the big data industry electricity business's transaction platform for domestically leading.
Data treasured is more than a website, and is electricity business's platform.
Can access the big data trade of other businessmans, businessman can include database service interface, data with product sold Application and big data product.For client: platform rice steamer selects high-quality producer and service, to third-party product quality side entirely Position is responsible for.For businessman: platform provides unique modality for co-operation, the marketing channel of full media.
And data source is the basis of big data analysis and data trade, the data source that platform comprises has: accumulate for many years All kinds of shared data that all kinds of every profession and trade data and each affiliate (such as Baidu, aggregated data, institute of China etc.) reach, both at home and abroad Government and the various open data of mechanism and various places government and the data of operators in co-operation.
But the information come by data collecting module collected from data source, for big data, and is not all valuable, Some data is not our content of interest, and other data are then full of prunes distracters, therefore will be to data Valid data are extracted by filtering " denoising ".
Summary of the invention
The technical problem to be solved is to provide a kind of data digging method for big data analysis, it is achieved Data are extracted the purpose of valid data by filtering " denoising ".
For solving above-mentioned technical problem, the technical solution of the present invention is:
A kind of data digging method for big data analysis, specific as follows:
First building the abnormality detection for big data cleansing and the platform of elimination, it includes data source;
Described data source is connected with processing terminal by network, described processing terminal includes data acquisition module and Data cleansing module;
Then, after data acquisition module sends the request of request data to data source, described data source just transmits data to Processing terminal, and processing terminal subsequent start-up data cleansing module carries out data cleansing to described data.
So by data cleansing mainly complete the discrimination to data accepted, extract, the operation such as cleaning.1) extraction: because of The data obtained are likely to be of various structures and type, and data extraction process can help us the data of these complexity to be converted For single or be easy to the configuration processed, to reach the purpose quickly analyzed and processed.2) clean: for big data, the most entirely Being valuable, some data is not content of interest, and other data are then full of prunes distracters, therefore wants Data are extracted valid data by filtering " denoising ".
Detailed description of the invention
In order to make the purpose of the present invention, technical scheme and advantage clearer, below in conjunction with embodiment, to the present invention It is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not used to Limit the present invention.
For the data digging method of big data analysis, specific as follows:
First building the abnormality detection for big data cleansing and the platform of elimination, it includes data source;
Described data source is connected with processing terminal by network, described processing terminal includes data acquisition module and Data cleansing module;
Then, after data acquisition module sends the request of request data to data source, described data source just transmits data to Processing terminal, and processing terminal subsequent start-up data cleansing module carries out data cleansing to described data.
The mode of described data cleansing be use statistical method to detect the Numeric Attributes of described data, calculated field value Average and standard deviation, utilize the confidence interval of each field to identify exception field and record, by data digging method introduce Data scrubbing, specifically include poly-use class method for detecting exception record, model method finds not meet the different of present mode Often record or association rules method find not meet in data set and have high confidence level and the abnormal data of support rule;
The most also it is carried out repeating record.The approximately duplicate record problem eliminated in data set is current data cleansing The content of most study in field.Record is repeated, it is simply that how to judge whether two records approximate in order to eliminate from data set Repeat.
In data warehouse applications, data cleansing is it is first necessary to consider data integration, mainly by the structure in data source It is mapped in object construction and territory with data.Many researcheres have carried out substantial amounts of research work in this respect.
Many data cleansing schemes and algorithm are both for application-specific problem, are only applicable to less scope.General The algorithm unrelated with application and scheme less.
Data cleansing mainly complete the discrimination to data accepted, extract, the operation such as cleaning.1) extraction: because of the number obtained According to being likely to be of various structures and type, data extraction process can help us to be converted into single by the data of these complexity Or it is easy to the configuration processed, to reach the purpose quickly analyzed and processed.2) clean: for big data, and be not all valuable , some data is not content of interest, and other data are then full of prunes distracters, therefore to lead to data Filter " denoising " thus extract valid data.
It is that data are sent to processing terminal by network that described data source just transmits data to the mode of processing terminal In, and data are sent in processing terminal period by network, data are often wanted before being sent to processing terminal by network Encoded process, but the coding that obtains of existing coding processing mode in transmittance process, be susceptible to outside intercepting and capturing and Decode easily, so that making the content of data of transmission compromised thus incurring loss.
The following method of the present invention seeks to solve to be easily subject to outside intercepting and capturing during transmission after the encoded process of data And decode easily so that making the content of data of transmission compromised thus the problem that incurs loss.
It is that data are sent to processing terminal by network that described data source just transmits data to the mode of processing terminal In, and the method that described data are sent to processing terminal by network, specifically include following steps:
Step 1-1: processing terminal receives the instruction that the requirement of network establishes the link;
Step 1-2: after link is set up, obtains the confirmation instruction encoded first from described network, by described confirmation Instruction performs decoding, obtains the confirmation instruction after decoding;
Step 1-3: determine that confirming after described decoding instructs the requirement whether meeting the first setting, if so, perform step Rapid 1-4;
Step 1-4: obtain the label corresponding with described confirmation instruction that described network gives;
Step 1-5: determine that confirming after described label and described decoding instructs the requirement whether meeting the second setting, if It is to perform step 1-6;
Step 1-6: receive by the data of after described network coding, the data decoding of described coding is obtained corresponding translating Data after Ma.
Described method includes: processing terminal receives the instruction that establishes the link of requirement of network, after link is set up, receive by By described, the confirmation instruction that described network encodes first, confirms that instruction performs decoding, obtain the confirmation instruction after decoding, determine institute State and confirm whether instruction meets the requirement of the first setting after decoding, if so, obtain described network that give with described confirmation The label that instruction is corresponding, determines that confirming after described label and described decoding instructs the requirement whether meeting the second setting, if It is to receive by the data of after described network coding, the data decoding after described coding is obtained the data after corresponding decoding, Achieve the safety transmission of all data.
With the above-mentioned desirable embodiment according to the present invention for enlightenment, by above-mentioned description, relevant staff is complete Entirely can carry out various change and amendment in the range of without departing from this invention technological thought.The technology of this invention The content that property scope is not limited in description, it is necessary to determine its technical scope according to right.

Claims (3)

1. the data digging method for big data analysis, it is characterised in that specific as follows:
First building the abnormality detection for big data cleansing and the platform of elimination, it includes data source;
Described data source is connected with processing terminal by network, includes data acquisition module and data in described processing terminal Cleaning module;
Then, after data acquisition module sends the request of request data to data source, described data source just transmits data to process Terminal, and processing terminal subsequent start-up data cleansing module carries out data cleansing to described data.
Data digging method for big data analysis the most according to claim 1, it is characterised in that divide for big data The data digging method of analysis, it is characterised in that the mode of described data cleansing is to use statistical method to detect the number of described data Value type attribute, the average of calculated field value and standard deviation, utilize the confidence interval of each field to identify exception field and record, Data digging method is introduced data scrubbing, specifically includes poly-employing class method for detecting exception record, model method discovery The exception record or the association rules method that do not meet present mode find not meet in data set have high confidence level and support Metric abnormal data then;
The most also it is carried out repeating record.
Data digging method for big data analysis the most according to claim 1, it is characterised in that
It is that data are sent in processing terminal by network that described data source just transmits data to the mode of processing terminal, and The method that described data are sent to processing terminal by network, specifically includes following steps:
Step 1-1: processing terminal receives the instruction that the requirement of network establishes the link;
Step 1-2: after link is set up, obtains the confirmation instruction encoded first from described network, confirms instruction by described Perform decoding, obtain the confirmation instruction after decoding;
Step 1-3: determine that confirming after described decoding instructs the requirement whether meeting the first setting, if so, perform step 1- 4;
Step 1-4: obtain the label corresponding with described confirmation instruction that described network gives;
Step 1-5: determine that confirming after described label and described decoding instructs the requirement whether meeting the second setting, if so, Perform step 1-6;
Step 1-6: receive by the data of after described network coding, after the data decoding of described coding is obtained corresponding decoding Data.
CN201610675596.4A 2016-08-17 2016-08-17 Data digging method for big data analysis Active CN106250556B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610675596.4A CN106250556B (en) 2016-08-17 2016-08-17 Data digging method for big data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610675596.4A CN106250556B (en) 2016-08-17 2016-08-17 Data digging method for big data analysis

Publications (2)

Publication Number Publication Date
CN106250556A true CN106250556A (en) 2016-12-21
CN106250556B CN106250556B (en) 2019-06-18

Family

ID=57593128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610675596.4A Active CN106250556B (en) 2016-08-17 2016-08-17 Data digging method for big data analysis

Country Status (1)

Country Link
CN (1) CN106250556B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679087A (en) * 2017-09-04 2018-02-09 浙江聚邦科技有限公司 A kind of growth information gathering mobile terminal microfluidic platform towards medium-sized and small enterprises
CN107908744A (en) * 2017-11-16 2018-04-13 河南中医药大学 A kind of method of abnormality detection and elimination for big data cleaning
CN110008208A (en) * 2019-04-04 2019-07-12 北京易华录信息技术股份有限公司 A kind of data administering method and system
CN110850297A (en) * 2019-09-23 2020-02-28 广东毓秀科技有限公司 Method for predicting SOH of rail-traffic lithium battery through big data
CN111858570A (en) * 2020-07-06 2020-10-30 中国科学院上海有机化学研究所 CCS data standardization method, database construction method and database system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1874336A (en) * 2005-12-31 2006-12-06 华为技术有限公司 Method and device for treating data stream
CN104092663A (en) * 2013-07-24 2014-10-08 牟大同 Encryption communication method and encryption communication system
CN104111996A (en) * 2014-07-07 2014-10-22 山大地纬软件股份有限公司 Health insurance outpatient clinic big data extraction system and method based on hadoop platform
CN104750813A (en) * 2015-03-30 2015-07-01 浪潮集团有限公司 Data cleaning method based on data reduction model
CN105354198A (en) * 2014-08-19 2016-02-24 中国移动通信集团湖北有限公司 Data processing method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1874336A (en) * 2005-12-31 2006-12-06 华为技术有限公司 Method and device for treating data stream
CN104092663A (en) * 2013-07-24 2014-10-08 牟大同 Encryption communication method and encryption communication system
CN104111996A (en) * 2014-07-07 2014-10-22 山大地纬软件股份有限公司 Health insurance outpatient clinic big data extraction system and method based on hadoop platform
CN105354198A (en) * 2014-08-19 2016-02-24 中国移动通信集团湖北有限公司 Data processing method and apparatus
CN104750813A (en) * 2015-03-30 2015-07-01 浪潮集团有限公司 Data cleaning method based on data reduction model

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679087A (en) * 2017-09-04 2018-02-09 浙江聚邦科技有限公司 A kind of growth information gathering mobile terminal microfluidic platform towards medium-sized and small enterprises
CN107908744A (en) * 2017-11-16 2018-04-13 河南中医药大学 A kind of method of abnormality detection and elimination for big data cleaning
CN107908744B (en) * 2017-11-16 2021-05-18 河南中医药大学 Anomaly detection and elimination method for big data cleaning
CN110008208A (en) * 2019-04-04 2019-07-12 北京易华录信息技术股份有限公司 A kind of data administering method and system
CN110850297A (en) * 2019-09-23 2020-02-28 广东毓秀科技有限公司 Method for predicting SOH of rail-traffic lithium battery through big data
CN111858570A (en) * 2020-07-06 2020-10-30 中国科学院上海有机化学研究所 CCS data standardization method, database construction method and database system

Also Published As

Publication number Publication date
CN106250556B (en) 2019-06-18

Similar Documents

Publication Publication Date Title
CN106250556A (en) Data digging method for big data analysis
CN109165234B (en) Robot abnormity analysis method and device
CN105824837A (en) Log treatment method and device
CN109242460B (en) Payment system based on multiple payment channels and account checking method thereof
CN101651561B (en) Network topology analytical method and system based on rule engine
US20170278102A1 (en) Immunisation method for user behaviour model detection in electronic transaction process
CN104732425A (en) E-commerce platform customer behavior analytical method based on big data
CN106125680A (en) Industrial stokehold data safety processing method based on industry internet and device
CN106570119A (en) Device for quickly obtaining product information and method for obtaining product information
CN104883269A (en) Method and apparatus of treating AC logs
CN106792876A (en) End to end network perception evaluating method and system
CN110113421A (en) A kind of big data information processing system based on Internet of Things
CN103810085A (en) Method and device for performing module testing through data comparison
CN107341591B (en) Intelligent statistical analysis system and method for substation warning information
CN107391695A (en) A kind of information extracting method based on big data
CN205540715U (en) Protocol conversion system based on FIX
Kim et al. COVID-19 variant surveillance in the Republic of Korea
BR112014010487B1 (en) METHOD FOR NOTIFYING INFORMATION FROM A VIRTUAL MICRODIARIES CUSTOMER, DEVICE FOR NOTIFYING INFORMATION FROM A VIRTUAL MICRODIARIES CUSTOMER, AND SERVER
CN211481289U (en) Detection authentication information processing system
CN111865689B (en) Alarm voltage drop method based on index set tree
CN111708791B (en) Automatic data updating system based on block chain
EP4331488A1 (en) Method and system for generating 2d representation of electrocardiogram (ecg) signals
CN105447050A (en) Customer classification processing method and device
CN202383677U (en) Intelligent interaction platform
CN114218208A (en) Network data acquisition, storage and processing method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Data mining methods for big data analysis

Granted publication date: 20190618

Pledgee: Industrial Bank Co.,Ltd. Shanghai People's Square Branch

Pledgor: GUIZHOU CHINADATAPAY NETWORK TECHNOLOGY CO.,LTD.

Registration number: Y2024310000370