CN110096498A - 一种数据清洗方法及装置 - Google Patents
一种数据清洗方法及装置 Download PDFInfo
- Publication number
- CN110096498A CN110096498A CN201910242397.8A CN201910242397A CN110096498A CN 110096498 A CN110096498 A CN 110096498A CN 201910242397 A CN201910242397 A CN 201910242397A CN 110096498 A CN110096498 A CN 110096498A
- Authority
- CN
- China
- Prior art keywords
- data
- cleaning
- characterization factor
- rule
- data cleansing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004140 cleaning Methods 0.000 title claims abstract description 287
- 238000000034 method Methods 0.000 title claims abstract description 92
- 238000012512 characterization method Methods 0.000 claims abstract description 245
- 230000015654 memory Effects 0.000 claims description 38
- 238000012545 processing Methods 0.000 claims description 25
- 238000003860 storage Methods 0.000 claims description 24
- 230000008569 process Effects 0.000 abstract description 24
- 230000006870 function Effects 0.000 description 18
- 238000010586 diagram Methods 0.000 description 16
- 230000006872 improvement Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 5
- 238000005406 washing Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 239000000047 product Substances 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000000750 progressive effect Effects 0.000 description 3
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 2
- 241001269238 Data Species 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000013523 data management Methods 0.000 description 2
- 238000000151 deposition Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 229910021389 graphene Inorganic materials 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011017 operating method Methods 0.000 description 2
- 241001146702 Candidatus Entotheonella factor Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 229920001296 polysiloxane Polymers 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
- 239000010979 ruby Substances 0.000 description 1
- 229910001750 ruby Inorganic materials 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims (16)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910242397.8A CN110096498A (zh) | 2019-03-28 | 2019-03-28 | 一种数据清洗方法及装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910242397.8A CN110096498A (zh) | 2019-03-28 | 2019-03-28 | 一种数据清洗方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110096498A true CN110096498A (zh) | 2019-08-06 |
Family
ID=67444059
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910242397.8A Pending CN110096498A (zh) | 2019-03-28 | 2019-03-28 | 一种数据清洗方法及装置 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110096498A (zh) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110597814A (zh) * | 2019-09-16 | 2019-12-20 | 腾讯科技(深圳)有限公司 | 结构化数据的序列化、反序列化方法以及装置 |
CN110727668A (zh) * | 2019-09-30 | 2020-01-24 | 北京百度网讯科技有限公司 | 数据清洗方法及装置 |
CN112217667A (zh) * | 2020-09-29 | 2021-01-12 | 苏州迈科网络安全技术股份有限公司 | 终端型号特征数据清洗系统及清洗方法 |
CN112579586A (zh) * | 2020-12-23 | 2021-03-30 | 平安普惠企业管理有限公司 | 数据处理方法、装置、设备及存储介质 |
CN112711578A (zh) * | 2020-12-30 | 2021-04-27 | 陈静 | 用于云计算业务的大数据去噪方法及云计算金融服务器 |
CN112860676A (zh) * | 2021-02-06 | 2021-05-28 | 高云 | 应用于大数据挖掘和业务分析的数据清洗方法及云服务器 |
CN113094031A (zh) * | 2021-03-16 | 2021-07-09 | 上海晓途网络科技有限公司 | 因子生成方法、装置、计算机设备和存储介质 |
CN113297479A (zh) * | 2021-04-29 | 2021-08-24 | 上海淇玥信息技术有限公司 | 一种用户画像生成方法、装置及电子设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101872449A (zh) * | 2010-06-25 | 2010-10-27 | 南京联创科技集团股份有限公司 | 一种客户信息筛选方法 |
US20140279972A1 (en) * | 2013-03-15 | 2014-09-18 | Teradata Us, Inc. | Cleansing and standardizing data |
CN106570005A (zh) * | 2015-10-08 | 2017-04-19 | 阿里巴巴集团控股有限公司 | 清理数据库的方法和装置 |
CN107239581A (zh) * | 2017-07-07 | 2017-10-10 | 小草数语(北京)科技有限公司 | 数据清洗方法及装置 |
CN108959620A (zh) * | 2018-07-18 | 2018-12-07 | 上海汉得信息技术股份有限公司 | 一种数据清洗方法及设备 |
-
2019
- 2019-03-28 CN CN201910242397.8A patent/CN110096498A/zh active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101872449A (zh) * | 2010-06-25 | 2010-10-27 | 南京联创科技集团股份有限公司 | 一种客户信息筛选方法 |
US20140279972A1 (en) * | 2013-03-15 | 2014-09-18 | Teradata Us, Inc. | Cleansing and standardizing data |
CN106570005A (zh) * | 2015-10-08 | 2017-04-19 | 阿里巴巴集团控股有限公司 | 清理数据库的方法和装置 |
CN107239581A (zh) * | 2017-07-07 | 2017-10-10 | 小草数语(北京)科技有限公司 | 数据清洗方法及装置 |
CN108959620A (zh) * | 2018-07-18 | 2018-12-07 | 上海汉得信息技术股份有限公司 | 一种数据清洗方法及设备 |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110597814B (zh) * | 2019-09-16 | 2021-12-28 | 腾讯科技(深圳)有限公司 | 结构化数据的序列化、反序列化方法以及装置 |
CN110597814A (zh) * | 2019-09-16 | 2019-12-20 | 腾讯科技(深圳)有限公司 | 结构化数据的序列化、反序列化方法以及装置 |
CN110727668A (zh) * | 2019-09-30 | 2020-01-24 | 北京百度网讯科技有限公司 | 数据清洗方法及装置 |
CN110727668B (zh) * | 2019-09-30 | 2022-03-01 | 北京百度网讯科技有限公司 | 数据清洗方法及装置 |
CN112217667A (zh) * | 2020-09-29 | 2021-01-12 | 苏州迈科网络安全技术股份有限公司 | 终端型号特征数据清洗系统及清洗方法 |
CN112579586A (zh) * | 2020-12-23 | 2021-03-30 | 平安普惠企业管理有限公司 | 数据处理方法、装置、设备及存储介质 |
CN112711578B (zh) * | 2020-12-30 | 2021-09-21 | 深圳市全景网络有限公司 | 用于云计算业务的大数据去噪方法及云计算金融服务器 |
CN112711578A (zh) * | 2020-12-30 | 2021-04-27 | 陈静 | 用于云计算业务的大数据去噪方法及云计算金融服务器 |
CN112860676A (zh) * | 2021-02-06 | 2021-05-28 | 高云 | 应用于大数据挖掘和业务分析的数据清洗方法及云服务器 |
CN113094031A (zh) * | 2021-03-16 | 2021-07-09 | 上海晓途网络科技有限公司 | 因子生成方法、装置、计算机设备和存储介质 |
CN113094031B (zh) * | 2021-03-16 | 2024-02-20 | 上海晓途网络科技有限公司 | 因子生成方法、装置、计算机设备和存储介质 |
CN113297479A (zh) * | 2021-04-29 | 2021-08-24 | 上海淇玥信息技术有限公司 | 一种用户画像生成方法、装置及电子设备 |
CN113297479B (zh) * | 2021-04-29 | 2024-08-20 | 上海淇玥信息技术有限公司 | 一种用户画像生成方法、装置及电子设备 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110096498A (zh) | 一种数据清洗方法及装置 | |
WO2019192261A1 (zh) | 一种支付方式推荐方法、装置及设备 | |
CN107894953A (zh) | 一种银行应用测试数据的生成方法及装置 | |
CN108667867A (zh) | 数据存储方法及装置 | |
CN108984658A (zh) | 一种智能问答数据处理方法及装置 | |
CN109933834A (zh) | 一种时序数据预测的模型创建方法及装置 | |
CN109472609A (zh) | 一种风控原因确定方法及装置 | |
CN109615081A (zh) | 一种模型预测系统及方法 | |
CN108683692A (zh) | 一种业务请求处理方法及装置 | |
CN110134668A (zh) | 应用于区块链的数据迁移方法、装置和设备 | |
CN109583890A (zh) | 异常交易对象的识别方法、装置及设备 | |
CN109816057A (zh) | 图书馆图书借阅管理方法、系统、电子设备及存储介质 | |
CN110457288A (zh) | 数据模型构建方法、装置、设备及计算机可读存储介质 | |
CN109614414A (zh) | 一种用户信息的确定方法及装置 | |
CN110032358A (zh) | 一种应用程序生成方法、装置、设备及系统 | |
CN108900619A (zh) | 一种独立访客统计方法及装置 | |
CN110020427A (zh) | 策略确定方法和装置 | |
CN110033304A (zh) | 一种信息处理方法、装置及设备 | |
CN109344173B (zh) | 数据管理方法和装置、数据结构 | |
CN110264232A (zh) | 一种券延迟核销的数据处理方法及装置 | |
CN110262998A (zh) | 一种对账数据处理方法及装置 | |
CN109003090A (zh) | 风险控制方法和装置 | |
CN110264213A (zh) | 一种信息的处理方法、装置及设备 | |
CN101650723A (zh) | 计费帐务引擎中资费模板树设置方法 | |
CN109583473A (zh) | 一种特征数据的生成方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200924 Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands Applicant after: Innovative advanced technology Co.,Ltd. Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands Applicant before: Advanced innovation technology Co.,Ltd. Effective date of registration: 20200924 Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands Applicant after: Advanced innovation technology Co.,Ltd. Address before: Greater Cayman, British Cayman Islands Applicant before: Alibaba Group Holding Ltd. |
|
TA01 | Transfer of patent application right | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190806 |
|
RJ01 | Rejection of invention patent application after publication |