CN113704240A - 一种数据去重的方法 - Google Patents
一种数据去重的方法 Download PDFInfo
- Publication number
- CN113704240A CN113704240A CN202111115705.4A CN202111115705A CN113704240A CN 113704240 A CN113704240 A CN 113704240A CN 202111115705 A CN202111115705 A CN 202111115705A CN 113704240 A CN113704240 A CN 113704240A
- Authority
- CN
- China
- Prior art keywords
- target
- index file
- file block
- value
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 230000001174 ascending effect Effects 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 description 27
- 238000004891 communication Methods 0.000 description 7
- 230000009471 action Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111115705.4A CN113704240B (zh) | 2021-09-23 | 2021-09-23 | 一种数据去重的方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111115705.4A CN113704240B (zh) | 2021-09-23 | 2021-09-23 | 一种数据去重的方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113704240A true CN113704240A (zh) | 2021-11-26 |
CN113704240B CN113704240B (zh) | 2025-02-11 |
Family
ID=78661637
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111115705.4A Active CN113704240B (zh) | 2021-09-23 | 2021-09-23 | 一种数据去重的方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113704240B (zh) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100211616A1 (en) * | 2009-02-16 | 2010-08-19 | Rajesh Khandelwal | Performance by Avoiding Disk I/O for Deduplicated File Blocks |
CN102591855A (zh) * | 2012-01-13 | 2012-07-18 | 广州从兴电子开发有限公司 | 一种数据标识方法及系统 |
CN102682085A (zh) * | 2012-04-18 | 2012-09-19 | 北京十分科技有限公司 | 一种网页去重的方法 |
CN103136243A (zh) * | 2011-11-29 | 2013-06-05 | 中国电信股份有限公司 | 基于云存储的文件系统去重方法及装置 |
CN103455631A (zh) * | 2013-09-22 | 2013-12-18 | 广州中国科学院软件应用技术研究所 | 一种数据处理方法、装置及系统 |
CN110704433A (zh) * | 2019-09-23 | 2020-01-17 | 北京优炫软件股份有限公司 | 列式存储数据的brin索引构建方法、数据检索方法及装置 |
CN112328435A (zh) * | 2020-12-07 | 2021-02-05 | 武汉绿色网络信息服务有限责任公司 | 目标数据备份和恢复的方法、装置、设备及存储介质 |
-
2021
- 2021-09-23 CN CN202111115705.4A patent/CN113704240B/zh active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100211616A1 (en) * | 2009-02-16 | 2010-08-19 | Rajesh Khandelwal | Performance by Avoiding Disk I/O for Deduplicated File Blocks |
CN103136243A (zh) * | 2011-11-29 | 2013-06-05 | 中国电信股份有限公司 | 基于云存储的文件系统去重方法及装置 |
CN102591855A (zh) * | 2012-01-13 | 2012-07-18 | 广州从兴电子开发有限公司 | 一种数据标识方法及系统 |
CN102682085A (zh) * | 2012-04-18 | 2012-09-19 | 北京十分科技有限公司 | 一种网页去重的方法 |
CN103455631A (zh) * | 2013-09-22 | 2013-12-18 | 广州中国科学院软件应用技术研究所 | 一种数据处理方法、装置及系统 |
CN110704433A (zh) * | 2019-09-23 | 2020-01-17 | 北京优炫软件股份有限公司 | 列式存储数据的brin索引构建方法、数据检索方法及装置 |
CN112328435A (zh) * | 2020-12-07 | 2021-02-05 | 武汉绿色网络信息服务有限责任公司 | 目标数据备份和恢复的方法、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN113704240B (zh) | 2025-02-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108319654B (zh) | 计算系统、冷热数据分离方法及装置、计算机可读存储介质 | |
TWI518530B (zh) | Repeated data processing methods, devices and systems | |
CN110888837B (zh) | 对象存储小文件归并方法及装置 | |
CN102999433B (zh) | 一种虚拟磁盘的冗余数据删除方法及系统 | |
US10776345B2 (en) | Efficiently updating a secondary index associated with a log-structured merge-tree database | |
WO2020057272A1 (zh) | 一种索引数据存储及检索方法、装置及存储介质 | |
CN109726177A (zh) | 一种基于HBase的海量文件分区索引方法 | |
CN110727404A (zh) | 一种基于存储端的数据重删方法、设备以及存储介质 | |
CN105511812A (zh) | 一种存储系统大数据优化方法及装置 | |
CN111782707A (zh) | 一种数据查询方法及系统 | |
CN114416670B (zh) | 适用于网盘文档的索引创建方法、装置、网盘及存储介质 | |
CN108958653A (zh) | 一种基于底层聚合文件的空间回收方法、系统及相关装置 | |
EP3343395B1 (en) | Data storage method and apparatus for mobile terminal | |
CN112732191A (zh) | 基于日志结构合并树合并数据的方法、系统、设备及介质 | |
CN109189343B (zh) | 一种元数据落盘方法、装置、设备及计算机可读存储介质 | |
CN110990394B (zh) | 分布式面向列数据库表的行数统计方法、装置和存储介质 | |
CN117369731B (zh) | 一种数据的缩减处理方法、装置、设备及介质 | |
CN113704240A (zh) | 一种数据去重的方法 | |
CN107315806B (zh) | 一种基于文件系统的嵌入式存储方法和装置 | |
CN118312483A (zh) | 文件检索方法及装置 | |
KR100859710B1 (ko) | 데이터에 대한 검색을 수행하기 위한 자료구조를 이용하여 데이터를 검색, 저장, 삭제하는 방법 | |
WO2017014744A1 (en) | Processing time-varying data using a graph data structure | |
CN111045994A (zh) | 一种基于kv数据库的文件分类检索方法及系统 | |
CN107943832A (zh) | 业务处理方法及装置 | |
CN113760876B (zh) | 一种数据过滤方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20211208 Address after: Room 1423, No. 1256 and 1258, Wanrong Road, Jing'an District, Shanghai 200040 Applicant after: Tianyi Digital Life Technology Co.,Ltd. Address before: 1 / F and 2 / F, East Garden, Huatian International Plaza, 211 Longkou Middle Road, Tianhe District, Guangzhou, Guangdong 510000 Applicant before: Century Dragon Information Network Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20240325 Address after: Unit 1, Building 1, China Telecom Zhejiang Innovation Park, No. 8 Xiqin Street, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province, 311100 Applicant after: Tianyi Shilian Technology Co.,Ltd. Country or region after: China Address before: Room 1423, No. 1256 and 1258, Wanrong Road, Jing'an District, Shanghai 200040 Applicant before: Tianyi Digital Life Technology Co.,Ltd. Country or region before: China |
|
GR01 | Patent grant | ||
GR01 | Patent grant |