CN113704240A - 一种数据去重的方法 - Google Patents
一种数据去重的方法 Download PDFInfo
- Publication number
- CN113704240A CN113704240A CN202111115705.4A CN202111115705A CN113704240A CN 113704240 A CN113704240 A CN 113704240A CN 202111115705 A CN202111115705 A CN 202111115705A CN 113704240 A CN113704240 A CN 113704240A
- Authority
- CN
- China
- Prior art keywords
- target
- index file
- file block
- value
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 230000001174 ascending effect Effects 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 description 27
- 238000004891 communication Methods 0.000 description 7
- 230000009471 action Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111115705.4A CN113704240A (zh) | 2021-09-23 | 2021-09-23 | 一种数据去重的方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111115705.4A CN113704240A (zh) | 2021-09-23 | 2021-09-23 | 一种数据去重的方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113704240A true CN113704240A (zh) | 2021-11-26 |
Family
ID=78661637
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111115705.4A Pending CN113704240A (zh) | 2021-09-23 | 2021-09-23 | 一种数据去重的方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113704240A (zh) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100211616A1 (en) * | 2009-02-16 | 2010-08-19 | Rajesh Khandelwal | Performance by Avoiding Disk I/O for Deduplicated File Blocks |
CN102591855A (zh) * | 2012-01-13 | 2012-07-18 | 广州从兴电子开发有限公司 | 一种数据标识方法及系统 |
CN102682085A (zh) * | 2012-04-18 | 2012-09-19 | 北京十分科技有限公司 | 一种网页去重的方法 |
CN103455631A (zh) * | 2013-09-22 | 2013-12-18 | 广州中国科学院软件应用技术研究所 | 一种数据处理方法、装置及系统 |
CN110704433A (zh) * | 2019-09-23 | 2020-01-17 | 北京优炫软件股份有限公司 | 列式存储数据的brin索引构建方法、数据检索方法及装置 |
CN112328435A (zh) * | 2020-12-07 | 2021-02-05 | 武汉绿色网络信息服务有限责任公司 | 目标数据备份和恢复的方法、装置、设备及存储介质 |
-
2021
- 2021-09-23 CN CN202111115705.4A patent/CN113704240A/zh active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100211616A1 (en) * | 2009-02-16 | 2010-08-19 | Rajesh Khandelwal | Performance by Avoiding Disk I/O for Deduplicated File Blocks |
CN102591855A (zh) * | 2012-01-13 | 2012-07-18 | 广州从兴电子开发有限公司 | 一种数据标识方法及系统 |
CN102682085A (zh) * | 2012-04-18 | 2012-09-19 | 北京十分科技有限公司 | 一种网页去重的方法 |
CN103455631A (zh) * | 2013-09-22 | 2013-12-18 | 广州中国科学院软件应用技术研究所 | 一种数据处理方法、装置及系统 |
CN110704433A (zh) * | 2019-09-23 | 2020-01-17 | 北京优炫软件股份有限公司 | 列式存储数据的brin索引构建方法、数据检索方法及装置 |
CN112328435A (zh) * | 2020-12-07 | 2021-02-05 | 武汉绿色网络信息服务有限责任公司 | 目标数据备份和恢复的方法、装置、设备及存储介质 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108319654B (zh) | 计算系统、冷热数据分离方法及装置、计算机可读存储介质 | |
CN106874348B (zh) | 文件存储和索引方法、装置及读取文件的方法 | |
CN110888837B (zh) | 对象存储小文件归并方法及装置 | |
JP5531583B2 (ja) | ログ出力装置、ログ出力方法、ログ出力用プログラム | |
CN109241003B (zh) | 文件管理方法和装置 | |
CN106980680B (zh) | 数据存储方法及存储设备 | |
CN112732191B (zh) | 基于日志结构合并树合并数据的方法、系统、设备及介质 | |
CN111782707A (zh) | 一种数据查询方法及系统 | |
EP3422205A1 (en) | Database-archiving method and apparatus that generate index information, and method and apparatus for searching archived database comprising index information | |
CN114416670B (zh) | 适用于网盘文档的索引创建方法、装置、网盘及存储介质 | |
JP6507657B2 (ja) | 類似性判定装置、類似性判定方法および類似性判定プログラム | |
CN109496292A (zh) | 一种磁盘管理方法、磁盘管理装置及电子设备 | |
CN116010362A (zh) | 文件存储和文件读取的方法、装置及系统 | |
CN117369731B (zh) | 一种数据的缩减处理方法、装置、设备及介质 | |
CN109189343B (zh) | 一种元数据落盘方法、装置、设备及计算机可读存储介质 | |
CN114995759A (zh) | 桶分片处理方法、装置、设备及介质 | |
CN109213972B (zh) | 确定文档相似度的方法、装置、设备和计算机存储介质 | |
CN107315806B (zh) | 一种基于文件系统的嵌入式存储方法和装置 | |
CN113704240A (zh) | 一种数据去重的方法 | |
CN111045994A (zh) | 一种基于kv数据库的文件分类检索方法及系统 | |
CN107943849B (zh) | 视频文件的检索方法及装置 | |
CN110990394B (zh) | 分布式面向列数据库表的行数统计方法、装置和存储介质 | |
CN112925746A (zh) | 文件归档方法和装置 | |
CN111143288A (zh) | 一种数据存储方法、系统及相关装置 | |
CN118233454B (zh) | 一种文件批处理上传方法及介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20211208 Address after: Room 1423, No. 1256 and 1258, Wanrong Road, Jing'an District, Shanghai 200040 Applicant after: Tianyi Digital Life Technology Co.,Ltd. Address before: 1 / F and 2 / F, East Garden, Huatian International Plaza, 211 Longkou Middle Road, Tianhe District, Guangzhou, Guangdong 510000 Applicant before: Century Dragon Information Network Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20240325 Address after: Unit 1, Building 1, China Telecom Zhejiang Innovation Park, No. 8 Xiqin Street, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province, 311100 Applicant after: Tianyi Shilian Technology Co.,Ltd. Country or region after: China Address before: Room 1423, No. 1256 and 1258, Wanrong Road, Jing'an District, Shanghai 200040 Applicant before: Tianyi Digital Life Technology Co.,Ltd. Country or region before: China |
|
TA01 | Transfer of patent application right |