CN115292248B - Data cleaning method, system and equipment based on multiple data versions - Google Patents

Data cleaning method, system and equipment based on multiple data versions Download PDF

Info

Publication number
CN115292248B
CN115292248B CN202211204960.0A CN202211204960A CN115292248B CN 115292248 B CN115292248 B CN 115292248B CN 202211204960 A CN202211204960 A CN 202211204960A CN 115292248 B CN115292248 B CN 115292248B
Authority
CN
China
Prior art keywords
version
preset
effective
files
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211204960.0A
Other languages
Chinese (zh)
Other versions
CN115292248A (en
Inventor
王敏
张雷
李本学
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongfu Safety Technology Co Ltd
Original Assignee
Zhongfu Safety Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongfu Safety Technology Co Ltd filed Critical Zhongfu Safety Technology Co Ltd
Priority to CN202211204960.0A priority Critical patent/CN115292248B/en
Publication of CN115292248A publication Critical patent/CN115292248A/en
Application granted granted Critical
Publication of CN115292248B publication Critical patent/CN115292248B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Abstract

The application discloses a data cleaning method, system and device based on multiple data versions, mainly relates to the technical field of data cleaning, and is used for solving the problem that the existing effective version metadata and total data block number are low in acquisition efficiency. The method comprises the following steps: when the version metadata are generated, adding the version metadata ID and the updating time of the version metadata as version metadata index information into a preset batch of files of a preset index file; acquiring an effective version time threshold to determine index information positioned by the effective version time threshold; determining valid version metadata; determining a valid file data block; further, according to the preset sizes of the effective file data blocks and the bloom filters, a plurality of bloom filters with dynamic linked list structures are established; and synchronously traversing the disk data blocks through a plurality of bloom filters to determine whether the valid file data blocks are valid. According to the method, the efficiency of acquiring the effective version metadata and the total number of the data blocks is improved.

Description

Data cleaning method, system and equipment based on multiple data versions
Technical Field
The present application relates to the field of data cleaning technologies, and in particular, to a method, a system, and an apparatus for cleaning data based on multiple data versions.
Background
Data cleansing refers to the process of correcting and deleting inaccurate data records from a database or data table. Data cleansing includes identifying and replacing incomplete, inaccurate, irrelevant, or problematic data and records.
At present, the method for cleaning data mainly comprises the following steps: dividing data in a storage server into three parts of version metadata, file metadata and file block data; the relationship among the three types of data is as follows: I. the file block data is file block data obtained by splitting file data according to a preset size, and is provided with a block data identifier; file metadata stores a block data identification list corresponding to file data; and III, storing a cluster of file metadata updating information in the version metadata. Scanning the whole disk space through cleaning equipment to obtain the total number of data blocks, and further constructing a bloom filter; and carrying out accurate cleaning on the invalid version metadata smaller than the threshold value of the valid version number through a bloom filter.
However, when data is cleaned by the existing cleaning device, the entire disk needs to be scanned, metadata information of each version needs to be analyzed, and the metadata information is compared with a threshold value of an effective version number to obtain the metadata of the effective version, so that the efficiency is low. In addition, before the bloom filter is constructed, the cleaning equipment needs to scan the data of the whole disk, so that the total number of the data blocks is obtained, and the efficiency is low.
Disclosure of Invention
In view of the above-mentioned deficiencies of the prior art, the present invention provides a method, a system and a device for data cleaning based on multiple data versions, so as to solve the above-mentioned technical problems.
In a first aspect, the present application provides a data cleaning method based on multiple data versions, including: when the version metadata are generated, adding the version metadata ID and the updating time of the version metadata as version metadata index information into a preset batch of files of a preset index file; the method comprises the steps that a preset index file is divided into a plurality of preset batches of files according to a preset segmentation time period, and a plurality of index information in the same preset batch of files are arranged according to an updating time sequence; acquiring an effective version time threshold to determine index information of the effective version time threshold positioned in a preset batch of files of a preset index file; determining that all the version metadata corresponding to all the index information after the update time point corresponding to the index information are valid version metadata; determining valid file data blocks based on the valid version metadata; then according to the effective file data block and the preset size of the bloom filter, a plurality of bloom filters with dynamic linked list structures are created; and synchronously traversing the disk data blocks through a plurality of bloom filters to determine whether the valid file data blocks are valid.
Further, the method further comprises: creating a preset batch file in a preset index file according to a preset batch interval; and the filenames of the preset batch files carry time information.
Further, acquiring an effective version time threshold to determine index information of the effective version time threshold located in a preset batch of files of a preset index file specifically includes: acquiring an effective version time threshold through a preset acquisition interface; determining a batch of files to be detected from a plurality of preset batch files based on the carrying time information and the effective version time threshold of the preset batch files; and determining the version metadata corresponding to the effective version time threshold value based on the updating time corresponding to each version metadata in the to-be-detected batch of files.
Further, according to the preset sizes of the valid file data blocks and the bloom filters, a plurality of bloom filters with a dynamic linked list structure are created, and the method specifically comprises the following steps: initializing a bloom filter with a dynamic linked list structure, presetting the size of the bloom filter as n and the HASH value as k, and pre-creating a bloom filter node at the head of the dynamic linked list; and after scanning the effective file data blocks according to the effective version metadata, writing the ID of the effective file data blocks into the bloom filter and counting, and when the count is greater than n, newly constructing the bloom filter and adding the bloom filter into the dynamic linked list.
In a second aspect, the present application provides a data cleansing system based on multiple data versions, the system comprising: the adding module is used for adding the version metadata ID and the updating time of the version metadata as version metadata index information into a preset batch of files of a preset index file when the version metadata are generated; the method comprises the steps that a preset index file is divided into a plurality of preset batches of files according to a preset segmentation time period, and a plurality of index information in the same preset batch of files are arranged according to an updating time sequence; the determining module is used for acquiring an effective version time threshold so as to determine index information of the effective version time threshold positioned in a preset batch of files of a preset index file; determining that all the version metadata corresponding to all the index information after the update time point corresponding to the index information are valid version metadata; the traversal module is used for determining an effective file data block based on the effective version metadata; then according to the effective file data block and the preset size of the bloom filter, a plurality of bloom filters with dynamic linked list structures are created; and synchronously traversing the disk data blocks through a plurality of bloom filters to determine whether the valid file data blocks are valid.
Further, the determining module further comprises a determining unit; the time threshold of the effective version is acquired through a preset acquisition interface; determining a batch of files to be detected from a plurality of preset batch files based on the carrying time information and the effective version time threshold of the preset batch files; and determining the version metadata corresponding to the effective version time threshold value based on the updating time corresponding to each version metadata in the to-be-detected batch of files.
Further, the traversing module comprises an adding unit; the method comprises the steps of initializing a bloom filter with a dynamic linked list structure, presetting the size of the bloom filter as n and the HASH value as k, and pre-establishing a bloom filter node at the head of the dynamic linked list; and after scanning the effective file data blocks according to the effective version metadata, writing the ID of the effective file data blocks into the bloom filter and counting, and when the count is greater than n, newly constructing the bloom filter and adding the bloom filter into the dynamic linked list.
In a third aspect, the present application provides a data cleansing apparatus based on multiple data versions, the apparatus comprising: a processor; and a memory having executable code stored thereon, the executable code, when executed, causing the processor to perform a multiple data version based data scrubbing method of any of the above.
As can be appreciated by those skilled in the art, the present invention has at least the following beneficial effects:
(1) The version metadata generates index files according to the updating time sequence, the index files are divided into batch files according to time, when the version metadata is searched, the batch index files are positioned according to the effective version time threshold value, then the batch index files are analyzed, the closest value of the version metadata is positioned according to the updating time, and the effective version positioning efficiency is improved.
(2) The bloom filter adopts a dynamic linked list mode, the number of the bloom filters is dynamically expanded according to the number of the effective file data blocks in the process of searching the effective file data blocks, the numerical values are stored in the linked list mode, and the query can be executed on the multi-node bloom filter in parallel in the process of checking the effective file data blocks, so that one-time full-disk scanning of a disk is reduced, and the effective checking efficiency is improved.
Drawings
Some embodiments of the present disclosure are described below with reference to the accompanying drawings, in which:
fig. 1 is a flowchart of a data cleaning method based on multiple data versions according to an embodiment of the present application.
Fig. 2 is a schematic diagram of an internal structure of index information provided in an embodiment of the present application.
Fig. 3 is a schematic diagram of an internal structure of a preset index file according to an embodiment of the present application.
Fig. 4 is a schematic diagram of internal structures of a plurality of bloom filters having a dynamic linked list structure according to an embodiment of the present application.
Fig. 5 is a schematic diagram of an internal structure of a data cleansing system based on multiple data versions according to an embodiment of the present application.
Fig. 6 is a schematic diagram of an internal structure of a data cleansing device based on multiple data versions according to an embodiment of the present application.
Detailed Description
It should be understood by those skilled in the art that the embodiments described below are only preferred embodiments of the present disclosure, and do not mean that the present disclosure can be implemented only by the preferred embodiments, which are merely for explaining the technical principles of the present disclosure and are not intended to limit the scope of the present disclosure. All other embodiments that can be derived by one of ordinary skill in the art from the preferred embodiments provided by the disclosure without undue experimentation will still fall within the scope of the disclosure.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The technical solutions proposed in the embodiments of the present application are described in detail below with reference to the accompanying drawings.
The embodiment of the present application further provides a data cleaning method based on multiple data versions, as shown in fig. 1, the method provided in the embodiment of the present application mainly includes the following steps:
step 110, when the version metadata is generated, adding the version metadata ID and the update time of the version metadata as the version metadata index information to a preset batch file of a preset index file.
Note that the index information includes a version metadata ID and an update time (for example, fig. 2). The preset index file is divided into a plurality of preset batches of files according to the preset segmentation time period, a plurality of index information in the same preset batch of files are arranged according to the updating time sequence (for example, as shown in fig. 3), and when any index information is accurately positioned, the effective version metadata index storage area and the invalid version metadata index storage area can be quickly found.
In order to effectively and quickly find out metadata of any version in a plurality of batches of files, marks can be preset in preset file names to distinguish different files, the preset batches of files are positioned, and then the metadata of the version is accurately positioned. As an example, the present application may create a preset batch file in a preset index file according to a preset batch interval; and the filenames of the preset batch files carry time information.
Step 120, obtaining an effective version time threshold value to determine index information of the effective version time threshold value positioned in a preset batch of files of a preset index file; and determining that all the version metadata corresponding to the index information after the update time point corresponding to the index information are valid version metadata.
It should be noted that, according to step 110, it can be known that the index information of the files in the same predetermined batch are arranged according to the update time sequence. Therefore, the index information corresponds to all index information after the update time point, and is valid data/information that can be detected.
The "obtaining the effective version time threshold to determine the index information of the effective version time threshold located in the preset batch of files of the preset index file" may specifically be: acquiring an effective version time threshold through a preset acquisition interface; determining a batch of files to be detected from a plurality of preset batch files based on the carrying time information and the effective version time threshold of the preset batch files; and determining the version metadata corresponding to the effective version time threshold value based on the updating time corresponding to each version metadata in the to-be-detected batch of files.
Step 130, determining valid file data blocks based on the valid version metadata; then according to the effective file data block and the preset size of the bloom filter, a plurality of bloom filters with dynamic linked list structures are created; and synchronously traversing the disk data blocks through a plurality of bloom filters to determine whether the valid file data blocks are valid.
Wherein, the creating of the bloom filters with the dynamic linked list structure according to the preset sizes of the effective file data blocks and the bloom filters may specifically be: initializing a bloom filter with a dynamic linked list structure, presetting the size of the bloom filter as n and the HASH value as k, and pre-creating a bloom filter node at the head of the dynamic linked list; after the valid file data blocks are scanned according to the valid version metadata, writing the ID of the valid file data blocks into a bloom filter and counting, and when the count is greater than n, newly constructing the bloom filter and adding the bloom filter into a dynamic linked list (for example, as shown in fig. 4).
And synchronously traversing the disk data blocks by a plurality of bloom filters, wherein when one bloom filter returns data, the file data block is valid data, and otherwise, the file data block is invalid data.
Based on the description, the method and the device can be used for storing the version metadata in a preset batch file segmentation mode, so that the effective version metadata positioning query time is reduced, the version metadata file analysis time is reduced, and the effective version metadata retrieval efficiency is improved; by using the dynamic chain table bloom filter, the one-time full-disk scanning time for acquiring the total number of the file data blocks is reduced, the data block validity detection can be concurrently performed, and the redundant data cleaning efficiency is improved.
In addition, fig. 5 is a data cleansing system based on multiple data versions according to an embodiment of the present application. As shown in fig. 5, the system provided in the embodiment of the present application mainly includes:
an adding module 210, configured to add, when the version metadata is generated, the version metadata ID and the update time of the version metadata as version metadata index information to a preset batch of files of a preset index file; the method comprises the steps that a preset index file is divided into a plurality of preset batches of files according to a preset segmentation time period, and a plurality of index information in the same preset batch of files are arranged according to an updating time sequence;
a determining module 220, configured to obtain an effective version time threshold, so as to determine index information of the effective version time threshold located in a preset batch of files of a preset index file; determining that all the version metadata corresponding to all the index information after the update time point corresponding to the index information are valid version metadata;
the determination module 220 further comprises a determination unit 221; the time threshold of the effective version is acquired through a preset acquisition interface; determining a batch of files to be detected from a plurality of preset batch files based on the carrying time information and the effective version time threshold of the preset batch files; and determining the version metadata corresponding to the effective version time threshold value based on the updating time corresponding to each version metadata in the to-be-detected batch of files.
A traversal module 230 configured to determine valid file data blocks based on the valid version metadata; then according to the effective file data block and the preset size of the bloom filter, a plurality of bloom filters with dynamic linked list structures are created; and synchronously traversing the disk data blocks through a plurality of bloom filters to determine whether the valid file data blocks are valid.
The traverse module 230 includes an adding unit 231; the method comprises the steps of initializing a bloom filter with a dynamic linked list structure, presetting the size of the bloom filter as n and the HASH value as k, and pre-establishing a bloom filter node at the head of the dynamic linked list; and after scanning the effective file data blocks according to the effective version metadata, writing the ID of the effective file data blocks into the bloom filter and counting, and when the count is greater than n, newly constructing the bloom filter and adding the bloom filter into the dynamic linked list.
Besides, the embodiment of the present application also provides a data cleansing device based on multiple data versions, as shown in fig. 6, on which executable instructions are stored, and when the executable instructions are executed, a data cleansing method based on multiple data versions as described above is implemented. Specifically, the server sends an execution instruction to the memory through the bus, and when the memory receives the execution instruction, sends an execution signal to the processor through the bus so as to activate the processor.
It should be noted that, the processor is configured to add the version metadata ID and the update time of the version metadata as the version metadata index information to the preset batch file of the preset index file when generating the version metadata; the method comprises the steps that a preset index file is divided into a plurality of preset batches of files according to a preset segmentation time period, and a plurality of index information in the same preset batch of files are arranged according to an updating time sequence; acquiring an effective version time threshold to determine index information of the effective version time threshold positioned in a preset batch of files of a preset index file; determining that all the version metadata corresponding to all the index information after the update time point corresponding to the index information are valid version metadata; determining valid file data blocks based on the valid version metadata; then according to the effective file data block and the preset size of the bloom filter, a plurality of bloom filters with dynamic linked list structures are created; and synchronously traversing the disk data blocks through a plurality of bloom filters to determine whether the valid file data blocks are valid.
So far, the technical solutions of the present disclosure have been described in connection with the foregoing embodiments, but it is easily understood by those skilled in the art that the scope of the present disclosure is not limited to only these specific embodiments. The technical solutions in the above embodiments can be split and combined, and equivalent changes or substitutions can be made on related technical features by those skilled in the art without departing from the technical principles of the present disclosure, and any changes, equivalents, improvements, etc. made within the technical concept and/or technical principles of the present disclosure will fall within the protection scope of the present disclosure.

Claims (8)

1. A method for data scrubbing based on multiple data versions, the method comprising:
when the version metadata are generated, adding the version metadata ID and the updating time of the version metadata as version metadata index information into a preset batch file of a preset index file; the method comprises the steps that a preset index file is divided into a plurality of preset batches of files according to a preset segmentation time period, and a plurality of index information in the same preset batch of files are arranged according to an updating time sequence;
acquiring an effective version time threshold to determine index information of the effective version time threshold positioned in a preset batch of files of a preset index file; determining that all the version metadata corresponding to the index information after the update time point corresponding to the index information are valid version metadata;
determining valid file data blocks based on the valid version metadata; further, according to the effective file data blocks and the preset sizes of the bloom filters, a plurality of bloom filters with dynamic linked list structures are established; and synchronously traversing the disk data blocks through a plurality of bloom filters to determine whether the valid file data blocks are valid.
2. The multiple data version based data cleansing method of claim 1, further comprising:
creating a preset batch file in a preset index file according to a preset batch interval; and the filename of the preset batch of files carries time information.
3. The method for cleaning data based on multiple data versions according to claim 2, wherein obtaining an effective version time threshold to determine index information of the effective version time threshold located in a predetermined batch of files in a predetermined index file specifically comprises:
acquiring an effective version time threshold through a preset acquisition interface;
determining a batch of files to be detected from a plurality of preset batch files based on the carrying time information and the effective version time threshold of the preset batch files;
and determining the version metadata corresponding to the effective version time threshold value based on the updating time corresponding to each version metadata in the batch of files to be detected.
4. The method for cleaning data based on multiple data versions according to claim 1, wherein creating a plurality of bloom filters having a dynamic linked list structure according to the preset sizes of the valid file data blocks and the bloom filters specifically includes:
initializing a bloom filter with a dynamic linked list structure, presetting the size of the bloom filter as n and the HASH value as k, and pre-creating a bloom filter node at the head of the dynamic linked list;
and after scanning the effective file data blocks according to the effective version metadata, writing the ID of the effective file data blocks into the bloom filter and counting, and when the count is greater than n, newly constructing the bloom filter and adding the bloom filter into the dynamic linked list.
5. A data cleansing system based on multiple data versions, the system comprising:
the adding module is used for adding the version metadata ID and the updating time of the version metadata as version metadata index information into a preset batch of files of a preset index file when the version metadata are generated; the method comprises the steps that a preset index file is divided into a plurality of preset batches of files according to a preset segmentation time period, and a plurality of index information in the same preset batch of files are arranged according to an updating time sequence;
the determining module is used for acquiring an effective version time threshold so as to determine index information of the effective version time threshold positioned in a preset batch of files of a preset index file; determining that all the version metadata corresponding to the index information after the update time point corresponding to the index information are valid version metadata;
the traversal module is used for determining an effective file data block based on the effective version metadata; further, according to the effective file data blocks and the preset sizes of the bloom filters, a plurality of bloom filters with dynamic linked list structures are established; and synchronously traversing the disk data blocks through a plurality of bloom filters to determine whether the valid file data blocks are valid.
6. The multiple data version-based data cleansing system of claim 5, wherein the determining module further comprises a determining unit;
the time threshold of the effective version is acquired through a preset acquisition interface; determining a batch of files to be detected from a plurality of preset batch files based on the carrying time information and the effective version time threshold of the preset batch files; and determining the version metadata corresponding to the effective version time threshold value based on the updating time corresponding to each version metadata in the batch of files to be detected.
7. The multiple data version-based data cleansing system of claim 5, wherein the traversal module comprises an add unit;
the method comprises the steps of initializing a bloom filter with a dynamic linked list structure, presetting the size of the bloom filter as n and the HASH value as k, and pre-establishing a bloom filter node at the head of the dynamic linked list; and after scanning the effective file data blocks according to the effective version metadata, writing the ID of the effective file data blocks into the bloom filter and counting, and when the count is greater than n, newly constructing the bloom filter and adding the bloom filter into the dynamic linked list.
8. A data scrubbing apparatus based on multiple data versions, the apparatus comprising:
a processor;
and a memory having executable code stored thereon, which when executed, causes the processor to perform a multiple data version based data cleansing method as claimed in any one of claims 1-4.
CN202211204960.0A 2022-09-30 2022-09-30 Data cleaning method, system and equipment based on multiple data versions Active CN115292248B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211204960.0A CN115292248B (en) 2022-09-30 2022-09-30 Data cleaning method, system and equipment based on multiple data versions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211204960.0A CN115292248B (en) 2022-09-30 2022-09-30 Data cleaning method, system and equipment based on multiple data versions

Publications (2)

Publication Number Publication Date
CN115292248A CN115292248A (en) 2022-11-04
CN115292248B true CN115292248B (en) 2023-01-03

Family

ID=83833850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211204960.0A Active CN115292248B (en) 2022-09-30 2022-09-30 Data cleaning method, system and equipment based on multiple data versions

Country Status (1)

Country Link
CN (1) CN115292248B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178677A (en) * 2007-11-09 2008-05-14 中国科学院计算技术研究所 Computer system for protecting software and method for protecting software
CN103116615A (en) * 2013-01-28 2013-05-22 袁华强 Data index method and server based edition vector
CN105320654A (en) * 2014-05-28 2016-02-10 中国科学院深圳先进技术研究院 Dynamic bloom filter and element operating method based on same

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8095509B2 (en) * 2007-08-11 2012-01-10 Novell, Inc. Techniques for retaining security restrictions with file versioning
US11874811B2 (en) * 2018-12-28 2024-01-16 Teradata Us, Inc. Control versioning of temporal tables to reduce data redundancy

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178677A (en) * 2007-11-09 2008-05-14 中国科学院计算技术研究所 Computer system for protecting software and method for protecting software
CN103116615A (en) * 2013-01-28 2013-05-22 袁华强 Data index method and server based edition vector
CN105320654A (en) * 2014-05-28 2016-02-10 中国科学院深圳先进技术研究院 Dynamic bloom filter and element operating method based on same

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
VBTree:forward secure conjunctive queries over encrypted data for cloud computing;Wu,ZQ et al.;《VLDB JOURNAL》;20190228;第25-46页 *
一种基于布隆过滤的快速冗余数据块发现算法;周斌;《中南民族大学学报(自然科学版)》;20160915(第03期);第130-134页 *
一种细粒度高效多版本文件系统;向小佳等;《软件学报》;20090315(第03期);第754-765页 *

Also Published As

Publication number Publication date
CN115292248A (en) 2022-11-04

Similar Documents

Publication Publication Date Title
Wang et al. Spatial online sampling and aggregation
US6119124A (en) Method for clustering closely resembling data objects
US8924365B2 (en) System and method for range search over distributive storage systems
JP6996812B2 (en) How to process data blocks in a distributed database, programs, and devices
US7945543B2 (en) Method and system for deferred maintenance of database indexes
KR20020028208A (en) Real-time database object statistics collection
CN110019384B (en) Method for acquiring blood edge data, method and device for providing blood edge data
CN109918386B (en) Data recovery method and device and computer readable storage medium
CN111125298A (en) Method, equipment and storage medium for reconstructing NTFS file directory tree
US20080005077A1 (en) Encoded version columns optimized for current version access
CN107301186B (en) Invalid data identification method and device
CN115292248B (en) Data cleaning method, system and equipment based on multiple data versions
CN114116795A (en) Data storage and query method, device, storage medium and electronic equipment
CN107004036B (en) Method and system for searching logs containing a large number of entries
CN106959960B (en) Data acquisition method and device
CN109271097A (en) Data processing method, data processing equipment and server
CN102591941B (en) Analysis method and analysis device for SQLite idle struct nodes
CN107590233B (en) File management method and device
CN106776704B (en) Statistical information collection method and device
JP2006228116A (en) Web page link determination method and web page link determination device
CN109446022B (en) Method and device for detecting abnormal overflow page of database and storage medium
CN111597212B (en) Data retrieval method and device
CN114692595B (en) Repeated conflict scheme detection method based on text matching
CN112559195B (en) Database deadlock detection method and device, test terminal and medium
CN111290803B (en) Data preloading method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant