CN111970381B - File deduplication, addition and uploading method, system, equipment and storage medium - Google Patents
File deduplication, addition and uploading method, system, equipment and storage medium Download PDFInfo
- Publication number
- CN111970381B CN111970381B CN202010925859.9A CN202010925859A CN111970381B CN 111970381 B CN111970381 B CN 111970381B CN 202010925859 A CN202010925859 A CN 202010925859A CN 111970381 B CN111970381 B CN 111970381B
- Authority
- CN
- China
- Prior art keywords
- file
- data
- additional
- fingerprint information
- strip
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000003860 storage Methods 0.000 title claims abstract description 25
- 230000008520 organization Effects 0.000 claims abstract description 32
- 238000004140 cleaning Methods 0.000 claims abstract description 14
- 238000012795 verification Methods 0.000 claims abstract description 6
- 238000004590 computer program Methods 0.000 claims description 12
- 238000004064 recycling Methods 0.000 claims description 2
- 230000000153 supplemental effect Effects 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 8
- 230000006870 function Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/06—Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1748—De-duplication implemented within the file system, e.g. based on file segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/561—Adding application-functional data or data for application control, e.g. adding metadata
Abstract
The invention discloses a method, a system, equipment and a storage medium for uploading deleted files, added files and deleted files, wherein the method comprises the following steps: when the server receives an uploading request of the added file for multiple times, acquiring metadata of the added file, and performing parameter verification; reading data according to the uploading request, and creating a stripe according to the data; the created strip is not filled with the last strip of the existing object; calculating fingerprint information of the created stripe and generating a file organization record file of the additional data; and when the addition fails, performing garbage data cleaning according to the file organization record file of the additional data. The system comprises a request receiving unit, a stripe creating unit, a fingerprint checking unit and a garbage cleaning unit. When the data is added, the stripes are not combined and aligned, the new added data can not interfere with the existing data, and the processing performance of the storage cluster can be improved.
Description
Technical Field
The invention relates to the field of object storage, in particular to a method, a system, equipment and a storage medium for deleting files, adding and uploading files again.
Background
With the development of informatization, more and more individual users select to transfer data originally stored in own digital equipment to cloud storage, so that family members can share the data with one another; and 4.0, the development of technologies such as industrial 4.0, intelligent manufacturing, enterprise cloud, big data, electronic government affairs, NASA satellite center, large radio telescope and the like, leads more and more enterprises and government units to choose to put data in storage clusters for centralized management.
A distributed object storage system refers to unstructured data oriented distributed storage. At present, more and more service scenes need to use a distributed object storage system, for example, in the fields of video monitoring, live video broadcast and the like, video data is continuously generated in real time, large system logs are continuously generated, the data volume in a storage cluster is larger and larger, operations such as read-write requests of files bring consumption of a large amount of Input/Output (IO) of a disk, and the IO throughput of the cluster is limited.
Due to the adoption of the data deduplication technology, the striped objects on the storage cluster are associated with different files, and when the addition fails, the generated garbage data needs to be recycled, so that the existing deduplication technology can affect the original files and the processing performance of the storage cluster.
Disclosure of Invention
In order to solve the technical problems, the invention provides a file deduplication, addition and uploading method, a system, equipment and a storage medium, which can avoid the influence of file addition failure on the original file and improve the processing performance of a storage cluster.
In order to achieve the purpose, the invention adopts the following technical scheme:
a file deduplication and additional uploading method comprises the following steps:
when the server receives an uploading request of a plurality of times of additional files, acquiring metadata of the existing additional files and carrying out parameter verification;
reading data according to the uploading request, and creating a stripe according to the data; the created strip is not filled with the last strip of the existing object;
calculating fingerprint information of the created stripe and generating a file organization record file of the additional data;
and when the addition fails, performing garbage data cleaning according to the file organization record file of the additional data.
Furthermore, when the metadata of the existing additional file is acquired, the existing additional file is locked, so that the file cannot be added by a plurality of uploading requests at the same time.
Further, the calculating creates fingerprint information of the stripe, generates a file organization record file of the additional data, and includes:
calculating fingerprint information according to the content of the created stripe;
inquiring the reference count of the corresponding strip according to the fingerprint information;
if the reference count exists, the reference count is cumulatively increased by 1; if the reference count does not exist, the fingerprint information is used for falling to the disc, and the corresponding reference count is set to be 1;
and adding the fingerprint information and the corresponding stripe size into a file organization record file of the additional data.
Further, when the appending fails, performing garbage data cleaning according to the file organization record file of the appended data, including:
when the addition fails, organizing the record file according to the additional data to obtain a stripe;
if the strip reference count is larger than 1, subtracting 1 from the strip reference count; and if the reference count of the strip is 1, adding the strip into a garbage recycling queue, and asynchronously deleting garbage data.
Further, the method also comprises the following steps:
when the data uploading is finished, organizing the record file according to the file of the existing file and the file of the additional data to generate a file organization record file of a new file;
calculating and generating fingerprint information and additional times of a new file according to etag of an existing file and fingerprint information of an additional data strip; generating etag of the new file according to the fingerprint information and the adding times of the new file;
establishing a logic head object based on the etag of the new file, and storing the metadata of the new file;
and establishing an index relation between the file name and the etag of the new file.
Further, still include:
when the server receives a first additional uploading request of the deleted file, the metadata is established to indicate that the file is an additional file, and the number of times of adding the file is set to be 1.
The invention also provides a file deduplication, addition and uploading system, which comprises:
the request receiving unit is used for receiving an uploading request of a file, acquiring metadata of an existing additional file and verifying parameters;
the stripe creating unit is used for reading data according to the uploading request and creating a stripe according to the data; the created strip is not filled with the last strip of the existing object;
the fingerprint checking unit is used for calculating fingerprint information of the created stripe and generating a file organization record file of the additional data;
and a garbage cleaning unit for performing garbage data cleaning according to the file organization record file of the additional data when the addition fails.
Further, the fingerprint checking unit calculates fingerprint information according to the content of the created stripe;
inquiring the reference count of the corresponding strip according to the fingerprint information;
if the reference count exists, the reference count is cumulatively increased by 1; if the reference count does not exist, the fingerprint information is used for falling into the disk, and the corresponding reference count is set to be 1;
and adding the fingerprint information and the corresponding stripe size into a file organization record file of the additional data.
The invention also provides a device for deleting, adding and uploading the file, which comprises the following components:
a memory for storing a computer program;
and the processor is used for realizing the steps of the file deleting, adding and uploading method when the computer program is executed.
The present invention further provides a storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the file deduplication and additional uploading method as described above.
The invention has the beneficial effects that:
the invention provides a file deduplication, addition and uploading method, a system, equipment and a storage medium, on the basis of deduplication based on a distributed object storage system, when data is added, strips are not combined and aligned, new additional data cannot interfere with existing data, and on the premise that a data strip is quoted by a plurality of files in a deduplication scene, the influence of failure in an addition process on an original file is avoided, the function of file deduplication and addition is realized, and the processing performance of the business logic of an object storage cluster is improved.
Drawings
FIG. 1 is a flow chart of a method for uploading deleted and added files in an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a specific process of calculating fingerprint information of a created stripe and generating a file organization record file of additional data according to an embodiment of the present invention;
FIG. 3 is a flow chart of a processing method for appending failures according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a system for uploading deleted files and added files according to an embodiment of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, specific example components and arrangements are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
As shown in fig. 1, an embodiment of the present invention discloses a file deduplication and additional uploading method, including:
when the server receives an uploading request of a plurality of times of additional files, acquiring metadata of the existing additional files and carrying out parameter verification;
reading data according to the uploading request, and creating a stripe according to the data; the created strip is not filled with the last strip of the existing object;
calculating fingerprint information of the created stripe, and generating a file organization record file of the additional data;
and when the addition fails, performing garbage data cleaning according to the file organization record file of the additional data.
Specifically, when receiving an upload request from a user, the server first parses relevant parameters (e.g., user information, bucket, file name of the upload, additional information, additional location, etc.) therein, and determines the type of the upload request:
1) judging whether the request is an additional write request, if not, uploading service processing according to a common file;
2) if the request is an additional writing request, judging whether a deduplication function is started or not; if the deduplication function is not started, adding and uploading service processing according to the common file;
3) if the deduplication function is started, judging whether the uploading request is added for the first time; if the file is the first additional uploading request, establishing metadata to mark the file as an additional file, and setting the number of times of adding the file as 1;
4) and if the request is the uploading request of adding the file for multiple times, acquiring the metadata of the existing file and checking the parameters. The metadata comprises information such as file organization record files of existing files, sizes of the existing files and the like; the parameter verification comprises the verification of the position of the appended file and the like. Meanwhile, the existing additional file is locked, so that the file cannot be added by a plurality of uploading requests at the same time.
The specific process of calculating the fingerprint information of the created stripe and generating the file organization record file of the appended data is shown in fig. 2, and includes:
calculating fingerprint information according to the content of the created stripe;
inquiring the reference count of the corresponding strip according to the fingerprint information; specifically, inquiring in the storage pool according to the storage strategy;
if the reference count exists, the reference count is cumulatively increased by 1; if the reference count does not exist, using the fingerprint information as a unique mark (oid) of the strip for landing, and setting the corresponding reference count to be 1;
and adding the fingerprint information and the corresponding stripe size into a file organization record file of the additional data.
At the data uploading finishing stage, organizing the record file according to the file of the existing file and the file of the additional data, and combining to generate a file organization record file of a new file; deleting the file organization record file of the existing file;
calculating fingerprint information and adding times of generating a new file according to etag of an existing file and fingerprint information of an added data strip, wherein the adding times can be in an md5 format; generating etag of the new file according to the fingerprint information and the adding times of the new file;
establishing a logic head object based on the etag of the new file, and storing the metadata of the new file;
and finally, establishing an index relationship between the file name and the new file etag, and finishing the file deletion, addition and uploading.
When the addition fails, the processing flow is as shown in fig. 3, and the garbage data cleaning is performed according to the file organization record file of the additional data, including:
organizing a record file according to the added data to obtain a strip;
if the strip reference count is greater than 1, subtracting 1 from the strip reference count;
if the reference count of the stripe is 1, the stripe is added into a garbage collection queue, garbage data is asynchronously deleted, and due to the fact that no merging alignment exists, new additional data cannot be interfered with existing data when being added or deleted.
As shown in fig. 4, an embodiment of the present invention further provides a file deduplication additional uploading system, including:
the request receiving unit is used for receiving an uploading request of a file, acquiring metadata of an existing additional file and verifying parameters;
the stripe creating unit is used for reading data according to the uploading request and creating a stripe according to the data; the created strip is not filled with the last strip of the existing object;
the fingerprint checking unit is used for calculating fingerprint information of the created stripe and generating a file organization record file of the additional data;
and a garbage cleaning unit for performing garbage data cleaning according to the file organization record file of the additional data when the addition fails.
The fingerprint checking unit calculates fingerprint information according to the content of the created strip;
inquiring the reference count of the corresponding strip according to the fingerprint information;
if the reference count exists, the reference count is cumulatively increased by 1; if the reference count does not exist, the fingerprint information is used for falling into the disk, and the corresponding reference count is set to be 1;
and adding the fingerprint information and the corresponding stripe size into a file organization record file of the additional data.
The embodiment of the invention also provides a device for deleting, adding and uploading the file, which comprises the following steps:
a memory for storing a computer program;
and the processor is used for realizing the steps of the file deduplication and additional uploading method when the computer program is executed.
The embodiment of the present invention further provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the steps of the above method for deleting the file and uploading the file are implemented.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, the scope of the present invention is not limited thereto. Various modifications and alterations will occur to those skilled in the art based on the foregoing description. This need not be, nor should it be exhaustive of all embodiments. On the basis of the technical scheme of the invention, various modifications or changes which can be made by a person skilled in the art without creative efforts are still within the protection scope of the invention.
Claims (8)
1. A file deduplication and additional uploading method is characterized by comprising the following steps:
when the server receives an uploading request of a plurality of times of additional files, acquiring metadata of the existing additional files and carrying out parameter verification;
reading data according to the uploading request, and creating a stripe according to the data; the created strip is not filled with the last strip of the existing object;
calculating fingerprint information of the created stripe and generating a file organization record file of the additional data;
when the addition fails, performing junk data cleaning according to the file organization record file of the additional data;
the calculating creates fingerprint information of the stripe, generates a file organization record file of the additional data, and comprises:
calculating fingerprint information according to the content of the created stripe;
inquiring the reference count of the corresponding strip according to the fingerprint information;
if the reference count exists, the reference count is cumulatively increased by 1; if the reference count does not exist, the fingerprint information is used for falling into the disk, and the corresponding reference count is set to be 1;
and adding the fingerprint information and the corresponding stripe size into a file organization record file of the additional data.
2. The method of claim 1, wherein the obtaining of the metadata of the existing additional file locks the existing additional file so that the file cannot be added by multiple upload requests at the same time.
3. The file deduplication and addition uploading method of claim 1, wherein when the addition fails, performing garbage data cleaning according to a file organization record file of the additional data comprises:
when the addition fails, organizing the record file according to the additional data to obtain a stripe;
if the strip reference count is greater than 1, subtracting 1 from the strip reference count; and if the reference count of the strip is 1, adding the strip into a garbage recycling queue, and asynchronously deleting garbage data.
4. The file deduplication and upload method of claim 1, further comprising:
when the data uploading is finished, organizing the record file according to the file of the existing file and the file of the additional data to generate a file organization record file of a new file;
calculating and generating fingerprint information and additional times of a new file according to etag of an existing file and fingerprint information of an additional data strip; generating etag of the new file according to the fingerprint information and the adding times of the new file;
establishing a logic head object based on the etag of the new file, and saving the metadata of the new file;
and establishing an index relation between the file name and the etag of the new file.
5. The file deduplication and upload method of claim 1, further comprising:
when the server receives a first additional uploading request of the deleted file, the metadata is established to indicate that the file is an additional file, and the number of times of adding the file is set to be 1.
6. A file deduplication, addition and uploading system is characterized by comprising:
the request receiving unit is used for receiving an uploading request of a file, acquiring metadata of an existing additional file and verifying parameters;
the stripe creating unit is used for reading data according to the uploading request and creating a stripe according to the data; the created strip is not filled with the last strip of the existing object;
the fingerprint checking unit is used for calculating fingerprint information of the created stripe and generating a file organization record file of the additional data;
the garbage cleaning unit is used for cleaning garbage data according to the file organization record file of the added data when the addition fails;
the fingerprint checking unit calculates fingerprint information according to the content of the created strip;
inquiring the reference count of the corresponding strip according to the fingerprint information;
if the reference count exists, the reference count is cumulatively increased by 1; if the reference count does not exist, the fingerprint information is used for falling into the disk, and the corresponding reference count is set to be 1;
and adding the fingerprint information and the corresponding stripe size into a file organization record file of the additional data.
7. An apparatus for file deduplication, append and upload, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the file deduplication upload method according to any one of claims 1 to 5 when executing the computer program.
8. A storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the file deduplication supplemental upload method of any of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010925859.9A CN111970381B (en) | 2020-09-06 | 2020-09-06 | File deduplication, addition and uploading method, system, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010925859.9A CN111970381B (en) | 2020-09-06 | 2020-09-06 | File deduplication, addition and uploading method, system, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111970381A CN111970381A (en) | 2020-11-20 |
CN111970381B true CN111970381B (en) | 2022-06-21 |
Family
ID=73392276
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010925859.9A Active CN111970381B (en) | 2020-09-06 | 2020-09-06 | File deduplication, addition and uploading method, system, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111970381B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110381128A (en) * | 2019-07-08 | 2019-10-25 | 紫光云技术有限公司 | A kind of method for uploading and cloud storage model suitable for files in stream media |
CN110399348A (en) * | 2019-07-19 | 2019-11-01 | 苏州浪潮智能科技有限公司 | File deletes method, apparatus, system and computer readable storage medium again |
CN110505314A (en) * | 2019-09-26 | 2019-11-26 | 浪潮电子信息产业股份有限公司 | A kind of processing method concurrently adding upload request |
CN111177088A (en) * | 2019-12-29 | 2020-05-19 | 北京浪潮数据技术有限公司 | Data deduplication method and device, electronic equipment and storage medium |
-
2020
- 2020-09-06 CN CN202010925859.9A patent/CN111970381B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110381128A (en) * | 2019-07-08 | 2019-10-25 | 紫光云技术有限公司 | A kind of method for uploading and cloud storage model suitable for files in stream media |
CN110399348A (en) * | 2019-07-19 | 2019-11-01 | 苏州浪潮智能科技有限公司 | File deletes method, apparatus, system and computer readable storage medium again |
CN110505314A (en) * | 2019-09-26 | 2019-11-26 | 浪潮电子信息产业股份有限公司 | A kind of processing method concurrently adding upload request |
CN111177088A (en) * | 2019-12-29 | 2020-05-19 | 北京浪潮数据技术有限公司 | Data deduplication method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111970381A (en) | 2020-11-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6778795B2 (en) | Methods, devices and systems for storing data | |
US9971796B2 (en) | Object storage using multiple dimensions of object information | |
US11093387B1 (en) | Garbage collection based on transmission object models | |
US8214377B2 (en) | Method, system, and program for managing groups of objects when there are different group types | |
CN109522283B (en) | Method and system for deleting repeated data | |
CN102339321A (en) | Network file system with version control and method using same | |
CN103714123A (en) | Methods for deleting duplicated data and controlling reassembly versions of cloud storage segmented objects of enterprise | |
CN103605585A (en) | Intelligent backup method based on data discovery | |
CN111651127A (en) | Monitoring data storage method and device based on shingled magnetic recording disk | |
CN111930716A (en) | Database capacity expansion method, device and system | |
US7895247B2 (en) | Tracking space usage in a database | |
CN111177143A (en) | Key value data storage method and device, storage medium and electronic equipment | |
CN107506466B (en) | Small file storage method and system | |
US11169977B2 (en) | System and method for removal of data and metadata using an enumerator | |
RU2665272C1 (en) | Method and apparatus for restoring deduplicated data | |
CN107169126A (en) | A kind of log processing method and relevant device | |
CN109144403B (en) | Method and equipment for switching cloud disk modes | |
CN113553325A (en) | Synchronization method and system for aggregation objects in object storage system | |
CN111970381B (en) | File deduplication, addition and uploading method, system, equipment and storage medium | |
CN102195936A (en) | Method and system for storing multimedia file and method and system for reading multimedia file | |
US20210097026A1 (en) | System and method for managing data using an enumerator | |
CN115114370B (en) | Master-slave database synchronization method and device, electronic equipment and storage medium | |
CN109213444A (en) | File memory method and device, storage medium, terminal | |
CN115756955A (en) | Data backup and data recovery method and device and computer equipment | |
CN109241011B (en) | Virtual machine file processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |