CN111970381B - File deduplication, addition and uploading method, system, equipment and storage medium - Google Patents

File deduplication, addition and uploading method, system, equipment and storage medium Download PDF

Info

Publication number
CN111970381B
CN111970381B CN202010925859.9A CN202010925859A CN111970381B CN 111970381 B CN111970381 B CN 111970381B CN 202010925859 A CN202010925859 A CN 202010925859A CN 111970381 B CN111970381 B CN 111970381B
Authority
CN
China
Prior art keywords
file
data
additional
fingerprint information
strip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010925859.9A
Other languages
Chinese (zh)
Other versions
CN111970381A (en
Inventor
李治鹏
陶桐桐
胡永刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010925859.9A priority Critical patent/CN111970381B/en
Publication of CN111970381A publication Critical patent/CN111970381A/en
Application granted granted Critical
Publication of CN111970381B publication Critical patent/CN111970381B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1748De-duplication implemented within the file system, e.g. based on file segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/561Adding application-functional data or data for application control, e.g. adding metadata

Abstract

The invention discloses a method, a system, equipment and a storage medium for uploading deleted files, added files and deleted files, wherein the method comprises the following steps: when the server receives an uploading request of the added file for multiple times, acquiring metadata of the added file, and performing parameter verification; reading data according to the uploading request, and creating a stripe according to the data; the created strip is not filled with the last strip of the existing object; calculating fingerprint information of the created stripe and generating a file organization record file of the additional data; and when the addition fails, performing garbage data cleaning according to the file organization record file of the additional data. The system comprises a request receiving unit, a stripe creating unit, a fingerprint checking unit and a garbage cleaning unit. When the data is added, the stripes are not combined and aligned, the new added data can not interfere with the existing data, and the processing performance of the storage cluster can be improved.

Description

File deduplication additional uploading method, system, equipment and storage medium
Technical Field
The invention relates to the field of object storage, in particular to a method, a system, equipment and a storage medium for deleting files, adding and uploading files again.
Background
With the development of informatization, more and more individual users select to transfer data originally stored in own digital equipment to cloud storage, so that family members can share the data with one another; and 4.0, the development of technologies such as industrial 4.0, intelligent manufacturing, enterprise cloud, big data, electronic government affairs, NASA satellite center, large radio telescope and the like, leads more and more enterprises and government units to choose to put data in storage clusters for centralized management.
A distributed object storage system refers to unstructured data oriented distributed storage. At present, more and more service scenes need to use a distributed object storage system, for example, in the fields of video monitoring, live video broadcast and the like, video data is continuously generated in real time, large system logs are continuously generated, the data volume in a storage cluster is larger and larger, operations such as read-write requests of files bring consumption of a large amount of Input/Output (IO) of a disk, and the IO throughput of the cluster is limited.
Due to the adoption of the data deduplication technology, the striped objects on the storage cluster are associated with different files, and when the addition fails, the generated garbage data needs to be recycled, so that the existing deduplication technology can affect the original files and the processing performance of the storage cluster.
Disclosure of Invention
In order to solve the technical problems, the invention provides a file deduplication, addition and uploading method, a system, equipment and a storage medium, which can avoid the influence of file addition failure on the original file and improve the processing performance of a storage cluster.
In order to achieve the purpose, the invention adopts the following technical scheme:
a file deduplication and additional uploading method comprises the following steps:
when the server receives an uploading request of a plurality of times of additional files, acquiring metadata of the existing additional files and carrying out parameter verification;
reading data according to the uploading request, and creating a stripe according to the data; the created strip is not filled with the last strip of the existing object;
calculating fingerprint information of the created stripe and generating a file organization record file of the additional data;
and when the addition fails, performing garbage data cleaning according to the file organization record file of the additional data.
Furthermore, when the metadata of the existing additional file is acquired, the existing additional file is locked, so that the file cannot be added by a plurality of uploading requests at the same time.
Further, the calculating creates fingerprint information of the stripe, generates a file organization record file of the additional data, and includes:
calculating fingerprint information according to the content of the created stripe;
inquiring the reference count of the corresponding strip according to the fingerprint information;
if the reference count exists, the reference count is cumulatively increased by 1; if the reference count does not exist, the fingerprint information is used for falling to the disc, and the corresponding reference count is set to be 1;
and adding the fingerprint information and the corresponding stripe size into a file organization record file of the additional data.
Further, when the appending fails, performing garbage data cleaning according to the file organization record file of the appended data, including:
when the addition fails, organizing the record file according to the additional data to obtain a stripe;
if the strip reference count is larger than 1, subtracting 1 from the strip reference count; and if the reference count of the strip is 1, adding the strip into a garbage recycling queue, and asynchronously deleting garbage data.
Further, the method also comprises the following steps:
when the data uploading is finished, organizing the record file according to the file of the existing file and the file of the additional data to generate a file organization record file of a new file;
calculating and generating fingerprint information and additional times of a new file according to etag of an existing file and fingerprint information of an additional data strip; generating etag of the new file according to the fingerprint information and the adding times of the new file;
establishing a logic head object based on the etag of the new file, and storing the metadata of the new file;
and establishing an index relation between the file name and the etag of the new file.
Further, still include:
when the server receives a first additional uploading request of the deleted file, the metadata is established to indicate that the file is an additional file, and the number of times of adding the file is set to be 1.
The invention also provides a file deduplication, addition and uploading system, which comprises:
the request receiving unit is used for receiving an uploading request of a file, acquiring metadata of an existing additional file and verifying parameters;
the stripe creating unit is used for reading data according to the uploading request and creating a stripe according to the data; the created strip is not filled with the last strip of the existing object;
the fingerprint checking unit is used for calculating fingerprint information of the created stripe and generating a file organization record file of the additional data;
and a garbage cleaning unit for performing garbage data cleaning according to the file organization record file of the additional data when the addition fails.
Further, the fingerprint checking unit calculates fingerprint information according to the content of the created stripe;
inquiring the reference count of the corresponding strip according to the fingerprint information;
if the reference count exists, the reference count is cumulatively increased by 1; if the reference count does not exist, the fingerprint information is used for falling into the disk, and the corresponding reference count is set to be 1;
and adding the fingerprint information and the corresponding stripe size into a file organization record file of the additional data.
The invention also provides a device for deleting, adding and uploading the file, which comprises the following components:
a memory for storing a computer program;
and the processor is used for realizing the steps of the file deleting, adding and uploading method when the computer program is executed.
The present invention further provides a storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the file deduplication and additional uploading method as described above.
The invention has the beneficial effects that:
the invention provides a file deduplication, addition and uploading method, a system, equipment and a storage medium, on the basis of deduplication based on a distributed object storage system, when data is added, strips are not combined and aligned, new additional data cannot interfere with existing data, and on the premise that a data strip is quoted by a plurality of files in a deduplication scene, the influence of failure in an addition process on an original file is avoided, the function of file deduplication and addition is realized, and the processing performance of the business logic of an object storage cluster is improved.
Drawings
FIG. 1 is a flow chart of a method for uploading deleted and added files in an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a specific process of calculating fingerprint information of a created stripe and generating a file organization record file of additional data according to an embodiment of the present invention;
FIG. 3 is a flow chart of a processing method for appending failures according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a system for uploading deleted files and added files according to an embodiment of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, specific example components and arrangements are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
As shown in fig. 1, an embodiment of the present invention discloses a file deduplication and additional uploading method, including:
when the server receives an uploading request of a plurality of times of additional files, acquiring metadata of the existing additional files and carrying out parameter verification;
reading data according to the uploading request, and creating a stripe according to the data; the created strip is not filled with the last strip of the existing object;
calculating fingerprint information of the created stripe, and generating a file organization record file of the additional data;
and when the addition fails, performing garbage data cleaning according to the file organization record file of the additional data.
Specifically, when receiving an upload request from a user, the server first parses relevant parameters (e.g., user information, bucket, file name of the upload, additional information, additional location, etc.) therein, and determines the type of the upload request:
1) judging whether the request is an additional write request, if not, uploading service processing according to a common file;
2) if the request is an additional writing request, judging whether a deduplication function is started or not; if the deduplication function is not started, adding and uploading service processing according to the common file;
3) if the deduplication function is started, judging whether the uploading request is added for the first time; if the file is the first additional uploading request, establishing metadata to mark the file as an additional file, and setting the number of times of adding the file as 1;
4) and if the request is the uploading request of adding the file for multiple times, acquiring the metadata of the existing file and checking the parameters. The metadata comprises information such as file organization record files of existing files, sizes of the existing files and the like; the parameter verification comprises the verification of the position of the appended file and the like. Meanwhile, the existing additional file is locked, so that the file cannot be added by a plurality of uploading requests at the same time.
The specific process of calculating the fingerprint information of the created stripe and generating the file organization record file of the appended data is shown in fig. 2, and includes:
calculating fingerprint information according to the content of the created stripe;
inquiring the reference count of the corresponding strip according to the fingerprint information; specifically, inquiring in the storage pool according to the storage strategy;
if the reference count exists, the reference count is cumulatively increased by 1; if the reference count does not exist, using the fingerprint information as a unique mark (oid) of the strip for landing, and setting the corresponding reference count to be 1;
and adding the fingerprint information and the corresponding stripe size into a file organization record file of the additional data.
At the data uploading finishing stage, organizing the record file according to the file of the existing file and the file of the additional data, and combining to generate a file organization record file of a new file; deleting the file organization record file of the existing file;
calculating fingerprint information and adding times of generating a new file according to etag of an existing file and fingerprint information of an added data strip, wherein the adding times can be in an md5 format; generating etag of the new file according to the fingerprint information and the adding times of the new file;
establishing a logic head object based on the etag of the new file, and storing the metadata of the new file;
and finally, establishing an index relationship between the file name and the new file etag, and finishing the file deletion, addition and uploading.
When the addition fails, the processing flow is as shown in fig. 3, and the garbage data cleaning is performed according to the file organization record file of the additional data, including:
organizing a record file according to the added data to obtain a strip;
if the strip reference count is greater than 1, subtracting 1 from the strip reference count;
if the reference count of the stripe is 1, the stripe is added into a garbage collection queue, garbage data is asynchronously deleted, and due to the fact that no merging alignment exists, new additional data cannot be interfered with existing data when being added or deleted.
As shown in fig. 4, an embodiment of the present invention further provides a file deduplication additional uploading system, including:
the request receiving unit is used for receiving an uploading request of a file, acquiring metadata of an existing additional file and verifying parameters;
the stripe creating unit is used for reading data according to the uploading request and creating a stripe according to the data; the created strip is not filled with the last strip of the existing object;
the fingerprint checking unit is used for calculating fingerprint information of the created stripe and generating a file organization record file of the additional data;
and a garbage cleaning unit for performing garbage data cleaning according to the file organization record file of the additional data when the addition fails.
The fingerprint checking unit calculates fingerprint information according to the content of the created strip;
inquiring the reference count of the corresponding strip according to the fingerprint information;
if the reference count exists, the reference count is cumulatively increased by 1; if the reference count does not exist, the fingerprint information is used for falling into the disk, and the corresponding reference count is set to be 1;
and adding the fingerprint information and the corresponding stripe size into a file organization record file of the additional data.
The embodiment of the invention also provides a device for deleting, adding and uploading the file, which comprises the following steps:
a memory for storing a computer program;
and the processor is used for realizing the steps of the file deduplication and additional uploading method when the computer program is executed.
The embodiment of the present invention further provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the steps of the above method for deleting the file and uploading the file are implemented.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, the scope of the present invention is not limited thereto. Various modifications and alterations will occur to those skilled in the art based on the foregoing description. This need not be, nor should it be exhaustive of all embodiments. On the basis of the technical scheme of the invention, various modifications or changes which can be made by a person skilled in the art without creative efforts are still within the protection scope of the invention.

Claims (8)

1. A file deduplication and additional uploading method is characterized by comprising the following steps:
when the server receives an uploading request of a plurality of times of additional files, acquiring metadata of the existing additional files and carrying out parameter verification;
reading data according to the uploading request, and creating a stripe according to the data; the created strip is not filled with the last strip of the existing object;
calculating fingerprint information of the created stripe and generating a file organization record file of the additional data;
when the addition fails, performing junk data cleaning according to the file organization record file of the additional data;
the calculating creates fingerprint information of the stripe, generates a file organization record file of the additional data, and comprises:
calculating fingerprint information according to the content of the created stripe;
inquiring the reference count of the corresponding strip according to the fingerprint information;
if the reference count exists, the reference count is cumulatively increased by 1; if the reference count does not exist, the fingerprint information is used for falling into the disk, and the corresponding reference count is set to be 1;
and adding the fingerprint information and the corresponding stripe size into a file organization record file of the additional data.
2. The method of claim 1, wherein the obtaining of the metadata of the existing additional file locks the existing additional file so that the file cannot be added by multiple upload requests at the same time.
3. The file deduplication and addition uploading method of claim 1, wherein when the addition fails, performing garbage data cleaning according to a file organization record file of the additional data comprises:
when the addition fails, organizing the record file according to the additional data to obtain a stripe;
if the strip reference count is greater than 1, subtracting 1 from the strip reference count; and if the reference count of the strip is 1, adding the strip into a garbage recycling queue, and asynchronously deleting garbage data.
4. The file deduplication and upload method of claim 1, further comprising:
when the data uploading is finished, organizing the record file according to the file of the existing file and the file of the additional data to generate a file organization record file of a new file;
calculating and generating fingerprint information and additional times of a new file according to etag of an existing file and fingerprint information of an additional data strip; generating etag of the new file according to the fingerprint information and the adding times of the new file;
establishing a logic head object based on the etag of the new file, and saving the metadata of the new file;
and establishing an index relation between the file name and the etag of the new file.
5. The file deduplication and upload method of claim 1, further comprising:
when the server receives a first additional uploading request of the deleted file, the metadata is established to indicate that the file is an additional file, and the number of times of adding the file is set to be 1.
6. A file deduplication, addition and uploading system is characterized by comprising:
the request receiving unit is used for receiving an uploading request of a file, acquiring metadata of an existing additional file and verifying parameters;
the stripe creating unit is used for reading data according to the uploading request and creating a stripe according to the data; the created strip is not filled with the last strip of the existing object;
the fingerprint checking unit is used for calculating fingerprint information of the created stripe and generating a file organization record file of the additional data;
the garbage cleaning unit is used for cleaning garbage data according to the file organization record file of the added data when the addition fails;
the fingerprint checking unit calculates fingerprint information according to the content of the created strip;
inquiring the reference count of the corresponding strip according to the fingerprint information;
if the reference count exists, the reference count is cumulatively increased by 1; if the reference count does not exist, the fingerprint information is used for falling into the disk, and the corresponding reference count is set to be 1;
and adding the fingerprint information and the corresponding stripe size into a file organization record file of the additional data.
7. An apparatus for file deduplication, append and upload, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the file deduplication upload method according to any one of claims 1 to 5 when executing the computer program.
8. A storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the file deduplication supplemental upload method of any of claims 1 to 5.
CN202010925859.9A 2020-09-06 2020-09-06 File deduplication, addition and uploading method, system, equipment and storage medium Active CN111970381B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010925859.9A CN111970381B (en) 2020-09-06 2020-09-06 File deduplication, addition and uploading method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010925859.9A CN111970381B (en) 2020-09-06 2020-09-06 File deduplication, addition and uploading method, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111970381A CN111970381A (en) 2020-11-20
CN111970381B true CN111970381B (en) 2022-06-21

Family

ID=73392276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010925859.9A Active CN111970381B (en) 2020-09-06 2020-09-06 File deduplication, addition and uploading method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111970381B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110381128A (en) * 2019-07-08 2019-10-25 紫光云技术有限公司 A kind of method for uploading and cloud storage model suitable for files in stream media
CN110399348A (en) * 2019-07-19 2019-11-01 苏州浪潮智能科技有限公司 File deletes method, apparatus, system and computer readable storage medium again
CN110505314A (en) * 2019-09-26 2019-11-26 浪潮电子信息产业股份有限公司 A kind of processing method concurrently adding upload request
CN111177088A (en) * 2019-12-29 2020-05-19 北京浪潮数据技术有限公司 Data deduplication method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110381128A (en) * 2019-07-08 2019-10-25 紫光云技术有限公司 A kind of method for uploading and cloud storage model suitable for files in stream media
CN110399348A (en) * 2019-07-19 2019-11-01 苏州浪潮智能科技有限公司 File deletes method, apparatus, system and computer readable storage medium again
CN110505314A (en) * 2019-09-26 2019-11-26 浪潮电子信息产业股份有限公司 A kind of processing method concurrently adding upload request
CN111177088A (en) * 2019-12-29 2020-05-19 北京浪潮数据技术有限公司 Data deduplication method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111970381A (en) 2020-11-20

Similar Documents

Publication Publication Date Title
JP6778795B2 (en) Methods, devices and systems for storing data
US9971796B2 (en) Object storage using multiple dimensions of object information
US11093387B1 (en) Garbage collection based on transmission object models
US8214377B2 (en) Method, system, and program for managing groups of objects when there are different group types
CN109522283B (en) Method and system for deleting repeated data
CN102339321A (en) Network file system with version control and method using same
CN103714123A (en) Methods for deleting duplicated data and controlling reassembly versions of cloud storage segmented objects of enterprise
CN103605585A (en) Intelligent backup method based on data discovery
CN111651127A (en) Monitoring data storage method and device based on shingled magnetic recording disk
CN111930716A (en) Database capacity expansion method, device and system
US7895247B2 (en) Tracking space usage in a database
CN111177143A (en) Key value data storage method and device, storage medium and electronic equipment
CN107506466B (en) Small file storage method and system
US11169977B2 (en) System and method for removal of data and metadata using an enumerator
RU2665272C1 (en) Method and apparatus for restoring deduplicated data
CN107169126A (en) A kind of log processing method and relevant device
CN109144403B (en) Method and equipment for switching cloud disk modes
CN113553325A (en) Synchronization method and system for aggregation objects in object storage system
CN111970381B (en) File deduplication, addition and uploading method, system, equipment and storage medium
CN102195936A (en) Method and system for storing multimedia file and method and system for reading multimedia file
US20210097026A1 (en) System and method for managing data using an enumerator
CN115114370B (en) Master-slave database synchronization method and device, electronic equipment and storage medium
CN109213444A (en) File memory method and device, storage medium, terminal
CN115756955A (en) Data backup and data recovery method and device and computer equipment
CN109241011B (en) Virtual machine file processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant