WO2018214905A1 - 一种数据存储的方法、装置、介质及设备 - Google Patents
一种数据存储的方法、装置、介质及设备 Download PDFInfo
- Publication number
- WO2018214905A1 WO2018214905A1 PCT/CN2018/087991 CN2018087991W WO2018214905A1 WO 2018214905 A1 WO2018214905 A1 WO 2018214905A1 CN 2018087991 W CN2018087991 W CN 2018087991W WO 2018214905 A1 WO2018214905 A1 WO 2018214905A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- file
- identifier
- uploaded
- information
- file identifier
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/06—Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1004—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1008—Server selection for load balancing based on parameters of servers, e.g. available memory or workload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/568—Storing data temporarily at an intermediate stage, e.g. caching
- H04L67/5682—Policies or rules for updating, deleting or replacing the stored data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/60—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
- H04L67/63—Routing a service request depending on the request content or context
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/32—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
- H04L9/3236—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
- H04L9/3239—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions involving non-keyed hash functions, e.g. modification detection codes [MDCs], MD5, SHA or RIPEMD
Definitions
- This document relates to, but is not limited to, the field of cloud storage technologies, and in particular, to a data storage method and apparatus, medium and device.
- the Erasure Coding algorithm is used to encode the file content, and when the file content is damaged, the code content can be used to reply the file content, thereby reducing the cold data. Stores the storage space occupied by the copy.
- an embodiment of the present invention provides a data storage method and device, a medium, and a device.
- Step 1 group the file storage areas, and set the storage capacity of each group
- Step 2 Receive a file uploading request sent by the client, where the file uploading request includes a file identifier of the file to be uploaded and a check block of the file to be uploaded calculated according to a preset algorithm; and determining information of the file to be uploaded. And including a file identifier of the file to be uploaded, original data of the file to be uploaded, and the check block;
- Step 3 Determine the target grouping, including: determining a file identifier range of each group, and according to the file identifier of the file to be uploaded and the file identifier range of each group, the file identifier range includes the group of the file identifier of the file to be uploaded and a packet whose remaining storage amount is not less than the storage amount occupied by the file information to be uploaded as a target packet;
- Step 4 Store information of the file to be uploaded to the target group.
- the determining the file identifier range of each group includes: determining a coverage range of all file identifiers included in each group, a file identifier range of the group not storing any information, and a file identifier range of the group storing only one file identifier being the largest range.
- the step 3 further includes: in the group whose file identifier range is not the maximum range, if the file identifier range of each group does not cover the file identifier of the file to be uploaded, or only one of the groups in the group If the file identification range covers the file identifier of the file to be uploaded and the remaining storage amount of the packet is smaller than the storage amount of the file to be uploaded, the file information is stored and the remaining storage amount is not less than the file information to be uploaded. a packet of the occupied storage amount as the target packet;
- the packet in which the file information is not stored is taken as the target packet.
- the step 3 further includes: if the file identifier range in the plurality of packets in the packet covers the file identifier of the file to be uploaded and is not the maximum range, and the remaining of each of the plurality of packets
- the storage amount is smaller than the storage amount occupied by the information to be uploaded, and a new packet with the same total number of the plurality of packets is newly created, and the files stored in the original plurality of packets and the files to be uploaded are sequentially sorted according to the file identifier. Then, the packet corresponding to the location of the file to be uploaded is determined, and the packet is used as the target packet, and each file stored in the original plurality of packets is stored in the new packet, and the original plurality of packets are deleted.
- the step 4 further includes: when the file information in the plurality of packets and the file information of the file to be uploaded are stored in the plurality of target packets, the file information to be stored in the plurality of target packets is preset according to the preset The sorting manner of the file identifiers is sequentially stored in the plurality of target groups, and the file information stored in the plurality of groups is deleted;
- the method of determining the target packet in the step 3 is re-determined to store the target of the file information that cannot be stored in the target packet. Grouping.
- the method further includes: periodically checking whether there is a damaged file in the file storage area according to a preset time period, and if there is a damaged file, using the check block to use the check block to the damaged file according to a preset algorithm.
- Raw data recovery
- the checking, by the preset time period, whether there is a damaged file in the file storage area comprises: calculating, according to a preset time period, the corresponding file identifier according to the original data of all the files in the file storage area, if the calculation If the obtained file identifier is inconsistent with the stored file identifier, it is determined that the file is damaged.
- the method further includes: receiving a file download request sent by the client, where the file download request includes a file identifier of the file to be downloaded, and searching and grouping the file identifier in the file identifier range that covers the file identifier of the file to be downloaded If the file identifier of the file to be downloaded is the same as the file identifier of the file to be downloaded, the file identifier of the file to be downloaded is the same as the file identifier of the file to be downloaded. And sending the original data of the file corresponding to the file identifier of the file to be downloaded to the client.
- the method further includes: receiving a file deletion request sent by the client, where the file deletion request includes a file identifier of the file to be deleted, and searching and grouping the file identifier in the file identifier range that covers the file identifier of the file to be deleted If the file identifier of the file to be deleted is the same as the file identifier of the file to be deleted, the file identifier of the file to be deleted is the same as the file identifier of the file to be deleted.
- file attribute includes the file identifier, and includes but is not limited to at least one of the following information: Name, file deletion time, file upload time, and file storage address.
- the file identifier is a unique identifier of a file obtained by calculating a file original data by using a preset encryption algorithm.
- a grouping module configured to group file storage areas and set a storage capacity of each group
- the upload request receiving module is configured to receive a file uploading request sent by the client, where the file uploading request includes a file identifier of the file to be uploaded and a check block of the file to be uploaded calculated according to a preset algorithm;
- a file information determining module configured to determine information about the file to be uploaded, including a file identifier of the file to be uploaded, original data of the file to be uploaded, and the check block;
- the target grouping determining module configured to determine the target grouping, includes: determining a file identifier range of each group, and including, according to the file identifier of the file to be uploaded and the file identifier range of each group, the file identifier range including the file to be uploaded a packet of the identified packet and having a remaining storage amount not less than the storage amount occupied by the file information to be uploaded as a target packet;
- the upload file management module is configured to store the information of the file to be uploaded to the target group.
- the above device also has the following features:
- the target grouping determining module includes a file identification range determining unit configured to determine a coverage range of all file identifiers included in each group, a file identification range of a group in which no information is stored, and a group file in which only one file identifier is stored.
- the identification range is the maximum range.
- the above device also has the following features:
- the target grouping determining module is further configured to: in the group whose file identifier range is not the maximum range, the file identifier range of each group does not cover the file identifier of the file to be uploaded, or only one packet
- the file identifier range of the file to be uploaded includes the file identifier of the file to be uploaded, and the remaining storage amount of the packet is smaller than the storage amount of the file to be uploaded, and the file information is stored and the remaining storage amount is not less than the to-be-uploaded a packet of the amount of storage occupied by the file information as the target packet;
- the packet in which the file information is not stored is taken as the target packet.
- the above device also has the following features:
- the target grouping determining module is further configured to: in a plurality of the packets, the file identification range in the group covers the file identifier of the file to be uploaded and is not the maximum range and each of the plurality of groups
- the remaining storage amount is less than the storage amount occupied by the information to be uploaded, a new packet with the same total number of the plurality of packets is newly created, and each file stored in the original plurality of packets and the file to be uploaded are filed according to the file.
- the identifiers are sequentially sorted to determine the group corresponding to the location of the file to be uploaded, and the packet is used as the target group, and each file stored in the original plurality of packets is stored in the new packet, and the original plurality of packets are deleted.
- the above device also has the following features:
- the upload file management module is further configured to: when the file information in the plurality of packets and the file information of the file to be uploaded are stored together into the plurality of target packets, the file information to be stored in the plurality of target packets And sequentially storing the plurality of target packets according to a sorting manner of the preset file identifier, and deleting file information stored in the plurality of packets;
- the target packet determination module re-determines the target packet storing the file information that cannot be stored in the target packet.
- the above device also has the following features:
- the device further includes an inspection module configured to periodically check whether there is a damaged file in the file storage area according to a preset time period, and if there is a damaged file, use the check block to use the check block according to a preset algorithm. The original data of the damaged file is restored.
- the above device also has the following features:
- the checking module includes a calculating unit, configured to calculate, according to a preset time period, a corresponding file identifier according to original data of all files in the file storage area, if the calculated file identifier and the stored file identifier are calculated If it is inconsistent, it is judged that the file is damaged.
- the above device also has the following features:
- the device further includes a download management module, configured to receive a file download request sent by the client, where the file download request includes a file identifier of the file to be downloaded, and the file identifier range covers the file identifier of the file to be downloaded Searching for the same file identifier as the file identifier of the file to be downloaded, and if there is no file identifier identical to the file identifier of the file to be downloaded, searching for the file with the file to be downloaded in the remaining group The same file identifier is identified, and the original data of the file corresponding to the file identifier of the file to be downloaded is sent to the client.
- a download management module configured to receive a file download request sent by the client, where the file download request includes a file identifier of the file to be downloaded, and the file identifier range covers the file identifier of the file to be downloaded Searching for the same file identifier as the file identifier of the file to be downloaded, and if there is no file identifier identical to the file identifier
- the above device also has the following features:
- the device further includes a deletion management module, configured to receive a file deletion request sent by the client, where the file deletion request includes a file identifier of the file to be deleted, and the file identification range covers the group of the file identifier of the file to be deleted. Searching for the same file identifier as the file identifier of the file to be deleted, and if there is no file identifier identical to the file identifier of the file to be deleted, searching for the file with the file to be deleted in the remaining group Identifying the same file identifier, deleting the file information corresponding to the file identifier of the file identifier of the file to be deleted, and deleting the deleted file attribute, where the file attribute includes the file identifier, including but not limited to the following information. At least one of: file name, file deletion time, file upload time, and file storage address.
- the above device also has the following features:
- the file identifier is a unique identifier of a file obtained by calculating a file original data by using a preset encryption algorithm.
- the computer readable storage medium provided by the embodiment of the present invention stores a computer program, and when the program is executed by the processor, the steps of the foregoing method are implemented.
- a computer device provided by an embodiment of the present invention includes a memory, a processor, and a computer program stored on the memory and operable on the processor, and the processor implements the steps of the foregoing method when the program is executed.
- the embodiment of the invention can improve the scientific management of the file and improve the storage speed.
- Embodiment 1 is a flowchart of a data storage method provided in Embodiment 1;
- Embodiment 2 is a schematic structural diagram of a data storage device provided in Embodiment 2.
- FIG. 1 is a flow chart showing a data storage method according to a first embodiment of the present invention. This method is typically applied to a server for storage. Referring to Figure 1, this method includes:
- Step 101 group file storage areas, and set a storage capacity of each group
- Step 102 Receive a file uploading request sent by the client, where the file uploading request includes a file identifier of the file to be uploaded and a check block of the file to be uploaded calculated according to a preset algorithm; and determining information of the file to be uploaded, including the file to be uploaded.
- Step 103 Determine a target group, including: determining a file identifier range of each group, and according to the file identifier of the file to be uploaded and the file identifier range of each group, the file identifier range includes the group of the file identifier of the file to be uploaded and the remaining storage amount is not a packet smaller than the storage amount occupied by the file information to be uploaded as a target packet;
- Step 104 Store information of the file to be uploaded to the target group.
- the file identifier is a unique identifier of a file obtained by calculating a file original data by using a preset encryption algorithm.
- a SHA1 value obtained by calculating a raw data content of the file by using a secure hash algorithm (SHA1) is a file.
- Unique identifier is a unique identifier.
- determining the file identifier range of each group includes: determining the coverage range of all file identifiers included in each group, the file identifier range of the group not storing any information, and the file identifier of the group storing only one file identifier.
- the range is the maximum range.
- the smallest file identifier is aaa
- the largest file identifier is bbb.
- the file identifier range of the group is aaa to bbb, and no file information is stored and only one file identifier is stored.
- the grouped file identification range is marked as MAX.
- the foregoing step 103 further includes: in the group whose file identifier range is not the maximum range, if the file identifier range of each group does not cover the file identifier of the file to be uploaded, or only the file identifier range in one group covers the file identifier of the file to be uploaded. And the remaining storage amount of the packet is smaller than the storage amount occupied by the file information to be uploaded, and the packet with the stored file information and the remaining storage amount not less than the storage amount occupied by the file information to be uploaded is used as the target packet;
- the packet in which the file information is not stored is taken as the target packet.
- the foregoing step 103 further includes: if the file identifier range in the plurality of packets in the packet covers the file identifier of the file to be uploaded and is not the maximum range, and the remaining storage amount of each of the plurality of packets is smaller than the file information to be uploaded.
- a new packet with the same total number of the plurality of packets is newly created, and the files corresponding to the locations of the files to be uploaded are determined by sequentially sorting the files stored in the original plurality of packets and the files to be uploaded according to the file identifier.
- This packet is grouped as a target, and each file stored in the original plurality of packets is stored in a new packet, and the original plurality of packets are deleted.
- the above step 104 further includes: when storing the file information in the plurality of packets together with the file information of the file to be uploaded into the plurality of target packets, sorting the file information of the plurality of target packets to be sorted according to the preset file identifier.
- the method is sequentially stored in a plurality of target packets, and the file information stored in the plurality of packets is deleted;
- the target packet storing the file information of the plurality of target packets cannot be determined by the method of determining the target packet in step 103.
- the file identifier of the file to be uploaded is abc123
- the file information occupied by the file information is 100 MB
- the file identification range of the first group is abc000 to abc300
- the file identifier range of the second group is abc100 to abc400. If the file identification range of the group has overlapping parts and can cover the file identifier of the file to be uploaded, and the remaining storage capacity of the two groups is less than 100 MB, the two packets are determined as the first storage in the packet in which no file information is stored.
- the preset file identifier is sorted from small to large, and the file information in the first group, the file information in the second group, and the file information of the file to be uploaded are sequentially stored into two target groups according to the file identifier from small to large.
- the file identification range of the two target groups has no overlapping portion. If the two target groups cannot accommodate all the file information to be stored, the above steps 103 are used to re-determine that the remaining two are not stored.
- the target grouping of file information for the target grouping is used to re-determine that the remaining two are not stored.
- each grouping When determining the target grouping, each grouping may be queried in a round-robin manner, or may be selected according to a preset grouping number, and the numbering order of the grouping represents a priority level when storing files to each group, when multiple groups satisfy the target grouping The condition of the file is stored in the group with the highest priority.
- the data storage method further includes: periodically checking whether there is a damaged file in the file storage area according to a preset time period (the value of the time period may be a system default fixed value or a manually set variable value), if present, if present.
- the damaged file is restored by using the check block to restore the original data of the damaged file according to a preset algorithm.
- the foregoing checking whether there is a damaged file in the file storage area according to a preset time period includes: calculating a corresponding file identifier according to the original data of all files in the file storage area periodically according to a preset time period, if the calculated file is obtained If the ID is inconsistent with the stored file ID, the file is judged to be corrupt.
- the data storage method further includes: receiving a file download request sent by the client, where the file download request includes a file identifier of the file to be downloaded, and searching for the same file identifier as the file to be downloaded in the group of the file identifier that covers the file identifier of the file to be downloaded. If the file identifier of the file to be downloaded is not the same as the file identifier of the file to be downloaded, the file identifier of the file with the file to be downloaded is the same as the file identifier of the file to be downloaded. The original data of the corresponding file is sent to the client.
- the data storage method further includes: receiving a file deletion request sent by the client, the file deletion request including the file identifier of the file to be deleted, and searching for the same file identifier as the file to be deleted in the group of the file identifier that covers the file identifier of the file to be deleted If the file identifier of the file to be deleted is not the same as the file identifier of the file to be deleted, the file identifier of the file with the file to be deleted is the same as the file identifier of the file to be deleted. The corresponding file information is deleted, and the deleted file attribute is recorded.
- the file attribute includes the file identifier, and includes but is not limited to at least one of the following information: a file name, a file deletion time, a file upload time, and a file storage address.
- FIG. 2 is a block diagram showing the structure of a data storage device according to a second embodiment of the present invention.
- the apparatus includes:
- the grouping module 201 is configured to group the file storage areas and set the storage capacity of each group;
- the upload request receiving module 202 is configured to receive a file upload request sent by the client, where the file upload request includes a file identifier of the file to be uploaded and a check block of the file to be uploaded calculated according to a preset algorithm;
- the file information determining module 203 is configured to determine information of the file to be uploaded, including the file identifier of the file to be uploaded, the original data of the file to be uploaded, and a check block.
- the target group determining module 204 is configured to determine the target group, including: determining a file identifier range of each group, and according to the file identifier of the file to be uploaded and the file identifier range of each group, the file identifier range includes the group of the file identifier of the file to be uploaded. And the remaining storage amount is larger than the storage amount occupied by the file information to be uploaded as the target group;
- the upload file management module 205 is configured to store information of the file to be uploaded to the target group.
- the target packet determining module 204 includes a file identification range determining unit 2041, configured to determine a coverage range of all file identifiers included in each packet, a file identification range of a packet in which no information is stored, and a file file in which only one file identifier is stored.
- the identification range is the maximum range.
- the target group determining module 204 is further configured to: in the group whose file identifier range is not the maximum range, if the file identifier range of each group does not cover the file identifier of the file to be uploaded, or only the file identifier range in one group covers the file to be uploaded. And the remaining storage amount of the packet is smaller than the storage amount of the file information to be uploaded, and the packet with the stored file information and the remaining storage amount larger than the storage amount occupied by the file information to be uploaded is used as the target group;
- the packet in which the file information is not stored is taken as the target packet.
- the target group determining module 204 is further configured to have a file identifier range in the plurality of packets in the packet that covers the file identifier of the file to be uploaded and is not the maximum range, and the remaining storage amount of each of the plurality of groups is smaller than the file to be uploaded.
- a new packet with the same total number of the plurality of packets is newly created, and the files that have been stored in the original plurality of packets and the files to be uploaded are sequentially sorted according to the file identifier, and the location of the file to be uploaded is determined.
- this packet is taken as a target packet, and each file stored in the original plurality of packets is stored in a new packet, and the original plurality of packets are deleted.
- the upload file management module 205 is further configured to: when the file information in the plurality of packets and the file information of the file to be uploaded are stored in the plurality of target packets, the file information to be stored in the plurality of target packets is according to the preset file.
- the sorting manner of the identifier is sequentially stored in a plurality of target groups, and the file information stored in the plurality of groups is deleted;
- the target packet determination module 204 When the total storage amount of the target packet is less than the total storage amount of the file information to be stored, the target packet determination module 204 newly determines the target packet storing the file information that cannot be stored in the target packet.
- the device further includes an checking module 206 configured to periodically check whether there is a damaged file in the file storage area according to a preset time period, and if there is a damaged file, use the check block to damage the file according to a preset algorithm.
- Raw data recovery configured to periodically check whether there is a damaged file in the file storage area according to a preset time period, and if there is a damaged file, use the check block to damage the file according to a preset algorithm.
- the checking module 206 includes a calculating unit 2061, configured to calculate a corresponding file identifier according to the original data of all files in the file storage area periodically according to a preset time period, and if the calculated file identifier is inconsistent with the stored file identifier, determine This file is corrupt.
- the device further includes a download management module 207, configured to receive a file download request sent by the client, where the file download request includes a file identifier of the file to be downloaded, and the file to be downloaded is found in the group of the file identifier that covers the file identifier of the file to be downloaded.
- the file identifier identifies the same file identifier. If there is no file identifier that is the same as the file identifier of the file to be downloaded, the file identifier of the same file identifier as the file to be downloaded is found in the remaining packets, and the file identifier of the file to be downloaded is The same file identifies the original data of the corresponding file and sends it to the client.
- the device further includes a deletion management module 208, configured to receive a file deletion request sent by the client, where the file deletion request includes a file identifier of the file to be deleted, and the file to be deleted is found in the group of the file identifier that covers the file identifier of the file to be deleted.
- the file identifier identifies the same file identifier. If there is no file identifier that is the same as the file identifier of the file to be deleted, the file identifier of the same file identifier as the file to be deleted is found in the remaining packets, and the file identifier of the file to be deleted is The file information corresponding to the same file identifier is deleted, and the deleted file attribute is recorded.
- the file attribute includes the file identifier, and includes but is not limited to at least one of the following information: file name, file deletion time, file upload time, and file storage address.
- the above file is identified as a unique identifier of a file calculated by using a preset encryption algorithm to calculate the original data of the file.
- the data storage method and device provided by the embodiments of the present invention manage the file storage space in a grouping manner.
- the location of the target file is determined according to the file identification range of the group, which is provided by using the embodiment of the present invention.
- the method and device can eliminate overlapping parts of the file identification range of multiple groups before uploading the file to the storage space, reduce the scope of the search, and improve the efficiency of the search.
- computer storage medium includes volatile and nonvolatile, implemented in any method or technology for storing information, such as computer readable instructions, data structures, program modules or other data. Sex, removable and non-removable media.
- Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridge, magnetic tape, magnetic disk storage or other magnetic storage device, or may Any other medium used to store the desired information and that can be accessed by the computer.
- communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and can include any information delivery media. .
- the data storage method and device provided by the embodiments of the present invention manage the file storage space in a grouping manner.
- the location of the target file is determined according to the file identification range of the group, which is provided by using the embodiment of the present invention.
- the method and device can eliminate overlapping parts of the file identification range of multiple groups before uploading the file to the storage space, reduce the scope of the search, and improve the efficiency of the search.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本文公开了一种数据存储方法及装置,此方法包括:对文件存储区域进行分组,设置每个分组的存储容量;接收客户端发送的文件上传请求,文件上传请求包括待上传文件的文件标识和根据预设算法计算出的所述待上传文件的校验块;确定待上传文件的信息,包括所述待上传文件的文件标识、所述待上传文件的原始数据和所述校验块;确定目标分组,包括:确定各分组的文件标识范围,根据所述待上传文件的文件标识和各分组的文件标识范围,将文件标识范围包含所述待上传文件的文件标识的分组并且剩余存储量不小于待上传文件信息占用的存储量的分组作为目标分组;将所述待上传文件的信息存储至所述目标分组。本文可以提高文件管理科学性,提高存储速度。
Description
本申请要求在2017年5月26日提交中国专利局、申请号为201710386030.4发明名称为“一种数据存储方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本文涉及但不限于涉及云存储技术领域,尤其涉及一种数据存储方法及装置、介质及设备。
现有的分布式数据存储系统一般采用固定的数据副本(如2副本或3副本),其主要缺点是对于数据存储区域中存储的访问次数较少的冷数据,需要同样占用多副本存储空间,使得无法腾出较多的存储空间给数据存储节点中存储的访问次数较多的热点数据使用,造成存储空间的有效利用率较低。
为了节省冷数据占用的存储空间,现有技术中,利用纠删码(Erasure Coding)算法对文件内容进行编码,当文件内容损坏时,可以利用该编码将文件内容回复,这样减少了冷数据因存储副本占用的存储空间。
当用户下载文件时,需要在存储空间中逐个查找存储的文件,导致系统资源被长时间占用,这样会对系统正常的操作产生一定的影响和干扰。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
为了解决上述技术问题,本发明实施例提供了一种数据存储方法及装置、介质及设备。
本发明实施例提供的数据存储方法包括:
步骤1,对文件存储区域进行分组,设置每个分组的存储容量;
步骤2,接收客户端发送的文件上传请求,所述文件上传请求包括待上传文件的文件标识和根据预设算法计算出的所述待上传文件的校验块;确定所述待上传文件的信息,包括所述待上传文件的文件标识、所述待上传文件的原始数据和所述校验块;
步骤3,确定目标分组,包括:确定各分组的文件标识范围,根据所述待上传文件的文件标识和各分组的文件标识范围,将文件标识范围包含所述待上传文件的文件标识的分组并且剩余存储量不小于待上传文件信息占用的存储量的分组作为目标分组;
步骤4,将所述待上传文件的信息存储至所述目标分组。
上述方法还具有以下特点:
所述确定各分组的文件标识范围包括:确定各分组中包含的所有文件标识的涵盖范围,未存入任何信息的分组的文件标识范围和只存入一个文件标识的分组的文件标识范围为最大范围。
上述方法还具有以下特点:
所述步骤3还包括:在所述文件标识范围不是所述最大范围的分组中,如果各分组的所述文件标识范围均不涵盖所述待上传文件的文件标识,或者只有一个分组内的所述文件标识范围涵盖所述待上传文件的文件标识且该分组的剩余存储量小于所述待上传文件信息占用的存储量,则将已存储文件信息且剩余存储量不小于所述待上传文件信息占用的存储量的分组作为所述目标分组;
如果已存储文件信息的每个分组的剩余存储量均小于所述待上传文件信息占用的存储量,则将未存储文件信息的分组作为所述目标分组。
上述方法还具有以下特点:
所述步骤3还包括:如果所述分组中有多个分组中的文件标识范围均涵盖所述待上传文件的文件标识且不是所述最大范围,且所述多个分组中的每一个的剩余存储量均小于所述待上传文件信息占用的存储量,新建与此多个分组的总数量相同的新的分组,将原多个分组中已存储的各文件和待上传文件按文件标识依次排序后确定待上传文件的位置所对应的分组,将此分组作为目标分组,并原多个分组中已存储的各文件存储到新的分组,删除原多个分组。
上述方法还具有以下特点:
所述步骤4还包括:当将多个分组中的文件信息和所述待上传文件的文件 信息一起存入多个目标分组时,将待存入所述多个目标分组的文件信息根据预设的文件标识的排序方式依次存入所述多个目标分组中,将所述多个分组中存储的文件信息删除;
当所述目标分组的总存储量小于待存入的文件信息总的占用的存储量时,按所述步骤3中确定目标分组的方法重新确定存储无法存入所述目标分组的文件信息的目标分组。
上述方法还具有以下特点:
所述方法还包括:按预设的时间周期定时检查文件存储区内是否存在损坏的文件,如果存在损坏的文件,则根据预设的算法,利用所述校验块将所述损坏的文件的原始数据恢复。
上述方法还具有以下特点:
所述按预设的时间周期定时检查文件存储区内是否存在损坏的文件包括:按预设的时间周期定时地根据所述文件存储区内所有的文件的原始数据计算对应的文件标识,如果计算得到的所述文件标识与存储的文件标识不一致,则判断此文件损坏。
上述方法还具有以下特点:
所述方法还包括:接收客户端发送的文件下载请求,所述文件下载请求包括待下载文件的文件标识,在所述文件标识范围涵盖所述待下载文件的文件标识的分组中查找与所述待下载文件的文件标识相同的文件标识,如果不存在与所述待下载文件的文件标识相同的文件标识,则在其余的所述分组中查找与所述待下载文件的文件标识相同的文件标识,将与所述待下载文件的文件标识相同的文件标识对应的文件的原始数据发送至客户端。
上述方法还具有以下特点:
所述方法还包括:接收客户端发送的文件删除请求,所述文件删除请求包括待删除文件的文件标识,在所述文件标识范围涵盖所述待删除文件的文件标识的分组中查找与所述待删除文件的文件标识相同的文件标识,如果不存在与所述待删除文件的文件标识相同的文件标识,则在其余的所述分组中查找与所述待删除文件的文件标识相同的文件标识,将与所述待删除文件的文件标识相同的文件标识对应的文件信息删除,记录删除的文件属性,所述文件属性包括所述文件标识,还包括但不限于以下信息中至少一种:文件名、文件删除时间、 文件上传时间和文件存储地址。
上述方法还具有以下特点:
所述文件标识为利用预设的加密算法对文件原始数据进行计算得到的文件的唯一标识。
本发明实施例提供的数据存储的装置,所述装置包括:
分组模块,设置为对文件存储区域进行分组,设置每个分组的存储容量;
上传请求接收模块,设置为接收客户端发送的文件上传请求,所述文件上传请求包括待上传文件的文件标识和根据预设算法计算出的所述待上传文件的校验块;
文件信息确定模块,设置为确定所述待上传文件的信息,包括所述待上传文件的文件标识、所述待上传文件的原始数据和所述校验块;
目标分组确定模块,设置为确定目标分组,包括:确定各分组的文件标识范围,根据所述待上传文件的文件标识和各分组的文件标识范围,将文件标识范围包含所述待上传文件的文件标识的分组并且剩余存储量不小于待上传文件信息占用的存储量的分组作为目标分组;
上传文件管理模块,设置为将所述待上传文件的信息存储至所述目标分组。
上述装置还具有以下特点:
所述目标分组确定模块包括文件标识范围确定单元,设置为确定各分组中包含的所有文件标识的涵盖范围,未存入任何信息的分组的文件标识范围和只存入一个文件标识的分组的文件标识范围为最大范围。
上述装置还具有以下特点:
所述目标分组确定模块,还设置为在所述文件标识范围不是所述最大范围的分组中,各分组的所述文件标识范围均不涵盖所述待上传文件的文件标识,或者只有一个分组内的所述文件标识范围涵盖所述待上传文件的文件标识且该分组的剩余存储量小于所述待上传文件信息占用的存储量,则将已存储文件信息且剩余存储量不小于所述待上传文件信息占用的存储量的分组作为所述目标分组;
如果已存储文件信息的每个分组的剩余存储量均小于所述待上传文件信息占用的存储量,则将未存储文件信息的分组作为所述目标分组。
上述装置还具有以下特点:
所述目标分组确定模块,还设置为在所述分组中有多个分组中的文件标识范围均涵盖所述待上传文件的文件标识且不是所述最大范围且所述多个分组中的每一个的剩余存储量均小于所述待上传文件信息占用的存储量时,新建与此多个分组的总数量相同的新的分组,将原多个分组中已存储的各文件和待上传文件按文件标识依次排序后确定待上传文件的位置所对应的分组,将此分组作为目标分组,并原多个分组中已存储的各文件存储到新的分组,删除原多个分组。
上述装置还具有以下特点:
所述上传文件管理模块,还设置为当将多个分组中的文件信息和所述待上传文件的文件信息一起存入多个目标分组时,将待存入所述多个目标分组的文件信息根据预设的文件标识的排序方式依次存入所述多个目标分组中,将所述多个分组中存储的文件信息删除;
当所述目标分组的总存储量小于待存入的文件信息总的占用的存储量时,目标分组确定模块重新确定存储无法存入所述目标分组的文件信息的目标分组。
上述装置还具有以下特点:
所述装置还包括检查模块,设置为按预设的时间周期定时检查文件存储区内是否存在损坏的文件,如果存在损坏的文件,则根据预设的算法,利用所述校验块将所述损坏的文件的原始数据恢复。
上述装置还具有以下特点:
所述检查模块包括计算单元,设置为按预设的时间周期定时地根据所述文件存储区内所有的文件的原始数据计算对应的文件标识,如果计算得到的所述文件标识与存储的文件标识不一致,则判断此文件损坏。
上述装置还具有以下特点:
所述装置还包括下载管理模块,设置为接收客户端发送的文件下载请求,所述文件下载请求包括待下载文件的文件标识,在所述文件标识范围涵盖所述待下载文件的文件标识的分组中查找与所述待下载文件的文件标识相同的文件标识,如果不存在与所述待下载文件的文件标识相同的文件标识,则在其余的所述分组中查找与所述待下载文件的文件标识相同的文件标识,将与所述待下 载文件的文件标识相同的文件标识对应的文件的原始数据发送至客户端。
上述装置还具有以下特点:
所述装置还包括删除管理模块,设置为接收客户端发送的文件删除请求,所述文件删除请求包括待删除文件的文件标识,在所述文件标识范围涵盖所述待删除文件的文件标识的分组中查找与所述待删除文件的文件标识相同的文件标识,如果不存在与所述待删除文件的文件标识相同的文件标识,则在其余的所述分组中查找与所述待删除文件的文件标识相同的文件标识,将与所述待删除文件的文件标识相同的文件标识对应的文件信息删除,记录删除的文件属性,所述文件属性包括所述文件标识,还包括但不限于以下信息中至少一种:文件名、文件删除时间、文件上传时间和文件存储地址。
上述装置还具有以下特点:
所述文件标识为利用预设的加密算法对文件原始数据进行计算得到的文件的唯一标识。
本发明实施例提供的计算机可读存储介质上存储有计算机程序,所述程序被处理器执行时实现上述方法的步骤。
本发明实施例提供的计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现上述方法的步骤。
本发明实施例可以提高文件管理科学性,提高存储速度。
此处所说明的附图用来提供对本发明实施例的进一步理解,构成本申请的一部分,本发明实施例的示意性实施例及其说明用于解释本发明实施例,并不构成对本发明实施例的不当限定。在附图中:
图1是实施例一提供的数据存储方法的流程图;
图2是实施例二提供的数据存储装置的结构示意图。
现结合附图和具体实施方式对本发明实施例进一步说明。
下面结合附图详细描述本发明的示例性实施例数据存储方法及装置。
实施例一
图1是示出根据本发明实施例一的数据存储方法的流程图。此方法典型的应用于用于存储的服务器,参照图1,此方法包括:
步骤101,对文件存储区域进行分组,设置每个分组的存储容量;
步骤102,接收客户端发送的文件上传请求,文件上传请求包括待上传文件的文件标识和根据预设算法计算出的待上传文件的校验块;确定待上传文件的信息,包括待上传文件的文件标识、待上传文件的原始数据和校验块;
步骤103,确定目标分组,包括:确定各分组的文件标识范围,根据待上传文件的文件标识和各分组的文件标识范围,将文件标识范围包含待上传文件的文件标识的分组并且剩余存储量不小于待上传文件信息占用的存储量的分组作为目标分组;
步骤104,将待上传文件的信息存储至目标分组。
上述数据存储方法中文件标识为利用预设的加密算法对文件原始数据进行计算得到的文件的唯一标识,例如,采用安全哈希算法(SHA1)对文件原始数据内容进行计算得到的SHA1值为文件的唯一标识。
上述步骤103中,确定各分组的文件标识范围包括:确定各分组中包含的所有文件标识的涵盖范围,未存入任何信息的分组的文件标识范围和只存入一个文件标识的分组的文件标识范围为最大范围。
例如,某分组中存储的文件标识中,最小的文件标识为aaa,最大的文件标识为bbb,则该分组的文件标识范围为aaa~bbb,将没有存储任何文件信息和只存储了一个文件标识的分组的文件标识范围标记为MAX。
上述步骤103还包括:在文件标识范围不是最大范围的分组中,如果各分组的文件标识范围均不涵盖待上传文件的文件标识,或者只有一个分组内的文件标识范围涵盖待上传文件的文件标识且该分组的剩余存储量小于待上传文件信息占用的存储量,则将已存储文件信息且剩余存储量不小于待上传文件信息占用的存储量的分组作为目标分组;
如果已存储文件信息的每个分组的剩余存储量均小于待上传文件信息占用的存储量,则将未存储文件信息的分组作为目标分组。
上述步骤103还包括:如果分组中有多个分组中的文件标识范围均涵盖待 上传文件的文件标识且不是最大范围,且多个分组中的每一个的剩余存储量均小于待上传文件信息占用的存储量新建与此多个分组的总数量相同的新的分组,将原多个分组中已存储的各文件和待上传文件按文件标识依次排序后确定待上传文件的位置所对应的分组,将此分组作为目标分组,并原多个分组中已存储的各文件存储到新的分组,删除原多个分组。
上述步骤104还包括:当将多个分组中的文件信息和待上传文件的文件信息一起存入多个目标分组时,将待存入多个目标分组的文件信息根据预设的文件标识的排序方式依次存入多个目标分组中,将多个分组中存储的文件信息删除;
当多个目标分组的总存储量小于待存入的文件信息总的占用的存储量时,按步骤103中确定目标分组的方法重新确定存储无法存入多个目标分组的文件信息的目标分组。
例如,待上传文件的文件标识为abc123,该文件的文件信息占用的存储量为100MB,第一分组的文件标识范围为abc000~abc300,第二分组的文件标识范围为abc100~abc400,这两个分组的文件标识范围有重叠部分且均能够涵盖待上传文件的文件标识,并且这两个分组的剩余存储量均小于100MB,则在未存储任何文件信息的分组中确定两个分组作为存储第一分组和第二分组中的全部文件信息和待上传文件的文件信息的目标分组。
预设文件标识的排序方式为从小到大,将第一分组中的文件信息、第二分组中的文件信息和待上传文件的文件信息按文件标识从小到大的顺序依次存入两个目标分组中,此时,两个目标分组的文件标识范围没有了重叠部分,如果两个目标分组无法容纳所有的待存入的文件信息,则按上述步骤103重新确定存储剩余的未存入这两个目标分组的文件信息的目标分组。
当确定目标分组的时候,可以以轮循的方式查询各分组,也可以按预设的分组编号选择,分组的编号顺序代表向各分组存储文件时的优先级别,当多个分组均满足目标分组的条件时,向优先级别最高的分组中存储文件信息。
上述数据存储方法还包括:按预设的时间周期(此时间周期的值可以是系统默认的固定值也可以是可人工设置的变量值)定时检查文件存储区内是否存在损坏的文件,如果存在损坏的文件,则根据预设的算法,利用校验块将损坏的文件的原始数据恢复。
上述按预设的时间周期定时检查文件存储区内是否存在损坏的文件包括:按预设的时间周期定时地根据文件存储区内所有的文件的原始数据计算对应的文件标识,如果计算得到的文件标识与存储的文件标识不一致,则判断此文件损坏。
例如,设置每个月第一天的零点检查文件存储区内是否存在损坏的文件,如果存在损坏的文件,按预设的算法根据校验块将损坏的文件原始数据恢复。
上述数据存储方法还包括:接收客户端发送的文件下载请求,文件下载请求包括待下载文件的文件标识,在文件标识范围涵盖待下载文件的文件标识的分组中查找与待下载文件的文件标识相同的文件标识,如果不存在与待下载文件的文件标识相同的文件标识,则在其余的分组中查找与待下载文件的文件标识相同的文件标识,将与待下载文件的文件标识相同的文件标识对应的文件的原始数据发送至客户端。
上述数据存储方法还包括:接收客户端发送的文件删除请求,文件删除请求包括待删除文件的文件标识,在文件标识范围涵盖待删除文件的文件标识的分组中查找与待删除文件的文件标识相同的文件标识,如果不存在与待删除文件的文件标识相同的文件标识,则在其余的分组中查找与待删除文件的文件标识相同的文件标识,将与待删除文件的文件标识相同的文件标识对应的文件信息删除,记录删除的文件属性,文件属性包括文件标识,还包括但不限于以下信息的至少一种:文件名、文件删除时间、文件上传时间和文件存储地址。
实施例二
图2是示出根据本发明实施例二的数据存储装置的结构示意图。参照图2,所述装置包括:
分组模块201,设置为对文件存储区域进行分组,设置每个分组的存储容量;
上传请求接收模块202,设置为接收客户端发送的文件上传请求,文件上传请求包括待上传文件的文件标识和根据预设算法计算出的待上传文件的校验块;
文件信息确定模块203,设置为确定待上传文件的信息,包括待上传文件的文件标识、待上传文件的原始数据和校验块;
目标分组确定模块204,设置为确定目标分组,包括:确定各分组的文件标识范围,根据待上传文件的文件标识和各分组的文件标识范围,将文件标识范 围包含待上传文件的文件标识的分组并且剩余存储量大于待上传文件信息占用的存储量的分组作为目标分组;
上传文件管理模块205,设置为将待上传文件的信息存储至目标分组。
目标分组确定模块204包括文件标识范围确定单元2041,设置为确定各分组中包含的所有文件标识的涵盖范围,未存入任何信息的分组的文件标识范围和只存入一个文件标识的分组的文件标识范围为最大范围。
目标分组确定模块204,还设置为在文件标识范围不是最大范围的分组中,如果各分组的文件标识范围均不涵盖待上传文件的文件标识,或者只有一个分组内的文件标识范围涵盖待上传文件的文件标识且该分组的剩余存储量小于待上传文件信息占用的存储量,则将已存储文件信息且剩余存储量大于待上传文件信息占用的存储量的分组作为目标分组;
如果已存储文件信息的每个分组的剩余存储量均小于待上传文件信息占用的存储量,则将未存储文件信息的分组作为目标分组。
目标分组确定模块204,还设置为在分组中有多个分组中的文件标识范围均涵盖待上传文件的文件标识且不是最大范围且多个分组中的每一个的剩余存储量均小于待上传文件信息占用的存储量时,新建与此多个分组的总数量相同的新的分组,将原多个分组中已存储的各文件和待上传文件按文件标识依次排序后确定待上传文件的位置所对应的分组,将此分组作为目标分组,并原多个分组中已存储的各文件存储到新的分组,删除原多个分组。
上传文件管理模块205,还设置为当将多个分组中的文件信息和待上传文件的文件信息一起存入多个目标分组时,将待存入多个目标分组的文件信息根据预设的文件标识的排序方式依次存入多个目标分组中,将多个分组中存储的文件信息删除;
当目标分组的总存储量小于待存入的文件信息总的占用的存储量时,目标分组确定模块204重新确定存储无法存入目标分组的文件信息的目标分组。
上述装置还包括检查模块206,设置为按预设的时间周期定时检查文件存储区内是否存在损坏的文件,如果存在损坏的文件,则根据预设的算法,利用校验块将损坏的文件的原始数据恢复。
检查模块206包括计算单元2061,设置为按预设的时间周期定时地根据文件存储区内所有的文件的原始数据计算对应的文件标识,如果计算得到的文件 标识与存储的文件标识不一致,则判断此文件损坏。
上述装置还包括下载管理模块207,设置为接收客户端发送的文件下载请求,文件下载请求包括待下载文件的文件标识,在文件标识范围涵盖待下载文件的文件标识的分组中查找与待下载文件的文件标识相同的文件标识,如果不存在与待下载文件的文件标识相同的文件标识,则在其余的分组中查找与待下载文件的文件标识相同的文件标识,将与待下载文件的文件标识相同的文件标识对应的文件的原始数据发送至客户端。
上述装置还包括删除管理模块208,设置为接收客户端发送的文件删除请求,文件删除请求包括待删除文件的文件标识,在文件标识范围涵盖待删除文件的文件标识的分组中查找与待删除文件的文件标识相同的文件标识,如果不存在与待删除文件的文件标识相同的文件标识,则在其余的分组中查找与待删除文件的文件标识相同的文件标识,将与待删除文件的文件标识相同的文件标识对应的文件信息删除,记录删除的文件属性,文件属性包括文件标识,还包括但不限于以下信息中至少一种:文件名、文件删除时间、文件上传时间和文件存储地址。
上述文件标识为利用预设的加密算法对文件原始数据进行计算得到的文件的唯一标识。
本发明实施例提供的数据存储方法及装置,以分组的方式管理文件存储空间,当需要从各分组中查找文件时,根据分组的文件标识范围确定目标文件的位置,而利用本发明实施例提供的方法及装置,在将文件上传至存储空间之前,可以消除多个分组的文件标识范围的重叠部分,缩小了查找的范围,提高了查找的效率。
上面描述的内容可以单独地或者以各种方式组合起来实施,而这些变型方式都在本发明实施例的保护范围之内。
本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的精神和范围,均应涵盖在权利要求范围当中。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应 于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
本发明实施例提供的数据存储方法及装置,以分组的方式管理文件存储空间,当需要从各分组中查找文件时,根据分组的文件标识范围确定目标文件的位置,而利用本发明实施例提供的方法及装置,在将文件上传至存储空间之前,可以消除多个分组的文件标识范围的重叠部分,缩小了查找的范围,提高了查找的效率。
Claims (22)
- 一种数据存储的方法,包括:步骤1,对文件存储区域进行分组,设置每个分组的存储容量;步骤2,接收客户端发送的文件上传请求,所述文件上传请求包括待上传文件的文件标识和根据预设算法计算出的所述待上传文件的校验块;确定所述待上传文件的信息,包括所述待上传文件的文件标识、所述待上传文件的原始数据和所述校验块;步骤3,确定目标分组,包括:确定各分组的文件标识范围,根据所述待上传文件的文件标识和各分组的文件标识范围,将文件标识范围包含所述待上传文件的文件标识的分组并且剩余存储量不小于待上传文件信息占用的存储量的分组作为目标分组;步骤4,将所述待上传文件的信息存储至所述目标分组。
- 如权利要求1所述的方法,其中,所述确定各分组的文件标识范围包括:确定各分组中包含的所有文件标识的涵盖范围,未存入任何信息的分组的文件标识范围和只存入一个文件标识的分组的文件标识范围为最大范围。
- 如权利要求2所述的方法,其中,所述步骤3还包括:在所述文件标识范围不是所述最大范围的分组中,如果各分组的所述文件标识范围均不涵盖所述待上传文件的文件标识,或者只有一个分组内的所述文件标识范围涵盖所述待上传文件的文件标识且该分组的剩余存储量小于所述待上传文件信息占用的存储量,则将已存储文件信息且剩余存储量不小于所述待上传文件信息占用的存储量的分组作为所述目标分组;如果已存储文件信息的每个分组的剩余存储量均小于所述待上传文件信息占用的存储量,则将未存储文件信息的分组作为所述目标分组。
- 如权利要求2所述的方法,其中,所述步骤3还包括:如果所述分组中有多个分组中的文件标识范围均涵盖所述待上传文件的文件标识且不是所述最大范围,且所述多个分组中的每一个的剩余存储量均小于所述待上传文件信息占用的存储量,新建与此多个分组的总数量相同的新的分组,将原多个分组中已存储的各文件和待上传文件按文件标识依次排序后确定待上传文件的位置所对应的分组,将此分组作为目标分组, 并原多个分组中已存储的各文件存储到新的分组,删除原多个分组。
- 如权利要求4所述的方法,其中,所述步骤4还包括:当将多个分组中的文件信息和所述待上传文件的文件信息一起存入多个目标分组时,将待存入所述多个目标分组的文件信息根据预设的文件标识的排序方式依次存入所述多个目标分组中,将所述多个分组中存储的文件信息删除;当所述目标分组的总存储量小于待存入的文件信息总的占用的存储量时,按所述步骤3中确定目标分组的方法重新确定存储无法存入所述目标分组的文件信息的目标分组。
- 如权利要求1所述的方法,其中,所述方法还包括:按预设的时间周期定时检查文件存储区内是否存在损坏的文件,如果存在损坏的文件,则根据预设的算法,利用所述校验块将所述损坏的文件的原始数据恢复。
- 如权利要求6所述的方法,其中,所述按预设的时间周期定时检查文件存储区内是否存在损坏的文件包括:按预设的时间周期定时地根据所述文件存储区内所有的文件的原始数据计算对应的文件标识,如果计算得到的所述文件标识与存储的文件标识不一致,则判断此文件损坏。
- 如权利要求1所述的方法,其中,所述方法还包括:接收客户端发送的文件下载请求,所述文件下载请求包括待下载文件的文件标识,在所述文件标识范围涵盖所述待下载文件的文件标识的分组中查找与所述待下载文件的文件标识相同的文件标识,如果不存在与所述待下载文件的文件标识相同的文件标识,则在其余的所述分组中查找与所述待下载文件的文件标识相同的文件标识,将与所述待下载文件的文件标识相同的文件标识对应的文件的原始数据发送至客户端。
- 如权利要求1所述的方法,其中,所述方法还包括:接收客户端发送的文件删除请求,所述文件删除请求包括待删除文件的文件标识,在所述文件标识范围涵盖所述待删除文件的文件标识的分组中查找与所述待删除文件的文件标识相同的文件标识,如果不存在与所述待删除文件的文件标识相同的文件标识,则在其余的所述分组中查找与所述待删除文件的文件标识相同的文件标识,将与所述待删除文件的文件标识相 同的文件标识对应的文件信息删除,记录删除的文件属性,所述文件属性包括所述文件标识,还包括但不限于以下信息中至少一种:文件名、文件删除时间、文件上传时间和文件存储地址。
- 如权利要求1-9中任一项所述的方法,其中,所述文件标识为利用预设的加密算法对文件原始数据进行计算得到的文件的唯一标识。
- 一种数据存储的装置,包括:分组模块,设置为对文件存储区域进行分组,设置每个分组的存储容量;上传请求接收模块,设置为接收客户端发送的文件上传请求,所述文件上传请求包括待上传文件的文件标识和根据预设算法计算出的所述待上传文件的校验块;文件信息确定模块,设置为确定所述待上传文件的信息,包括所述待上传文件的文件标识、所述待上传文件的原始数据和所述校验块;目标分组确定模块,设置为确定目标分组,包括:确定各分组的文件标识范围,根据所述待上传文件的文件标识和各分组的文件标识范围,将文件标识范围包含所述待上传文件的文件标识的分组并且剩余存储量不小于待上传文件信息占用的存储量的分组作为目标分组;上传文件管理模块,设置为将所述待上传文件的信息存储至所述目标分组。
- 如权利要求11所述的装置,其中,所述目标分组确定模块包括文件标识范围确定单元,用于确定各分组中包含的所有文件标识的涵盖范围,未存入任何信息的分组的文件标识范围和只存入一个文件标识的分组的文件标识范围为最大范围。
- 如权利要求12所述的装置,其中,所述目标分组确定模块,还设置为在所述文件标识范围不是所述最大范围的分组中,各分组的所述文件标识范围均不涵盖所述待上传文件的文件标识,或者只有一个分组内的所述文件标识范围涵盖所述待上传文件的文件标识且该分组的剩余存储量小于所述待上传文件信息占用的存储量,则将已存储文件信息且剩余存储量不小于所述待上传文件信息占用的存储量的分组作为所述目标分组;如果已存储文件信息的每个分组的剩余存储量均小于所述待上传文件信息 占用的存储量,则将未存储文件信息的分组作为所述目标分组。
- 如权利要求12所述的装置,其中,所述目标分组确定模块,还设置为在所述分组中有多个分组中的文件标识范围均涵盖所述待上传文件的文件标识且不是所述最大范围且所述多个分组中的每一个的剩余存储量均小于所述待上传文件信息占用的存储量时,新建与此多个分组的总数量相同的新的分组,将原多个分组中已存储的各文件和待上传文件按文件标识依次排序后确定待上传文件的位置所对应的分组,将此分组作为目标分组,并原多个分组中已存储的各文件存储到新的分组,删除原多个分组。
- 如权利要求14所述的装置,其中,所述上传文件管理模块,还设置为当将多个分组中的文件信息和所述待上传文件的文件信息一起存入多个目标分组时,将待存入所述多个目标分组的文件信息根据预设的文件标识的排序方式依次存入所述多个目标分组中,将所述多个分组中存储的文件信息删除;当所述目标分组的总存储量小于待存入的文件信息总的占用的存储量时,目标分组确定模块重新确定存储无法存入所述目标分组的文件信息的目标分组。
- 如权利要求11所述的装置,其中,所述装置还包括检查模块,设置为按预设的时间周期定时检查文件存储区内是否存在损坏的文件,如果存在损坏的文件,则根据预设的算法,利用所述校验块将所述损坏的文件的原始数据恢复。
- 如权利要求16所述的装置,其中,所述检查模块包括计算单元,设置为按预设的时间周期定时地根据所述文件存储区内所有的文件的原始数据计算对应的文件标识,如果计算得到的所述文件标识与存储的文件标识不一致,则判断此文件损坏。
- 如权利要求11所述的装置,其中,所述装置还包括下载管理模块,设置为接收客户端发送的文件下载请求,所述文件下载请求包括待下载文件的文件标识,在所述文件标识范围涵盖所述待下载文件的文件标识的分组中查找与所述待下载文件的文件标识相同的文件标识,如果不存在与所述待下载文件的文件标识相同的文件标识,则在其余的所述分组中查找与所述待下载文件的文件标识相同的文件标识,将与所述待下 载文件的文件标识相同的文件标识对应的文件的原始数据发送至客户端。
- 如权利要求11所述的装置,其中,所述装置还包括删除管理模块,设置为接收客户端发送的文件删除请求,所述文件删除请求包括待删除文件的文件标识,在所述文件标识范围涵盖所述待删除文件的文件标识的分组中查找与所述待删除文件的文件标识相同的文件标识,如果不存在与所述待删除文件的文件标识相同的文件标识,则在其余的所述分组中查找与所述待删除文件的文件标识相同的文件标识,将与所述待删除文件的文件标识相同的文件标识对应的文件信息删除,记录删除的文件属性,所述文件属性包括所述文件标识,还包括但不限于以下信息中至少一种:文件名、文件删除时间、文件上传时间和文件存储地址。
- 如权利要求11-19中任一项所述的装置,其中,所述文件标识为利用预设的加密算法对文件原始数据进行计算得到的文件的唯一标识。
- 一种计算机可读存储介质,所述存储介质上存储有计算机程序,所述程序被处理器执行时实现权利要求1至10中任意一项所述方法的步骤。
- 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至10中任意一项所述方法的步骤。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710386030.4A CN107707600B (zh) | 2017-05-26 | 2017-05-26 | 一种数据存储方法及装置 |
CN201710386030.4 | 2017-05-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018214905A1 true WO2018214905A1 (zh) | 2018-11-29 |
Family
ID=61169628
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/087991 WO2018214905A1 (zh) | 2017-05-26 | 2018-05-23 | 一种数据存储的方法、装置、介质及设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107707600B (zh) |
WO (1) | WO2018214905A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112804312A (zh) * | 2020-12-31 | 2021-05-14 | 上海掌门科技有限公司 | 文件上传方法、设备以及计算机可读介质 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107707600B (zh) * | 2017-05-26 | 2018-09-18 | 贵州白山云科技有限公司 | 一种数据存储方法及装置 |
CN108830102B (zh) * | 2018-06-14 | 2021-07-02 | 平安科技(深圳)有限公司 | 文件安全管理方法、装置、计算机设备及存储介质 |
CN111106840A (zh) * | 2018-10-25 | 2020-05-05 | 贵州白山云科技股份有限公司 | 一种纠删码解码加速的方法、系统、介质及计算机设备 |
CN110262752B (zh) * | 2019-05-16 | 2020-08-11 | 罗普特科技集团股份有限公司 | 一种用于存储流媒体数据的方法、装置、存储介质 |
CN112286540B (zh) * | 2020-10-30 | 2024-08-23 | Vidaa(荷兰)国际控股有限公司 | 应用软件安装方法、终端及显示设备 |
CN112685753B (zh) * | 2020-12-25 | 2023-11-28 | 上海焜耀网络科技有限公司 | 一种用于加密数据存储的方法及设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050027692A1 (en) * | 2003-07-29 | 2005-02-03 | International Business Machines Corporation. | Method, system, and program for accessing data in a database table |
CN103797770A (zh) * | 2012-12-31 | 2014-05-14 | 华为技术有限公司 | 一种共享存储资源的方法和系统 |
CN106294352A (zh) * | 2015-05-13 | 2017-01-04 | 姚猛 | 一种文件处理方法、装置和文件系统 |
CN106294421A (zh) * | 2015-05-25 | 2017-01-04 | 阿里巴巴集团控股有限公司 | 一种数据写入、读取方法及装置 |
CN107707600A (zh) * | 2017-05-26 | 2018-02-16 | 贵州白山云科技有限公司 | 一种数据存储方法及装置 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103077166B (zh) * | 2011-10-25 | 2016-08-03 | 深圳市天趣网络科技有限公司 | 小文件存储的空间复用方法和装置 |
CN102546836A (zh) * | 2012-03-09 | 2012-07-04 | 腾讯科技(深圳)有限公司 | 一种上传文件的方法、终端、服务器及系统 |
CN103873504A (zh) * | 2012-12-12 | 2014-06-18 | 鸿富锦精密工业(深圳)有限公司 | 数据分块存储至分布式服务器的系统及方法 |
CN104679830A (zh) * | 2015-01-30 | 2015-06-03 | 乐视网信息技术(北京)股份有限公司 | 一种文件处理方法和装置 |
CN106649721B (zh) * | 2016-12-22 | 2021-06-22 | 创新科技术有限公司 | 一种文件排重方法和装置 |
-
2017
- 2017-05-26 CN CN201710386030.4A patent/CN107707600B/zh active Active
-
2018
- 2018-05-23 WO PCT/CN2018/087991 patent/WO2018214905A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050027692A1 (en) * | 2003-07-29 | 2005-02-03 | International Business Machines Corporation. | Method, system, and program for accessing data in a database table |
CN103797770A (zh) * | 2012-12-31 | 2014-05-14 | 华为技术有限公司 | 一种共享存储资源的方法和系统 |
CN106294352A (zh) * | 2015-05-13 | 2017-01-04 | 姚猛 | 一种文件处理方法、装置和文件系统 |
CN106294421A (zh) * | 2015-05-25 | 2017-01-04 | 阿里巴巴集团控股有限公司 | 一种数据写入、读取方法及装置 |
CN107707600A (zh) * | 2017-05-26 | 2018-02-16 | 贵州白山云科技有限公司 | 一种数据存储方法及装置 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112804312A (zh) * | 2020-12-31 | 2021-05-14 | 上海掌门科技有限公司 | 文件上传方法、设备以及计算机可读介质 |
CN112804312B (zh) * | 2020-12-31 | 2023-06-30 | 上海掌门科技有限公司 | 文件上传方法、设备以及计算机可读介质 |
Also Published As
Publication number | Publication date |
---|---|
CN107707600B (zh) | 2018-09-18 |
CN107707600A (zh) | 2018-02-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018214905A1 (zh) | 一种数据存储的方法、装置、介质及设备 | |
US11474972B2 (en) | Metadata query method and apparatus | |
US9792306B1 (en) | Data transfer between dissimilar deduplication systems | |
US10380073B2 (en) | Use of solid state storage devices and the like in data deduplication | |
CN108733761B (zh) | 一种数据处理方法装置及系统 | |
US8751763B1 (en) | Low-overhead deduplication within a block-based data storage | |
US10339112B1 (en) | Restoring data in deduplicated storage | |
US10303363B2 (en) | System and method for data storage using log-structured merge trees | |
CN102725755B (zh) | 文件访问方法及系统 | |
US20120259825A1 (en) | Data management method and data management system | |
CN105095300A (zh) | 一种数据库备份方法及系统 | |
KR20150104606A (ko) | 볼륨 동작들에 대한 안전성 | |
CN107085613B (zh) | 入库文件的过滤方法和装置 | |
WO2018006721A1 (zh) | 日志文件的存储方法及装置 | |
CN107832470A (zh) | 一种基于存储系统的对象存储方法及装置 | |
US10929052B2 (en) | Snapshot space reduction method and apparatus | |
CN115756955A (zh) | 一种数据备份、数据恢复的方法、装置及计算机设备 | |
US9921918B1 (en) | Cloud-based data backup and management | |
TWI420333B (zh) | 分散式的重複數據刪除系統及其處理方法 | |
US11409604B1 (en) | Storage optimization of pre-allocated units of storage | |
CN112559118A (zh) | 应用数据迁移方法、装置、电子设备及存储介质 | |
US10860212B1 (en) | Method or an apparatus to move perfect de-duplicated unique data from a source to destination storage tier | |
JP5494817B2 (ja) | ストレージシステム、データ管理装置、方法及びプログラム | |
CN107846429B (zh) | 一种文件备份方法、装置和系统 | |
JP6110354B2 (ja) | 異種記憶サーバおよびそのファイル記憶方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18806162 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18806162 Country of ref document: EP Kind code of ref document: A1 |