WO2021237467A1

WO2021237467A1 - File uploading method, file downloading method and file management apparatus

Info

Publication number: WO2021237467A1
Application number: PCT/CN2020/092383
Authority: WO
Inventors: 许若阳
Original assignee: 深圳元戎启行科技有限公司
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2021-12-02
Also published as: CN113273163A

Abstract

The present application relates to a file uploading method, a file downloading method and a file management apparatus. The file uploading method comprises: acquiring a file to be uploaded; segmenting the file to be uploaded into at least one block file; uploading the at least one block file to a storage server; receiving at least one storage address returned by the storage server and corresponding to the at least one block file; and storing metainformation of the file, wherein the metainformation comprises a first file identifier of the file and the at least one storage address, which are stored in an associated manner, and an arrangement order of the at least one block file corresponding to the at least one storage address in the file before segmentation.

Description

File upload method, file download method and file management device

Technical field

This application relates to the field of data storage technology, and in particular to a file upload method, file download method and file management device.

Background technique

At present, there are many storage service providers that provide data storage services to corporate users or individual users. The storage servers of these storage service providers can provide access to object storage services, such as Object-Based Storage Systems, for corporate users to perform Storage of mobile applications, large-scale websites, picture sharing or hotspot audio and video, or low-frequency access storage and archive storage, or storage of files for individual users, etc. This type of service can provide flat file storage and Content Delivery Network (CDN) resources to improve the loading speed of static resources by users. The user uploads the file to the storage pool of the storage server through the back-end program of the user terminal, and can obtain the Uniform Resource Locator (URL) returned by the storage server, and embed the address in the web page or application program interface ( In the data returned by Application Programming Interface (API), the user can download the previously uploaded file by virtue of the URL. However, in the above-mentioned related technologies, the files stored in the storage server are easier to be stolen, and the file storage security is low.

Summary of the invention

Based on this, it is necessary to provide a file upload method, file download method, and file management device.

A method for uploading files includes: obtaining a file to be uploaded; dividing the file to be uploaded into at least one block file; uploading at least one block file to a storage server; receiving a file corresponding to at least one block file returned by the storage server At least one storage address; and the meta-information of the stored file; the meta-information includes the first file identifier of the associated stored file, at least one storage address, and the arrangement order of the at least one block file corresponding to the at least one storage address before the file is split information.

A file download method includes: obtaining a first file identification of a file to be downloaded; obtaining meta information of the file based on the first file identification; meta information including the first file identification of the file, at least one storage address, and at least one storage address Information about the arrangement sequence of the corresponding at least one block file before the file is split; download at least one block file from a storage server corresponding to the at least one storage address by using at least one storage address; and arrange based on the at least one block file Sequence information, restore at least one block file to a complete file and return the complete file.

A file management device, the file management device is in communication connection with a storage server; the file management device includes a processor and an information memory. The processor is used to execute the above-mentioned file upload method and file download method.

In the above file upload method, file download method, and file management device, when uploading a file, the file to be uploaded is divided into at least one block file, and the at least one block file is respectively stored in a storage server, and when the file is downloaded At this time, download at least one block file from the storage server, and merge the at least one block file according to the division order of the block file recorded locally to obtain the restored file. In this way, the files stored on the storage server are incomplete block files that are divided into blocks, and it is difficult to obtain information such as the arrangement order of these block files from the storage server, so that it is difficult to restore the complete storage files from the storage server, which effectively improves This improves the security of files stored on the storage server.

Description of the drawings

Figure 1 is an application environment diagram of a file upload method and a file download method in an embodiment;

2 is a diagram of the application environment of the file upload method and the file download method in another embodiment;

FIG. 3 is a schematic flowchart of a file upload method in an embodiment;

Figure 4 is a schematic flowchart of a file upload method in an embodiment;

FIG. 5 is a schematic flowchart of a file upload method in an embodiment;

FIG. 6 is a schematic flowchart of a file upload method in an embodiment;

FIG. 7 is a schematic diagram of the structure of an information storage in an embodiment;

FIG. 8 is a schematic flowchart of a file download method in an embodiment;

FIG. 9 is a schematic flowchart of a file download method in an embodiment;

Figure 10 is a schematic diagram of a file upload method and a file download method in an embodiment;

Figure 11 is a structural block diagram of a file uploading device in an embodiment;

Figure 12 is a structural block diagram of a file downloading device in an embodiment;

Fig. 13 is a structural block diagram of a file management device in an embodiment.

Detailed ways

In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.

The file upload method and file download method provided in this application can be applied to the application environment as shown in FIG. 1. The user equipment 102 communicates with the intermediate server 104 through the network, and the intermediate server 104 communicates with the storage server 106 through the network. The storage server 106 is usually a third-party server. The user equipment 102 may be an enterprise user equipment or a personal user terminal, and the enterprise user equipment may be an enterprise server and/or an enterprise user terminal. Personal user terminals and enterprise user terminals can be, but are not limited to, various personal computers, notebook computers, smart phones, tablet computers, etc. The storage service provider provides storage services through the storage server 106, and the storage server 106 is provided with a storage pool for storing files.

The file upload method and file download method provided in this application can be executed by a file management device. The file management device can store a back-end program. The back-end program can be deployed on the intermediate server 104. The file upload method and file download method of the application. In other embodiments, a part of the back-end program may also be deployed in the intermediate server 104, and the other part may be deployed in the user equipment 102 by the intermediate server 104, or the intermediate server 104 may also deploy all the back-end programs in the user equipment. In equipment 102. Correspondingly, the file uploading method and file downloading method of the present application can be executed in the intermediate server 104 as part of the steps and executed in the back-end program of the user equipment 102, or all steps can be executed in the back-end of the user equipment 102 Executed in the program.

In an embodiment, the file management apparatus may include an information storage 110, and the information storage 110 may include an internal memory and a non-volatile storage medium. The back-end program may be stored on the non-volatile storage medium. A database may also be stored on the non-volatile storage medium, and the database may be used to store the index information of the uploaded file. The index information may include the meta-information of the file, or the meta-information of the file may be obtained through the index information. For example, the index information may also include the storage address of the associated meta-information and the first file identifier. The information storage 110 may be located in the user equipment 102 or in the intermediate server 104.

In one embodiment, the storage services provided by multiple different storage service providers can be comprehensively utilized. Accordingly, as shown in FIG. 2, the intermediate server 104 can perform the processing through the network with multiple storage servers 106 corresponding to multiple storage service providers. In communication, the file management apparatus may choose to store the divided at least one block file in a plurality of different storage servers 106 respectively. Both the intermediate server 104 and the storage server 106 can be implemented by independent servers or a server cluster composed of multiple servers.

In an embodiment of the present application, the index information of the file can be stored in the Redis database, and the file system directory structure can be realized in the database by using the Hash and Set data structure. According to the actual situation, you can choose whether to cache the file meta-information together in the Redis database, or record the URL of the web page address where the meta-information is located. At the same time, the file name, actual file length, file creation and modification time and other information are stored in the Redis database to speed up the query of such information.

In the foregoing embodiment, the Redis database communicates using the TCP protocol, so that multiple framework instances can be configured to connect to the same Redis database instance, thereby easily achieving file index synchronization. And Redis database supports atomic operations, which can effectively avoid various competition situations. At the same time, the Redis database supports a master-slave architecture, which is convenient for expansion and can provide high availability guarantee. In addition, you can use the RDB function and AOF function that comes with the Redis database to achieve persistence, to prevent the loss of file indexes to the greatest extent, and to ensure the integrity of the framework data.

In an embodiment of this application, a directory is created in another file system as the root directory of this file system, and all operations with a program interface granularity greater than the file's own reading and writing can be completed by the system calling the POSIX interface, that is, a Transparent proxy for directory structure. Then the meta-information of this framework file is stored in files in the corresponding directory structure of other file systems. When file data operations are performed, the request is forwarded to the bottom layer of the framework abstract file for processing.

In the above embodiment, the file system transparent proxy method uses an existing mature file system to host the file index of the framework, which has high stability and security. In addition, part of the data in the meta-information can be directly consistent with the ATTR attributes supported in the existing file system, such as the creation time ctime and the modification time mtime. For attributes that cannot be kept consistent, such as the real size of a file, it can be obtained by simulating the output by reading the meta-information at the transparent proxy layer.

In an embodiment of the present application, as shown in FIG. 3, a file upload method is provided. The file upload method is applied to the intermediate server 104 in FIG. Deploying on the intermediate server 104 is taken as an example for description, including the following steps S302-S310.

Step S302: Obtain the file to be uploaded.

Specifically, the intermediate server 104 receives a file upload request sent by the user equipment 102, and the file upload request carries a file to be uploaded. The intermediate server 104 parses the file upload request to obtain the file to be uploaded.

In the embodiment of the present application, the file management device can provide both an HTTP API interface and a POSIX interface. Take the user equipment 102 using the HTTP API interface to perform file upload operations as an example. As shown in Figure 10, the user uses the web front end of the browser or a dedicated client or other means to directly initiate a POST/PUT method type to the intermediate server 104 File upload request (POST/PUT request for short), the request body of the POST/PUT request body part carries the file to be uploaded. Therefore, in this step, the intermediate server 104 can receive the POST/PUT request from the user equipment 102 that carries the file to be uploaded.

Step S304: Divide the file to be uploaded into at least one partial file.

Specifically, the intermediate server 104 may divide the file to be uploaded into one or more divided files according to a predetermined file division rule. In one embodiment, as shown in FIG. 10, the intermediate server 104 instantiates and obtains an abstract file object. The abstract file object can provide a variety of methods for different developers to call and use, for example, it can be used in writing mode. The flush, truncate, and write methods, and the locate, read, seek, and tell methods available in read mode. Set the abstract file object to the write mode and pass the file data stream of the file parsed in step S802 to the write method. The write method internally cuts the file data stream according to the pre-configured block size to obtain Multi-segment block file data stream. The multi-segment block file data stream represents at least one corresponding block file.

Step S306: Upload at least one block file to the storage server.

The storage server 106 provides an upload interface, and the intermediate server 104 can respectively send a plurality of file upload requests to the storage server 106 through the upload interface provided by the storage server 106, and each file upload request carries One segment of the multiple data streams, so that the multiple data streams are uploaded to the designated storage server 106.

Wherein, when uploading at least one block file to the storage server, multiple data streams, that is, at least one block file, can be uploaded to the storage server 106 independently of each other through different coroutines or threads in a multi-threaded or asynchronous manner. In order to realize the synchronous upload of at least one block file in the macro, so as to make full use of bandwidth resources and accelerate the transmission speed.

Step S308: Receive at least one storage address corresponding to the at least one block file returned by the storage server.

When the storage server 106 receives any block file uploaded by the intermediate server 104, it stores the block file in the storage pool of the storage server 106 and sends the corresponding storage address to the intermediate server 104. The storage address is used to indicate the storage location of the block file in the storage server 106. In an embodiment, the storage address may be a Uniform Resource Locator (URL).

Step S310: Store meta-information of the file, where the meta-information of the file includes the first file identifier of the associated stored file, at least one storage address, and the arrangement sequence of the at least one block file corresponding to the at least one storage address before file splitting.

Among them, the meta information of the file may record information related to various processing performed on the file during the file upload process, such as disguising, dividing, uploading, and so on. In addition, the meta-information may also include other file-related information, such as the file name of the file carried in the file upload request of the file, the MIME type of the file, the actual file length, the modification time of the file, etc. . In this way, when uploading a file, the file data stream can be extracted from the request body of the file upload request, and the file data stream can be divided to obtain multiple data streams, and the multiple data streams can be uploaded. The other file-related information extracted in the request is stored in the meta-information.

The first file identifier is information that uniquely identifies the file. When two files have the same first file identifier, it can be considered that the two files are the same file. In an embodiment, the first file identifier may be a check value of the file, such as a file fingerprint. By calculating the file fingerprint, the content of the file can be compared with higher accuracy. In other embodiments, the first file identifier may also be other information related to the file according to the requirements for the accuracy of file recognition, for example, it may be the storage path of the file in the information storage and so on.

In the above-mentioned file upload method of this application, the file to be uploaded is divided into a plurality of block files, and the plurality of block files are respectively stored in a storage server. Thus, the files stored in the storage server are incomplete Block files, and it is difficult to obtain information such as the arrangement order of these block files from the storage server, so that it is difficult to restore the complete storage file from the storage server, which effectively improves the security of the files stored on the storage server.

In one embodiment, storing the meta information of the file in step S310 includes: storing the meta information of the file in an information storage.

In the above embodiment, the meta-information of the file is stored in the information storage. In this way, the meta-information of the file cannot be obtained from the storage server, making it difficult to restore the uploaded original file, which further improves the security of the file stored in this application.

In an embodiment, the file meta-information of this application can also be uploaded to the storage server. The meta-information of the stored file in the above step S310 includes: uploading the meta-information of the file to the storage server; receiving the storage address of the meta-information returned by the storage server; and storing the storage address of the meta-information in association with the first file identifier of the file In the information store. Among them, the storage address of the meta-information can also be a URL.

In the above embodiment, by uploading the meta-information of the file to the storage server, the information storage of this information storage can only store the storage address of the meta-information of the file and the information of the first file identification, thereby reducing the cost of this information storage. The burden of data storage saves the data storage capacity of the system.

Further, in an embodiment, the dividing the file to be uploaded into multiple block files in step S304 includes: dividing the file to be uploaded based on a predetermined size to obtain a first number of files with all the files. The block file of the predetermined size; the value of the predetermined size is less than or equal to the upper limit of the file size allowed to be stored by the storage server, and the first number is the difference between the size of the file to be stored and the predetermined size Quotient rounded value; when the first number of block files with the predetermined size are obtained by dividing, if there is a remaining part of the file, the remaining part of the file is allocated as a block file .

In an embodiment, when there are multiple storage servers, the block sizes of at least one block file may be the same or different from each other, as long as the predetermined size of each block file is less than or equal to the block file The upper limit value of the file size allowed by the storage server to be stored correspondingly is sufficient.

Some storage service providers have restrictions on the file size allowed to be uploaded to their storage server. The solution of the above embodiment of this application divides the file to be stored into a size less than or equal to the upper limit of the file size allowed by the storage server. Block files to meet the storage service provider’s limit on the size of the storage file, so that files of any size can be uploaded.

Generally, the meta-information of the file to be stored is small and can meet the file size limit of the storage server. However, in some cases, the meta-information is large and may exceed the upper limit of the file size allowed by the storage server. In an embodiment, the meta information can also be divided so that the size of each block meta information obtained by the division is smaller than the upper limit, and then multiple block meta information obtained by the division are uploaded to the storage server respectively. Receive and store the storage address of the block meta-information returned by the storage server to meet the storage service provider's limitation on the size of the storage file. When downloading a file, the storage address of the block meta-information can be used in the same way to obtain the meta-information of the file.

In one embodiment, as shown in FIG. 4, after step S304 and before step S306, the file upload method further includes: S404, selecting an encoder with a target file format, and using the encoder to disguise at least one block file respectively The target file format includes a target file format; wherein the target file format includes a file format allowed by the storage server; correspondingly, the meta information stored in step S310 also includes at least one encoder corresponding to the at least one block file used for disguising. Information.

Among them, the encoder is a module used to disguise the input file of any file format into a file of the specified target file format and then output it. The target file format of the encoder is the target file format that the file is disguised as. The encoder may perform processing of adding a file header and a file tail or some other change processing to each received data stream, so as to disguise the block file represented by the data stream as a block file in the target file format.

In one embodiment, at least one block file can be disguised as a plurality of block files of the same file format. In this case, an encoder can be selected from at least one encoder with the same target file format to compare at least one block file. For camouflage, setting at least one encoder can avoid a situation where a single encoder fails and the camouflage processing cannot be performed. Alternatively, the at least one block file can also be disguised as a plurality of block files with different file formats. In this case, at least one encoder with different target file formats can be selected to disguise the at least one block file respectively.

In one of the embodiments, as shown in FIG. 10, after using the write method to cut multiple pieces of block file data streams, each piece of block file data stream obtained by cutting can be directed to the corresponding code with the target file format. The encoder can add file header and file tail processing or some other change processing to each received block file data stream, so as to disguise the block file represented by the block file data stream as the target file format. Block files.

In the above embodiment, by disguising at least one block file to have the target file format before uploading the block file, it is difficult for the storage server to obtain the block file before the disguise, which can meet the requirements of the file format allowed by the storage server. At the same time, it increases the difficulty of restoring the original files from the storage server, and further improves the security of file storage.

In one embodiment, as shown in FIG. 5, after step S302, the file upload method further includes: step S502, obtaining the first file identifier of the file to be uploaded; step S504, searching the first file in the information storage and the storage server File identification; step S506, determine whether the first file identification is stored in the information storage and the storage server; when the first file identification is not stored in the information storage and the storage server, continue to execute the waiting Step S406 where the uploaded file is divided into at least one block file; when the first file identifier has been stored in the information storage or the storage server, step S508 is executed to terminate the upload of the file operate.

In the above embodiment, when a file needs to be uploaded, the first file identifier of the file to be uploaded is searched in the information storage and the storage server. If the stored first file identifier is found in either of the information storage and the storage server , It means that the same file has been uploaded before, the upload operation of the file is terminated, and only the metadata of the file is updated and stored. Avoid repeated uploads occupying unnecessary storage space and system processing resources.

In one embodiment, the first file identifier is a file fingerprint, and the file fingerprint of the file to be uploaded can be calculated based on the fixed byte length data of the header of the file data stream of the file and the actual file length. As shown in FIG. 10, still taking the POST/PUT request as an example, when the intermediate server 104 receives the POST/PUT request, it first parses the request header to obtain the original data length, that is, the value of Content-Length in the request header. Then obtain the data format of the request body, which is specified by the value of Content-Type in the request header. When the value of Content-Type is multipart/form-data; boundary=3vkqffBXJh (where 3vkqffBXJh is a custom separator), the data stream recorded in the request body is directed to the Multipart parser, and the parser uses regular matching. Separate the file name, Multipurpose Internet Mail Extensions (MIME) type, file data stream and other data stream content from the data stream, and then subtract the original data length from the boundary object Boundary, file name, MIME type, etc. The length of the data stream part of the file is the actual file length. When the value of Content-Type is not multipart/form-data; boundary=3vkqffBXJh, it is considered that the raw file data stream in the request body is unprocessed, and the original data length is the actual file length. At the same time, you can also start from the request header. The parameters carried in the URL or the Content-Disposition of the request header get the file name, take the value of Content-Type as the MIME type, and get the actual file length, file upload time, etc., in order to store the metadata of the file later . Then, the file fingerprint is calculated based on the fixed byte length data of the header of the file data stream and the actual file length, where the file data stream includes the fixed byte length data of the header and the remaining data.

When a user uploads a file through the provided POSIX interface, similarly, when uploading, the intermediate server 104 receives a user-initiated data write request with a write (WRITE) method type through the user space file system (Filesystem in Userspace, FUSE) (Referred to as a WRITE request), and then extract the data content of the file to be uploaded from the WRITE request and calculate the first file identifier.

In one embodiment, in step S508, meta-information of the file may also be stored. Among the stored meta-information in the file, the information related to the block file can be selected from the previously stored first A file identifier corresponding to the meta-information is obtained, and other file-related information, such as upload time, can be obtained from the file upload request of the file.

In one embodiment, as shown in FIG. 4, after step S304 and before step S404, the file upload method further includes: step S402, calculating at least one second file identifier corresponding to the at least one block file; wherein, step S310 The meta-information stored in further includes at least one second file identifier corresponding to the at least one block file stored in association.

The second file identification is information used to uniquely identify the file. By calculating and storing the second file identifier of the block file before performing the file disguise, the characteristic information of the original block file before the disguise can be recorded.

In one of the embodiments, as shown in FIG. 10, the intermediate server 104 can collect the block URL, the fingerprint of the block file, and the block file data stream before the disguise and the block file data stream after the disguise through the abstract file object. A set of information such as the start offset in the block file data stream and the data length of the block file data stream before masquerading are obtained to obtain N sets of information corresponding to the N block files. The N groups of information are arranged to form a block information sequence according to the sequence of the block files before the file is split. Specifically, the starting offset of the block file can be used as a key, and the rest of the information such as the block URL, the fingerprint of the block file, and the data length can be used as values to form the key-value of the block file. ) To insert a dictionary (dict). In the case of uploading at least one block file using multiple threads, you can wait for all the threads to upload, and then sort the key-value pairs corresponding to all the block files in the dictionary according to the order in which the block files are arranged before the file is split. , The sorted dictionary is the block information sequence. Then serialize the block information sequence with the file name, file modification time, MIME type, file fingerprint, upload time, encoder used and other information using JSON and other formats to obtain the meta information of the file. The file name extension has a one-to-one correspondence with the MIME type. Therefore, the MIME type may not be included in the meta-information of the stored file. When you need to restore the downloaded block file to get the original file, you can use the file name extension The name infers the corresponding MIME type.

In addition to searching for the entire file before uploading the file, this application can also search for block files before uploading the file. In one embodiment, as shown in FIG. 6, after step S402 and before step S306, the file upload method further includes: step S602, searching for at least one second file identifier in the information storage and the storage server, respectively Each second file ID in.

Correspondingly, step S306 and step S308 include: step S604, using the second file identifier of the at least one second file identifier that is not stored in the information storage and storage server as the target second file identifier, and assigning the target second file identifier to the corresponding The block file is uploaded to the storage server; and step S606, the storage address of the block file corresponding to the target second file identifier sent by the storage server is received.

In this embodiment, the second file identifier may be the check value of the file, for example, the file fingerprint of the file. By calculating the file fingerprint of the file, the content of the file can be performed with higher accuracy. Comparison. During the transmission of the data stream corresponding to the file, a hash algorithm such as SHA-1 can be used to iteratively calculate the check value of the file.

In the above-mentioned embodiment, the file to be uploaded is first divided into block files, and then the block file is used as a unit to find whether each block file has previously uploaded the same file in the information storage of the intermediate server and in the storage server. When uploading a block file, you only need to upload a block file that has not been uploaded before, instead of repeatedly uploading a block file that has been uploaded before, thereby further saving file storage capacity and system processing resources. In one embodiment, as shown in FIG. 6, after step S602 and before step S310, the file upload method further includes: step S608, identifying the first file that has been stored in the information storage or the storage server among the at least one second file identifier. The second file identifier is used as the stored second file identifier, and the storage address of the block file corresponding to the stored second file identifier is obtained. Correspondingly, the meta-information stored in step S310 includes the first file identifier, the target second file identifier, the stored second file identifier, the storage address of the block file corresponding to the target second file identifier, and the stored file identifier of the associated stored file. The storage address of the block file corresponding to the second file identifier, and information about the sequence of the block file corresponding to the target second file identifier and the block file corresponding to the stored second file identifier in the file before splitting.

In the above embodiment, for at least one segmented file obtained by segmenting a file to be uploaded, after uploading segmented files that have not been uploaded before, the meta information of the segmented files that have been uploaded before can be separately uploaded, Combined with the meta-information of the currently uploaded block file, the complete meta-information of the file to be uploaded is obtained, so that the corresponding original file can be downloaded based on the meta-information of the file when downloading.

In one embodiment, as shown in FIG. 7, the information storage 110 of the present application may store a database, and the database includes at least one public database 111 and multiple local databases 112. Wherein, each user equipment 102 corresponds to a local database 112, and the local database 112 is dedicated to storing the index information of the corresponding user equipment 102. For individual users, the local database 112 can be configured on the personal user terminal. For enterprise users, The local database can be configured on the enterprise server or on the enterprise user terminal of the enterprise. The public database 111 may be used to store index information of sharable files uploaded by multiple user equipment 102. In this way, the index information of the file can be shared across instances, so that when new index information is added to any instance using this framework, the other instances can query the index information, thereby sharing the file to all instances.

In one embodiment, the number of the storage server is multiple; the number of the block file is multiple; the uploading the at least one block file to the storage server includes: The block files are respectively uploaded to the plurality of storage servers, so that each storage server of the plurality of storage servers stores a part of the block files of the plurality of block files. In this way, each storage server saves only part of the block files of the multiple block files of the file, but does not save all the block files of the file, making it difficult to restore the complete original file from a single storage server, which improves the file storage Security.

In an embodiment, the number of the storage server is multiple; the uploading the at least one block file to the storage server includes: uploading the at least one block file to the multiple storage servers respectively , So that the at least one block file is repeatedly stored in at least two storage servers of the plurality of storage servers. In this way, each block file is backed up and stored in more than two storage servers. When downloading block files, if the correct block file cannot be downloaded from a storage server, you can also download it from a backup storage server Download files in blocks, thereby ensuring the reliability of file downloads.

Further, in one embodiment, when there are more than three storage servers, the above two embodiments may also be combined, and storing at least one block file in the storage server includes: storing the multiple block files to multiple Storage server, so that each of the plurality of storage servers stores a part of the plurality of block files, and each of the plurality of block files is stored in at least one of the plurality of storage servers Two storage servers. In this way, the reliability of file download can be improved while ensuring the security of file storage.

In an embodiment, the above-mentioned file uploading method may further include the step of specifying to update data in any byte range of an uploaded file. Specifically, when the intermediate server 104 receives the file update request of the specified byte range of the update file, it first finds the metadata of the old file uploaded previously based on the first file identifier, and then updates the file based on the file specified in the file update request. Range, determine one or more uploaded block files partially or fully covered by the file update range, based on the start and end positions of each block file before disguise corresponding to these uploaded block files, Split and disguise the new file to be uploaded in the file update request, then replace the old block file stored in the storage server with the new block file after division and disguise, and update the meta information of the replaced block file , You can update the data of any byte range of the uploaded file. Further, when the start position and end position of the update range of the file requested to be updated are not completely aligned with the start position and end position of the old block file uploaded previously, that is, there is an offset difference between them , According to the start position and end position of the specified byte range, the corresponding start block file and/or the end block file of the downloaded multiple block files can be divided, and the specified byte range can be removed. For the outside part, reserve and restore one or more block files within the specified byte range, so as to accurately download the file that meets the specified byte range. Wherein, the start position and the end position of each block file can be calculated according to the data length of the block file recorded in the meta information of each block file. Among them, HTTP Range can also be used to specify the file update range, and the Web server can parse the POST/PUT request initiated by the client to obtain the Range data in the request header.

In an embodiment, as shown in FIG. 8, the present application provides a file download method. Taking the file download method applied to the intermediate server 104 in FIG. 1 and the back-end program of this application deployed on the intermediate server 104 as an example for description, the method may include the following steps S802-S810.

Step S802: Obtain the first file identifier of the file to be downloaded.

Specifically, the intermediate server 104 may receive a file download request sent by the user equipment 102, where the file download request carries the first file identifier of the file to be downloaded. The intermediate server 104 parses the file download request to obtain the first file identifier of the file to be downloaded.

Step S804: Obtain the meta information of the file based on the first file identifier; where the meta information of the file includes the first file identifier of the file, at least one storage address, and at least one block file corresponding to the at least one storage address in the file before splitting. Sort order information.

In this step, the intermediate server 104 finds the meta information of the file based on the first file identifier of the file.

Step S806: Use at least one storage address to download at least one block file from a storage server corresponding to the at least one storage address.

In this step, the intermediate server 104 uses at least one storage address to generate corresponding multiple file download requests, and sends the multiple file download requests to the corresponding one or more storage servers 106, from each storage address. The storage server corresponding to the address downloads the corresponding piece of block data, so as to obtain at least one piece of file corresponding to the at least one storage address one-to-one.

In one of the embodiments, there are multiple storage servers, and the intermediate server 104 needs to know which storage server the storage address corresponds to before sending the file download request corresponding to each storage address. When the storage address returned by the storage server itself carries the information of the storage server, the intermediate server 104 can read the information of the corresponding storage server from the storage address.

Step S808, based on the information of the arrangement sequence of the at least one block file, restore the at least one block file to a complete file and return the complete file.

Specifically, the intermediate server 104 may restore the at least one block file to a complete file based on the information about the arrangement sequence of the at least one block file corresponding to the at least one storage address stored in the meta-information before the file is split. Describe the complete file to the user device 102.

The above-mentioned file download method of the present application can restore the downloaded at least one block file to a complete file based on the stored information of the arrangement sequence of the at least one block file, and return the complete file to the user device 102.

In one embodiment, when the meta-information of the file is stored in the information storage, the step S504 in the above-mentioned file download method is based on the first file identification, and the obtaining of the meta-information of the file includes: based on the first file identification, searching and combining in the information storage Get the meta information of the file.

In one embodiment, when the meta-information of the file is stored in the storage server, in step S804 of the above-mentioned file downloading method, based on the first file identification, obtaining the meta-information of the file includes: searching and obtaining in the information storage based on the first file identification The storage address of the meta-information of the file; and using the storage address of the meta-information to download the meta-information of the file from the storage server corresponding to the storage address of the meta-information.

In the above embodiment, by uploading the meta-information of the file to the storage server, the information storage can only store the storage address of the meta-information of the file and the information identified by the first file, thereby reducing the data storage burden of the information storage. Save the data storage capacity of the system.

In one embodiment, as shown in FIG. 9, the meta information obtained in step S804 further includes at least one second file identifier corresponding to at least one block file stored in association; after step S806, and before step S808, the file The download method further includes: step S904, calculating at least one third file identifier corresponding to the downloaded at least one block file. Step 906: When the third file identifier of the divided file matches the second file identifier of the divided file, it is determined that the divided file passes the verification. Step 908: When the third file identifier of the divided file does not match the second file identifier of the divided file, it is determined that the divided file has not passed the verification, and the divided file is re-downloaded. The block file replaces the block file that fails the verification.

The third file identifier and the second file identifier are both information used to uniquely identify the segmented file. When two segmented files have the same second file identifier or third file identifier, they can be considered Chunked files are the same file. In this embodiment, the second file identifier and the third file identifier may be the check value of the block file, for example, the block file fingerprint of the block file, by calculating the block file of the block file File fingerprints can compare the content of the block files with higher accuracy. During the transmission of the data stream corresponding to the block file, a hash algorithm such as SHA-1 may be used to iteratively calculate the check value of the block file.

In the foregoing embodiment, by calculating and storing the second file identifier of the block file before performing the file disguise, the characteristic information of the original block file before the disguise can be recorded. After downloading and restoring the block file before disguise, the third file ID of the block file before disguise is calculated and compared with the second file ID of the block file uploaded before disguise. When all the blocks are When both pass the verification, it can be determined that the restored complete file is the same as the original file when uploaded, so that the integrity and reliability of the downloaded file can be guaranteed.

Further, in one embodiment, when uploading files, before disguising the block files in step S404, a redundant part may be added to each block file, and the redundant part includes an erasure code, for example, Reed-Solomon encoding (Reed-Solomon encoding, RS encoding for short); when downloading files, when verifying each block file obtained by the download, it is used as an alternative to the above step 908, if the current block file fails the calibration In order to improve the stability of file download, the erasure code of the block file can be used to restore the block file. Correspondingly, before the complete file is restored in step S808, the redundant part in the block file needs to be deleted, so that the restored complete file is consistent with the original file.

When downloading a file, it is necessary to obtain at least one block file before the disguise to restore the original file. In one embodiment, as shown in FIG. 9, the meta-information obtained in step S804 further includes information of at least one encoder corresponding to at least one block file; after step S806 and before step S904, the file download method It further includes: S902, obtaining at least one decoder corresponding to the at least one encoder based on the at least one encoder; using the at least one decoder to restore the corresponding at least one block file to at least one before the disguise. Block file.

Among them, the decoder is a module used to restore the input file that has been disguised as the specified target file format to the file before disguise. The target file format possessed by the decoder means that the decoder will restore the file disguised as the target file format to its original format. The decoder can extract the block file part before disguise from each received data stream, and ignore or delete the file header and file tail part added by disguise to obtain the block file before disguise. The encoder and decoder have a one-to-one correspondence. In this embodiment, after the information of the encoder is obtained, the information of the corresponding decoder can be obtained, and the information of the corresponding decoder is used to decode the block file after disguise into the block file before disguise.

As an alternative to the above step S902, in another embodiment, when each of the at least one segment files stored in the storage server is disguised as a segment file with a predetermined format, step S806 specifically includes : Using the position of each block file before disguise in the corresponding block file after disguise, download the block file before disguise from the block file after disguise stored in the storage server corresponding to the storage address.

The position of the block file before the disguise in the block file after the disguise includes the position interval composed of the start position and the end position of the block file before the disguise in the block file after the disguise. The position interval may be determined based on the data length of the block file before masquerading and the starting offset of the block file before masquerading in the block file after masquerading. This information can be recorded in the meta-information of the file.

In the above-mentioned embodiment, it is not necessary to download the complete block file and then perform the decoding process for the block file that has been disguised before uploading. In the process of downloading the block file, use the position of the block file before the disguise in the block file after the disguise, directly from the block file after the disguise stored in the storage server, specify the download of the block file before the disguise The data of the occupied position interval can be downloaded without downloading the data of other positions. Specifically, the HTTP Range request header can be used to specify to download data in a specified byte range of a certain block file when downloading a block file. In this way, the block file before the disguise can be downloaded directly, which further saves data transmission traffic and also saves the performance cost of the system.

Similarly, we can extend the feature of downloading the specified byte range of a block file to the entire file. By writing a web server that can also accept and parse the HTTP Range request header, users can use the current There are some download tools to achieve multi-threaded download and resumable upload. At this point, the Web server can obtain the value of the meta-information mentioned above that can extract the file from the URL PATH or form part of the GET/POST request initiated by the client, such as the first file identifier, and then parse the Range data in the request header For the file download range specified in the Range data, determine and download one or at least one block file corresponding to the file download range to return the specified download data. When the start position and end position of the requested specified byte range are not completely aligned with the start and end positions of the stored block file, that is, there is an offset between them, it can be based on the specified byte range Start position and end position, split the corresponding start block file and/or end block file of the downloaded multiple block files, remove the part outside the specified byte range, retain and restore the specified One or more block files within the byte range, so as to accurately download the file that meets the specified byte range.

In one embodiment, the number of storage servers is multiple; the number of block files is multiple; each block file of the multiple block files is repeatedly stored in the multiple storage servers Said downloading the at least one block file from the storage server corresponding to the at least one storage address by using the at least one storage address includes: according to the The first storage address of the multiple storage addresses corresponding to each block file downloads the corresponding block file from the storage server corresponding to the first storage address; when the first storage address is used from the first storage address to the first storage address When the storage server corresponding to a storage address fails to download the corresponding block file, the second storage address of the plurality of storage addresses corresponding to the block file is used to download from the storage corresponding to the second storage address. The server downloads the corresponding block file.

In one embodiment, when the download using the second storage address fails, if there are other storage addresses, you can continue to switch and use other storage addresses to download the block file until the required block file is downloaded. In this way, the reliability of file download can be effectively improved.

It should be understood that although the various steps in the flowcharts of FIGS. 3-9 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless specifically stated in this article, there is no strict order for the execution of these steps, and these steps can be executed in other orders. Moreover, at least part of the steps in Figures 3-9 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The order of execution of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

In one embodiment, as shown in FIG. 11, a file uploading device 1100 is provided, including: an upload file acquisition module 1101, a file segmentation module 1102, a file upload module 1103, a storage address receiving module 1104, and a meta-information storage module 1105 ,

The upload file obtaining module 1101 is used to obtain files to be uploaded.

The file segmentation module 1102 is used to segment the file to be uploaded into at least one segmented file.

The file upload module 1103 is used to upload at least one block file to the storage server.

The storage address receiving module 1104 is configured to receive at least one storage address corresponding to at least one block file returned by the storage server.

The meta-information storage module 1105 is used to store the meta-information of the file.

In one embodiment, as shown in FIG. 12, a file downloading device 1200 is provided, including: a file identification obtaining module 1201, a meta information obtaining module 1202, a file downloading module 1203, and a file restoring module 1204, wherein:

The file identifier obtaining module 1201 is used to obtain the first file identifier of the file to be downloaded.

The meta-information acquiring module 1202 is configured to acquire meta-information of the file based on the first file identifier. The meta-information of the file includes the first file identifier of the file, at least one storage address, and information about the arrangement sequence of the at least one block file corresponding to the at least one storage address in the file before division.

The file download module 1203 is configured to use at least one storage address to download at least one block file from a storage server corresponding to the at least one storage address.

The file restoration module 1204 is configured to restore at least one block file to a complete file and return the complete file based on the information of the arrangement sequence of the at least one block file.

In one embodiment, as shown in FIG. 13, the present application provides a file management device 1300, the file management device 1300 is in communication connection with the storage server 106; the file management device 1300 includes a processor 1301 and an information storage 110; the processor 1301 uses At:

When a file upload request is received, execute: obtain the file to be uploaded; divide the file to be uploaded into at least one block file; upload at least one block file to the storage server; receive at least one block file returned by the storage server At least one corresponding storage address; the meta-information of the storage file;

When a file download request is received, execute: obtain the first file identification of the file to be downloaded; obtain the meta-information of the file based on the first file identification; use at least one storage address to download from the storage server corresponding to the at least one storage address At least one block file; and at least one block file arrangement order information, restore at least one block file to a complete file and return the complete file;

The information storage 110 is configured to store meta-information of the file or a storage address of the meta-information, where the meta-information of the file includes the first file identifier of the file, the at least one storage address, and the at least one storage address The corresponding arrangement sequence of the at least one divided file before the file is divided, and the storage server is used to store the uploaded file.

The above examples only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

A file upload method, the method includes:

Obtain the file to be uploaded;

Dividing the file to be uploaded into at least one divided file;

Uploading the at least one block file to the storage server;

Receiving at least one storage address corresponding to the at least one block file returned by the storage server; and

Store the meta-information of the file, where the meta-information of the file includes the first file identifier of the file, the at least one storage address, and the at least one block file corresponding to the at least one storage address. The order in the previous document.
The method according to claim 1, wherein said storing the meta-information of the file comprises:

The meta information of the file is stored in the information storage.
The method according to claim 1, wherein said storing the meta-information of the file comprises:

Uploading the meta information of the file to the storage server;

Receiving the storage address of the meta-information returned by the storage server;

The storage address of the meta-information and the first file identification of the file are stored in the information storage in association with each other.
The method according to claim 1, wherein the dividing the file to be uploaded into at least one block file comprises:

Based on a predetermined size, the file to be uploaded is divided to obtain a first number of block files having the predetermined size; the value of the predetermined size is less than or equal to the file size allowed to be stored by the storage server The upper limit value of the first number is a value rounded from the quotient of the size of the file to be stored and the predetermined size;

After the first number of block files with the predetermined size are obtained by dividing, if there is a remaining part of the file, the remaining part of the file is allocated as a block file.
The method according to claim 1, wherein after the dividing the file to be uploaded into at least one divided file and before uploading the at least one divided file to a storage server, the method further comprises:

Selecting an encoder with a target file format, and using the encoder to disguise the at least one block file as having the target file format;

Wherein, the target file format includes a file format allowed to be stored by the storage server;

The stored meta information of the file further includes information for disguising the corresponding at least one encoder of the at least one block file.
The method according to claim 1, wherein, after obtaining the file to be uploaded, the method further comprises:

Acquiring the first file identifier of the file to be uploaded;

Searching the first file identifier in the information storage and the storage server; and

When the first file identifier is not stored in the information storage and the storage server, continue to perform the step of dividing the file to be uploaded into at least one block file;

When the first file identifier has been stored in the information storage or the storage server, the upload operation of the file is terminated.
The method according to claim 1, wherein, after dividing the file to be uploaded into at least one file in blocks, the method further comprises:

Calculating at least one second file identifier corresponding to the at least one block file;

Wherein, the meta information further includes at least one second file identifier corresponding to the at least one block file stored in association.
The method according to claim 7, wherein:

After calculating the at least one second file identifier corresponding to the at least one block file, the method further includes searching for each second file identifier in the at least one second file identifier in the information storage and the storage server, respectively ；

The uploading the at least one block file to the storage server, and receiving the at least one storage address corresponding to the at least one block file returned by the storage server includes:

It is determined that the second file identifier of the at least one second file identifier that is not stored in the information storage and the storage server is used as the target second file identifier, and the block file corresponding to the target second file identifier is uploaded To the storage server;

Receiving the storage address of the block file corresponding to the target second file identifier returned by the storage server.
The method according to claim 8, further comprising:

It is determined that the second file identifier of the at least one second file identifier that has been stored in the information storage or the storage server is used as the stored second file identifier, and the score corresponding to the stored second file identifier is obtained The storage address of the block file.
The method according to claim 1, wherein the number of the storage server is multiple; the number of the block file is multiple; and the uploading the at least one block file to the storage server comprises:

Upload the plurality of block files to the plurality of storage servers respectively, so that each storage server of the plurality of storage servers stores a part of the block files of the plurality of block files.
The method according to claim 1, wherein: the number of the storage server is multiple; and the uploading the at least one block file to the storage server comprises:

Upload the at least one segmented file to the multiple storage servers respectively, so that the at least one segmented file is repeatedly stored in at least two of the multiple storage servers.
A file downloading method, the method includes:

Acquiring the first file identifier of the file to be downloaded;

Based on the first file identifier, the meta-information of the file is acquired; the meta-information of the file includes the first file identifier of the file, at least one storage address, and at least one block corresponding to the at least one storage address Information about the arrangement order of the file in the file before it is divided;

Using the at least one storage address to download the at least one block file from a storage server corresponding to the at least one storage address; and

Based on the information of the arrangement sequence of the at least one block file, restore the at least one block file to a complete file and return the complete file.
The method according to claim 12, wherein the meta-information of the file is stored in an information storage; said obtaining the meta-information of the file based on the first file identifier comprises:

Based on the first file identifier, search and obtain meta information of the file in the information storage.
The method according to claim 12, wherein when the meta information of the file is stored in a storage server; said obtaining the meta information of the file based on the first file identifier comprises:

Based on the first file identifier, search for and obtain the storage address of the meta-information of the file in the information storage; and

Using the storage address of the meta information, download the meta information of the file from the storage server corresponding to the storage address of the meta information.
The method according to claim 12, wherein the meta information further comprises at least one second file identifier corresponding to the at least one block file stored in association;

Before restoring the at least one block file to a complete file based on the information of the arrangement sequence of the at least one block file and returning to the complete file, the method further includes:

Calculating at least one third file identifier corresponding to the downloaded at least one block file;

When the third file identifier of the divided file matches the second file identifier of the divided file, determining that the divided file passes the verification;

When the third file identifier of the segmented file does not match the second file identifier of the segmented file, it is determined that the segmented file has not passed the verification, and the segmented file is downloaded again, and the downloaded segmented file will be re-downloaded. The block file replaces the block file that fails the verification.
The method according to claim 12, wherein the meta information further comprises information of at least one encoder corresponding to the at least one block file;

Before restoring the at least one block file to a complete file based on the information of the arrangement sequence of the at least one block file and returning to the complete file, the method further includes:

Obtaining at least one decoder corresponding to the at least one encoder based on the information of the at least one encoder;

Using the at least one decoder, the corresponding at least one block file is respectively restored to at least one block file before masquerading.
The method according to claim 12, wherein, when each of the at least one block file stored in the storage server is disguised as a block file with a target file format, the use The at least one storage address downloading the at least one block file from a storage server corresponding to the at least one storage address includes:

Using the position of each block file before disguise in the corresponding block file after disguise, download all the block files before disguise from the block files after disguise stored in the storage server corresponding to the storage address. The block file is described.
The method of claim 12, wherein:

The number of the storage server is multiple;

When each of the plurality of block files is repeatedly stored in at least two storage servers of the plurality of storage servers;

The using the at least one storage address to download the at least one block file from a storage server corresponding to the at least one storage address includes:

Downloading the corresponding block file from the storage server corresponding to the first storage address by using a first storage address among the plurality of storage addresses corresponding to each block file in the plurality of block files;

When using the first storage address to download the corresponding block file from the storage server corresponding to the first storage address fails, use the second storage among the plurality of storage addresses corresponding to the block file Address, download the corresponding block file from the storage server corresponding to the second storage address.
A file management device, the file management device is in communication connection with a storage server, and the file management device includes a processor and an information memory;

The processor is configured to, when receiving a file upload request, execute:

Obtain the file to be uploaded;

Dividing the file to be uploaded into at least one divided file;

Uploading the at least one block file to the storage server;

Receiving at least one storage address corresponding to the at least one block file returned by the storage server;

Store meta-information of the file;

When a file download request is received, execute:

Acquiring the first file identifier of the file to be downloaded;

Obtaining meta-information of the file based on the first file identifier;

Using the at least one storage address, download the at least one block file from a storage server corresponding to the at least one storage address, and the information of the arrangement order of the at least one block file, and divide the at least one block file Restore the file to a complete file and return the complete file;

The information storage is used to store meta-information of the file or the storage address of the meta-information; wherein the meta-information of the file includes the first file identifier of the file, the at least one storage address, and the at least one storage address. The arrangement sequence of the at least one block file corresponding to the address before the file is split.