WO2021237467A1 - File uploading method, file downloading method and file management apparatus - Google Patents

File uploading method, file downloading method and file management apparatus Download PDF

Info

Publication number
WO2021237467A1
WO2021237467A1 PCT/CN2020/092383 CN2020092383W WO2021237467A1 WO 2021237467 A1 WO2021237467 A1 WO 2021237467A1 CN 2020092383 W CN2020092383 W CN 2020092383W WO 2021237467 A1 WO2021237467 A1 WO 2021237467A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
block
storage
information
meta
Prior art date
Application number
PCT/CN2020/092383
Other languages
French (fr)
Chinese (zh)
Inventor
许若阳
Original Assignee
深圳元戎启行科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳元戎启行科技有限公司 filed Critical 深圳元戎启行科技有限公司
Priority to CN202080007587.2A priority Critical patent/CN113273163A/en
Priority to PCT/CN2020/092383 priority patent/WO2021237467A1/en
Publication of WO2021237467A1 publication Critical patent/WO2021237467A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1074Peer-to-peer [P2P] networks for supporting data block transmission mechanisms
    • H04L67/1078Resource delivery mechanisms
    • H04L67/108Resource delivery mechanisms characterised by resources being split in blocks or fragments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Definitions

  • This application relates to the field of data storage technology, and in particular to a file upload method, file download method and file management device.
  • the storage servers of these storage service providers can provide access to object storage services, such as Object-Based Storage Systems, for corporate users to perform Storage of mobile applications, large-scale websites, picture sharing or hotspot audio and video, or low-frequency access storage and archive storage, or storage of files for individual users, etc.
  • object storage services such as Object-Based Storage Systems
  • This type of service can provide flat file storage and Content Delivery Network (CDN) resources to improve the loading speed of static resources by users.
  • CDN Content Delivery Network
  • the user uploads the file to the storage pool of the storage server through the back-end program of the user terminal, and can obtain the Uniform Resource Locator (URL) returned by the storage server, and embed the address in the web page or application program interface (In the data returned by Application Programming Interface (API), the user can download the previously uploaded file by virtue of the URL.
  • URL Uniform Resource Locator
  • API Application Programming Interface
  • a method for uploading files includes: obtaining a file to be uploaded; dividing the file to be uploaded into at least one block file; uploading at least one block file to a storage server; receiving a file corresponding to at least one block file returned by the storage server At least one storage address; and the meta-information of the stored file; the meta-information includes the first file identifier of the associated stored file, at least one storage address, and the arrangement order of the at least one block file corresponding to the at least one storage address before the file is split information.
  • a file download method includes: obtaining a first file identification of a file to be downloaded; obtaining meta information of the file based on the first file identification; meta information including the first file identification of the file, at least one storage address, and at least one storage address Information about the arrangement sequence of the corresponding at least one block file before the file is split; download at least one block file from a storage server corresponding to the at least one storage address by using at least one storage address; and arrange based on the at least one block file Sequence information, restore at least one block file to a complete file and return the complete file.
  • a file management device the file management device is in communication connection with a storage server; the file management device includes a processor and an information memory.
  • the processor is used to execute the above-mentioned file upload method and file download method.
  • the file to be uploaded is divided into at least one block file, and the at least one block file is respectively stored in a storage server, and when the file is downloaded At this time, download at least one block file from the storage server, and merge the at least one block file according to the division order of the block file recorded locally to obtain the restored file.
  • the files stored on the storage server are incomplete block files that are divided into blocks, and it is difficult to obtain information such as the arrangement order of these block files from the storage server, so that it is difficult to restore the complete storage files from the storage server, which effectively improves This improves the security of files stored on the storage server.
  • Figure 1 is an application environment diagram of a file upload method and a file download method in an embodiment
  • FIG. 2 is a diagram of the application environment of the file upload method and the file download method in another embodiment
  • FIG. 3 is a schematic flowchart of a file upload method in an embodiment
  • Figure 4 is a schematic flowchart of a file upload method in an embodiment
  • FIG. 5 is a schematic flowchart of a file upload method in an embodiment
  • FIG. 6 is a schematic flowchart of a file upload method in an embodiment
  • FIG. 7 is a schematic diagram of the structure of an information storage in an embodiment
  • FIG. 8 is a schematic flowchart of a file download method in an embodiment
  • FIG. 9 is a schematic flowchart of a file download method in an embodiment
  • Figure 10 is a schematic diagram of a file upload method and a file download method in an embodiment
  • Figure 11 is a structural block diagram of a file uploading device in an embodiment
  • Figure 12 is a structural block diagram of a file downloading device in an embodiment
  • Fig. 13 is a structural block diagram of a file management device in an embodiment.
  • the file upload method and file download method provided in this application can be applied to the application environment as shown in FIG. 1.
  • the user equipment 102 communicates with the intermediate server 104 through the network, and the intermediate server 104 communicates with the storage server 106 through the network.
  • the storage server 106 is usually a third-party server.
  • the user equipment 102 may be an enterprise user equipment or a personal user terminal, and the enterprise user equipment may be an enterprise server and/or an enterprise user terminal.
  • Personal user terminals and enterprise user terminals can be, but are not limited to, various personal computers, notebook computers, smart phones, tablet computers, etc.
  • the storage service provider provides storage services through the storage server 106, and the storage server 106 is provided with a storage pool for storing files.
  • the file upload method and file download method provided in this application can be executed by a file management device.
  • the file management device can store a back-end program.
  • the back-end program can be deployed on the intermediate server 104.
  • the file upload method and file download method of the application In other embodiments, a part of the back-end program may also be deployed in the intermediate server 104, and the other part may be deployed in the user equipment 102 by the intermediate server 104, or the intermediate server 104 may also deploy all the back-end programs in the user equipment. In equipment 102.
  • the file uploading method and file downloading method of the present application can be executed in the intermediate server 104 as part of the steps and executed in the back-end program of the user equipment 102, or all steps can be executed in the back-end of the user equipment 102 Executed in the program.
  • the file management apparatus may include an information storage 110, and the information storage 110 may include an internal memory and a non-volatile storage medium.
  • the back-end program may be stored on the non-volatile storage medium.
  • a database may also be stored on the non-volatile storage medium, and the database may be used to store the index information of the uploaded file.
  • the index information may include the meta-information of the file, or the meta-information of the file may be obtained through the index information.
  • the index information may also include the storage address of the associated meta-information and the first file identifier.
  • the information storage 110 may be located in the user equipment 102 or in the intermediate server 104.
  • the storage services provided by multiple different storage service providers can be comprehensively utilized.
  • the intermediate server 104 can perform the processing through the network with multiple storage servers 106 corresponding to multiple storage service providers.
  • the file management apparatus may choose to store the divided at least one block file in a plurality of different storage servers 106 respectively.
  • Both the intermediate server 104 and the storage server 106 can be implemented by independent servers or a server cluster composed of multiple servers.
  • the index information of the file can be stored in the Redis database, and the file system directory structure can be realized in the database by using the Hash and Set data structure. According to the actual situation, you can choose whether to cache the file meta-information together in the Redis database, or record the URL of the web page address where the meta-information is located. At the same time, the file name, actual file length, file creation and modification time and other information are stored in the Redis database to speed up the query of such information.
  • the Redis database communicates using the TCP protocol, so that multiple framework instances can be configured to connect to the same Redis database instance, thereby easily achieving file index synchronization.
  • Redis database supports atomic operations, which can effectively avoid various competition situations.
  • the Redis database supports a master-slave architecture, which is convenient for expansion and can provide high availability guarantee.
  • a directory is created in another file system as the root directory of this file system, and all operations with a program interface granularity greater than the file's own reading and writing can be completed by the system calling the POSIX interface, that is, a Transparent proxy for directory structure. Then the meta-information of this framework file is stored in files in the corresponding directory structure of other file systems. When file data operations are performed, the request is forwarded to the bottom layer of the framework abstract file for processing.
  • the file system transparent proxy method uses an existing mature file system to host the file index of the framework, which has high stability and security.
  • part of the data in the meta-information can be directly consistent with the ATTR attributes supported in the existing file system, such as the creation time ctime and the modification time mtime.
  • the creation time ctime For attributes that cannot be kept consistent, such as the real size of a file, it can be obtained by simulating the output by reading the meta-information at the transparent proxy layer.
  • a file upload method is provided.
  • the file upload method is applied to the intermediate server 104 in FIG. Deploying on the intermediate server 104 is taken as an example for description, including the following steps S302-S310.
  • Step S302 Obtain the file to be uploaded.
  • the intermediate server 104 receives a file upload request sent by the user equipment 102, and the file upload request carries a file to be uploaded.
  • the intermediate server 104 parses the file upload request to obtain the file to be uploaded.
  • the file management device can provide both an HTTP API interface and a POSIX interface. Take the user equipment 102 using the HTTP API interface to perform file upload operations as an example. As shown in Figure 10, the user uses the web front end of the browser or a dedicated client or other means to directly initiate a POST/PUT method type to the intermediate server 104 File upload request (POST/PUT request for short), the request body of the POST/PUT request body part carries the file to be uploaded. Therefore, in this step, the intermediate server 104 can receive the POST/PUT request from the user equipment 102 that carries the file to be uploaded.
  • POST/PUT request for short
  • Step S304 Divide the file to be uploaded into at least one partial file.
  • the intermediate server 104 may divide the file to be uploaded into one or more divided files according to a predetermined file division rule.
  • the intermediate server 104 instantiates and obtains an abstract file object.
  • the abstract file object can provide a variety of methods for different developers to call and use, for example, it can be used in writing mode. The flush, truncate, and write methods, and the locate, read, seek, and tell methods available in read mode.
  • the write method internally cuts the file data stream according to the pre-configured block size to obtain Multi-segment block file data stream.
  • the multi-segment block file data stream represents at least one corresponding block file.
  • Step S306 Upload at least one block file to the storage server.
  • the storage server 106 provides an upload interface, and the intermediate server 104 can respectively send a plurality of file upload requests to the storage server 106 through the upload interface provided by the storage server 106, and each file upload request carries One segment of the multiple data streams, so that the multiple data streams are uploaded to the designated storage server 106.
  • multiple data streams that is, at least one block file
  • the synchronous upload of at least one block file in the macro so as to make full use of bandwidth resources and accelerate the transmission speed.
  • Step S308 Receive at least one storage address corresponding to the at least one block file returned by the storage server.
  • the storage server 106 When the storage server 106 receives any block file uploaded by the intermediate server 104, it stores the block file in the storage pool of the storage server 106 and sends the corresponding storage address to the intermediate server 104.
  • the storage address is used to indicate the storage location of the block file in the storage server 106.
  • the storage address may be a Uniform Resource Locator (URL).
  • Step S310 Store meta-information of the file, where the meta-information of the file includes the first file identifier of the associated stored file, at least one storage address, and the arrangement sequence of the at least one block file corresponding to the at least one storage address before file splitting.
  • the meta information of the file may record information related to various processing performed on the file during the file upload process, such as disguising, dividing, uploading, and so on.
  • the meta-information may also include other file-related information, such as the file name of the file carried in the file upload request of the file, the MIME type of the file, the actual file length, the modification time of the file, etc. .
  • the file data stream can be extracted from the request body of the file upload request, and the file data stream can be divided to obtain multiple data streams, and the multiple data streams can be uploaded.
  • the other file-related information extracted in the request is stored in the meta-information.
  • the first file identifier is information that uniquely identifies the file. When two files have the same first file identifier, it can be considered that the two files are the same file.
  • the first file identifier may be a check value of the file, such as a file fingerprint. By calculating the file fingerprint, the content of the file can be compared with higher accuracy.
  • the first file identifier may also be other information related to the file according to the requirements for the accuracy of file recognition, for example, it may be the storage path of the file in the information storage and so on.
  • the file to be uploaded is divided into a plurality of block files, and the plurality of block files are respectively stored in a storage server.
  • the files stored in the storage server are incomplete Block files, and it is difficult to obtain information such as the arrangement order of these block files from the storage server, so that it is difficult to restore the complete storage file from the storage server, which effectively improves the security of the files stored on the storage server.
  • storing the meta information of the file in step S310 includes: storing the meta information of the file in an information storage.
  • the meta-information of the file is stored in the information storage.
  • the meta-information of the file cannot be obtained from the storage server, making it difficult to restore the uploaded original file, which further improves the security of the file stored in this application.
  • the file meta-information of this application can also be uploaded to the storage server.
  • the meta-information of the stored file in the above step S310 includes: uploading the meta-information of the file to the storage server; receiving the storage address of the meta-information returned by the storage server; and storing the storage address of the meta-information in association with the first file identifier of the file In the information store.
  • the storage address of the meta-information can also be a URL.
  • the information storage of this information storage can only store the storage address of the meta-information of the file and the information of the first file identification, thereby reducing the cost of this information storage.
  • the burden of data storage saves the data storage capacity of the system.
  • the dividing the file to be uploaded into multiple block files in step S304 includes: dividing the file to be uploaded based on a predetermined size to obtain a first number of files with all the files.
  • the block file of the predetermined size the value of the predetermined size is less than or equal to the upper limit of the file size allowed to be stored by the storage server, and the first number is the difference between the size of the file to be stored and the predetermined size Quotient rounded value; when the first number of block files with the predetermined size are obtained by dividing, if there is a remaining part of the file, the remaining part of the file is allocated as a block file .
  • the block sizes of at least one block file may be the same or different from each other, as long as the predetermined size of each block file is less than or equal to the block file
  • the upper limit value of the file size allowed by the storage server to be stored correspondingly is sufficient.
  • Some storage service providers have restrictions on the file size allowed to be uploaded to their storage server.
  • the solution of the above embodiment of this application divides the file to be stored into a size less than or equal to the upper limit of the file size allowed by the storage server. Block files to meet the storage service provider’s limit on the size of the storage file, so that files of any size can be uploaded.
  • the meta-information of the file to be stored is small and can meet the file size limit of the storage server.
  • the meta-information is large and may exceed the upper limit of the file size allowed by the storage server.
  • the meta information can also be divided so that the size of each block meta information obtained by the division is smaller than the upper limit, and then multiple block meta information obtained by the division are uploaded to the storage server respectively.
  • the storage address of the block meta-information can be used in the same way to obtain the meta-information of the file.
  • the file upload method further includes: S404, selecting an encoder with a target file format, and using the encoder to disguise at least one block file respectively
  • the target file format includes a target file format; wherein the target file format includes a file format allowed by the storage server; correspondingly, the meta information stored in step S310 also includes at least one encoder corresponding to the at least one block file used for disguising. Information.
  • the encoder is a module used to disguise the input file of any file format into a file of the specified target file format and then output it.
  • the target file format of the encoder is the target file format that the file is disguised as.
  • the encoder may perform processing of adding a file header and a file tail or some other change processing to each received data stream, so as to disguise the block file represented by the data stream as a block file in the target file format.
  • At least one block file can be disguised as a plurality of block files of the same file format.
  • an encoder can be selected from at least one encoder with the same target file format to compare at least one block file. For camouflage, setting at least one encoder can avoid a situation where a single encoder fails and the camouflage processing cannot be performed.
  • the at least one block file can also be disguised as a plurality of block files with different file formats. In this case, at least one encoder with different target file formats can be selected to disguise the at least one block file respectively.
  • each piece of block file data stream obtained by cutting can be directed to the corresponding code with the target file format.
  • the encoder can add file header and file tail processing or some other change processing to each received block file data stream, so as to disguise the block file represented by the block file data stream as the target file format. Block files.
  • the file upload method further includes: step S502, obtaining the first file identifier of the file to be uploaded; step S504, searching the first file in the information storage and the storage server File identification; step S506, determine whether the first file identification is stored in the information storage and the storage server; when the first file identification is not stored in the information storage and the storage server, continue to execute the waiting Step S406 where the uploaded file is divided into at least one block file; when the first file identifier has been stored in the information storage or the storage server, step S508 is executed to terminate the upload of the file operate.
  • the first file identifier of the file to be uploaded is searched in the information storage and the storage server. If the stored first file identifier is found in either of the information storage and the storage server , It means that the same file has been uploaded before, the upload operation of the file is terminated, and only the metadata of the file is updated and stored. Avoid repeated uploads occupying unnecessary storage space and system processing resources.
  • the first file identifier is a file fingerprint
  • the file fingerprint of the file to be uploaded can be calculated based on the fixed byte length data of the header of the file data stream of the file and the actual file length.
  • the intermediate server 104 receives the POST/PUT request, it first parses the request header to obtain the original data length, that is, the value of Content-Length in the request header. Then obtain the data format of the request body, which is specified by the value of Content-Type in the request header.
  • the intermediate server 104 receives a user-initiated data write request with a write (WRITE) method type through the user space file system (Filesystem in Userspace, FUSE) (Referred to as a WRITE request), and then extract the data content of the file to be uploaded from the WRITE request and calculate the first file identifier.
  • WRITE write
  • FUSE user space file system
  • meta-information of the file may also be stored.
  • the information related to the block file can be selected from the previously stored first A file identifier corresponding to the meta-information is obtained, and other file-related information, such as upload time, can be obtained from the file upload request of the file.
  • the file upload method further includes: step S402, calculating at least one second file identifier corresponding to the at least one block file; wherein, step S310
  • the meta-information stored in further includes at least one second file identifier corresponding to the at least one block file stored in association.
  • the second file identification is information used to uniquely identify the file. By calculating and storing the second file identifier of the block file before performing the file disguise, the characteristic information of the original block file before the disguise can be recorded.
  • the intermediate server 104 can collect the block URL, the fingerprint of the block file, and the block file data stream before the disguise and the block file data stream after the disguise through the abstract file object.
  • a set of information such as the start offset in the block file data stream and the data length of the block file data stream before masquerading are obtained to obtain N sets of information corresponding to the N block files.
  • the N groups of information are arranged to form a block information sequence according to the sequence of the block files before the file is split. Specifically, the starting offset of the block file can be used as a key, and the rest of the information such as the block URL, the fingerprint of the block file, and the data length can be used as values to form the key-value of the block file.
  • a dictionary (dict).
  • the sorted dictionary is the block information sequence. Then serialize the block information sequence with the file name, file modification time, MIME type, file fingerprint, upload time, encoder used and other information using JSON and other formats to obtain the meta information of the file.
  • the file name extension has a one-to-one correspondence with the MIME type. Therefore, the MIME type may not be included in the meta-information of the stored file.
  • this application can also search for block files before uploading the file.
  • the file upload method further includes: step S602, searching for at least one second file identifier in the information storage and the storage server, respectively Each second file ID in.
  • step S306 and step S308 include: step S604, using the second file identifier of the at least one second file identifier that is not stored in the information storage and storage server as the target second file identifier, and assigning the target second file identifier to the corresponding The block file is uploaded to the storage server; and step S606, the storage address of the block file corresponding to the target second file identifier sent by the storage server is received.
  • the second file identifier may be the check value of the file, for example, the file fingerprint of the file.
  • the file fingerprint of the file By calculating the file fingerprint of the file, the content of the file can be performed with higher accuracy. Comparison.
  • a hash algorithm such as SHA-1 can be used to iteratively calculate the check value of the file.
  • the file to be uploaded is first divided into block files, and then the block file is used as a unit to find whether each block file has previously uploaded the same file in the information storage of the intermediate server and in the storage server.
  • the file upload method further includes: step S608, identifying the first file that has been stored in the information storage or the storage server among the at least one second file identifier.
  • the second file identifier is used as the stored second file identifier, and the storage address of the block file corresponding to the stored second file identifier is obtained.
  • the meta-information stored in step S310 includes the first file identifier, the target second file identifier, the stored second file identifier, the storage address of the block file corresponding to the target second file identifier, and the stored file identifier of the associated stored file.
  • the meta information of the segmented files that have been uploaded before can be separately uploaded, Combined with the meta-information of the currently uploaded block file, the complete meta-information of the file to be uploaded is obtained, so that the corresponding original file can be downloaded based on the meta-information of the file when downloading.
  • the information storage 110 of the present application may store a database, and the database includes at least one public database 111 and multiple local databases 112.
  • each user equipment 102 corresponds to a local database 112, and the local database 112 is dedicated to storing the index information of the corresponding user equipment 102.
  • the local database 112 can be configured on the personal user terminal.
  • the local database can be configured on the enterprise server or on the enterprise user terminal of the enterprise.
  • the public database 111 may be used to store index information of sharable files uploaded by multiple user equipment 102. In this way, the index information of the file can be shared across instances, so that when new index information is added to any instance using this framework, the other instances can query the index information, thereby sharing the file to all instances.
  • the number of the storage server is multiple; the number of the block file is multiple; the uploading the at least one block file to the storage server includes: The block files are respectively uploaded to the plurality of storage servers, so that each storage server of the plurality of storage servers stores a part of the block files of the plurality of block files. In this way, each storage server saves only part of the block files of the multiple block files of the file, but does not save all the block files of the file, making it difficult to restore the complete original file from a single storage server, which improves the file storage Security.
  • the number of the storage server is multiple; the uploading the at least one block file to the storage server includes: uploading the at least one block file to the multiple storage servers respectively , So that the at least one block file is repeatedly stored in at least two storage servers of the plurality of storage servers. In this way, each block file is backed up and stored in more than two storage servers.
  • downloading block files if the correct block file cannot be downloaded from a storage server, you can also download it from a backup storage server Download files in blocks, thereby ensuring the reliability of file downloads.
  • storing at least one block file in the storage server includes: storing the multiple block files to multiple Storage server, so that each of the plurality of storage servers stores a part of the plurality of block files, and each of the plurality of block files is stored in at least one of the plurality of storage servers Two storage servers. In this way, the reliability of file download can be improved while ensuring the security of file storage.
  • the above-mentioned file uploading method may further include the step of specifying to update data in any byte range of an uploaded file. Specifically, when the intermediate server 104 receives the file update request of the specified byte range of the update file, it first finds the metadata of the old file uploaded previously based on the first file identifier, and then updates the file based on the file specified in the file update request.
  • Range determine one or more uploaded block files partially or fully covered by the file update range, based on the start and end positions of each block file before disguise corresponding to these uploaded block files, Split and disguise the new file to be uploaded in the file update request, then replace the old block file stored in the storage server with the new block file after division and disguise, and update the meta information of the replaced block file , You can update the data of any byte range of the uploaded file.
  • the start position and end position of the update range of the file requested to be updated are not completely aligned with the start position and end position of the old block file uploaded previously, that is, there is an offset difference between them .
  • the corresponding start block file and/or the end block file of the downloaded multiple block files can be divided, and the specified byte range can be removed.
  • reserve and restore one or more block files within the specified byte range so as to accurately download the file that meets the specified byte range.
  • the start position and the end position of each block file can be calculated according to the data length of the block file recorded in the meta information of each block file.
  • HTTP Range can also be used to specify the file update range, and the Web server can parse the POST/PUT request initiated by the client to obtain the Range data in the request header.
  • the present application provides a file download method. Taking the file download method applied to the intermediate server 104 in FIG. 1 and the back-end program of this application deployed on the intermediate server 104 as an example for description, the method may include the following steps S802-S810.
  • Step S802 Obtain the first file identifier of the file to be downloaded.
  • the intermediate server 104 may receive a file download request sent by the user equipment 102, where the file download request carries the first file identifier of the file to be downloaded.
  • the intermediate server 104 parses the file download request to obtain the first file identifier of the file to be downloaded.
  • Step S804 Obtain the meta information of the file based on the first file identifier; where the meta information of the file includes the first file identifier of the file, at least one storage address, and at least one block file corresponding to the at least one storage address in the file before splitting. Sort order information.
  • the intermediate server 104 finds the meta information of the file based on the first file identifier of the file.
  • Step S806 Use at least one storage address to download at least one block file from a storage server corresponding to the at least one storage address.
  • the intermediate server 104 uses at least one storage address to generate corresponding multiple file download requests, and sends the multiple file download requests to the corresponding one or more storage servers 106, from each storage address.
  • the storage server corresponding to the address downloads the corresponding piece of block data, so as to obtain at least one piece of file corresponding to the at least one storage address one-to-one.
  • the intermediate server 104 needs to know which storage server the storage address corresponds to before sending the file download request corresponding to each storage address.
  • the intermediate server 104 can read the information of the corresponding storage server from the storage address.
  • Step S808 based on the information of the arrangement sequence of the at least one block file, restore the at least one block file to a complete file and return the complete file.
  • the intermediate server 104 may restore the at least one block file to a complete file based on the information about the arrangement sequence of the at least one block file corresponding to the at least one storage address stored in the meta-information before the file is split. Describe the complete file to the user device 102.
  • the above-mentioned file download method of the present application can restore the downloaded at least one block file to a complete file based on the stored information of the arrangement sequence of the at least one block file, and return the complete file to the user device 102.
  • the step S504 in the above-mentioned file download method is based on the first file identification
  • the obtaining of the meta-information of the file includes: based on the first file identification, searching and combining in the information storage Get the meta information of the file.
  • the meta-information of the file is stored in the information storage.
  • the meta-information of the file cannot be obtained from the storage server, making it difficult to restore the uploaded original file, which further improves the security of the file stored in this application.
  • step S804 of the above-mentioned file downloading method based on the first file identification, obtaining the meta-information of the file includes: searching and obtaining in the information storage based on the first file identification The storage address of the meta-information of the file; and using the storage address of the meta-information to download the meta-information of the file from the storage server corresponding to the storage address of the meta-information.
  • the information storage can only store the storage address of the meta-information of the file and the information identified by the first file, thereby reducing the data storage burden of the information storage. Save the data storage capacity of the system.
  • the meta information obtained in step S804 further includes at least one second file identifier corresponding to at least one block file stored in association; after step S806, and before step S808, the file
  • the download method further includes: step S904, calculating at least one third file identifier corresponding to the downloaded at least one block file.
  • Step 906 When the third file identifier of the divided file matches the second file identifier of the divided file, it is determined that the divided file passes the verification.
  • Step 908 When the third file identifier of the divided file does not match the second file identifier of the divided file, it is determined that the divided file has not passed the verification, and the divided file is re-downloaded.
  • the block file replaces the block file that fails the verification.
  • the third file identifier and the second file identifier are both information used to uniquely identify the segmented file.
  • the second file identifier and the third file identifier may be the check value of the block file, for example, the block file fingerprint of the block file, by calculating the block file of the block file File fingerprints can compare the content of the block files with higher accuracy.
  • a hash algorithm such as SHA-1 may be used to iteratively calculate the check value of the block file.
  • the characteristic information of the original block file before the disguise can be recorded.
  • the third file ID of the block file before disguise is calculated and compared with the second file ID of the block file uploaded before disguise.
  • a redundant part when uploading files, before disguising the block files in step S404, a redundant part may be added to each block file, and the redundant part includes an erasure code, for example, Reed-Solomon encoding (Reed-Solomon encoding, RS encoding for short); when downloading files, when verifying each block file obtained by the download, it is used as an alternative to the above step 908, if the current block file fails the calibration In order to improve the stability of file download, the erasure code of the block file can be used to restore the block file. Correspondingly, before the complete file is restored in step S808, the redundant part in the block file needs to be deleted, so that the restored complete file is consistent with the original file.
  • an erasure code for example, Reed-Solomon encoding (Reed-Solomon encoding, RS encoding for short)
  • the meta-information obtained in step S804 further includes information of at least one encoder corresponding to at least one block file; after step S806 and before step S904, the file download method It further includes: S902, obtaining at least one decoder corresponding to the at least one encoder based on the at least one encoder; using the at least one decoder to restore the corresponding at least one block file to at least one before the disguise. Block file.
  • the decoder is a module used to restore the input file that has been disguised as the specified target file format to the file before disguise.
  • the target file format possessed by the decoder means that the decoder will restore the file disguised as the target file format to its original format.
  • the decoder can extract the block file part before disguise from each received data stream, and ignore or delete the file header and file tail part added by disguise to obtain the block file before disguise.
  • the encoder and decoder have a one-to-one correspondence. In this embodiment, after the information of the encoder is obtained, the information of the corresponding decoder can be obtained, and the information of the corresponding decoder is used to decode the block file after disguise into the block file before disguise.
  • step S806 specifically includes : Using the position of each block file before disguise in the corresponding block file after disguise, download the block file before disguise from the block file after disguise stored in the storage server corresponding to the storage address.
  • the position of the block file before the disguise in the block file after the disguise includes the position interval composed of the start position and the end position of the block file before the disguise in the block file after the disguise.
  • the position interval may be determined based on the data length of the block file before masquerading and the starting offset of the block file before masquerading in the block file after masquerading. This information can be recorded in the meta-information of the file.
  • the HTTP Range request header can be used to specify to download data in a specified byte range of a certain block file when downloading a block file. In this way, the block file before the disguise can be downloaded directly, which further saves data transmission traffic and also saves the performance cost of the system.
  • the Web server can obtain the value of the meta-information mentioned above that can extract the file from the URL PATH or form part of the GET/POST request initiated by the client, such as the first file identifier, and then parse the Range data in the request header For the file download range specified in the Range data, determine and download one or at least one block file corresponding to the file download range to return the specified download data.
  • start position and end position of the requested specified byte range are not completely aligned with the start and end positions of the stored block file, that is, there is an offset between them, it can be based on the specified byte range Start position and end position, split the corresponding start block file and/or end block file of the downloaded multiple block files, remove the part outside the specified byte range, retain and restore the specified One or more block files within the byte range, so as to accurately download the file that meets the specified byte range.
  • the number of storage servers is multiple; the number of block files is multiple; each block file of the multiple block files is repeatedly stored in the multiple storage servers
  • Said downloading the at least one block file from the storage server corresponding to the at least one storage address by using the at least one storage address includes: according to the The first storage address of the multiple storage addresses corresponding to each block file downloads the corresponding block file from the storage server corresponding to the first storage address; when the first storage address is used from the first storage address to the first storage address.
  • the second storage address of the plurality of storage addresses corresponding to the block file is used to download from the storage corresponding to the second storage address.
  • the server downloads the corresponding block file.
  • the download using the second storage address fails, if there are other storage addresses, you can continue to switch and use other storage addresses to download the block file until the required block file is downloaded. In this way, the reliability of file download can be effectively improved.
  • a file uploading device 1100 including: an upload file acquisition module 1101, a file segmentation module 1102, a file upload module 1103, a storage address receiving module 1104, and a meta-information storage module 1105 ,
  • the upload file obtaining module 1101 is used to obtain files to be uploaded.
  • the file segmentation module 1102 is used to segment the file to be uploaded into at least one segmented file.
  • the file upload module 1103 is used to upload at least one block file to the storage server.
  • the storage address receiving module 1104 is configured to receive at least one storage address corresponding to at least one block file returned by the storage server.
  • the meta-information storage module 1105 is used to store the meta-information of the file.
  • a file downloading device 1200 including: a file identification obtaining module 1201, a meta information obtaining module 1202, a file downloading module 1203, and a file restoring module 1204, wherein:
  • the file identifier obtaining module 1201 is used to obtain the first file identifier of the file to be downloaded.
  • the meta-information acquiring module 1202 is configured to acquire meta-information of the file based on the first file identifier.
  • the meta-information of the file includes the first file identifier of the file, at least one storage address, and information about the arrangement sequence of the at least one block file corresponding to the at least one storage address in the file before division.
  • the file download module 1203 is configured to use at least one storage address to download at least one block file from a storage server corresponding to the at least one storage address.
  • the file restoration module 1204 is configured to restore at least one block file to a complete file and return the complete file based on the information of the arrangement sequence of the at least one block file.
  • the present application provides a file management device 1300, the file management device 1300 is in communication connection with the storage server 106; the file management device 1300 includes a processor 1301 and an information storage 110; the processor 1301 uses At:
  • the information storage 110 is configured to store meta-information of the file or a storage address of the meta-information, where the meta-information of the file includes the first file identifier of the file, the at least one storage address, and the at least one storage address The corresponding arrangement sequence of the at least one divided file before the file is divided, and the storage server is used to store the uploaded file.

Abstract

The present application relates to a file uploading method, a file downloading method and a file management apparatus. The file uploading method comprises: acquiring a file to be uploaded; segmenting the file to be uploaded into at least one block file; uploading the at least one block file to a storage server; receiving at least one storage address returned by the storage server and corresponding to the at least one block file; and storing metainformation of the file, wherein the metainformation comprises a first file identifier of the file and the at least one storage address, which are stored in an associated manner, and an arrangement order of the at least one block file corresponding to the at least one storage address in the file before segmentation.

Description

文件上传方法、文件下载方法和文件管理装置File upload method, file download method and file management device 技术领域Technical field
本申请涉及数据存储技术领域,特别是涉及一种文件上传方法、文件下载方法和文件管理装置。This application relates to the field of data storage technology, and in particular to a file upload method, file download method and file management device.
背景技术Background technique
目前,有许多存储服务商向企业用户或个人用户提供数据存储服务,这些存储服务商的存储服务器可以提供接入对象存储服务,例如对象存储系统(Object-Based Storage System),以供企业用户进行移动应用、大型网站、图片分享或热点音视频等的存储,或进行低频访问存储和归档存储,或供个人用户存储文件等等。这类服务可以提供文件扁平化存储与内容分发网络(Content Delivery Network,CDN)资源,来提高用户对静态资源的加载速度。用户通过用户终端的后端程序将文件上传至存储服务器的存储池,便能得到存储服务器返回的统一资源定位符(Uniform Resource Locator,URL),将所述地址嵌入到网页或是应用程序接口(Application Programming Interface,API)返回的数据中,用户便能凭借所述URL下载先前上传的文件。然而,在上述相关的技术中,存放在存储服务器的文件较易被盗取,文件存储安全性较低。At present, there are many storage service providers that provide data storage services to corporate users or individual users. The storage servers of these storage service providers can provide access to object storage services, such as Object-Based Storage Systems, for corporate users to perform Storage of mobile applications, large-scale websites, picture sharing or hotspot audio and video, or low-frequency access storage and archive storage, or storage of files for individual users, etc. This type of service can provide flat file storage and Content Delivery Network (CDN) resources to improve the loading speed of static resources by users. The user uploads the file to the storage pool of the storage server through the back-end program of the user terminal, and can obtain the Uniform Resource Locator (URL) returned by the storage server, and embed the address in the web page or application program interface ( In the data returned by Application Programming Interface (API), the user can download the previously uploaded file by virtue of the URL. However, in the above-mentioned related technologies, the files stored in the storage server are easier to be stolen, and the file storage security is low.
发明内容Summary of the invention
基于此,有必要提供一种文件上传方法、文件下载方法和文件管理装置。Based on this, it is necessary to provide a file upload method, file download method, and file management device.
一种文件上传方法,包括:获取待上传的文件;将待上传的文件分割为至少一个分块文件;将至少一个分块文件上传至存储服务器;接收存储服务器返回的至少一个分块文件对应的至少一个存储地址;以及存储文件的元信息;元信息包括关联存储的文件的第一文件标识、至少一个存储地址以及至少一个存储地址对应的至少一个分块文件在文件分割前中的排列顺序的信息。A method for uploading files includes: obtaining a file to be uploaded; dividing the file to be uploaded into at least one block file; uploading at least one block file to a storage server; receiving a file corresponding to at least one block file returned by the storage server At least one storage address; and the meta-information of the stored file; the meta-information includes the first file identifier of the associated stored file, at least one storage address, and the arrangement order of the at least one block file corresponding to the at least one storage address before the file is split information.
一种文件下载方法,包括:获取待下载的文件的第一文件标识;基于第一文件标识,获取文件的元信息;元信息包括文件的第一文件标识、至少一个存储地址以及至少一个存储地址对应的至少一个分块文件在文件分割前中的排列顺序的信息;利用至少一个存储地址,从与至少一个存储地址对应的存储服务器下载至少一个分块文件;以及基于至少一个分块文件的排列顺序的信息,将至少一个分块文件还原为完整文件并返回完整文件。A file download method includes: obtaining a first file identification of a file to be downloaded; obtaining meta information of the file based on the first file identification; meta information including the first file identification of the file, at least one storage address, and at least one storage address Information about the arrangement sequence of the corresponding at least one block file before the file is split; download at least one block file from a storage server corresponding to the at least one storage address by using at least one storage address; and arrange based on the at least one block file Sequence information, restore at least one block file to a complete file and return the complete file.
一种文件管理装置,所述文件管理装置与存储服务器通信连接;所述文件管理装置包括处理器和信息存储器。所述处理器用于执行上述文件上传方法和文件下载方法。A file management device, the file management device is in communication connection with a storage server; the file management device includes a processor and an information memory. The processor is used to execute the above-mentioned file upload method and file download method.
上述文件上传方法、文件下载方法、文件管理装置,在上传文件时,将待上传的文件划分为至少一个分块文件,并将所述至少一个分块文件分别存储至存储服务器中,在下载文件时,从存储服务器下载至少一个分块文件,并根据本地记录的分块文件的分割顺序将至少一个分块文件合并得到还原文件。如此,存储在存储服务器的文件是分块的不完整的分块文件,并且从存储服务器也难以得到这些分块文件的排列顺序等信息,从而难以从存储服务器还原得到完整 的存储文件,有效提升了存储在存储服务器的文件的安全性。In the above file upload method, file download method, and file management device, when uploading a file, the file to be uploaded is divided into at least one block file, and the at least one block file is respectively stored in a storage server, and when the file is downloaded At this time, download at least one block file from the storage server, and merge the at least one block file according to the division order of the block file recorded locally to obtain the restored file. In this way, the files stored on the storage server are incomplete block files that are divided into blocks, and it is difficult to obtain information such as the arrangement order of these block files from the storage server, so that it is difficult to restore the complete storage files from the storage server, which effectively improves This improves the security of files stored on the storage server.
附图说明Description of the drawings
图1为一个实施例中文件上传方法和文件下载方法的应用环境图;Figure 1 is an application environment diagram of a file upload method and a file download method in an embodiment;
图2为另一个实施例中文件上传方法和文件下载方法的应用环境图;2 is a diagram of the application environment of the file upload method and the file download method in another embodiment;
图3为一个实施例中文件上传方法的流程示意图;FIG. 3 is a schematic flowchart of a file upload method in an embodiment;
图4为一个实施例中文件上传方法的流程示意图;Figure 4 is a schematic flowchart of a file upload method in an embodiment;
图5为一个实施例中文件上传方法的流程示意图;FIG. 5 is a schematic flowchart of a file upload method in an embodiment;
图6为一个实施例中文件上传方法的流程示意图;FIG. 6 is a schematic flowchart of a file upload method in an embodiment;
图7为一个实施例中信息存储器的结构示意图;FIG. 7 is a schematic diagram of the structure of an information storage in an embodiment;
图8为一个实施例中文件下载方法的流程示意图;FIG. 8 is a schematic flowchart of a file download method in an embodiment;
图9为一个实施例中文件下载方法的流程示意图;FIG. 9 is a schematic flowchart of a file download method in an embodiment;
图10为一个实施例中文件上传方法和文件下载方法的示意图;Figure 10 is a schematic diagram of a file upload method and a file download method in an embodiment;
图11为一个实施例中文件上传装置的结构框图;Figure 11 is a structural block diagram of a file uploading device in an embodiment;
图12为一个实施例中文件下载装置的结构框图;Figure 12 is a structural block diagram of a file downloading device in an embodiment;
图13为一个实施例中文件管理装置的结构框图。Fig. 13 is a structural block diagram of a file management device in an embodiment.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.
本申请提供的文件上传方法和文件下载方法,可应用于如图1所示的应用环境中。其中,用户设备102通过网络与中间服务器104进行通信,中间服务器104通过网络与存储服务器106进行通信。所述存储服务器106通常为第三方服务器。用户设备102可以是企业用户设备或个人用户终端,企业用户设备可以是企业服务器和/或企业用户终端。个人用户终端和企业用户终端可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑等。存储服务商通过存储服务器106提供存储服务,在存储服务器106中设置有用于存储文件的存储池。The file upload method and file download method provided in this application can be applied to the application environment as shown in FIG. 1. The user equipment 102 communicates with the intermediate server 104 through the network, and the intermediate server 104 communicates with the storage server 106 through the network. The storage server 106 is usually a third-party server. The user equipment 102 may be an enterprise user equipment or a personal user terminal, and the enterprise user equipment may be an enterprise server and/or an enterprise user terminal. Personal user terminals and enterprise user terminals can be, but are not limited to, various personal computers, notebook computers, smart phones, tablet computers, etc. The storage service provider provides storage services through the storage server 106, and the storage server 106 is provided with a storage pool for storing files.
本申请提供的文件上传方法和文件下载方法,可以由文件管理装置来执行,所述文件管理装置可以存储有后端程序,后端程序可以部署在中间服务器104上,后端程序运行时执行本申请的文件上传方法和文件下载方法。在其他实施例中,所述后端程序也可以是一部分部署在中间服务器104中,另一部分由中间服务器104部署到用户设备102中,或者也可以由中间服务器104将后端程序全部部署在用户设备102中。相应的,本申请的文件上传方法和文件下载方法,可以一部分步骤在中间服务器104中执行,另一部分步骤在用户设备102的后端程序中执行,也可以全部步骤都在用户设备102的后端程序中执行。The file upload method and file download method provided in this application can be executed by a file management device. The file management device can store a back-end program. The back-end program can be deployed on the intermediate server 104. The file upload method and file download method of the application. In other embodiments, a part of the back-end program may also be deployed in the intermediate server 104, and the other part may be deployed in the user equipment 102 by the intermediate server 104, or the intermediate server 104 may also deploy all the back-end programs in the user equipment. In equipment 102. Correspondingly, the file uploading method and file downloading method of the present application can be executed in the intermediate server 104 as part of the steps and executed in the back-end program of the user equipment 102, or all steps can be executed in the back-end of the user equipment 102 Executed in the program.
在一个实施例中,文件管理装置可以包括信息存储器110,信息存储器110可以包括内存储器和非易失性存储介质。非易失性存储介质上可以存储有所述后端程序。非易失性存储介质上还可以存储有数据库,该数据库可以用于存储上传的文件的索引信息,所述索引信息可以包括文件的元信息,或者可以通过所述索引信息来获取文件的元信息,例如索引信息也可以包括关 联的元信息的存储地址和第一文件标识。信息存储器110可以位于用户设备102中,也可以位于中间服务器104中。In an embodiment, the file management apparatus may include an information storage 110, and the information storage 110 may include an internal memory and a non-volatile storage medium. The back-end program may be stored on the non-volatile storage medium. A database may also be stored on the non-volatile storage medium, and the database may be used to store the index information of the uploaded file. The index information may include the meta-information of the file, or the meta-information of the file may be obtained through the index information. For example, the index information may also include the storage address of the associated meta-information and the first file identifier. The information storage 110 may be located in the user equipment 102 or in the intermediate server 104.
在一个实施例中,可以综合利用多个不同的存储服务商提供的存储服务,相应地,如图2所示,中间服务器104可以与多个存储服务商对应的多个存储服务器106通过网络进行通信,文件管理装置可以选择将分割的至少一个分块文件分别存储至多个不同的存储服务器106中。中间服务器104和存储服务器106均可以用独立的服务器或者是多个服务器组成的服务器集群来实现。In one embodiment, the storage services provided by multiple different storage service providers can be comprehensively utilized. Accordingly, as shown in FIG. 2, the intermediate server 104 can perform the processing through the network with multiple storage servers 106 corresponding to multiple storage service providers. In communication, the file management apparatus may choose to store the divided at least one block file in a plurality of different storage servers 106 respectively. Both the intermediate server 104 and the storage server 106 can be implemented by independent servers or a server cluster composed of multiple servers.
在本申请的一实施例中,可以将文件的索引信息存放在Redis数据库中,利用Hash与Set数据结构,即可在数据库中实现文件系统目录结构。可根据实际情况,选择是否将文件元信息一并缓存在Redis数据库中,或是记录元信息所在的网页地址URL。同时将文件名、实际文件长度、文件创建与修改时间等信息存放在Redis数据库中,以加速此类信息的查询。In an embodiment of the present application, the index information of the file can be stored in the Redis database, and the file system directory structure can be realized in the database by using the Hash and Set data structure. According to the actual situation, you can choose whether to cache the file meta-information together in the Redis database, or record the URL of the web page address where the meta-information is located. At the same time, the file name, actual file length, file creation and modification time and other information are stored in the Redis database to speed up the query of such information.
在上述实施例中,Redis数据库以TCP协议进行通信,这样可以通过配置多个框架实例连接到同一个Redis数据库实例,从而轻易实现文件索引的同步。且Redis数据库支持原子操作,可有效避开各种竞争情况。同时Redis数据库支持主从架构,方便进行扩展,可提供高可用性的保障。除此之外,可以使用Redis数据库自带的RDB功能与AOF功能实现持久化,最大程度防止文件索引的丢失,保证框架数据的完整性。In the foregoing embodiment, the Redis database communicates using the TCP protocol, so that multiple framework instances can be configured to connect to the same Redis database instance, thereby easily achieving file index synchronization. And Redis database supports atomic operations, which can effectively avoid various competition situations. At the same time, the Redis database supports a master-slave architecture, which is convenient for expansion and can provide high availability guarantee. In addition, you can use the RDB function and AOF function that comes with the Redis database to achieve persistence, to prevent the loss of file indexes to the greatest extent, and to ensure the integrity of the framework data.
在本申请的一实施例中,在其他文件系统中建立一个目录作为本文件系统的根目录,所有程序接口粒度大于文件自身读写的操作均可通过系统调用POSIX接口的方法完成,即实现一个目录结构的透明代理。然后将本框架文件的元信息存储在其他文件系统对应目录结构的文件中,进行文件数据的操作时,即将请求转发给框架抽象文件底层进行处理。In an embodiment of this application, a directory is created in another file system as the root directory of this file system, and all operations with a program interface granularity greater than the file's own reading and writing can be completed by the system calling the POSIX interface, that is, a Transparent proxy for directory structure. Then the meta-information of this framework file is stored in files in the corresponding directory structure of other file systems. When file data operations are performed, the request is forwarded to the bottom layer of the framework abstract file for processing.
在上述实施例中,文件系统透明代理方法则使用现有成熟的文件系统来托管本框架的文件索引,具有较高的稳定性与安全性。且元信息中的一部分数据可直接与现有文件系统中支持的ATTR属性保持一致,例如创建时间ctime与修改时间mtime。对于无法保持一致的属性,例如文件的真实大小,则可通过在透明代理层通过读取元信息模拟输出得到。In the above embodiment, the file system transparent proxy method uses an existing mature file system to host the file index of the framework, which has high stability and security. In addition, part of the data in the meta-information can be directly consistent with the ATTR attributes supported in the existing file system, such as the creation time ctime and the modification time mtime. For attributes that cannot be kept consistent, such as the real size of a file, it can be obtained by simulating the output by reading the meta-information at the transparent proxy layer.
在本申请的一个实施例中,如图3所示,提供了一种文件上传方法,以所述文件上传方法应用于图1中的中间服务器104,以本申请的文件管理装置的后端程序部署在中间服务器104上为例进行说明,包括以下步骤S302-S310。In an embodiment of the present application, as shown in FIG. 3, a file upload method is provided. The file upload method is applied to the intermediate server 104 in FIG. Deploying on the intermediate server 104 is taken as an example for description, including the following steps S302-S310.
步骤S302,获取待上传的文件。Step S302: Obtain the file to be uploaded.
具体地,中间服务器104接收用户设备102发送的文件上传请求,所述文件上传请求中携带有待上传的文件。中间服务器104解析所述文件上传请求以得到待上传的文件。Specifically, the intermediate server 104 receives a file upload request sent by the user equipment 102, and the file upload request carries a file to be uploaded. The intermediate server 104 parses the file upload request to obtain the file to be uploaded.
在本申请的实施例中,文件管理装置可以同时提供HTTP API接口和POSIX接口。以用户设备102使用HTTP API接口进行文件上传操作为例,如图10所示,用户使用浏览器的Web前端或是专用客户端或通过其他手段直接向中间服务器104发起具有POST/PUT方法类型的文件上传请求(简称POST/PUT请求),该POST/PUT请求正文部分的请求主体中携带有待上传的文件。从而在本步骤中,中间服务器104可以接收用户设备102的携带有待上传的文件的POST/PUT请求。In the embodiment of the present application, the file management device can provide both an HTTP API interface and a POSIX interface. Take the user equipment 102 using the HTTP API interface to perform file upload operations as an example. As shown in Figure 10, the user uses the web front end of the browser or a dedicated client or other means to directly initiate a POST/PUT method type to the intermediate server 104 File upload request (POST/PUT request for short), the request body of the POST/PUT request body part carries the file to be uploaded. Therefore, in this step, the intermediate server 104 can receive the POST/PUT request from the user equipment 102 that carries the file to be uploaded.
步骤S304,将待上传的文件分割为至少一个分块文件。Step S304: Divide the file to be uploaded into at least one partial file.
具体地,中间服务器104可以按照预定的文件分割规则,将待上传的文件分割为一个或多个分块文件。在一个实施例中,如图10中所示,中间服务器104实例化得到一个抽象文件对象, 该抽象文件对象可以提供多种方法,以供不同的开发者调取使用,例如包括写模式下可用的flush、truncate和write方法,以及读模式下可用的locate、read、seek和tell方法。将该抽象文件对象设置为写(write)模式并将步骤S802中解析得到的该文件的文件数据流传入到write方法,write方法内部根据预配置好的分块大小来切割文件数据流,从而得到多段分块文件数据流。该多段分块文件数据流即代表对应的至少一个分块文件。Specifically, the intermediate server 104 may divide the file to be uploaded into one or more divided files according to a predetermined file division rule. In one embodiment, as shown in FIG. 10, the intermediate server 104 instantiates and obtains an abstract file object. The abstract file object can provide a variety of methods for different developers to call and use, for example, it can be used in writing mode. The flush, truncate, and write methods, and the locate, read, seek, and tell methods available in read mode. Set the abstract file object to the write mode and pass the file data stream of the file parsed in step S802 to the write method. The write method internally cuts the file data stream according to the pre-configured block size to obtain Multi-segment block file data stream. The multi-segment block file data stream represents at least one corresponding block file.
步骤S306,将至少一个分块文件上传至存储服务器。Step S306: Upload at least one block file to the storage server.
存储服务器106提供有上传接口,中间服务器104可以将分割得到的至少一个分块文件分别通过存储服务器106提供的上传接口,分别发送多个文件上传请求至存储服务器106,其中每个文件上传请求携带多段数据流中的一段数据流,从而将所述多段数据流上传到指定的存储服务器106中。The storage server 106 provides an upload interface, and the intermediate server 104 can respectively send a plurality of file upload requests to the storage server 106 through the upload interface provided by the storage server 106, and each file upload request carries One segment of the multiple data streams, so that the multiple data streams are uploaded to the designated storage server 106.
其中,在将至少一个分块文件上传至存储服务器时,可以以多线程或异步的方式,将多段数据流即至少一个分块文件通过不同的协程或线程相互独立地上传至存储服务器106,以实现宏观上的至少一个分块文件的同步上传,从而充分利用带宽资源,加快传输速度。Wherein, when uploading at least one block file to the storage server, multiple data streams, that is, at least one block file, can be uploaded to the storage server 106 independently of each other through different coroutines or threads in a multi-threaded or asynchronous manner. In order to realize the synchronous upload of at least one block file in the macro, so as to make full use of bandwidth resources and accelerate the transmission speed.
步骤S308,接收存储服务器返回的至少一个分块文件对应的至少一个存储地址。Step S308: Receive at least one storage address corresponding to the at least one block file returned by the storage server.
存储服务器106在接收到中间服务器104上传的任意一个分块文件时,会将所述分块文件存储在所述存储服务器106的存储池中,并发送对应的存储地址至中间服务器104。所述存储地址用于指示所述分块文件在所述存储服务器106中的存储位置。在一实施例中,所述存储地址可以是一个统一资源定位符(Uniform Resource Locator,URL)。When the storage server 106 receives any block file uploaded by the intermediate server 104, it stores the block file in the storage pool of the storage server 106 and sends the corresponding storage address to the intermediate server 104. The storage address is used to indicate the storage location of the block file in the storage server 106. In an embodiment, the storage address may be a Uniform Resource Locator (URL).
步骤S310,存储文件的元信息,其中文件元信息包括关联存储的文件的第一文件标识、至少一个存储地址、至少一个存储地址对应的至少一个分块文件在文件分割前中的排列顺序。Step S310: Store meta-information of the file, where the meta-information of the file includes the first file identifier of the associated stored file, at least one storage address, and the arrangement sequence of the at least one block file corresponding to the at least one storage address before file splitting.
其中,文件的元信息可以记录有文件上传过程中对所述文件所执行的例如伪装、分割、上传等各种处理相关的信息。此外,元信息还可以包括其它与文件有关的信息,例如可以包括所述文件的文件上传请求中携带的所述文件的文件名、文件的MIME类型、实际文件长度、文件的修改时间等等信息。如此,在上传文件时,可以从文件上传请求的请求主体中提取文件数据流,并对文件数据流进行分割得到多段数据流,并上传多段数据流。而请求中提取的其它与文件有关的信息则保存在元信息中。Among them, the meta information of the file may record information related to various processing performed on the file during the file upload process, such as disguising, dividing, uploading, and so on. In addition, the meta-information may also include other file-related information, such as the file name of the file carried in the file upload request of the file, the MIME type of the file, the actual file length, the modification time of the file, etc. . In this way, when uploading a file, the file data stream can be extracted from the request body of the file upload request, and the file data stream can be divided to obtain multiple data streams, and the multiple data streams can be uploaded. The other file-related information extracted in the request is stored in the meta-information.
第一文件标识是唯一标识所述文件的信息,当两个文件具有相同的第一文件标识时,则可以认为这两个文件是相同的文件。在一实施例中,第一文件标识可以是所述文件的校验值,例如文件指纹,通过计算文件指纹,可以较高精度的对所述文件的内容进行比对。在其他实施例中,根据对文件识别精度的需求,第一文件标识也可以是其他与所述文件有关的信息,例如可以是所述文件的在本信息存储器中的存放路径等等。The first file identifier is information that uniquely identifies the file. When two files have the same first file identifier, it can be considered that the two files are the same file. In an embodiment, the first file identifier may be a check value of the file, such as a file fingerprint. By calculating the file fingerprint, the content of the file can be compared with higher accuracy. In other embodiments, the first file identifier may also be other information related to the file according to the requirements for the accuracy of file recognition, for example, it may be the storage path of the file in the information storage and so on.
在本申请的上述文件上传方法,将待上传的文件划分为多个分块文件,并将所述多个分块文件分别存储至存储服务器中,如此,存储在存储服务器的文件是不完整的分块文件,并且从存储服务器也难以得到这些分块文件的排列顺序等信息,从而难以从存储服务器还原得到完整的存储文件,有效提升了存储在存储服务器的文件的安全性。In the above-mentioned file upload method of this application, the file to be uploaded is divided into a plurality of block files, and the plurality of block files are respectively stored in a storage server. Thus, the files stored in the storage server are incomplete Block files, and it is difficult to obtain information such as the arrangement order of these block files from the storage server, so that it is difficult to restore the complete storage file from the storage server, which effectively improves the security of the files stored on the storage server.
在一个实施例中,上述步骤S310的存储文件的元信息包括:将文件的元信息存储在信息存储器中。In one embodiment, storing the meta information of the file in step S310 includes: storing the meta information of the file in an information storage.
在上述实施例中,文件的元信息存储在信息存储器,如此,从存储服务器不能获取文件的元信息,从而难以还原得到上传的原始文件,进一步提升了本申请存储文件的安全性。In the above embodiment, the meta-information of the file is stored in the information storage. In this way, the meta-information of the file cannot be obtained from the storage server, making it difficult to restore the uploaded original file, which further improves the security of the file stored in this application.
在一实施例中,本申请的文件元信息也可以上传到存储服务器中。上述步骤S310的存储文件的元信息包括:将文件的元信息上传至存储服务器;接收存储服务器返回的元信息的存储地址;以及将所述元信息的存储地址与文件的第一文件标识关联存储在信息存储器中。其中,元信息的存储地址也可以是一个URL。In an embodiment, the file meta-information of this application can also be uploaded to the storage server. The meta-information of the stored file in the above step S310 includes: uploading the meta-information of the file to the storage server; receiving the storage address of the meta-information returned by the storage server; and storing the storage address of the meta-information in association with the first file identifier of the file In the information store. Among them, the storage address of the meta-information can also be a URL.
在上述实施例中,通过将文件的元信息也上传至存储服务器,本信息存储器的信息存储器中可以仅保存文件的元信息的存储地址与第一文件标识的信息,从而可以减少本信息存储器的数据存储负担,节省本系统的数据存储容量。In the above embodiment, by uploading the meta-information of the file to the storage server, the information storage of this information storage can only store the storage address of the meta-information of the file and the information of the first file identification, thereby reducing the cost of this information storage. The burden of data storage saves the data storage capacity of the system.
进一步地,在一个实施例中,上述步骤S304的将待上传的文件分割为多个分块文件包括:基于预定大小,对所述待上传的所述文件进行分割,得到第一数量的具有所述预定大小的分块文件;所述预定大小的值小于或等于所述存储服务器所允许存储的文件大小的上限值,所述第一数量为所述待存储文件大小与所述预定大小之商取整的值;当分割得到所述第一数量的具有所述预定大小的分块文件后,若所述文件存在剩余部分,则将所述文件的所述剩余部分分配为一个分块文件。Further, in an embodiment, the dividing the file to be uploaded into multiple block files in step S304 includes: dividing the file to be uploaded based on a predetermined size to obtain a first number of files with all the files. The block file of the predetermined size; the value of the predetermined size is less than or equal to the upper limit of the file size allowed to be stored by the storage server, and the first number is the difference between the size of the file to be stored and the predetermined size Quotient rounded value; when the first number of block files with the predetermined size are obtained by dividing, if there is a remaining part of the file, the remaining part of the file is allocated as a block file .
在一个实施例中,当存在多个存储服务器时,至少一个分块文件的分块大小可以彼此相同,也可以彼此不同,只要满足每个分块文件的预定大小小于或等于所述分块文件所对应将要存储的存储服务器所允许存储的文件大小的上限值即可。In an embodiment, when there are multiple storage servers, the block sizes of at least one block file may be the same or different from each other, as long as the predetermined size of each block file is less than or equal to the block file The upper limit value of the file size allowed by the storage server to be stored correspondingly is sufficient.
某些存储服务商对允许上传至其存储服务器存在文件大小的限制,本申请上述实施例的方案,将待存储文件分割为具有小于等于存储服务器所允许存储的文件大小的上限值的大小的分块文件,以满足存储服务商对存储文件大小的限制,从而能够上传任意大小的文件。Some storage service providers have restrictions on the file size allowed to be uploaded to their storage server. The solution of the above embodiment of this application divides the file to be stored into a size less than or equal to the upper limit of the file size allowed by the storage server. Block files to meet the storage service provider’s limit on the size of the storage file, so that files of any size can be uploaded.
通常待存储文件的元信息较小,能够满足存储服务器的文件大小限制,但是在某些情况下,元信息较大,可能超出存储服务器所允许存储的文件大小的上限值。在一个实施例中,还可以对元信息进行分割,使分割得到的每个分块元信息的大小小于该上限值,然后将分割得到的多个分块元信息再分别上传至存储服务器,接收并存储存储服务器返回的分块元信息的存储地址,以满足存储服务商对存储文件大小的限制。在下载文件时,可同理地利用分块元信息的存储地址得到文件的元信息。Generally, the meta-information of the file to be stored is small and can meet the file size limit of the storage server. However, in some cases, the meta-information is large and may exceed the upper limit of the file size allowed by the storage server. In an embodiment, the meta information can also be divided so that the size of each block meta information obtained by the division is smaller than the upper limit, and then multiple block meta information obtained by the division are uploaded to the storage server respectively. Receive and store the storage address of the block meta-information returned by the storage server to meet the storage service provider's limitation on the size of the storage file. When downloading a file, the storage address of the block meta-information can be used in the same way to obtain the meta-information of the file.
在一个实施例中,如图4所示,在步骤S304之后,以及步骤S306之前,文件上传方法还包括:S404,选择具有目标文件格式的编码器,使用编码器将至少一个分块文件分别伪装成具有目标文件格式;其中,目标文件格式包括存储服务器所允许存储的文件格式;相应地,步骤S310中存储的元信息还包括用于伪装的至少一个分块文件对应的使用的至少一个编码器的信息。In one embodiment, as shown in FIG. 4, after step S304 and before step S306, the file upload method further includes: S404, selecting an encoder with a target file format, and using the encoder to disguise at least one block file respectively The target file format includes a target file format; wherein the target file format includes a file format allowed by the storage server; correspondingly, the meta information stored in step S310 also includes at least one encoder corresponding to the at least one block file used for disguising. Information.
其中,编码器是用于将输入的任意文件格式的文件伪装成指定的目标文件格式的文件后再输出的模块。编码器所具有的目标文件格式为将文件伪装成为的目标文件格式。编码器可以对接收的每段数据流进行添加文件头与文件尾的处理或其它一些变化处理,以将所述段数据流代表的分块文件伪装成目标文件格式的分块文件。Among them, the encoder is a module used to disguise the input file of any file format into a file of the specified target file format and then output it. The target file format of the encoder is the target file format that the file is disguised as. The encoder may perform processing of adding a file header and a file tail or some other change processing to each received data stream, so as to disguise the block file represented by the data stream as a block file in the target file format.
在一个实施例中,至少一个分块文件可以伪装成为多个相同文件格式的分块文件,此时,可以从具有相同目标文件格式的至少一个编码器中选择一个编码器对至少一个分块文件进行伪装,设置至少一个编码器可以避免单个编码器出现故障导致无法执行伪装处理的情况。或者,至少一个分块文件也可以伪装成多个不同文件格式的分块文件,此时,可以选择具有不同目标文件格式的至少一个编码器分别对至少一个分块文件进行伪装。In one embodiment, at least one block file can be disguised as a plurality of block files of the same file format. In this case, an encoder can be selected from at least one encoder with the same target file format to compare at least one block file. For camouflage, setting at least one encoder can avoid a situation where a single encoder fails and the camouflage processing cannot be performed. Alternatively, the at least one block file can also be disguised as a plurality of block files with different file formats. In this case, at least one encoder with different target file formats can be selected to disguise the at least one block file respectively.
在其中一个实施例中,如图10所示,在利用write方法切割得到多段分块文件数据流之后, 可以将切割得到的每段分块文件数据流分别定向到对应的具有目标文件格式的编码器,编码器可以对接收的每段分块文件数据流进行添加文件头与文件尾的处理或其它一些变化处理,以将该段分块文件数据流代表的分块文件伪装成目标文件格式的分块文件。In one of the embodiments, as shown in FIG. 10, after using the write method to cut multiple pieces of block file data streams, each piece of block file data stream obtained by cutting can be directed to the corresponding code with the target file format. The encoder can add file header and file tail processing or some other change processing to each received block file data stream, so as to disguise the block file represented by the block file data stream as the target file format. Block files.
在上述实施例中,通过在上传分块文件前将至少一个分块文件伪装成具有目标文件格式,存储服务器难以得到伪装前的分块文件,能够在满足存储服务器所允许存储的文件格式要求的同时,增加从存储服务器还原得到原始文件的难度,进一步提升文件存储的安全性。In the above embodiment, by disguising at least one block file to have the target file format before uploading the block file, it is difficult for the storage server to obtain the block file before the disguise, which can meet the requirements of the file format allowed by the storage server. At the same time, it increases the difficulty of restoring the original files from the storage server, and further improves the security of file storage.
在一个实施例中,如图5所示,在步骤S302之后,文件上传方法还包括:步骤S502,获取待上传的文件的第一文件标识;步骤S504,在信息存储器和存储服务器中查找第一文件标识;步骤S506,判断第一文件标识是否存储于信息存储器和存储服务器中;当所述第一文件标识未存储于所述信息存储器和所述存储服务器中时,则继续执行所述将待上传的所述文件分割为至少一个分块文件的步骤S406;当所述第一文件标识已经存储于所述信息存储器或所述存储服务器中时,则执行步骤S508,终止对所述文件的上传操作。In one embodiment, as shown in FIG. 5, after step S302, the file upload method further includes: step S502, obtaining the first file identifier of the file to be uploaded; step S504, searching the first file in the information storage and the storage server File identification; step S506, determine whether the first file identification is stored in the information storage and the storage server; when the first file identification is not stored in the information storage and the storage server, continue to execute the waiting Step S406 where the uploaded file is divided into at least one block file; when the first file identifier has been stored in the information storage or the storage server, step S508 is executed to terminate the upload of the file operate.
在上述实施例中,在需要上传文件时,先在信息存储器和存储服务器中对待上传的文件的第一文件标识进行查找,如果在信息存储器和存储服务器任一个查找到已存的第一文件标识,则说明之前已经上传过相同的文件,则终止对所述文件的上传操作,仅对所述文件的元信息进行更新存储。避免重复上传占用不必要的存储空间和系统处理资源。In the above embodiment, when a file needs to be uploaded, the first file identifier of the file to be uploaded is searched in the information storage and the storage server. If the stored first file identifier is found in either of the information storage and the storage server , It means that the same file has been uploaded before, the upload operation of the file is terminated, and only the metadata of the file is updated and stored. Avoid repeated uploads occupying unnecessary storage space and system processing resources.
在一个实施例中,第一文件标识是文件指纹,待上传的文件的文件指纹可以基于该文件的文件数据流的头部固定字节长度的数据与实际文件长度计算得到。如图10所示,仍然以POST/PUT请求为例,中间服务器104在接收到POST/PUT请求时,首先对请求头进行解析,获取原始数据长度,即请求头中Content-Length的值。然后获取请求主体的数据格式,该数据格式由请求头中Content-Type的值指定。当Content-Type的值为multipart/form-data;boundary=3vkqffBXJh(其中3vkqffBXJh是自定义的分隔符)时,则将请求主体中记录的数据流定向到Multipart解析器中,解析器使用正则匹配,从数据流中分离出文件名、多用途互联网邮件扩展(Multipurpose Internet Mail Extensions,MIME)类型与文件数据流等数据流内容,再将原始数据长度减去边界对象Boundary、文件名、MIME类型等非文件数据流部分的长度,即得到实际文件长度。而当Content-Type的值不是multipart/form-data;boundary=3vkqffBXJh时,则认为请求主体中为未经加工的原始文件数据流,原始数据长度即为实际文件长度,同时,还可以从请求头的URL中携带的参数或是请求头的Content-Disposition中获取文件名、取Content-Type的值作为MIME类型,并获取实际文件长度、文件上传时间等等信息,以备后续存储文件的元信息。然后基于文件数据流的头部固定字节长度的数据与实际文件长度计算文件指纹,其中文件数据流包括头部固定字节长度的数据和剩余数据。In one embodiment, the first file identifier is a file fingerprint, and the file fingerprint of the file to be uploaded can be calculated based on the fixed byte length data of the header of the file data stream of the file and the actual file length. As shown in FIG. 10, still taking the POST/PUT request as an example, when the intermediate server 104 receives the POST/PUT request, it first parses the request header to obtain the original data length, that is, the value of Content-Length in the request header. Then obtain the data format of the request body, which is specified by the value of Content-Type in the request header. When the value of Content-Type is multipart/form-data; boundary=3vkqffBXJh (where 3vkqffBXJh is a custom separator), the data stream recorded in the request body is directed to the Multipart parser, and the parser uses regular matching. Separate the file name, Multipurpose Internet Mail Extensions (MIME) type, file data stream and other data stream content from the data stream, and then subtract the original data length from the boundary object Boundary, file name, MIME type, etc. The length of the data stream part of the file is the actual file length. When the value of Content-Type is not multipart/form-data; boundary=3vkqffBXJh, it is considered that the raw file data stream in the request body is unprocessed, and the original data length is the actual file length. At the same time, you can also start from the request header. The parameters carried in the URL or the Content-Disposition of the request header get the file name, take the value of Content-Type as the MIME type, and get the actual file length, file upload time, etc., in order to store the metadata of the file later . Then, the file fingerprint is calculated based on the fixed byte length data of the header of the file data stream and the actual file length, where the file data stream includes the fixed byte length data of the header and the remaining data.
当用户通过提供的POSIX接口上传文件时,同理地,在上传时,中间服务器104通过用户空间文件系统(Filesystem in Userspace,FUSE)接收用户发起的具有写(WRITE)方法类型的数据写入请求(简称WRITE请求),然后从该WRITE请求中提取待上传的文件的数据内容并计算第一文件标识。When a user uploads a file through the provided POSIX interface, similarly, when uploading, the intermediate server 104 receives a user-initiated data write request with a write (WRITE) method type through the user space file system (Filesystem in Userspace, FUSE) (Referred to as a WRITE request), and then extract the data content of the file to be uploaded from the WRITE request and calculate the first file identifier.
在一个实施例中,在步骤S508中,还可以存储所述文件的元信息,存储的所述文件中的元信息中,与分块文件有关的信息均可以从匹配到的先前已存的第一文件标识对应的元信息中获取,而其它与文件有关的信息,如上传时间,则可以从所述文件的文件上传请求中获取。In one embodiment, in step S508, meta-information of the file may also be stored. Among the stored meta-information in the file, the information related to the block file can be selected from the previously stored first A file identifier corresponding to the meta-information is obtained, and other file-related information, such as upload time, can be obtained from the file upload request of the file.
在一个实施例中,如图4所示,在步骤S304之后,以及步骤S404之前,文件上传方法还 包括:步骤S402,计算至少一个分块文件对应的至少一个第二文件标识;其中,步骤S310中存储的元信息还包括关联存储的至少一个分块文件对应的至少一个第二文件标识。In one embodiment, as shown in FIG. 4, after step S304 and before step S404, the file upload method further includes: step S402, calculating at least one second file identifier corresponding to the at least one block file; wherein, step S310 The meta-information stored in further includes at least one second file identifier corresponding to the at least one block file stored in association.
第二文件标识是用于唯一地标识所述文件的信息。通过在执行文件伪装前计算并存储分块文件的第二文件标识,可以对伪装前的原始分块文件的特征信息进行记录。The second file identification is information used to uniquely identify the file. By calculating and storing the second file identifier of the block file before performing the file disguise, the characteristic information of the original block file before the disguise can be recorded.
在其中一实施例中,如图10所示,中间服务器104可以通过抽象文件对象,收集每一个分块文件的包括分块URL、分块文件指纹、伪装前的分块文件数据流在伪装后的分块文件数据流中的起始偏移量与伪装前的分块文件数据流的数据长度等的一组信息,得到N个分块文件对应的N组信息。并按照分块文件在文件分割前中的排列顺序,将该N组信息排列形成分块信息序列。具体地,可以将所述分块文件的起始偏移量作为键,其余信息如分块URL、分块文件指纹与数据长度等作为值,形成所述分块文件的键值(key-value)对插入一个字典(dict)。在使用多线程上传至少一个分块文件的情况下,可以待所有线程上传结束后,再将所有分块文件对应的键值对按这些分块文件在文件分割前中的排列顺序在字典中排序,排序后的字典即为分块信息序列。再将分块信息序列与文件的文件名、文件修改时间、MIME类型、文件指纹、上传时间、使用的编码器等信息一起使用JSON等格式进行序列化,从而得到文件的元信息。文件名的扩展名与MIME类型具有一一对应的关系,因此,存储的文件的元信息中也可以不包括MIME类型,在需要将下载分块文件还原得到原始文件时,可以从文件名的扩展名推得对应的MIME类型。In one of the embodiments, as shown in FIG. 10, the intermediate server 104 can collect the block URL, the fingerprint of the block file, and the block file data stream before the disguise and the block file data stream after the disguise through the abstract file object. A set of information such as the start offset in the block file data stream and the data length of the block file data stream before masquerading are obtained to obtain N sets of information corresponding to the N block files. The N groups of information are arranged to form a block information sequence according to the sequence of the block files before the file is split. Specifically, the starting offset of the block file can be used as a key, and the rest of the information such as the block URL, the fingerprint of the block file, and the data length can be used as values to form the key-value of the block file. ) To insert a dictionary (dict). In the case of uploading at least one block file using multiple threads, you can wait for all the threads to upload, and then sort the key-value pairs corresponding to all the block files in the dictionary according to the order in which the block files are arranged before the file is split. , The sorted dictionary is the block information sequence. Then serialize the block information sequence with the file name, file modification time, MIME type, file fingerprint, upload time, encoder used and other information using JSON and other formats to obtain the meta information of the file. The file name extension has a one-to-one correspondence with the MIME type. Therefore, the MIME type may not be included in the meta-information of the stored file. When you need to restore the downloaded block file to get the original file, you can use the file name extension The name infers the corresponding MIME type.
除了在上传文件之前对整个文件进行查找之外,本申请还可以在上传文件之前,对分块文件进行查找。在一个实施例中,如图6所示,在步骤S402之后,以及步骤S306之前,文件上传方法还包括:步骤S602,在所述信息存储器和所述存储服务器中分别查找至少一个第二文件标识中的每个第二文件标识。In addition to searching for the entire file before uploading the file, this application can also search for block files before uploading the file. In one embodiment, as shown in FIG. 6, after step S402 and before step S306, the file upload method further includes: step S602, searching for at least one second file identifier in the information storage and the storage server, respectively Each second file ID in.
相应地,步骤S306和步骤S308包括:步骤S604,将至少一个第二文件标识中未存储于信息存储器和存储服务器中的第二文件标识作为目标第二文件标识,将目标第二文件标识对应的分块文件上传至存储服务器;以及步骤S606,接收存储服务器发送的目标第二文件标识对应的分块文件的存储地址。Correspondingly, step S306 and step S308 include: step S604, using the second file identifier of the at least one second file identifier that is not stored in the information storage and storage server as the target second file identifier, and assigning the target second file identifier to the corresponding The block file is uploaded to the storage server; and step S606, the storage address of the block file corresponding to the target second file identifier sent by the storage server is received.
在本实施例中,第二文件标识可以是所述文件的校验值,例如是所述文件的文件指纹,通过计算所述文件的文件指纹,可以较高精度的对所述文件的内容进行比对。在文件对应的数据流的传输过程中,可同时使用SHA-1等哈希算法迭代计算所述文件的校验值。In this embodiment, the second file identifier may be the check value of the file, for example, the file fingerprint of the file. By calculating the file fingerprint of the file, the content of the file can be performed with higher accuracy. Comparison. During the transmission of the data stream corresponding to the file, a hash algorithm such as SHA-1 can be used to iteratively calculate the check value of the file.
在上述实施例中,先将待上传的文件分割为分块文件,然后以分块文件为单位,在中间服务器的信息存储器中和存储服务器中查找各个分块文件先前是否上传过相同的文件,上传分块文件时,仅需上传先前未上传过的分块文件,而不必重复上传之前已经上传过的分块文件,从而进一步节省了文件存储容量和系统处理资源。在一个实施例中,如图6所示,在步骤S602之后,以及步骤S310之前,文件上传方法还包括:步骤S608,将至少一个第二文件标识中已经存储于信息存储器或存储服务器中的第二文件标识作为已存第二文件标识,并获取已存第二文件标识对应的分块文件的存储地址。相应地,步骤S310中存储的元信息包括关联存储的文件的第一文件标识、目标第二文件标识、已存第二文件标识、目标第二文件标识对应的分块文件的存储地址、已存第二文件标识对应的分块文件的存储地址、以及目标第二文件标识对应的分块文件和已存第二文件标识对应的分块文件分割前在文件中的排列顺序的信息。In the above-mentioned embodiment, the file to be uploaded is first divided into block files, and then the block file is used as a unit to find whether each block file has previously uploaded the same file in the information storage of the intermediate server and in the storage server. When uploading a block file, you only need to upload a block file that has not been uploaded before, instead of repeatedly uploading a block file that has been uploaded before, thereby further saving file storage capacity and system processing resources. In one embodiment, as shown in FIG. 6, after step S602 and before step S310, the file upload method further includes: step S608, identifying the first file that has been stored in the information storage or the storage server among the at least one second file identifier. The second file identifier is used as the stored second file identifier, and the storage address of the block file corresponding to the stored second file identifier is obtained. Correspondingly, the meta-information stored in step S310 includes the first file identifier, the target second file identifier, the stored second file identifier, the storage address of the block file corresponding to the target second file identifier, and the stored file identifier of the associated stored file. The storage address of the block file corresponding to the second file identifier, and information about the sequence of the block file corresponding to the target second file identifier and the block file corresponding to the stored second file identifier in the file before splitting.
在上述实施例中,对于一个待上传的文件分割得到的至少一个分块文件,在上传了先前未 上传过的分块文件后,可以分别将其中先前已经上传过的分块文件的元信息,与当前上传的分块文件的元信息结合起来,得到完整的待上传文件的元信息,从而下载时可以基于所述文件的元信息下载得到对应的原始文件。In the above embodiment, for at least one segmented file obtained by segmenting a file to be uploaded, after uploading segmented files that have not been uploaded before, the meta information of the segmented files that have been uploaded before can be separately uploaded, Combined with the meta-information of the currently uploaded block file, the complete meta-information of the file to be uploaded is obtained, so that the corresponding original file can be downloaded based on the meta-information of the file when downloading.
在一个实施例中,如图7所示,本申请的信息存储器110可以存储有数据库,该数据库包括至少一个公共数据库111以及多个本地数据库112。其中,每个用户设备102对应使用一个本地数据库112,所述本地数据库112专用于存储对应的用户设备102的索引信息对于个人用户,所述本地数据库112可以配置在个人用户终端上,对于企业用户,所述本地数据库可以配置所述企业服务器上或所述企业的企业用户终端上。公共数据库111可以用于存储多个用户设备102上传的可共享的文件的索引信息。这样能够实现跨实例之间共享文件的索引信息,从而允许在使用本框架的任一实例添加了新的索引信息时,其余实例都能查询到这个索引信息,从而将文件共享到所有实例。In one embodiment, as shown in FIG. 7, the information storage 110 of the present application may store a database, and the database includes at least one public database 111 and multiple local databases 112. Wherein, each user equipment 102 corresponds to a local database 112, and the local database 112 is dedicated to storing the index information of the corresponding user equipment 102. For individual users, the local database 112 can be configured on the personal user terminal. For enterprise users, The local database can be configured on the enterprise server or on the enterprise user terminal of the enterprise. The public database 111 may be used to store index information of sharable files uploaded by multiple user equipment 102. In this way, the index information of the file can be shared across instances, so that when new index information is added to any instance using this framework, the other instances can query the index information, thereby sharing the file to all instances.
在一个实施例中,所述存储服务器的数量为多个;所述分块文件的数量为多个;所述将所述至少一个分块文件上传至所述存储服务器包括:将多个所述分块文件分别上传至多个所述存储服务器,以使得所述多个存储服务器中的每个存储服务器存储有所述多个分块文件中的一部分分块文件。如此,每个存储服务器中仅保存文件的多个分块文件中的一部分分块文件,而没有保存文件的全部的分块文件,使得从单个存储服务器难以还原完整的原始文件,提升了文件存储的安全性。In one embodiment, the number of the storage server is multiple; the number of the block file is multiple; the uploading the at least one block file to the storage server includes: The block files are respectively uploaded to the plurality of storage servers, so that each storage server of the plurality of storage servers stores a part of the block files of the plurality of block files. In this way, each storage server saves only part of the block files of the multiple block files of the file, but does not save all the block files of the file, making it difficult to restore the complete original file from a single storage server, which improves the file storage Security.
在一个实施例中,所述存储服务器的数量为多个;所述将所述至少一个分块文件上传至所述存储服务器包括:将所述至少一个分块文件分别上传至多个所述存储服务器,以使得所述至少一个分块文件重复地存储在所述多个存储服务器中的至少两个存储服务器中。如此,每个分块文件在两个以上的存储服务器中得到了备份存储,在下载分块文件时,如果不能从某个存储服务器下载到正确的分块文件,则还可以从备用的存储服务器下载分块文件,从而保证了文件下载的可靠性。In an embodiment, the number of the storage server is multiple; the uploading the at least one block file to the storage server includes: uploading the at least one block file to the multiple storage servers respectively , So that the at least one block file is repeatedly stored in at least two storage servers of the plurality of storage servers. In this way, each block file is backed up and stored in more than two storage servers. When downloading block files, if the correct block file cannot be downloaded from a storage server, you can also download it from a backup storage server Download files in blocks, thereby ensuring the reliability of file downloads.
进一步地,在一个实施例中,当存在三个以上的存储服务器时,还可以结合上述两个实施例,将至少一个分块文件存储至存储服务器包括:将多个分块文件分别存储至多个存储服务器,以使得多个存储服务器中的每个存储服务器存储有多个分块文件的一部分分块文件,并且多个分块文件中的每个分块文件存储在多个存储服务器中的至少两个存储服务器中。如此,可以在保证文件存储安全性的同时提升文件下载的可靠性。Further, in one embodiment, when there are more than three storage servers, the above two embodiments may also be combined, and storing at least one block file in the storage server includes: storing the multiple block files to multiple Storage server, so that each of the plurality of storage servers stores a part of the plurality of block files, and each of the plurality of block files is stored in at least one of the plurality of storage servers Two storage servers. In this way, the reliability of file download can be improved while ensuring the security of file storage.
在一个实施例中,上述文件上传方法还可以包括指定更新一个已上传文件的任意字节范围的数据的步骤。具体地,中间服务器104在接收到更新文件的指定字节范围的文件更新请求时,先基于第一文件标识查找到先前上传的旧的文件的元信息,然后基于文件更新请求中指定的文件更新范围,确定由所述文件更新范围部分或全部覆盖的一个或多个已上传的分块文件,基于这些已上传的分块文件对应的伪装前的各个分块文件的起始位置和终止位置,对文件更新请求中待上传的新的文件进行分割和伪装,然后用分割和伪装后的新的分块文件替换存储服务器中存储的旧的分块文件,并更新替换的分块文件的元信息,即可更新已上传文件的任意字节范围的数据。进一步地,当请求更新的文件更新范围的起始位置与终止位置,与先前上传的旧的分块文件的起始位置与终止位置不完全对齐,即它们之间具有一个偏移量差值时,可以根据指定字节范围的起始位置与终止位置,对下载的多个分块文件中对应的起始的分块文件和/或末尾的分块文件进行切分,去除指定字节范围之外的部分,保留并还原指定字节范围以内的一个或多 个分块文件,从而准确地下载符合指定字节范围的文件。其中,各个分块文件的起始位置和终止位置可以根据各个分块文件的元信息中记载的分块文件的数据长度计算得到。其中,同样可以使用HTTP Range对文件更新范围进行指定,Web服务器能够从客户端发起的POST/PUT请求中解析以得到请求头中的Range数据。In an embodiment, the above-mentioned file uploading method may further include the step of specifying to update data in any byte range of an uploaded file. Specifically, when the intermediate server 104 receives the file update request of the specified byte range of the update file, it first finds the metadata of the old file uploaded previously based on the first file identifier, and then updates the file based on the file specified in the file update request. Range, determine one or more uploaded block files partially or fully covered by the file update range, based on the start and end positions of each block file before disguise corresponding to these uploaded block files, Split and disguise the new file to be uploaded in the file update request, then replace the old block file stored in the storage server with the new block file after division and disguise, and update the meta information of the replaced block file , You can update the data of any byte range of the uploaded file. Further, when the start position and end position of the update range of the file requested to be updated are not completely aligned with the start position and end position of the old block file uploaded previously, that is, there is an offset difference between them , According to the start position and end position of the specified byte range, the corresponding start block file and/or the end block file of the downloaded multiple block files can be divided, and the specified byte range can be removed. For the outside part, reserve and restore one or more block files within the specified byte range, so as to accurately download the file that meets the specified byte range. Wherein, the start position and the end position of each block file can be calculated according to the data length of the block file recorded in the meta information of each block file. Among them, HTTP Range can also be used to specify the file update range, and the Web server can parse the POST/PUT request initiated by the client to obtain the Range data in the request header.
在一个实施例中,如图8所示,本申请提供了一种文件下载方法。以所述文件下载方法应用于图1中的中间服务器104以及本申请的后端程序部署在中间服务器104上为例进行说明,可以包括以下步骤S802-S810。In an embodiment, as shown in FIG. 8, the present application provides a file download method. Taking the file download method applied to the intermediate server 104 in FIG. 1 and the back-end program of this application deployed on the intermediate server 104 as an example for description, the method may include the following steps S802-S810.
步骤S802,获取待下载的文件的第一文件标识。Step S802: Obtain the first file identifier of the file to be downloaded.
具体地,中间服务器104可以接收用户设备102发送的文件下载请求,所述文件下载请求中携带有待下载的文件的第一文件标识。中间服务器104解析所述文件下载请求,得到待下载的文件的第一文件标识。Specifically, the intermediate server 104 may receive a file download request sent by the user equipment 102, where the file download request carries the first file identifier of the file to be downloaded. The intermediate server 104 parses the file download request to obtain the first file identifier of the file to be downloaded.
步骤S804,基于第一文件标识,获取文件的元信息;其中文件的元信息包括文件的第一文件标识、至少一个存储地址以及至少一个存储地址对应的至少一个分块文件在文件分割前中的排列顺序的信息。Step S804: Obtain the meta information of the file based on the first file identifier; where the meta information of the file includes the first file identifier of the file, at least one storage address, and at least one block file corresponding to the at least one storage address in the file before splitting. Sort order information.
在本步骤中,中间服务器104基于所述文件的第一文件标识,查找到所述文件的元信息。In this step, the intermediate server 104 finds the meta information of the file based on the first file identifier of the file.
步骤S806,利用至少一个存储地址,从与至少一个存储地址对应的存储服务器下载至少一个分块文件。Step S806: Use at least one storage address to download at least one block file from a storage server corresponding to the at least one storage address.
在本步骤中,中间服务器104分别利用至少一个存储地址,生成对应的多个文件下载请求,并将所述多个文件下载请求分别发送至对应的一个或多个存储服务器106,从每个存储地址对应的存储服务器下载对应的一个分块数据,从而得到所述至少一个存储地址一一对应的至少一个分块文件。In this step, the intermediate server 104 uses at least one storage address to generate corresponding multiple file download requests, and sends the multiple file download requests to the corresponding one or more storage servers 106, from each storage address. The storage server corresponding to the address downloads the corresponding piece of block data, so as to obtain at least one piece of file corresponding to the at least one storage address one-to-one.
在其中一个实施例中,存在多个存储服务器,中间服务器104在发送每个存储地址对应的文件下载请求之前,需要先获知所述存储地址对应的是哪个存储服务器。当存储服务器返回的存储地址自身携带有所述存储服务器的信息时,中间服务器104可以从所述存储地址中读取到对应的存储服务器的信息。In one of the embodiments, there are multiple storage servers, and the intermediate server 104 needs to know which storage server the storage address corresponds to before sending the file download request corresponding to each storage address. When the storage address returned by the storage server itself carries the information of the storage server, the intermediate server 104 can read the information of the corresponding storage server from the storage address.
步骤S808,基于至少一个分块文件的排列顺序的信息,将至少一个分块文件还原为完整文件并返回完整文件。Step S808, based on the information of the arrangement sequence of the at least one block file, restore the at least one block file to a complete file and return the complete file.
具体的,中间服务器104可以基于元信息中存储的至少一个存储地址对应的至少一个分块文件在文件分割前中的排列顺序的信息,将所述至少一个分块文件还原为完整文件并返回所述完整文件至用户设备102。Specifically, the intermediate server 104 may restore the at least one block file to a complete file based on the information about the arrangement sequence of the at least one block file corresponding to the at least one storage address stored in the meta-information before the file is split. Describe the complete file to the user device 102.
本申请的上述文件下载方法,能够基于存储的至少一个分块文件的排列顺序的信息,将下载的至少一个分块文件还原为完整文件,并返回完整文件至用户设备102。The above-mentioned file download method of the present application can restore the downloaded at least one block file to a complete file based on the stored information of the arrangement sequence of the at least one block file, and return the complete file to the user device 102.
在一个实施例中,当文件的元信息存储于信息存储器中时,上述文件下载方法中步骤S504的基于第一文件标识,获取文件的元信息包括:基于第一文件标识,在信息存储器查找并获取文件的元信息。In one embodiment, when the meta-information of the file is stored in the information storage, the step S504 in the above-mentioned file download method is based on the first file identification, and the obtaining of the meta-information of the file includes: based on the first file identification, searching and combining in the information storage Get the meta information of the file.
在上述实施例中,文件的元信息存储在信息存储器,如此,从存储服务器不能获取文件的元信息,从而难以还原得到上传的原始文件,进一步提升了本申请存储文件的安全性。In the above embodiment, the meta-information of the file is stored in the information storage. In this way, the meta-information of the file cannot be obtained from the storage server, making it difficult to restore the uploaded original file, which further improves the security of the file stored in this application.
在一个实施例中,当文件的元信息存储于存储服务器时,上述文件下载方法中步骤S804的基于第一文件标识,获取文件的元信息包括:基于第一文件标识,在信息存储器查找并获取文 件的元信息的存储地址;并利用元信息的存储地址,从与元信息的存储地址对应的存储服务器下载文件的元信息。In one embodiment, when the meta-information of the file is stored in the storage server, in step S804 of the above-mentioned file downloading method, based on the first file identification, obtaining the meta-information of the file includes: searching and obtaining in the information storage based on the first file identification The storage address of the meta-information of the file; and using the storage address of the meta-information to download the meta-information of the file from the storage server corresponding to the storage address of the meta-information.
在上述实施例中,通过将文件的元信息也上传至存储服务器,本信息存储器可以仅保存文件的元信息的存储地址与第一文件标识的信息,从而可以减少本信息存储器的数据存储负担,节省本系统的数据存储容量。In the above embodiment, by uploading the meta-information of the file to the storage server, the information storage can only store the storage address of the meta-information of the file and the information identified by the first file, thereby reducing the data storage burden of the information storage. Save the data storage capacity of the system.
在一个实施例中,如图9所示,步骤S804中获取的元信息还包括关联存储的至少一个分块文件对应的至少一个第二文件标识;在步骤S806之后,以及在步骤S808之前,文件下载方法还包括:步骤S904,计算下载的至少一个分块文件对应的至少一个第三文件标识。步骤906,当分块文件的第三文件标识与所述分块文件的第二文件标识匹配时,确定所述分块文件通过校验。步骤908,当分块文件的第三文件标识与所述分块文件的第二文件标识不匹配时,确定所述分块文件未通过校验,并重新下载所述分块文件,将重新下载的所述分块文件替换未通过校验的所述分块文件。In one embodiment, as shown in FIG. 9, the meta information obtained in step S804 further includes at least one second file identifier corresponding to at least one block file stored in association; after step S806, and before step S808, the file The download method further includes: step S904, calculating at least one third file identifier corresponding to the downloaded at least one block file. Step 906: When the third file identifier of the divided file matches the second file identifier of the divided file, it is determined that the divided file passes the verification. Step 908: When the third file identifier of the divided file does not match the second file identifier of the divided file, it is determined that the divided file has not passed the verification, and the divided file is re-downloaded. The block file replaces the block file that fails the verification.
第三文件标识和第二文件标识都是用于唯一地标识所述分块文件的信息,当两个分块文件具有相同的第二文件标识或第三文件标识时,则可以认为这两个分块文件是相同的文件。在本实施例中,第二文件标识和第三文件标识可以是所述分块文件的校验值,例如是所述分块文件的分块文件指纹,通过计算所述分块文件的分块文件指纹,可以较高精度的对所述分块文件的内容进行比对。在分块文件对应的数据流的传输过程中,可同时使用SHA-1等哈希算法迭代计算所述分块文件的校验值。The third file identifier and the second file identifier are both information used to uniquely identify the segmented file. When two segmented files have the same second file identifier or third file identifier, they can be considered Chunked files are the same file. In this embodiment, the second file identifier and the third file identifier may be the check value of the block file, for example, the block file fingerprint of the block file, by calculating the block file of the block file File fingerprints can compare the content of the block files with higher accuracy. During the transmission of the data stream corresponding to the block file, a hash algorithm such as SHA-1 may be used to iteratively calculate the check value of the block file.
在上述实施例中,通过在执行文件伪装前计算并存储分块文件的第二文件标识,可以对伪装前的原始分块文件的特征信息进行记录。在下载并还原得到伪装前的分块文件后,计算下载得到的伪装前的分块文件的第三文件标识与上传的伪装前的分块文件的第二文件标识进行比对,当所有分块都通过校验时,即可认定还原得到的完整文件与上传时的原始文件相同,从而能够保证下载的文件的完整性和可靠性。In the foregoing embodiment, by calculating and storing the second file identifier of the block file before performing the file disguise, the characteristic information of the original block file before the disguise can be recorded. After downloading and restoring the block file before disguise, the third file ID of the block file before disguise is calculated and compared with the second file ID of the block file uploaded before disguise. When all the blocks are When both pass the verification, it can be determined that the restored complete file is the same as the original file when uploaded, so that the integrity and reliability of the downloaded file can be guaranteed.
进一步地,在一个实施例中,上传文件时,可以在步骤S404中对分块文件进行伪装之前,在每个分块文件中添加冗余部分,该冗余部分中包括纠删码,例如里德-所罗门编码(Reed-Solomon编码,简称RS编码);下载文件时,当对下载得到的每个分块文件进行校验时,作为上述步骤908的替代,如果当前的分块文件未通过校验,则可以利用该分块文件具有的纠删码对该分块文件进行恢复,以提升文件下载的稳定性。相应地,在步骤S808中恢复得到完整文件之前,需要删除分块文件中该冗余部分,以使得恢复得到的完整文件与原始文件一致。Further, in one embodiment, when uploading files, before disguising the block files in step S404, a redundant part may be added to each block file, and the redundant part includes an erasure code, for example, Reed-Solomon encoding (Reed-Solomon encoding, RS encoding for short); when downloading files, when verifying each block file obtained by the download, it is used as an alternative to the above step 908, if the current block file fails the calibration In order to improve the stability of file download, the erasure code of the block file can be used to restore the block file. Correspondingly, before the complete file is restored in step S808, the redundant part in the block file needs to be deleted, so that the restored complete file is consistent with the original file.
在下载文件时,需要获取伪装前的至少一个分块文件,以还原得到原始文件。在一个实施例中,如图9所示,步骤S804中获取的元信息还包括与至少一个分块文件对应的至少一个编码器的信息;在步骤S806之后,并且在步骤S904之前,文件下载方法还包括:S902,基于至少一个编码器,获取与至少一个编码器对应的至少一个解码器;使用至少一个解码器,将对应的所述的至少一个分块文件分别还原为伪装前的至少一个分块文件。When downloading a file, it is necessary to obtain at least one block file before the disguise to restore the original file. In one embodiment, as shown in FIG. 9, the meta-information obtained in step S804 further includes information of at least one encoder corresponding to at least one block file; after step S806 and before step S904, the file download method It further includes: S902, obtaining at least one decoder corresponding to the at least one encoder based on the at least one encoder; using the at least one decoder to restore the corresponding at least one block file to at least one before the disguise. Block file.
其中,解码器是用于将输入的已经伪装成指定的目标文件格式的文件还原为伪装前的文件的模块。解码器所具有的目标文件格式代表所述解码器会将伪装成为目标文件格式的文件还原为其原来的格式。解码器可以从接收的每段数据流中提取出伪装前的分块文件部分,而忽略或删除伪装添加的文件头与文件尾等部分,以得到伪装前的分块文件。编码器与解码器具有一一对应的关系。在本实施例中,在得到编码器的信息后,即可得到对应的解码器的信息,并利用 对应的解码器的信息将伪装后的分块文件解码为伪装前的分块文件。Among them, the decoder is a module used to restore the input file that has been disguised as the specified target file format to the file before disguise. The target file format possessed by the decoder means that the decoder will restore the file disguised as the target file format to its original format. The decoder can extract the block file part before disguise from each received data stream, and ignore or delete the file header and file tail part added by disguise to obtain the block file before disguise. The encoder and decoder have a one-to-one correspondence. In this embodiment, after the information of the encoder is obtained, the information of the corresponding decoder can be obtained, and the information of the corresponding decoder is used to decode the block file after disguise into the block file before disguise.
作为上述步骤S902的替代方案,在另一个实施例中,当存储在存储服务器的至少一个分块文件中的每个分块文件分别伪装成了具有预定格式的分块文件时,步骤S806具体包括:利用伪装前的每一分块文件在相应的伪装后的分块文件中的位置,从与存储地址对应的存储服务器中存储的伪装后的分块文件中,下载伪装前的分块文件。As an alternative to the above step S902, in another embodiment, when each of the at least one segment files stored in the storage server is disguised as a segment file with a predetermined format, step S806 specifically includes : Using the position of each block file before disguise in the corresponding block file after disguise, download the block file before disguise from the block file after disguise stored in the storage server corresponding to the storage address.
其中,伪装前的分块文件在伪装后的分块文件中的位置,包括伪装前的分块文件在伪装后的分块文件中的起始位置和终止位置所组成的位置区间。所述位置区间可以基于伪装前的分块文件的数据长度以及伪装前的分块文件在伪装后的分块文件中的起始偏移量来确定。这些信息可以记录在文件的元信息中。The position of the block file before the disguise in the block file after the disguise includes the position interval composed of the start position and the end position of the block file before the disguise in the block file after the disguise. The position interval may be determined based on the data length of the block file before masquerading and the starting offset of the block file before masquerading in the block file after masquerading. This information can be recorded in the meta-information of the file.
在上述实施例中,对于上传前进行了伪装的分块文件,可以不必将完整的分块文件下载下来后再进行解码处理。在下载分块文件的过程中,利用伪装前的分块文件在伪装后的分块文件中的位置,直接从存储服务器中存储的伪装后的分块文件中,指定下载伪装前的分块文件所占据的位置区间的数据,而其它位置的数据则可以不用下载。具体地,可以利用HTTP Range请求头,在下载分块文件时指定下载某个分块文件的指定字节范围的数据。如此,可以直接下载得到伪装前的分块文件,进一步节省了数据传输流量,也节约了系统的性能成本。In the above-mentioned embodiment, it is not necessary to download the complete block file and then perform the decoding process for the block file that has been disguised before uploading. In the process of downloading the block file, use the position of the block file before the disguise in the block file after the disguise, directly from the block file after the disguise stored in the storage server, specify the download of the block file before the disguise The data of the occupied position interval can be downloaded without downloading the data of other positions. Specifically, the HTTP Range request header can be used to specify to download data in a specified byte range of a certain block file when downloading a block file. In this way, the block file before the disguise can be downloaded directly, which further saves data transmission traffic and also saves the performance cost of the system.
类似地,我们可以将所述指定下载某个分块文件的指定字节范围的数据的特性扩展到整个文件,通过编写一个同样能够接受并解析HTTP Range请求头的Web服务器,用户即可使用现有的一些下载工具来实现多线程下载与断点续传。此时Web服务器能够从客户端发起的GET/POST请求的URL PATH或是表单部分获取前文提到的能提取出文件的元信息的值,例如第一文件标识,然后解析请求头中的Range数据,针对Range数据中指定的文件下载范围,确定并下载与所述文件下载范围对应的一个或至少一个分块文件即可传回指定下载的数据。当请求的指定字节范围的起始位置与终止位置,与存储的分块文件的起始位置与终止位置不完全对齐,即它们之间具有一个偏移量时,可以根据指定字节范围的起始位置与终止位置,对下载的多个分块文件中对应的起始的分块文件和/或末尾的分块文件进行切分,去除指定字节范围之外的部分,保留并还原指定字节范围以内的一个或多个分块文件,从而准确地下载符合指定字节范围的文件。Similarly, we can extend the feature of downloading the specified byte range of a block file to the entire file. By writing a web server that can also accept and parse the HTTP Range request header, users can use the current There are some download tools to achieve multi-threaded download and resumable upload. At this point, the Web server can obtain the value of the meta-information mentioned above that can extract the file from the URL PATH or form part of the GET/POST request initiated by the client, such as the first file identifier, and then parse the Range data in the request header For the file download range specified in the Range data, determine and download one or at least one block file corresponding to the file download range to return the specified download data. When the start position and end position of the requested specified byte range are not completely aligned with the start and end positions of the stored block file, that is, there is an offset between them, it can be based on the specified byte range Start position and end position, split the corresponding start block file and/or end block file of the downloaded multiple block files, remove the part outside the specified byte range, retain and restore the specified One or more block files within the byte range, so as to accurately download the file that meets the specified byte range.
在一个实施例中,所述存储服务器的数量为多个;所述分块文件的数量为多个;所述多个分块文件中的每个分块文件重复存储在所述多个存储服务器中的至少两个存储服务器;所述利用所述至少一个存储地址,从与所述至少一个存储地址对应的存储服务器下载所述至少一个分块文件包括:根据多个所述分块文件中的每一分块文件对应的多个存储地址中的第一存储地址,从与所述第一存储地址对应的存储服务器下载对应的分块文件;当利用所述第一存储地址从与所述第一存储地址对应的存储服务器下载对应的所述分块文件失败时,利用所述分块文件对应的所述多个存储地址中的第二存储地址,从与所述第二存储地址对应的存储服务器下载对应的分块文件。In one embodiment, the number of storage servers is multiple; the number of block files is multiple; each block file of the multiple block files is repeatedly stored in the multiple storage servers Said downloading the at least one block file from the storage server corresponding to the at least one storage address by using the at least one storage address includes: according to the The first storage address of the multiple storage addresses corresponding to each block file downloads the corresponding block file from the storage server corresponding to the first storage address; when the first storage address is used from the first storage address to the first storage address When the storage server corresponding to a storage address fails to download the corresponding block file, the second storage address of the plurality of storage addresses corresponding to the block file is used to download from the storage corresponding to the second storage address. The server downloads the corresponding block file.
在一个实施例中,当使用第二存储地址下载也失败时,如果还有其它存储地址,则可以继续调换使用其它的存储地址下载分块文件,直至下载得到所需的分块文件为止。如此,能够有效提升文件下载的可靠性。In one embodiment, when the download using the second storage address fails, if there are other storage addresses, you can continue to switch and use other storage addresses to download the block file until the required block file is downloaded. In this way, the reliability of file download can be effectively improved.
应该理解的是,虽然图3-9的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有 严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图3-9中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowcharts of FIGS. 3-9 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless specifically stated in this article, there is no strict order for the execution of these steps, and these steps can be executed in other orders. Moreover, at least part of the steps in Figures 3-9 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The order of execution of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
在一个实施例中,如图11所示,提供了一种文件上传装置1100,包括:上传文件获取模块1101、文件分割模块1102、文件上传模块1103、存储地址接收模块1104和元信息存储模块1105,In one embodiment, as shown in FIG. 11, a file uploading device 1100 is provided, including: an upload file acquisition module 1101, a file segmentation module 1102, a file upload module 1103, a storage address receiving module 1104, and a meta-information storage module 1105 ,
上传文件获取模块1101,用于获取待上传的文件。The upload file obtaining module 1101 is used to obtain files to be uploaded.
文件分割模块1102,用于将待上传的文件分割为至少一个分块文件。The file segmentation module 1102 is used to segment the file to be uploaded into at least one segmented file.
文件上传模块1103,用于将至少一个分块文件上传至存储服务器。The file upload module 1103 is used to upload at least one block file to the storage server.
存储地址接收模块1104,用于接收存储服务器返回的至少一个分块文件对应的至少一个存储地址。The storage address receiving module 1104 is configured to receive at least one storage address corresponding to at least one block file returned by the storage server.
元信息存储模块1105,用于存储文件的元信息。The meta-information storage module 1105 is used to store the meta-information of the file.
在一个实施例中,如图12所示,提供了一种文件下载装置1200,包括:文件标识获取模块1201、元信息获取模块1202、文件下载模块1203和文件还原模块1204,其中:In one embodiment, as shown in FIG. 12, a file downloading device 1200 is provided, including: a file identification obtaining module 1201, a meta information obtaining module 1202, a file downloading module 1203, and a file restoring module 1204, wherein:
文件标识获取模块1201,用于获取待下载的文件的第一文件标识。The file identifier obtaining module 1201 is used to obtain the first file identifier of the file to be downloaded.
元信息获取模块1202,用于基于第一文件标识,获取文件的元信息。所述文件的元信息包括文件的第一文件标识、至少一个存储地址以及至少一个存储地址对应的至少一个分块文件在分割前的文件中的排列顺序的信息。The meta-information acquiring module 1202 is configured to acquire meta-information of the file based on the first file identifier. The meta-information of the file includes the first file identifier of the file, at least one storage address, and information about the arrangement sequence of the at least one block file corresponding to the at least one storage address in the file before division.
文件下载模块1203,用于利用至少一个存储地址,从与至少一个存储地址对应的存储服务器下载至少一个分块文件。The file download module 1203 is configured to use at least one storage address to download at least one block file from a storage server corresponding to the at least one storage address.
文件还原模块1204,用于基于至少一个分块文件的排列顺序的信息,将至少一个分块文件还原为完整文件并返回完整文件。The file restoration module 1204 is configured to restore at least one block file to a complete file and return the complete file based on the information of the arrangement sequence of the at least one block file.
在一个实施例中,如图13所示,本申请提供一种文件管理装置1300,文件管理装置1300与存储服务器106通信连接;文件管理装置1300包括处理器1301和信息存储器110;处理器1301用于:In one embodiment, as shown in FIG. 13, the present application provides a file management device 1300, the file management device 1300 is in communication connection with the storage server 106; the file management device 1300 includes a processor 1301 and an information storage 110; the processor 1301 uses At:
当收到上传文件请求时,执行:获取待上传的文件;将待上传的文件分割为至少一个分块文件;将至少一个分块文件上传至存储服务器;接收存储服务器返回的至少一个分块文件对应的至少一个存储地址;存储文件的元信息;When a file upload request is received, execute: obtain the file to be uploaded; divide the file to be uploaded into at least one block file; upload at least one block file to the storage server; receive at least one block file returned by the storage server At least one corresponding storage address; the meta-information of the storage file;
当收到下载文件请求时,执行:获取待下载的文件的第一文件标识;基于第一文件标识,获取文件的元信息;利用至少一个存储地址,从与至少一个存储地址对应的存储服务器下载至少一个分块文件;以及至少一个分块文件的排列顺序的信息,将至少一个分块文件还原为完整文件并返回完整文件;When a file download request is received, execute: obtain the first file identification of the file to be downloaded; obtain the meta-information of the file based on the first file identification; use at least one storage address to download from the storage server corresponding to the at least one storage address At least one block file; and at least one block file arrangement order information, restore at least one block file to a complete file and return the complete file;
信息存储器110用于存储所述文件的元信息或元信息的存储地址,其中,所述文件的元信息包括所述文件的第一文件标识、所述至少一个存储地址以及所述至少一个存储地址对应的所述至少一个分块文件在所述文件分割前中的排列顺序所述存储服务器用于存储上传的。The information storage 110 is configured to store meta-information of the file or a storage address of the meta-information, where the meta-information of the file includes the first file identifier of the file, the at least one storage address, and the at least one storage address The corresponding arrangement sequence of the at least one divided file before the file is divided, and the storage server is used to store the uploaded file.
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above examples only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (19)

  1. 一种文件上传方法,所述方法包括:A file upload method, the method includes:
    获取待上传的文件;Obtain the file to be uploaded;
    将待上传的所述文件分割为至少一个分块文件;Dividing the file to be uploaded into at least one divided file;
    将所述至少一个分块文件上传至存储服务器;Uploading the at least one block file to the storage server;
    接收所述存储服务器返回的所述至少一个分块文件对应的至少一个存储地址;以及Receiving at least one storage address corresponding to the at least one block file returned by the storage server; and
    存储所述文件的元信息,其中所述文件的元信息包括所述文件的第一文件标识、所述至少一个存储地址、以及所述至少一个存储地址对应的所述至少一个分块文件在分割前的所述文件中的排列顺序。Store the meta-information of the file, where the meta-information of the file includes the first file identifier of the file, the at least one storage address, and the at least one block file corresponding to the at least one storage address. The order in the previous document.
  2. 根据权利要求1所述的方法,其中,所述存储所述文件的元信息包括:The method according to claim 1, wherein said storing the meta-information of the file comprises:
    将所述文件的元信息存储在信息存储器中。The meta information of the file is stored in the information storage.
  3. 根据权利要求1所述的方法,其中,所述存储所述文件的元信息包括:The method according to claim 1, wherein said storing the meta-information of the file comprises:
    将所述文件的元信息上传至存储服务器;Uploading the meta information of the file to the storage server;
    接收所述存储服务器返回的所述元信息的存储地址;Receiving the storage address of the meta-information returned by the storage server;
    将所述元信息的存储地址与所述文件的第一文件标识关联地存储在信息存储器中。The storage address of the meta-information and the first file identification of the file are stored in the information storage in association with each other.
  4. 根据权利要求1所述的方法,其中,所述将待上传的所述文件分割为至少一个分块文件包括:The method according to claim 1, wherein the dividing the file to be uploaded into at least one block file comprises:
    基于预定大小,对所述待上传的所述文件进行分割,得到第一数量的具有所述预定大小的分块文件;所述预定大小的值小于或等于所述存储服务器所允许存储的文件大小的上限值,所述第一数量为所述待存储文件大小与所述预定大小之商取整的值;Based on a predetermined size, the file to be uploaded is divided to obtain a first number of block files having the predetermined size; the value of the predetermined size is less than or equal to the file size allowed to be stored by the storage server The upper limit value of the first number is a value rounded from the quotient of the size of the file to be stored and the predetermined size;
    当分割得到所述第一数量的具有所述预定大小的分块文件后,若所述文件存在剩余部分,则将所述文件的所述剩余部分分配为一个分块文件。After the first number of block files with the predetermined size are obtained by dividing, if there is a remaining part of the file, the remaining part of the file is allocated as a block file.
  5. 根据权利要求1所述的方法,其中,所述将待上传的所述文件分割为至少一个分块文件之后,以及将所述至少一个分块文件上传至存储服务器之前,还包括:The method according to claim 1, wherein after the dividing the file to be uploaded into at least one divided file and before uploading the at least one divided file to a storage server, the method further comprises:
    选择具有目标文件格式的编码器,使用所述编码器将所述至少一个分块文件分别伪装成具有所述目标文件格式;Selecting an encoder with a target file format, and using the encoder to disguise the at least one block file as having the target file format;
    其中,所述目标文件格式包括所述存储服务器允许存储的文件格式;Wherein, the target file format includes a file format allowed to be stored by the storage server;
    存储的所述文件的元信息还包括用于伪装所述至少一个分块文件的对应的至少一个编码器的信息。The stored meta information of the file further includes information for disguising the corresponding at least one encoder of the at least one block file.
  6. 根据权利要求1所述的方法,其中,所述获取待上传的文件之后还包括:The method according to claim 1, wherein, after obtaining the file to be uploaded, the method further comprises:
    获取待上传的所述文件的第一文件标识;Acquiring the first file identifier of the file to be uploaded;
    在信息存储器和存储服务器中查找所述第一文件标识;以及Searching the first file identifier in the information storage and the storage server; and
    当所述第一文件标识未存储于所述信息存储器和所述存储服务器中时,则继续执行所述将 待上传的所述文件分割为至少一个分块文件的步骤;When the first file identifier is not stored in the information storage and the storage server, continue to perform the step of dividing the file to be uploaded into at least one block file;
    当所述第一文件标识已经存储于所述信息存储器或所述存储服务器中时,则终止对所述文件的上传操作。When the first file identifier has been stored in the information storage or the storage server, the upload operation of the file is terminated.
  7. 根据权利要求1所述的方法,其中,所述将待上传的所述文件分割为至少一个分块文件之后还包括:The method according to claim 1, wherein, after dividing the file to be uploaded into at least one file in blocks, the method further comprises:
    计算所述至少一个分块文件对应的至少一个第二文件标识;Calculating at least one second file identifier corresponding to the at least one block file;
    其中,所述元信息还包括关联存储的所述至少一个分块文件对应的至少一个第二文件标识。Wherein, the meta information further includes at least one second file identifier corresponding to the at least one block file stored in association.
  8. 根据权利要求7所述的方法,其中,The method according to claim 7, wherein:
    所述计算所述至少一个分块文件对应的至少一个第二文件标识之后还包括在所述信息存储器和所述存储服务器中分别查找所述至少一个第二文件标识中的每个第二文件标识;After calculating the at least one second file identifier corresponding to the at least one block file, the method further includes searching for each second file identifier in the at least one second file identifier in the information storage and the storage server, respectively ;
    所述上传所述至少一个分块文件至存储服务器,接收所述存储服务器返回的至少一个分块文件对应的至少一个存储地址包括:The uploading the at least one block file to the storage server, and receiving the at least one storage address corresponding to the at least one block file returned by the storage server includes:
    确定将所述至少一个第二文件标识中未存储于所述信息存储器和所述存储服务器中的第二文件标识作为目标第二文件标识,将所述目标第二文件标识对应的分块文件上传至所述存储服务器;It is determined that the second file identifier of the at least one second file identifier that is not stored in the information storage and the storage server is used as the target second file identifier, and the block file corresponding to the target second file identifier is uploaded To the storage server;
    接收所述存储服务器返回的所述目标第二文件标识对应的分块文件的存储地址。Receiving the storage address of the block file corresponding to the target second file identifier returned by the storage server.
  9. 根据权利要求8所述的方法,其中,还包括:The method according to claim 8, further comprising:
    确定将所述至少一个第二文件标识中已经存储于所述信息存储器或所述存储服务器中的第二文件标识作为已存第二文件标识,并获取所述已存第二文件标识对应的分块文件的存储地址。It is determined that the second file identifier of the at least one second file identifier that has been stored in the information storage or the storage server is used as the stored second file identifier, and the score corresponding to the stored second file identifier is obtained The storage address of the block file.
  10. 根据权利要求1所述的方法,其中,所述存储服务器的数量为多个;所述分块文件的数量为多个;所述将所述至少一个分块文件上传至所述存储服务器包括:The method according to claim 1, wherein the number of the storage server is multiple; the number of the block file is multiple; and the uploading the at least one block file to the storage server comprises:
    将多个所述分块文件分别上传至多个所述存储服务器,以使得所述多个存储服务器中的每个存储服务器存储有所述多个所述分块文件中的一部分分块文件。Upload the plurality of block files to the plurality of storage servers respectively, so that each storage server of the plurality of storage servers stores a part of the block files of the plurality of block files.
  11. 根据权利要求1所述的方法,其中:所述存储服务器的数量为多个;所述将所述至少一个分块文件上传至所述存储服务器包括:The method according to claim 1, wherein: the number of the storage server is multiple; and the uploading the at least one block file to the storage server comprises:
    将所述至少一个分块文件分别上传至多个所述存储服务器,以使得所述至少一个分块文件重复地存储在所述多个存储服务器中的至少两个存储服务器中。Upload the at least one segmented file to the multiple storage servers respectively, so that the at least one segmented file is repeatedly stored in at least two of the multiple storage servers.
  12. 一种文件下载方法,所述方法包括:A file downloading method, the method includes:
    获取待下载的文件的第一文件标识;Acquiring the first file identifier of the file to be downloaded;
    基于所述第一文件标识,获取所述文件的元信息;所述文件的元信息包括所述文件的第一文件标识、至少一个存储地址、以及所述至少一个存储地址对应的至少一个分块文件在分割前的所述文件中的排列顺序的信息;Based on the first file identifier, the meta-information of the file is acquired; the meta-information of the file includes the first file identifier of the file, at least one storage address, and at least one block corresponding to the at least one storage address Information about the arrangement order of the file in the file before it is divided;
    利用所述至少一个存储地址,从与所述至少一个存储地址对应的存储服务器下载所述至少一个分块文件;以及Using the at least one storage address to download the at least one block file from a storage server corresponding to the at least one storage address; and
    基于所述至少一个分块文件的所述排列顺序的信息,将所述至少一个分块文件还原为完整文件并返回所述完整文件。Based on the information of the arrangement sequence of the at least one block file, restore the at least one block file to a complete file and return the complete file.
  13. 根据权利要求12所述的方法,其中,所述文件的元信息存储于信息存储器;所述基于所述第一文件标识,获取所述文件的元信息包括:The method according to claim 12, wherein the meta-information of the file is stored in an information storage; said obtaining the meta-information of the file based on the first file identifier comprises:
    基于所述第一文件标识,在所述信息存储器中,查找并获取所述文件的元信息。Based on the first file identifier, search and obtain meta information of the file in the information storage.
  14. 根据权利要求12所述的方法,其中,当所述文件的元信息存储于存储服务器;所述基于所述第一文件标识,获取所述文件的元信息包括:The method according to claim 12, wherein when the meta information of the file is stored in a storage server; said obtaining the meta information of the file based on the first file identifier comprises:
    基于所述第一文件标识,在所述信息存储器中查找并获取所述文件的元信息的存储地址;并Based on the first file identifier, search for and obtain the storage address of the meta-information of the file in the information storage; and
    利用所述元信息的存储地址,从与所述元信息的存储地址对应的所述存储服务器下载所述文件的元信息。Using the storage address of the meta information, download the meta information of the file from the storage server corresponding to the storage address of the meta information.
  15. 根据权利要求12所述的方法,其中,所述元信息还包括关联存储的所述至少一个分块文件对应的至少一个第二文件标识;The method according to claim 12, wherein the meta information further comprises at least one second file identifier corresponding to the at least one block file stored in association;
    所述基于所述至少一个分块文件的所述排列顺序的信息,将所述至少一个分块文件还原为完整文件并返回所述完整文件之前,还包括:Before restoring the at least one block file to a complete file based on the information of the arrangement sequence of the at least one block file and returning to the complete file, the method further includes:
    计算下载的所述至少一个分块文件对应的至少一个第三文件标识;Calculating at least one third file identifier corresponding to the downloaded at least one block file;
    当分块文件的第三文件标识与所述分块文件的第二文件标识匹配时,确定所述分块文件通过校验;When the third file identifier of the divided file matches the second file identifier of the divided file, determining that the divided file passes the verification;
    当分块文件的第三文件标识与所述分块文件的第二文件标识不匹配时,确定所述分块文件未通过校验,并重新下载所述分块文件,将重新下载的所述分块文件替换未通过校验的所述分块文件。When the third file identifier of the segmented file does not match the second file identifier of the segmented file, it is determined that the segmented file has not passed the verification, and the segmented file is downloaded again, and the downloaded segmented file will be re-downloaded. The block file replaces the block file that fails the verification.
  16. 根据权利要求12所述的方法,其中,所述元信息还包括与所述至少一个分块文件对应的至少一个编码器的信息;The method according to claim 12, wherein the meta information further comprises information of at least one encoder corresponding to the at least one block file;
    所述基于所述至少一个分块文件的所述排列顺序的信息,将所述至少一个分块文件还原为完整文件并返回所述完整文件之前,所述方法还包括:Before restoring the at least one block file to a complete file based on the information of the arrangement sequence of the at least one block file and returning to the complete file, the method further includes:
    基于所述至少一个编码器的信息,获取与所述至少一个编码器对应的至少一个解码器;Obtaining at least one decoder corresponding to the at least one encoder based on the information of the at least one encoder;
    使用所述至少一个解码器,将对应的所述至少一个分块文件分别还原为伪装前的至少一个分块文件。Using the at least one decoder, the corresponding at least one block file is respectively restored to at least one block file before masquerading.
  17. 根据权利要求12所述的方法,其中,当存储在所述存储服务器的所述至少一个分块文件中的每个分块文件分别伪装成了具有目标文件格式的分块文件时,所述利用所述至少一个存储地址,从与所述至少一个存储地址对应的存储服务器下载所述至少一个分块文件,包括:The method according to claim 12, wherein, when each of the at least one block file stored in the storage server is disguised as a block file with a target file format, the use The at least one storage address downloading the at least one block file from a storage server corresponding to the at least one storage address includes:
    利用伪装前的每一分块文件在相应的伪装后的分块文件中的位置,从与所述存储地址对应的存储服务器中存储的伪装后的所述分块文件中,下载伪装前的所述分块文件。Using the position of each block file before disguise in the corresponding block file after disguise, download all the block files before disguise from the block files after disguise stored in the storage server corresponding to the storage address. The block file is described.
  18. 根据权利要求12所述的方法,其中,The method of claim 12, wherein:
    所述存储服务器的数量为多个;The number of the storage server is multiple;
    当所述多个分块文件中的每个分块文件重复存储在所述多个存储服务器中的至少两个存储服务器;When each of the plurality of block files is repeatedly stored in at least two storage servers of the plurality of storage servers;
    所述利用所述至少一个存储地址,从与所述至少一个存储地址对应的存储服务器下载所述至少一个分块文件包括:The using the at least one storage address to download the at least one block file from a storage server corresponding to the at least one storage address includes:
    利用根据多个所述分块文件中的每一分块文件对应的多个存储地址中的第一存储地址,从与所述第一存储地址对应的存储服务器下载对应的分块文件;Downloading the corresponding block file from the storage server corresponding to the first storage address by using a first storage address among the plurality of storage addresses corresponding to each block file in the plurality of block files;
    当利用所述第一存储地址从与所述第一存储地址对应的存储服务器下载对应的所述分块文件失败时,利用所述分块文件对应的所述多个存储地址中的第二存储地址,从与所述第二存储地址对应的存储服务器下载对应的分块文件。When using the first storage address to download the corresponding block file from the storage server corresponding to the first storage address fails, use the second storage among the plurality of storage addresses corresponding to the block file Address, download the corresponding block file from the storage server corresponding to the second storage address.
  19. 一种文件管理装置,所述文件管理装置与存储服务器通信连接,所述文件管理装置包括处理器和信息存储器;A file management device, the file management device is in communication connection with a storage server, and the file management device includes a processor and an information memory;
    所述处理器用于,当收到上传文件请求时,执行:The processor is configured to, when receiving a file upload request, execute:
    获取待上传的文件;Obtain the file to be uploaded;
    将待上传的所述文件分割为至少一个分块文件;Dividing the file to be uploaded into at least one divided file;
    将所述至少一个分块文件上传至所述存储服务器;Uploading the at least one block file to the storage server;
    接收所述存储服务器返回的所述至少一个分块文件对应的至少一个存储地址;Receiving at least one storage address corresponding to the at least one block file returned by the storage server;
    存储所述文件的元信息;Store meta-information of the file;
    当收到下载文件请求时,执行:When a file download request is received, execute:
    获取待下载的文件的第一文件标识;Acquiring the first file identifier of the file to be downloaded;
    基于所述第一文件标识,获取所述文件的元信息;Obtaining meta-information of the file based on the first file identifier;
    利用所述至少一个存储地址,从与所述至少一个存储地址对应的存储服务器下载所述至少一个分块文件,以及至少一个分块文件的所述排列顺序的信息,将所述至少一个分块文件还原为完整文件并返回所述完整文件;Using the at least one storage address, download the at least one block file from a storage server corresponding to the at least one storage address, and the information of the arrangement order of the at least one block file, and divide the at least one block file Restore the file to a complete file and return the complete file;
    所述信息存储器用于存储所述文件的元信息或者元信息的存储地址;其中,所述文件的元信息包括所述文件的第一文件标识、所述至少一个存储地址以及所述至少一个存储地址对应的所述至少一个分块文件在所述文件分割前中的排列顺序。The information storage is used to store meta-information of the file or the storage address of the meta-information; wherein the meta-information of the file includes the first file identifier of the file, the at least one storage address, and the at least one storage address. The arrangement sequence of the at least one block file corresponding to the address before the file is split.
PCT/CN2020/092383 2020-05-26 2020-05-26 File uploading method, file downloading method and file management apparatus WO2021237467A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080007587.2A CN113273163A (en) 2020-05-26 2020-05-26 File uploading method, file downloading method and file management device
PCT/CN2020/092383 WO2021237467A1 (en) 2020-05-26 2020-05-26 File uploading method, file downloading method and file management apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/092383 WO2021237467A1 (en) 2020-05-26 2020-05-26 File uploading method, file downloading method and file management apparatus

Publications (1)

Publication Number Publication Date
WO2021237467A1 true WO2021237467A1 (en) 2021-12-02

Family

ID=77227980

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092383 WO2021237467A1 (en) 2020-05-26 2020-05-26 File uploading method, file downloading method and file management apparatus

Country Status (2)

Country Link
CN (1) CN113273163A (en)
WO (1) WO2021237467A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114338653A (en) * 2021-12-29 2022-04-12 中国电信股份有限公司 File breakpoint resuming method and device
CN115481158A (en) * 2022-09-22 2022-12-16 北京泰策科技有限公司 Automatic loading and converting method for data distributed cache
CN116527539A (en) * 2023-05-15 2023-08-01 合芯科技(苏州)有限公司 Data consistency verification method and device and computer equipment

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168537A (en) * 2021-11-27 2022-03-11 深圳市连用科技有限公司 Method for uploading file and terminal equipment
CN114978555B (en) * 2022-08-01 2022-10-21 北京惠朗时代科技有限公司 Remote online electronic signature system based on WEB script data stream operation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101227460A (en) * 2007-01-19 2008-07-23 秦晨 Method for uploading and downloading distributed document and apparatus and system thereof
CN103442090A (en) * 2013-09-16 2013-12-11 苏州市职业大学 Cloud computing system for data scatter storage
CN103685162A (en) * 2012-09-05 2014-03-26 中国移动通信集团公司 File storing and sharing method
CN103729470A (en) * 2014-01-20 2014-04-16 刘强 Secure storage method based on different cloud storage ends
CN103873504A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 System enabling data blocks to be stored in distributed server and method thereof
CN105718808A (en) * 2016-01-18 2016-06-29 天津科技大学 File encryption storage system and method based on multiple network disks

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103731451B (en) * 2012-10-12 2018-10-19 腾讯科技(深圳)有限公司 A kind of method and system that file uploads
CN103324552B (en) * 2013-06-06 2016-01-13 西安交通大学 Two benches list example duplicate removal data back up method
CN111049884A (en) * 2019-11-18 2020-04-21 武汉方始科技有限公司 Distributed large file storage system and file uploading and downloading method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101227460A (en) * 2007-01-19 2008-07-23 秦晨 Method for uploading and downloading distributed document and apparatus and system thereof
CN103685162A (en) * 2012-09-05 2014-03-26 中国移动通信集团公司 File storing and sharing method
CN103873504A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 System enabling data blocks to be stored in distributed server and method thereof
CN103442090A (en) * 2013-09-16 2013-12-11 苏州市职业大学 Cloud computing system for data scatter storage
CN103729470A (en) * 2014-01-20 2014-04-16 刘强 Secure storage method based on different cloud storage ends
CN105718808A (en) * 2016-01-18 2016-06-29 天津科技大学 File encryption storage system and method based on multiple network disks

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114338653A (en) * 2021-12-29 2022-04-12 中国电信股份有限公司 File breakpoint resuming method and device
CN115481158A (en) * 2022-09-22 2022-12-16 北京泰策科技有限公司 Automatic loading and converting method for data distributed cache
CN116527539A (en) * 2023-05-15 2023-08-01 合芯科技(苏州)有限公司 Data consistency verification method and device and computer equipment
CN116527539B (en) * 2023-05-15 2023-11-28 合芯科技(苏州)有限公司 Data consistency verification method and device and computer equipment

Also Published As

Publication number Publication date
CN113273163A (en) 2021-08-17

Similar Documents

Publication Publication Date Title
WO2021237467A1 (en) File uploading method, file downloading method and file management apparatus
US20200412525A1 (en) Blockchain filesystem
US8843454B2 (en) Elimination of duplicate objects in storage clusters
US8990257B2 (en) Method for handling large object files in an object storage system
US9183213B2 (en) Indirection objects in a cloud storage system
US9195666B2 (en) Location independent files
US20180060348A1 (en) Method for Replication of Objects in a Cloud Object Store
JP2002501255A (en) Encapsulation, representation and transfer of content addressable information
EP3716581A1 (en) Global file system for data-intensive applications
Yang et al. A security carving approach for AVI video based on frame size and index
CN116010348B (en) Distributed mass object management method and device
CN112866406A (en) Data storage method, system, device, equipment and storage medium
US20060020572A1 (en) Computer, storage system, file management method done by the computer, and program
EP4002143A1 (en) Storage of file system items related to a versioned snapshot of a directory-based file system onto a key-object storage system
CN114416676A (en) Data processing method, device, equipment and storage medium
US20170048303A1 (en) On the fly statistical delta differencing engine
US11442892B2 (en) File and data migration to storage system
US20170337204A1 (en) Differencing engine for moving pictures
CN108763425B (en) Method and apparatus for storing and reading audio files
CN115905120B (en) Archive file management method, archive file management device, archive file management computer device and archive file management storage medium
EP4195068A1 (en) Storing and retrieving media recordings in an object store
CN116974998A (en) Data file updating method, device, computer equipment and storage medium
CN117909138A (en) File recovery method, device, equipment and storage medium
WO2013136584A1 (en) Data transfer system
CN115017163A (en) Pathological digital section data dynamic management system and method based on object storage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20937952

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05.04.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20937952

Country of ref document: EP

Kind code of ref document: A1