CN117215477A - Data object storage method, device, computer equipment and storage medium - Google Patents

Data object storage method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117215477A
CN117215477A CN202210621695.XA CN202210621695A CN117215477A CN 117215477 A CN117215477 A CN 117215477A CN 202210621695 A CN202210621695 A CN 202210621695A CN 117215477 A CN117215477 A CN 117215477A
Authority
CN
China
Prior art keywords
file
data
directory
target
temporary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210621695.XA
Other languages
Chinese (zh)
Inventor
张伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210621695.XA priority Critical patent/CN117215477A/en
Publication of CN117215477A publication Critical patent/CN117215477A/en
Pending legal-status Critical Current

Links

Abstract

The present application relates to a data object storage method, apparatus, computer device, storage medium and computer program product. The method can be applied to cloud storage application scenes, big data application scenes and map data uploading application scenes, and comprises the following steps: creating a temporary file of a metadata level under the temporary directory; the method comprises the steps of uploading a data block of target data to a storage container in parallel to obtain index information of the target data; associating the index information of the target data with the temporary file to obtain a temporary data file; creating a first file directory under an object directory according to a file path corresponding to the target data; and moving the temporary data file to the position under the first file directory to obtain a target data file under the first file directory. By adopting the method, the stored data can be immediately read, and the strong consistency of the data is ensured.

Description

Data object storage method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of object storage technology, and in particular, to a data object storage method, apparatus, computer device, storage medium, and computer program product.
Background
In the context of large data, if a file storage mode is adopted when the data is stored, the data is likely to be uploaded and downloaded at a lower speed due to a larger data volume, so that the data storage efficiency is affected. Thus, an object storage scheme has emerged, in which data and metadata are split, and stored in the form of data objects, such as data objects stored in buckets. However, after the data object is stored in the storage bucket, the data read immediately is not necessarily the latest data, which may cause incomplete or inaccurate data reading, so that the application scenario of read-after-write cannot be satisfied.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data object storage method, apparatus, computer device, computer readable storage medium, and computer program product that can ensure that stored data can be read immediately, ensuring strong consistency of the data.
In a first aspect, the present application provides a data object storage method. The method comprises the following steps:
creating a temporary file of a metadata level under the temporary directory;
the method comprises the steps of uploading a data block of target data to a storage container in parallel to obtain index information of the target data;
Associating the index information of the target data with the temporary file to obtain a temporary data file;
creating a first file directory under an object directory according to a file path corresponding to the target data;
and moving the temporary data file to the position under the first file directory to obtain a target data file under the first file directory.
In a second aspect, the present application further provides a data object storage device. The device comprises:
the first creating module is used for creating a temporary file of a metadata level under the temporary directory;
the uploading module is used for uploading the data blocks of the target data to the storage container in parallel to obtain index information of the target data;
the association module is used for associating the index information of the target data with the temporary file to obtain a temporary data file;
the second creating module is used for creating a first file directory under the object directory according to the file path corresponding to the target data;
and the moving module is used for moving the temporary data file to the first file directory to obtain a target data file in the first file directory.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
Creating a temporary file of a metadata level under the temporary directory;
the method comprises the steps of uploading a data block of target data to a storage container in parallel to obtain index information of the target data;
associating the index information of the target data with the temporary file to obtain a temporary data file;
creating a first file directory under an object directory according to a file path corresponding to the target data;
and moving the temporary data file to the position under the first file directory to obtain a target data file under the first file directory.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
creating a temporary file of a metadata level under the temporary directory;
the method comprises the steps of uploading a data block of target data to a storage container in parallel to obtain index information of the target data;
associating the index information of the target data with the temporary file to obtain a temporary data file;
creating a first file directory under an object directory according to a file path corresponding to the target data;
and moving the temporary data file to the position under the first file directory to obtain a target data file under the first file directory.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:
creating a temporary file of a metadata level under the temporary directory;
the method comprises the steps of uploading a data block of target data to a storage container in parallel to obtain index information of the target data;
associating the index information of the target data with the temporary file to obtain a temporary data file;
creating a first file directory under an object directory according to a file path corresponding to the target data;
and moving the temporary data file to the position under the first file directory to obtain a target data file under the first file directory.
The above data object storage method, apparatus, computer device, storage medium and computer program product create a metadata-level temporary file under a temporary directory; the data blocks of the target data are transmitted to a storage container in parallel to be uploaded, and index information of the target data is obtained; and associating the index information of the target data with the temporary file to obtain the temporary data file, so that if an abnormality occurs in the uploading process, the data stays under the temporary directory. In addition, a first file directory is created under the object directory according to a file path corresponding to the target data; the temporary data file is moved to the first file directory to obtain the target data file under the first file directory, so that even if an abnormality occurs in the uploading process, the data stays under the temporary directory, the file data under the target directory is ensured to be complete, the requirement of strong consistency of the data is met, and the integrity and accuracy of the data can be effectively ensured when the data is read.
Drawings
FIG. 1 is an application environment diagram of a data object storage method in one embodiment;
FIG. 2 is a flow diagram of a method of data object storage in one embodiment;
FIG. 3 is a schematic diagram of a directory tree structure in one embodiment;
FIG. 4 is a diagram of a directory tree structure in another embodiment;
FIG. 5 is a schematic diagram of moving a temporary data file from a temporary directory to an object directory in one embodiment;
FIG. 6 is a schematic diagram of moving a temporary data file from a temporary directory to an object directory in another embodiment;
FIG. 7 is a flow chart illustrating a method for reading a target data file according to index information according to an embodiment;
FIG. 8 is a flow chart of a method of storing data objects in another embodiment;
FIG. 9 is a flow diagram of block upload in one embodiment;
FIG. 10 is a flow chart of block upload in another embodiment;
FIG. 11 is a flow diagram of a state migration of a target file in one embodiment;
FIG. 12 is a diagram of logical relationships between MPU files, part files, and blocks and data uploads in one embodiment;
FIG. 13 is a block diagram of a data object store in one embodiment;
fig. 14 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Before explaining the scheme of the application, the technology and technical terms related to the application are explained, and the technical terms are specifically as follows:
cloud storage (cloud storage) is a new concept that extends and develops in the concept of cloud computing, and a distributed cloud storage system (hereinafter referred to as a storage system for short) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of various types in a network to work cooperatively through application software or application interfaces through functions such as cluster application, grid technology, and a distributed storage file system, so as to provide data storage and service access functions for the outside.
At present, the storage method of the storage system is as follows: when creating logical volumes, each logical volume is allocated a physical storage space, which may be a disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as a data Identification (ID) and the like, the file system writes each object into a physical storage space of the logical volume, and the file system records storage position information of each object, so that when the client requests to access the data, the file system can enable the client to access the data according to the storage position information of each object.
The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided into stripes in advance according to the set of capacity measures for objects stored on a logical volume (which measures tend to have a large margin with respect to the capacity of the object actually to be stored) and redundant array of independent disks (RAID, redundant Array of Independent Disk), and a logical volume can be understood as a stripe, whereby physical storage space is allocated for the logical volume.
Cloud HDFS (Hadoop distributed file system), which may be a cloud distributed file system (hereinafter referred to as a distributed file system) that evolves HDFS, may provide a high-performance, strongly-consistent, directory-structured file system metadata service while utilizing low-cost and scalable object storage as data storage.
The target data may be various types of data obtained by logically inducing objective things, such as text, images, video, audio, and the like.
Metadata, which may refer to data describing data, is used to indicate storage locations, resource attributes, file records, and the like.
Data consistency refers to the fact that in a distributed file system, the values of data in multiple nodes are consistent.
The strong consistency is that the data after modification or writing can be obtained immediately after modification or writing.
The weak consistency means that after the data is modified or written, the data after the modification or writing is not completely guaranteed to be acquired immediately.
Final consistency: is a specific representation of weak consistency, and can finally acquire the modified or written data after the data is modified or written.
In one embodiment, the method for storing data objects provided in the embodiment of the present application may be applied to an application environment as shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 sends an uploading request to the server 104, and the server 104 creates a temporary file of metadata level under the temporary directory; the data blocks of the target data are transmitted to a storage container in parallel to be uploaded, and index information of the target data is obtained; associating the index information of the target data with the temporary file to obtain a temporary data file; creating a first file directory under the object directory according to a file path corresponding to the temporary data file; and moving the temporary data file to the first file directory to obtain the target data file under the first file directory.
The terminal 102 may run a client and a distributed file system webpage for reading and writing data objects, where the terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, an internet of things device, and a portable wearable device, and the internet of things device may be a smart speaker, a smart television, a smart air conditioner, an intelligent vehicle-mounted device, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like.
The server 104 may be an independent physical server, or may be a server cluster formed by a plurality of physical servers, and may be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
The terminal 102 and the server 104 may be connected by a communication connection manner such as bluetooth, USB (Universal Serial Bus ) or a network, which is not limited herein.
In one embodiment, as shown in fig. 2, a data object storage method is provided, and the method is applied to the server 104 in fig. 1 for illustration, and includes the following steps:
S202, creating a temporary file of the metadata level under the temporary directory.
The temporary directory may be a temporary directory under a root directory of a file system, and is used for temporarily storing metadata-level files. In a distributed file system, various levels of directories and corresponding files may be organized and managed in the form of index nodes (inodes) that are connected together to form a directory tree, such as a temporary directory tree. It should be noted that, in addition to the temporary file of the metadata set, the inode may store other metadata, such as the number of bytes of the file, the owner, the read-write and execution rights, and the file processing time stamp, including a modification time stamp (mtime) and a read time stamp (atime), where mtime refers to the time when the file content was last changed, and atime refers to the time when the file was last opened.
The temporary directory and temporary files under the temporary directory serve as a part of content stored by index nodes in the temporary directory tree. As shown in fig. 3, the two nodes of/Tmp and/Tmp/tmp_file may constitute a temporary directory tree, where/Tmp may represent a temporary directory and/Tmp/tmp_file represents a tmp_file under the/Tmp directory.
The metadata level may be a level of metadata, and metadata is attribute information for describing data.
The temporary file may refer to temporarily saved file information of a metadata level, and may be, for example, an identification for representing a file or data, such as at least one of a file type, an icon, and a file name. In the distributed file system, the temporary files may be organized and managed in the manner of index nodes, such as temporary file nodes belonging to child nodes under a temporary directory, as shown in fig. 3.
In one embodiment, prior to S202, the server receives a storage request carrying target data sent by the terminal, and then invokes a file creation interface through which a temporary file of metadata level is created under a temporary directory. For example, the server calls a CreateFile (pid, name) interface, and creates a temporary file (tmp_file) under the/Tmp directory using the parent node identification (pid) and node name (name) in the CreateFile (pid, name) interface, see FIG. 3.
In one embodiment, prior to S202, the server creates a temporary directory and an object directory under the file system root directory; the temporary directory is configured to be invisible to a user object uploading the target data, namely the temporary directory and a temporary file under the temporary directory are not visually displayed on an operation page of the distributed file system, so that the temporary directory is invisible to the user object; the object catalog is configured to be visible to the user object uploading the target data, i.e. the object catalog and various levels of subdirectories and corresponding files under the object catalog are visually displayed on the operation page of the distributed file system, and thus are visible to the user object.
Specifically, the server invokes a directory creation interface, based on which temporary directories and object directories are created under the root directory of the file system. For example, the server invokes a MkDir (pid, name) interface, based on which Tmp directories and User directories are created under the file system root directory, which Tmp directories and User directories can refer to fig. 3.
S204, the data blocks of the target data are transmitted to the storage container in parallel to be uploaded, and index information of the target data is obtained.
The target data may refer to various types of data that need to be uploaded, including text, images, video, audio, and the like. The data block may be a data set of small blocks in the target data. A storage container may refer to a container, such as a bucket, in an object store for storing data objects.
In one embodiment, the server segments the target data into at least two data blocks, and the segmented at least two data blocks are transmitted to the storage container to be uploaded, so as to obtain index information of the target data. For example, the server cuts the image sent by the terminal into blocks to obtain a plurality of image blocks; multiple image blocks are then written concurrently to the bucket.
Specifically, the server performs dicing processing on target data to obtain data blocks; uploading each data block to a storage container in parallel, and determining a block offset value corresponding to each data block in the uploading process; index information of the target data is determined based on the block offset value and the block size.
The block Offset value may be a read Offset value (Offset) of the data block, for example, an Offset value of a memory container of the first data block when the data block is read.
For example, assuming that there are 5 data blocks 1 to 5, each data block has a size of 1 megabit (M), the data blocks 1 to 5 are sequentially written into the memory bucket, and the index information of the target data is calculated according to the block identifier of the data block, the block size, and the read offset value of the data block, for example, the index information of the target data=the block identifier of the data block×the block size+the read offset value of the data block (i.e., the offset value of the first data block in the memory bucket).
In one embodiment, the server stores the block offset value and the block size in a data block list, and may store index information in the data block list to obtain the block offset value and the block size or the index information to read the corresponding target data file when the data needs to be read.
S206, associating the index information of the target data with the temporary file to obtain the temporary data file.
Wherein the temporary data file may be a data file containing target data and metadata, and the temporary data file may be in the form of a data object in a storage container.
In one embodiment, in the temporary directory tree, a temporary file node of the temporary directory tree is associated with index information of target data, so that association between the index information of the target data and the temporary file is realized, and a temporary data file is obtained. Wherein, since the temporary file is file information of metadata level, after the index information of the target data is associated with the temporary file, the metadata is associated with the target data, and a temporary data file containing the target data and the metadata is obtained.
In one embodiment, S206 may specifically include: when each data block of the target data is uploaded to the storage container concurrently, the server associates index information of the target data with the temporary file. When at least one data block of the target data is abnormal in the uploading process, stopping associating the index information of the target data with the temporary file; the temporary files created under the temporary directory are cleaned.
Because the mobile operation (such as Rename operation) in the application has atomicity, even if uploading fails, files can stay under the temporary directory, only files which are successfully uploaded can reach the object directory and are complete files, and the uploading failure files under the temporary directory are periodically cleaned by the cleaning module.
S208, creating a first file directory under the object directory according to the file path corresponding to the target data.
The file path may refer to a path of a user object when the user object performs data storage in the distributed file system, for example, the user object is to store a file under the directory of/dir 0/dir1, and at this time, the file path/dir 0/dir1/file may be obtained.
In one embodiment, after receiving a storage request sent by a terminal through a client or a web page of a distributed file system, a server analyzes a file path corresponding to target data from the storage request, and creates a first file directory under an object directory according to the file path. The first file directory may refer to a directory of files (abbreviated as file directory), and includes file directories of each level created under the object directory.
It should be noted that in the distributed file system, the directories of each level and the corresponding files are organized and managed in the manner of index nodes, and these index nodes (including the object directory node and the file directory node of each level) are connected together to form a directory tree, so that the object directory and the index nodes corresponding to the file directory of each level under the object directory can be combined into an object directory tree, and each directory and file under the object directory tree are visible to the user object. When the object directory tree is combined with the temporary directory tree, a large directory tree may be formed.
Specifically, the server generates a father node identifier and a node name according to the file path, then invokes a directory creation interface, takes the father node identifier and the node name as parameters in the directory creation interface, recursively creates file directory nodes under the object directory nodes by using the directory creation interface, namely creates a first-stage file directory node under the object directory nodes, then creates a second-stage file directory node under the first-stage file directory node, and the like, and completes the creation of all the file directory nodes.
For example, when a file path/dir 0/dir1/file is obtained, the server generates a pid and a node name based on the file path/dir 0/dir1/file, and as dir0 in the file path needs to be mounted on an object directory node, a parent node corresponding to dir0 is the object directory node, and the identifier of the object directory node is marked as 1, i.e. pid=1, and the node name is dir0; for dir1 in the file path, it needs to be mounted on the node dir0, so that pid=2 corresponding to dir1 can be obtained, and the node name is dir1. Upon obtaining the pid and node name, a MKDIR (pid, name) interface is invoked, based on which a file directory node is recursively created, see FIG. 4.
And S210, moving the temporary data file to the first file directory to obtain a target data file in the first file directory.
The target data file may refer to a data file visible to a user object located under the first file directory, and includes target data and corresponding metadata. The operation of moving the temporary data file has atomicity, so that no matter which step is abnormal, the file stays under the temporary directory, and only if the uploading is successful, the file can be moved under the first file directory of the object directory, thereby ensuring that the file moved under the first file directory of the object directory is complete. And the data reading operation is strong consistency reading, namely, the target data file can be immediately obtained after being written (namely, uploaded), so that the strong consistency of the data can be ensured.
In one embodiment, the server detects whether a target data file exists under the first file directory, and renames the temporary data file to obtain the target data file when the target data file does not exist under the first file directory; and moving the target data file to the first file directory to obtain the target data file under the first file directory.
Specifically, the server calls a renaming interface and transmits parameters to the renaming interface, wherein the parameters comprise a father node identifier and a node name corresponding to the temporary data file, and a target father node identifier and a target node name; renaming the temporary data file based on the father node identifier and the node name corresponding to the temporary data file and the target father node identifier and the target node name, thereby obtaining a target data file; and moving the target data file to the first file directory to obtain the target data file under the first file directory.
For example, the server calls a Rename (source_pid, source_name, destination_pid, destination_name) interface, obtains the parent node name (source_pid) and the node name (source_name), and the target parent node name (destination_pid) and the target node name (destination_name) corresponding to the temporary data file, renames the temporary data file based on the source_pid, source_ name, destination _pid, and destination_name, thereby modifying the name of the temporary data file from tmp_file to the target data file (file), and moves the file to under the directory User/dir0/dir1, as shown in fig. 5.
In one embodiment, the server detects whether a data file below a target version corresponding to the temporary data file exists in the first file directory, and when the data file below the target version corresponding to the temporary data file exists in the first file directory, the server updates the data file below the first file directory based on the temporary data file to obtain the target data file below the first file directory. When a data file which corresponds to the target data file and is higher than the target version exists in the first file directory, the target data file is refused to be moved to the first file directory; wherein the target version is a version corresponding to the temporary data file.
For example, as shown in fig. 6, if tmp_file with version number (version) of 99 is to be moved, the server detects whether file with version number lower than 99 exists in the directory of User/dir0/dir1, and because file version number of 100 in the directory of User/dir0/dir1, file with version number lower than 99 does not exist, tmp_file with version 99 is not renamed and moved to the directory of User/dir0/dir 1. If tmp_file with version number 101 is to be moved, the server detects whether the file with version number (version) lower than 101 exists in the directory of/User/dir 0/dir1, and because the file version number in the directory of/User/dir 0/dir1 is 100, the file with version number lower than 101 exists, and at this time, the tmp_file with version 101 needs to be renamed and moved to the directory of/User/dir 0/dir 1.
In the above embodiment, the temporary file of the metadata level is created under the temporary directory; the data blocks of the target data are transmitted to a storage container in parallel to be uploaded, and index information of the target data is obtained; and associating the index information of the target data with the temporary file to obtain the temporary data file, so that if an abnormality occurs in the uploading process, the data stays under the temporary directory. In addition, a first file directory is created under the object directory according to a file path corresponding to the target data; and moving the temporary data file to the first file directory to obtain a target data file under the first file directory, so that if an abnormality occurs in the uploading process, the data stays under the temporary directory, thereby ensuring that the file data under the target directory is complete, and the data integrity and accuracy can be effectively ensured when the data is read because the individual interface operation of the cloud HDFS meets the requirement of strong data consistency.
In one embodiment, after S210, the method further comprises:
s702, receiving a read request for a target data file.
In one embodiment, the server receives a read request for a target data file sent by a client or receives the read request sent by a distributed file system web page.
S704, in response to the read request of the target data file, reading the block size and the block offset value of the data block in the target data file.
The block Offset value may be a read Offset value (Offset) of a data block, for example, an Offset value of a memory container for a first data block when the data block is read.
In one embodiment, the server, upon receiving a read request sent by a client or a distributed file system web page, reads the block size and block offset value of a data block in the target data file from the data block list.
S706, determining index information of the target data file according to the block size and the block offset value of the data block.
In one embodiment, for the case where the number of data blocks is greater than or equal to two, the server may determine a block identification, a block size, and a block offset value for each data block in the target data file, and then determine index information of the target data file based on the block identification, the block size, and the block offset value for the data block. For example, assuming that there are 5 data blocks 1 to 5, each data block has a size of 1 megabit, the data blocks 1 to 5 are sequentially written into the memory bucket, and index information of the target data file is calculated according to the block identifier of the data block, the block size, and the block offset value of the data block, for example, index information of the target data file=block identifier of the data block×block size+block offset value.
S708, reading the target data file in the storage container according to the index information of the target data file.
In one embodiment, the server reads the target data file from a storage container of the object storage device in accordance with index information of the target data file.
For a clearer understanding of the solution of the foregoing embodiment, the explanation is herein made with reference to fig. 8, specifically as follows:
the servers may include a data server for managing metadata and a metadata server for reading and writing data. As shown in fig. 8, when a user object is to read a video file for video playing, sending a read request for the video file to a data server deployed with a distributed file system through a client, generating an information acquisition request by the data server after receiving the read request, and sending the information acquisition request to a metadata server; the metadata server responds to the information acquisition request to acquire the block size and the block offset value of each data block of the video file, and then returns the block size and the block offset value of the data block to the data server; the data server calculates index information of the video file based on the block identification, the block size and the block offset value of the data block, and then sends the index information and the read request to the object storage device, or encapsulates the index information in the read request, and then sends the read request to the object storage device; the object storage device reads the corresponding video file from the storage barrel according to the index information and returns the video file to the data server; after receiving the video file, the data server returns the video file to the client, so that the client plays the video.
In one embodiment, as shown in fig. 9, a block uploading manner may be adopted for large data, and the specific steps include:
s902, creating a temporary block file of a metadata level under the temporary directory.
The temporary block file may refer to a temporary multi-segment uploaded (Multi Part Upload, MPU) file, among others. In a distributed file system, temporary directories and corresponding temporary block files may be organized and managed in the form of index nodes that are connected together to form a temporary directory tree.
In one embodiment, prior to S902, the server receives a storage request carrying media data sent by the terminal, and then invokes a file creation interface through which a metadata-level temporary block file is created under a temporary directory. For example, the server calls a CreateFile (pid, name) interface, and creates a temporary block file (tmp_file) under the/Tmp directory using the parent node identification and node name in the CreateFile (pid, name) interface, see FIG. 10.
S904, generating an uploading identifier, and associating the uploading identifier and a file path of the media data with the temporary block file to obtain an associated file.
The upload identifier (upload_id) may be an identifier that is used to uniquely represent the temporary block file during the upload process. The media data includes multimedia data having a large data volume, such as a video file of 1G or more. The file path of the media data may refer to a path of the user object when storing data in the distributed file system, for example, the user object is to store file under the directory of/test, and the file path/test/file may be obtained.
For example, the server generates a globally unique upload_id in addition to creating a temporary MPU file using the CreateFile (pid) interface, and then associates the upload_id with the file path of the media data to the temporary MPU file.
And S906, when the media data is uploaded to other storage containers in a clip file mode, associating each clip file with an associated file to obtain a temporary block data file.
The clip file (part file) may be a secondary file obtained by dividing the media data into at least two pieces.
In one embodiment, the server invokes at least two interfaces for uploading the part file, uploads the part file based on the invoked interfaces, and associates each fragment file with an associated file when media data is uploaded to other storage containers in a fragment file manner, so as to obtain a temporary block data file.
In addition, when media data is concurrently uploaded to other storage containers in the form of clip files, the server creates a clip directory (Part directory) under the file system root directory for managing temporary block data files.
In one embodiment, S906 may specifically include: when all the fragment files of the media data are uploaded to the storage container concurrently, each fragment file of the server is associated with the associated file. When at least one fragment file of the media data is abnormal in the uploading process, the association of each fragment file with the associated file is stopped, and the temporary block files created under the temporary directory are cleaned.
Because the mobile operation in the application has atomicity, even if uploading fails, the file stays under the temporary directory, only the successfully uploaded file can reach the object directory and is a complete file, and the uploading failure files under the temporary directory are periodically cleaned by the cleaning module.
In one embodiment, when media data is uploaded to other storage containers concurrently in the form of clip files, the server determines a first file offset value for each clip file; a fragment directory for storing each fragment file is newly added under a root directory of a file system; a clip file table for storing the first file offset value is newly added.
S908, creating a second file directory under the object directory according to the file path corresponding to the temporary block data file.
In one embodiment, after receiving a storage request sent by a terminal through a client or a web page of a distributed file system, the server analyzes a file path corresponding to media data from the storage request, and creates a second file directory under an object directory according to the file path. Wherein the second file directory may include file directories of various levels created under the object directory.
It should be noted that in the distributed file system, the directories of each level and the corresponding files are organized and managed in the manner of index nodes, and these index nodes (including the object directory node and the file directory node of each level) are connected together to form a directory tree, so that the object directory and the index nodes corresponding to the file directory of each level under the object directory can be combined into an object directory tree, and each directory and file under the object directory tree are visible to the user object. When the object directory tree, temporary directory tree, and fragment directory tree are combined, a large directory tree may be formed.
S910, the temporary block data file is moved to the second file directory, and the target block data file in the second file directory is obtained.
Wherein the target chunk data file may refer to the targeted media data stored in the distributed file system chunk.
In one embodiment, S910 may specifically include: and the server moves the temporary block data file corresponding to the target fragment file to the second file directory, wherein the target fragment file belongs to at least one file in each fragment file, and the target block data file under the second file directory is obtained. In addition, the server determines a file offset value corresponding to the target block data file based on the first file offset value of the target fragment file.
In one embodiment, following S910, the target tile file may be read, including the specific steps of: the method comprises the steps that a server receives a read request for a target block data file; responding to a reading request of a target block data file, reading a first file offset value from a fragment file table, and determining a second file offset value corresponding to the fragment file; determining index information of the target block data file according to the first file offset value and the second file offset value; and reading the target block data files in other storage containers according to the index information of the target block data files.
In an embodiment, the step of determining the second file shift value corresponding to the clip file may specifically include: the server obtains the block size and the block offset value of each data block in the fragment file; and determining a second file shift value corresponding to the fragment file based on the block size and the block shift value of each data block in the fragment file.
For a clear and intuitive understanding of the above scheme, the following is specifically described with reference to fig. 10:
(1) A multi-segment upload initialization (InitMPU) phase.
The InitMPU stage corresponds to S902 to S904 described above. The server calls a createFile (pid, name) interface to create a temporary MPU file, such as tmp_file_0 in FIG. 10, wherein tmp_file_0 is hung under a temporary directory/Tmp, so that index node identifiers corresponding to file system root directory,/User,/Part,/Tmp and/Tmp/tmp_file_0 are respectively 1-5, and an index node (inode) table can be obtained at the moment; a unique upload_id=xxxxx is also generated and then associated with the temporary MPU file along with a file path (path=/test/file). For example, a MPU table alone is used to associate the upload_id and file path of the temporary MPU file; in addition, a Part table is used to associate Part files of the MPU.
(2) A segment file upload (UploadPart) stage.
In the UploadPart stage, the server creates a subdirectory storing the Part file under the Part directory, as shown in fig. 10, creates a subdirectory named xxxxx under the/Part directory, and stores the Part file under the xxxxx subdirectory, as shown in fig. 10, storing file_1, file_2 and file_3, and storing the corresponding Part table and inode table, as shown in fig. 10.
(3) The block upload (completmpu) stage is completed.
The CompleteMPU stage corresponds to S910 described above. The User object can select Part files in the Part files to move, for example, select file_2 and file_3 in the Part files to move to/User/test directory, and select file_1 not to move to/User/test directory, so that Part of Part files can be selectively uploaded to a distributed file system to be stored, and file with the size of 7M in/User/test directory is obtained.
In the above embodiment, for uploading big data, a block uploading mode may be adopted for uploading, so that the data uploading efficiency may be effectively improved, and the uploading time is reduced. In addition, the user object can selectively upload the interested fragments, so that the uploading flexibility is improved.
In one embodiment, the method further comprises: the server calls an object supplementing interface or an object cutting interface of the distributed file system; uploading augmentation data for the target data file on the storage container or performing truncation processing on the target data file in the storage container based on the object augmentation interface or the object truncation interface; alternatively, the augmentation data for the target block data file is uploaded in the other storage container or the target block data file is truncated in the other storage container based on the object augmentation interface or the object truncation interface.
The supplemental data may be added to the target data file or the target block data file, for example, for an editable data file, the data file may be supplemented with data, for example, a word file may be supplemented with a text content, and the supplemental data is the supplemental data.
In addition, the truncation process may refer to cropping the target data file or the target block data file into at least two segments, such as truncating the uploaded video or audio (e.g., music), to obtain at least two segments of truncated files.
It should be noted that both the augmentation operation and the truncation operation meet the requirement of a strong consistency of the data, i.e. the object data file or the object block data file after the augmentation or truncation process can be read immediately to the file content obtained after the augmentation or truncation process.
For example, the server calls the appadobject interface to append data to the editable file, or calls the TruncateObject interface to truncate the target data file.
In the above embodiment, the object supplementing interface is adopted to supplement data to the target data file or the target data file respectively, so that the supplement of data can be realized directly in the distributed file system, and the complete target data file or the target data file is not required to be downloaded and re-uploaded, thereby reducing the complexity of data supplement and improving the data processing efficiency. In addition, the object truncation interface is adopted to respectively truncate the target data file or the target data file, and the data can be truncated directly in the distributed file system without downloading and re-uploading the complete target data file or the target data file, so that the complexity of data supplementation is reduced, and the data processing efficiency is improved.
In one embodiment, as shown in fig. 11, the method further comprises:
s1102, searching a target file meeting the state transition condition based on the object directory tree.
Wherein the object directory tree is a directory tree formed based on the object directory and the file directory. The target file may be a file in which the state timing migration task is to perform task management operations, including a target data file and other uploaded data files. The state timing migration task comprises one of an archiving task, a deleting task and a backheating task, wherein the archiving task is used for migrating the target file from an initial state to an archiving state, such as from a standard state to the archiving state; the deletion task is used for transferring the target file from an initial state to a deletion state, such as from an archiving state to a deletion state; the regeneration state is used to migrate data from an initial state to a regeneration state, such as from an archive state to a regeneration state.
The state transition condition may refer to that a last modified time stamp (mtime) and a read time stamp (atime) of the target file exceed a preset time threshold, and if the last modified time stamp of the target file exceeds one week or one month, the target file satisfies the state transition condition.
In one embodiment, responsive to a state timing migration task, a target file path is obtained, and an object directory tree is scanned based on the target file path to obtain candidate files; and determining target files meeting the state transition conditions in the candidate files.
S1104, marking the target file as an intermediate state matched with the state transition condition.
The intermediate state is a state between target states to which the target file is to be migrated from the initial state. For example, if the state migration condition is an archive migration condition, the target state matching the archive migration condition is an archive state, and the intermediate state is an archive state; if the state transition condition is the deletion transition condition, the target state matched with the deletion transition condition is the deletion state, and the intermediate state is the deletion state; if the state transition condition is a regenerative transition condition, the target state matched with the regenerative transition condition is a regenerative state, and the intermediate state is a regenerative state.
Specifically, after finding a target file that satisfies the state transition condition, the server determines an intermediate state that matches the state transition condition, and marks the state of the target file as the determined intermediate state. The process of marking the state of the target file as the determined intermediate state may be to acquire an initial state of the target file and modify the acquired initial state into the determined intermediate state.
S1106, after the marking is completed, a data list for each data block in the target file is obtained.
The data list may be a data block list or a data object list. The object file may contain at least one data block, each corresponding to at least one data segment, each data segment may be a data object. That is, a target file contains at least one data object.
Specifically, after completing marking the target files as intermediate states matched with the state transition conditions, the server obtains a data object list of the data objects in each target file respectively. For example, when the target file is a file that can be subjected to archiving processing, a data object list corresponding to each file that can be subjected to archiving processing is acquired.
S1108, data processing is performed on the data blocks in the data list.
The data processing specifically includes storage location migration processing for each data object, and specifically includes at least one of archiving processing, deleting processing and backheating processing. The archiving process refers to migrating the data object from the initial storage position to the target storage position for storage; the heat regeneration treatment is to create a data object copy of the data object, store the data object copy to a target storage position, and the storage time of the data object copy is consistent with the storage time designated by a heat regeneration task; the delete process refers to the removal of the data object from the initial storage location.
S1110, after finishing data processing, the target file is migrated from the intermediate state to the target state.
The target state is a state to be reached by a data state migration target of the state timing migration task, and the target state comprises at least one of the following: an archiving state, a deleting state and a backheating state.
Specifically, after completing data processing on each data object in the data object list of the target file, the server determines a target state corresponding to the completed data processing, and modifies the state of the target file from an intermediate state to a target state.
In one embodiment, S1110 specifically includes: and after at least one of the file processing, the deleted processing and the backheating processing is completed, updating the intermediate state into the target state in the index node corresponding to the target file.
According to the data state migration method, the server searches the target file meeting the state migration conditions based on the object directory tree, marks the target file as an intermediate state matched with the state migration conditions, and after the marking is completed, the data object list of each data object in the target file is obtained, and the data objects in the data object list are subjected to data processing, so that the synchronous processing of the data objects contained in the target file can be ensured, and after all the data objects are subjected to data processing, the target file is migrated from the intermediate state to the target state, so that the synchronous proceeding of the data state migration of the data objects of the target file is realized, and the correctness of the target file after the data state migration is ensured.
In one embodiment, the method further comprises: the method comprises the steps that a server receives a deleting request aiming at a target file directory under an object directory; responding to the deleting request, and calling a catalog deleting interface of the distributed file system; and deleting the target file directory under the object directory based on the directory deletion interface.
The deleting operation meets the requirement of strong data consistency, namely, after the deleting operation is carried out on the target data file or the target block data file, the target data file or the target block data file can be immediately deleted, the directory needing to be deleted does not need to be searched one by one, and then the directory is deleted one by one, so that the directory deleting efficiency is improved.
Taking cloud HDFS as an example for illustration, first, the existing interface capability and S3 interface to be implemented in cloud HDFS are introduced, then how to complete the logic of the normal upload based on the existing interface, then the core design of the MPU block upload is introduced, and finally some new features supported by cloud HDFS as the underlying storage that are helpful for big data scenarios and outside the S3 standard are introduced.
(1) Cloud HDFS and S3
The cloud HDFS organizes all metadata information of a file system by taking an inode as an entity, and write operations depend on a high-performance database, so that memory is accelerated in read operations. The inode contains attributes of inode_ id, pid, name, file_mode and the like, wherein the inode_id uniquely identifies the current inode, the pid represents the inode_id of the parent node, the name represents the inode name, and the file_mode represents whether the inode is a file or a directory, so that a complete directory tree can be constructed through all inode entries. Meanwhile, when the metadata path searching is performed, target inode information can be searched recursively according to file path generation (pid, name), so that an upper file system interface is realized. As a truly distributed file system, the interfaces supported are as follows:
1. Metadata write operations
● CreateFile (pid, name): a file is created.
● MkDir (pid, name): a directory is created.
● Deletenode (pid, name): the file or directory is deleted.
● Rename (source_pid, source_name, destination_pid, destination_name): renaming a file or directory.
● CommitFileBlocks (inode_id, file_block_array): the data block information of the file is modified (file length is unchanged or becomes larger).
● Trunk_id, target_size): truncating the file (the file length becomes smaller).
2. Metadata read operations
● GetInode (path): file or directory attributes are obtained.
● ListChildInodes (pid): and obtaining child nodes under the directory.
● GetFileBlocks (inode_id): and acquiring the data block information of the file.
It should be noted that the write operations described above are atomic and the read operations are all strongly consistent reads.
Although the S3 interfaces are rich, in a big data scene, only the following data read-write interfaces with several cores are needed to be realized, and the method is specifically as follows:
3. common uploading
● PutObject (key, object): uploading objects (keys can be understood as file path).
● GetObject (key): an object is acquired.
● DeleteObject (key): the object is deleted.
● Copy object (source_key), destination_key & DeleteObject (source_key): the Rename operation is simulated.
● HeadObject (key): the object metadata is viewed.
● Getblock (key_prefix): i.e., listcobjects, get part (the specified prefix) or all of the objects in the bucket.
4. Block upload
● InitMPU (key): a block upload event is initiated.
● Uplink part (key, uplink_id, part_number, part): and uploading the data to the object in a block mode according to the specified block uploading event.
● CompleteMPU (key, upload_id, final_part_number_array): after all the blocks are uploaded, the entire block upload event (which may be Complete Part of the portion) is completed.
● List parts (key, upload_id): the uploaded blocks in the specified block upload event are obtained.
● ListMPUs: an ongoing block upload event within the bucket is obtained.
(2) Common uploading
A common upload of strong consistency requires that the file PutObject specifying the path be visible immediately after success, either GetObject, headObject or listdobject.
The application refers to a Rename mechanism of big data operation, which divides a file system root directory into a temporary Tmp directory and a User directory, wherein the User directory is visible to a User, and the PutObject is divided into 5 steps to finish: createFile- > dices and uploads data to COS- > CommitFileBlocks- > recursive MKDIR- > Rename concurrently, as shown in FIG. 5.
Because the Rename operation has atomicity, no matter which step fails, the file stays under the Tmp directory, and only the successfully uploaded file can reach the User directory, so that the complete file is uploaded, and the failed files under the Tmp directory are periodically cleaned by a cleaning module.
In addition to the problem of data integrity, the problem of concurrent uploading needs to be considered. For PutObject, the concurrent uploading of files in the same path is successful, and the final file depends on the sequence defined by the object storage bottom layer, and may depend on a physical clock or a logic clock in a certain step, but the integrity of data must be ensured. The application is based on the fact that the Rename newly adds the realization of Rename overlay (source_index, destination_destination_name) and introduces the concept of version number, similar to a logical clock, which is always self-increasing, generated when creating a temporary file, can be put into metadata of the file, and then compared with the version number when the Rename. If the condition that the target file already exists and the version number is smaller than the temporary file is satisfied, the overlay is performed, and if not, the overlay is not performed, as shown in fig. 6. Wherein, the rename overtwrite is also atomic, and plays a key role in MPU blocking uploading logic.
In addition to the PutObject, other interfaces may be implemented through the cloud HDFS interface in the common upload interface, as shown in table 1:
TABLE 1
(3) Block upload
To accelerate the uploading of large files, cloud HDFS also has the concept of data chunking, but the chunking length is fixed, while the chunking of S3 is variable in length, and the length of each chunk may not be equal. For example, assuming that a file of 11MB is transferred long, cloud HDFS splits the file of 11MB into block 0 of 4MB, block 1 of 4MB, and block 2 of 3 MB; and S3 may be a 5MB partition 1, a 5MB partition 2, and a 1MB partition 3, or may be a 5MB partition 1 and a 6MB partition 2. This variable length S3 block upload cannot be implemented with existing file system block sizes, so the present application adopts a new design, i.e., using a variable length file as an index, as shown in FIG. 12.
Next, the block upload and the normal upload are compared, and the difference between them is shown in the following table 2:
TABLE 2
In addition, the present application makes a finer design on MPU partition upload, as shown in Table 3 below:
TABLE 3 Table 3
It is emphasized that the extra logic of CreateFile or RenameOverwrite needs to be atomic with itself, and the underlying implementation is guaranteed by the database transaction.
The MPU file length calculation and how MPU file data is read is described next in connection with FIG. 10, as follows:
the application uses an MPU table to associate the upload_id and the file path of the MPU file in the InitMPU stage, and uses a Part table to associate the Part file of the MPU in the UploadPart stage, and completes the final calculation of the file length in the CompleteMPU stage, as shown in FIG. 10.
As can be seen from fig. 10, at the InitMPU stage, the MPU file length defaults to 0; in the UpladPart stage, the MPU file length is kept to be 0, and the MPU file offset corresponding to Part defaults to 0; in the CompleteMPU stage, the MPU file length is calculated to be 7M, and valid Part (Part 2,3 is reserved) and updating MPU file offset corresponding to Part are reserved.
For data reading, first, the file type, normal or MPU, is distinguished, and then different data indexes are acquired. If the file is of Normal type, the Block table is directly read and returned, and if the file is of MPU type, the Part table is read first, and then the Block table is read and returned. The calculation formula of the target data index is as follows:
normal file read Offset (i.e., index information of Normal file) =block_id of Block table×block size+block read Offset;
MPU file read Offset (i.e., index information of MPU file) =part table file_offset+part file read Offset;
Part file read Offset = Block table block_id x Block size + Block read Offset;
because the parts are independent of each other, the database concurrency advantage can be utilized to accelerate the process in the MPU file length calculation and file data reading process.
(3) Novel characteristics
Implementing S3 with cloud HDFS as the underlying storage, supports some new features that are very useful for big data scenarios, such as:
1. supporting files Apend and trunk, S3 has no AppendObject and trunk Object interface, when simulating files Append and trunk, the complete files need to be downloaded locally and then uploaded, which is time-consuming, and because cloud HDFS supports random writing, appendObject and trunk Object are easy to realize.
2. The method supports file atime and mtime updating, S3 is only mtime, is updated when being uploaded or covered for the first time, is not the concept of atime and mtime in a file system, and cloud HDFS is a file system in the true sense, and the atime and mtime updating is accurate and has reference value.
3. The directory is deleted instantaneously, and the list needs to be deleted first by S3 to delete all objects prefixed by the directory path, then the objects are deleted individually, and the cloud HDFS deletes the directory millisecond level and deletes the objects instantaneously.
(4) By the embodiment of the application, the following beneficial effects can be produced:
The application can solve the problem of final consistency of object storage, satisfies the application scene of read-after-write, and has other advantages, and is specifically as follows:
1. functional advantage: support data strong consistency, possess complete directory hierarchy, support files application and trunk, and support files atime and mtime updates.
2. Performance advantage: support atomic Rename operations on the millisecond level, support directory transient delete, and support directory high frequency list.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a data object storage device for realizing the above related data object storage method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the data object storage device or devices provided below may be referred to the limitation of the data object storage method hereinabove, and will not be described herein.
In one embodiment, as shown in FIG. 13, there is provided a data object storage device comprising: a first creation module 1302, an upload module 1304, an association module 1306, a second creation module 1308, and a movement module 1310, wherein:
a first creating module 1302 for creating a temporary file of a metadata level under a temporary directory;
an uploading module 1304, configured to upload the data block of the target data to the storage container, to obtain index information of the target data;
an association module 1306, configured to associate the index information of the target data with the temporary file, to obtain a temporary data file;
a second creating module 1308, configured to create a first file directory under the object directory according to the file path corresponding to the target data;
And the moving module 1310 is configured to move the temporary data file to the first file directory, so as to obtain the target data file in the first file directory.
In one embodiment, the apparatus further comprises:
the new adding module is used for creating a temporary directory and an object directory under the root directory of the file system; wherein the temporary directory is configured to be invisible to user objects uploading the target data, and the object directory is configured to be visible to user objects uploading the target data.
In one embodiment, the uploading module is further configured to perform dicing processing on the target data to obtain a data block; uploading each data block to a storage container in parallel, and determining a block offset value corresponding to each data block in the uploading process; index information of the target data is determined based on the block offset value and the block size.
In one embodiment, the apparatus further comprises:
the association module is used for associating the index information of the target data with the temporary file when all the data blocks of the target data are uploaded to the storage container concurrently.
The stopping module is used for stopping associating the index information of the target data with the temporary file when at least one data block of the target data is abnormal in the uploading process;
And the cleaning module is used for cleaning the temporary files created under the temporary directory.
In one embodiment, the mobile module is further configured to rename the temporary data file to obtain the target data file when the target data file does not exist under the first file directory; and moving the target data file to the first file directory to obtain the target data file under the first file directory.
In one embodiment, the apparatus further comprises:
the updating module is used for updating the data file under the first file directory based on the temporary data file to obtain the target data file under the first file directory when the data file which corresponds to the temporary data file and is lower than the target version exists under the first file directory;
the stopping module is used for refusing to move the target data file to the first file directory when the data file which corresponds to the target data file and is higher than the target version exists in the first file directory; wherein the target version is a version corresponding to the temporary data file.
In one embodiment, the apparatus further comprises:
the receiving module is used for receiving a reading request aiming at the target data file;
the reading module is used for responding to the reading request of the target data file and reading the block size and the block offset value of the data block in the target data file;
The determining module is used for determining index information of the target data file according to the block size and the block offset value of the data block; and reading the target data file in the storage container according to the index information of the target data file.
In the above embodiment, the temporary file of the metadata level is created under the temporary directory; the data blocks of the target data are transmitted to a storage container in parallel to be uploaded, and index information of the target data is obtained; and associating the index information of the target data with the temporary file to obtain the temporary data file, so that if an abnormality occurs in the uploading process, the data stays under the temporary directory. In addition, a first file directory is created under the object directory according to a file path corresponding to the target data; and moving the temporary data file to the first file directory to obtain a target data file under the first file directory, so that if an abnormality occurs in the uploading process, the data stays under the temporary directory, thereby ensuring that the file data under the target directory is complete, and the interface operation of the cloud HDFS meets the requirement of strong data consistency, so that the integrity and the accuracy of the data can be effectively ensured when the data is read.
In one embodiment, the apparatus further comprises:
The creation module is used for creating a temporary block file of a metadata level under the temporary directory;
the generation module is used for generating an uploading identifier, and associating the uploading identifier with a file path of the media data with the temporary block file to obtain an associated file;
the association module is also used for associating each fragment file with the associated file to obtain a temporary block data file when the media data is uploaded to other storage containers in a fragment file mode;
the second creating module is further used for creating a second file directory under the object directory according to the file path corresponding to the temporary block data file;
and the moving module is also used for moving the temporary block data file to the second file directory to obtain the target block data file under the second file directory.
In one embodiment, the apparatus further comprises:
the determining module is used for determining a first file offset value of each fragment file when the media data is uploaded to other storage containers in a fragment file mode;
the new adding module is used for adding a fragment catalog for storing each fragment file under the root catalog of the file system; a clip file table for storing the first file offset value is newly added.
In one embodiment, the moving module is further configured to move the temporary block data file corresponding to the target segment file to a second file directory, where the target segment file belongs to at least one file in the segment files;
and the determining module is used for determining the file offset value corresponding to the target block data file based on the first file offset value of the target fragment file.
In one embodiment, the apparatus further comprises:
the receiving module is used for receiving a reading request aiming at the target block data file;
the reading module is used for responding to a reading request of the target block data file, reading a first file offset value from the fragment file table and determining a second file offset value corresponding to the fragment file;
the determining module is used for determining index information of the target block data file according to the first file offset value and the second file offset value;
and the reading module is also used for reading the target block data files in other storage containers according to the index information of the target block data files.
In one embodiment, the determining module is further configured to obtain a block size and a block offset value of each data block in the clip file; and determining a second file shift value corresponding to the fragment file based on the block size and the block shift value of each data block in the fragment file.
In the above embodiment, for uploading big data, a block uploading mode may be adopted for uploading, so that the data uploading efficiency may be effectively improved, and the uploading time is reduced. In addition, the user object can selectively upload the interested fragments, so that the uploading flexibility is improved.
In one embodiment, the apparatus further comprises:
a calling module for calling an object supplementing interface or an object intercepting interface of the distributed file system;
a processing module for uploading the augmentation data for the target data file on the basis of the object augmentation interface or the object truncation interface, or performing truncation processing on the target data file in the storage container; alternatively, the augmentation data for the target block data file is uploaded in the other storage container or the target block data file is truncated in the other storage container based on the object augmentation interface or the object truncation interface.
In the above embodiment, the object supplementing interface is adopted to supplement data to the target data file or the target data file respectively, so that the supplement of data can be realized directly in the distributed file system, and the complete target data file or the target data file is not required to be downloaded and re-uploaded, thereby reducing the complexity of data supplement and improving the data processing efficiency. In addition, the object truncation interface is adopted to respectively truncate the target data file or the target data file, and the data can be truncated directly in the distributed file system without downloading and re-uploading the complete target data file or the target data file, so that the complexity of data supplementation is reduced, and the data processing efficiency is improved.
In one embodiment, the apparatus further comprises:
the searching module is used for searching the target file meeting the state migration condition based on the object directory tree; the object directory tree is a directory tree formed based on an object directory and a file directory, wherein the file directory comprises a first file directory or a second file directory, and the target file comprises a target data file;
the marking module is used for marking the target file as an intermediate state matched with the state transition condition;
the acquisition module is used for acquiring a data list aiming at each data block in the target file after marking is completed;
the cleaning module is used for carrying out data processing on the data blocks in the data list;
and the migration module is used for migrating the target file from the intermediate state to the target state after the data processing is completed.
According to the data state migration method, the server searches the target file meeting the state migration conditions based on the object directory tree, marks the target file as an intermediate state matched with the state migration conditions, and after the marking is completed, the data object list of each data object in the target file is obtained, and the data objects in the data object list are subjected to data processing, so that the synchronous processing of the data objects contained in the target file can be ensured, and after all the data objects are subjected to data processing, the target file is migrated from the intermediate state to the target state, so that the synchronous proceeding of the data state migration of the data objects of the target file is realized, and the correctness of the target file after the data state migration is ensured.
In one embodiment, the apparatus further comprises:
the receiving module is used for receiving a deleting request aiming at a target file directory under the target directory;
the calling module is used for responding to the deleting request and calling a catalog deleting interface of the distributed file system;
and the deleting module is used for deleting the target file directory under the object directory based on the directory deleting interface.
The deleting operation meets the requirement of strong data consistency, namely, after the deleting operation is carried out on the target data file or the target block data file, the target data file or the target block data file can be immediately deleted, the directory needing to be deleted does not need to be searched one by one, and then the directory is deleted one by one, so that the directory deleting efficiency is improved.
The various modules in the data object storage device described above may be implemented in whole or in part in software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 14. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing data objects and data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data object storage method.
It will be appreciated by those skilled in the art that the structure shown in fig. 14 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements are applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory having a computer program stored therein and a processor implementing the steps of the data object storage method described above when the computer program is executed.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, implements the steps of the data object storage method described above.
In an embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, implements the steps of the data object storage method described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (19)

1. A method of storing a data object, the method comprising:
creating a temporary file of a metadata level under the temporary directory;
the method comprises the steps of uploading a data block of target data to a storage container in parallel to obtain index information of the target data;
associating the index information of the target data with the temporary file to obtain a temporary data file;
creating a first file directory under an object directory according to a file path corresponding to the target data;
And moving the temporary data file to the position under the first file directory to obtain a target data file under the first file directory.
2. The method of claim 1, wherein prior to creating the metadata-level temporary file under the temporary directory, the method further comprises:
newly building the temporary directory and the object directory under a root directory of a file system;
wherein the temporary directory is configured to be invisible to a user object uploading the target data, and the object directory is configured to be visible to a user object uploading the target data.
3. The method of claim 1, wherein the concurrently uploading the data blocks of the target data to the storage container, obtaining the index information of the target data comprises:
cutting the target data into blocks to obtain data blocks;
uploading each data block to a storage container, and determining a block offset value corresponding to each data block in the uploading process;
index information of the target data is determined based on the block offset value and the block size.
4. The method of claim 1, wherein the associating the index information of the target data with the temporary file comprises:
When all the data blocks of the target data are uploaded to the storage container concurrently, associating the index information of the target data with the temporary file;
the method further comprises the steps of: when at least one data block of the target data is abnormal in the uploading process, stopping associating the index information of the target data with the temporary file;
and cleaning the temporary files created under the temporary directory.
5. The method of claim 1, wherein moving the temporary data file under the first file directory to obtain the target data file under the first file directory comprises:
when the target data file does not exist under the first file directory, renaming the temporary data file to obtain the target data file;
and moving the target data file to the first file directory to obtain the target data file under the first file directory.
6. The method of claim 5, wherein the method further comprises:
when a data file which is lower than a target version and corresponds to the temporary data file exists in the first file catalog, updating the data file in the first file catalog based on the temporary data file to obtain the target data file in the first file catalog;
When a data file which corresponds to the target data file and is higher than the target version exists in the first file directory, the target data file is refused to be moved to the first file directory;
wherein the target version is a version corresponding to the temporary data file.
7. The method of claim 1, wherein after the moving the temporary data file to the first file directory, the method further comprises:
receiving a read request for the target data file;
responding to a reading request of the target data file, and reading the block size and the block offset value of a data block in the target data file;
determining index information of the target data file according to the block size and the block offset value of the data block;
and reading the target data file in the storage container according to the index information of the target data file.
8. The method according to claim 1, wherein the method further comprises:
creating a temporary block file of a metadata level under the temporary directory;
generating an uploading identifier, and associating the uploading identifier with the file path of the media data with the temporary block file to obtain an associated file;
When the media data is uploaded to other storage containers in a clip file mode, associating each clip file with the associated file to obtain a temporary block data file;
creating a second file directory under the object directory according to a file path corresponding to the temporary block data file;
and moving the temporary block data file to the second file directory to obtain a target block data file under the second file directory.
9. The method of claim 8, wherein the method further comprises:
when the media data is uploaded to other storage containers in a clip file mode, determining a first file offset value of each clip file;
newly adding a fragment directory for storing each fragment file under a root directory of a file system;
and a fragment file table for storing the first file offset value is newly added.
10. The method of claim 9, wherein said moving the temporary block data file under the second file directory comprises:
moving a temporary block data file corresponding to a target fragment file to the second file directory, wherein the target fragment file belongs to at least one file in the fragment files;
The method further comprises the steps of: and determining a file offset value corresponding to the target block data file based on the first file offset value of the target fragment file.
11. The method of claim 8, wherein after the moving the temporary block data file under the second file directory, the method further comprises:
receiving a read request for the target block data file;
responding to the reading request of the target block data file, reading a first file offset value from the fragment file table, and determining a second file offset value corresponding to the fragment file;
determining index information of the target block data file according to the first file offset value and the second file offset value;
and reading the target block data files in the other storage containers according to the index information of the target block data files.
12. The method of claim 11, wherein determining the second file offset value corresponding to the clip file comprises:
obtaining the block size and the block offset value of each data block in the fragment file;
and determining a second file shift value corresponding to the fragment file based on the block size and the block offset value of each data block in the fragment file.
13. The method according to any one of claims 1 to 12, further comprising:
an object supplementing interface or an object intercepting interface of the distributed file system is called;
uploading augmentation data for the target data file in the storage container or performing a truncation process on the target data file in the storage container based on the object augmentation interface or the object truncation interface; or,
based on the object augmentation interface or the object truncation interface, uploading augmentation data for the target block data file in other storage containers or performing truncation processing on the target block data file in the other storage containers.
14. The method according to any one of claims 1 to 12, further comprising:
searching a target file meeting the state migration condition based on the object directory tree; the object directory tree is a directory tree formed based on the object directory and a file directory, the file directory comprises the first file directory or the second file directory, and the target file comprises a target data file;
marking the target file as an intermediate state matched with the state transition condition;
After marking is completed, a data list aiming at each data block in the target file is obtained;
carrying out data processing on the data blocks in the data list;
and after finishing data processing, migrating the target file from the intermediate state to a target state.
15. The method according to any one of claims 1 to 12, further comprising:
receiving a deleting request aiming at a target file directory under the object directory;
responding to the deleting request, and calling a catalog deleting interface of the distributed file system;
and deleting the target file directory under the object directory based on the directory deletion interface.
16. A data object storage device, the device comprising:
the first creating module is used for creating a temporary file of a metadata level under the temporary directory;
the uploading module is used for uploading the data blocks of the target data to the storage container in parallel to obtain index information of the target data;
the association module is used for associating the index information of the target data with the temporary file to obtain a temporary data file;
the second creating module is used for creating a first file directory under the object directory according to the file path corresponding to the target data;
And the moving module is used for moving the temporary data file to the first file directory to obtain a target data file in the first file directory.
17. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 15 when the computer program is executed.
18. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 15.
19. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 15.
CN202210621695.XA 2022-06-02 2022-06-02 Data object storage method, device, computer equipment and storage medium Pending CN117215477A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210621695.XA CN117215477A (en) 2022-06-02 2022-06-02 Data object storage method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210621695.XA CN117215477A (en) 2022-06-02 2022-06-02 Data object storage method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117215477A true CN117215477A (en) 2023-12-12

Family

ID=89041239

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210621695.XA Pending CN117215477A (en) 2022-06-02 2022-06-02 Data object storage method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117215477A (en)

Similar Documents

Publication Publication Date Title
US10346363B2 (en) Deduplicated file system
US9146930B2 (en) Method and apparatus for file storage
US9830324B2 (en) Content based organization of file systems
US11321192B2 (en) Restoration of specified content from an archive
CN102184211B (en) File system, and method and device for retrieving, writing, modifying or deleting file
US9547706B2 (en) Using colocation hints to facilitate accessing a distributed data storage system
CN103282899B (en) The storage method of data, access method and device in file system
CN103595797B (en) Caching method for distributed storage system
GB2439578A (en) Virtual file system with links between data streams
CN111090618B (en) Data reading method, system and equipment
WO2008001094A1 (en) Data processing
US20220083504A1 (en) Managing snapshotting of a dataset using an ordered set of b+ trees
US20220188267A1 (en) Embedded reference counts for file clones
US20160139980A1 (en) Erasure-coding extents in an append-only storage system
US20230394010A1 (en) File system metadata deduplication
US8612717B2 (en) Storage system
CN114610680A (en) Method, device and equipment for managing metadata of distributed file system and storage medium
CN104516945A (en) Hadoop distributed file system metadata storage method based on relational data base
CN117215477A (en) Data object storage method, device, computer equipment and storage medium
US8886656B2 (en) Data processing
US20170242882A1 (en) An overlay stream of objects
CN114491111B (en) Distributed metadata system for picture storage
US8290993B2 (en) Data processing
WO2023138788A1 (en) Method of backing up file-system onto object storgae system and data management module
CN116737659A (en) Metadata management method for file system, terminal device and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination