CN113220211A - Data storage system, data access method and related device - Google Patents

Data storage system, data access method and related device Download PDF

Info

Publication number
CN113220211A
CN113220211A CN202010071340.9A CN202010071340A CN113220211A CN 113220211 A CN113220211 A CN 113220211A CN 202010071340 A CN202010071340 A CN 202010071340A CN 113220211 A CN113220211 A CN 113220211A
Authority
CN
China
Prior art keywords
storage object
data
layer
cache
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010071340.9A
Other languages
Chinese (zh)
Inventor
王立鹏
叶松高
颜深根
成林峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sensetime Intelligent Technology Co Ltd
Original Assignee
Shanghai Sensetime Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sensetime Intelligent Technology Co Ltd filed Critical Shanghai Sensetime Intelligent Technology Co Ltd
Priority to CN202010071340.9A priority Critical patent/CN113220211A/en
Publication of CN113220211A publication Critical patent/CN113220211A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data storage system, a data access method and a related device, wherein the system comprises a data layer, a cache data layer and a cache metadata layer, wherein the cache data layer is used for caching at least one storage object of the data layer; the cache metadata layer is used for storing metadata information of the cache data layer, wherein the reading rate of the cache metadata layer is higher than that of the cache data layer, and the data reading rate can be improved.

Description

Data storage system, data access method and related device
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data storage system, a data access method, and a related apparatus.
Background
In artificial neural network training, many storage objects are usually used as data sets. The storage objects are typically stored in persistent storage media and read out for use when needed. In the current scheme, when a storage object is read, the storage object is usually directly searched and read from a cache data layer, and metadata and the storage object of the storage object are both extracted from the cache data layer in a direct reading mode, so that the storage object cannot be quickly read when the storage object is read.
Disclosure of Invention
The embodiment of the application provides a data storage system, a data access method and a related device, which can improve the speed of data reading.
A first aspect of embodiments of the present application provides a data storage system comprising a data tier, a cache data tier, and a cache metadata tier, wherein,
the cache data layer is used for caching at least one storage object of the data layer;
the cache metadata layer is used for storing metadata information of the cache data layer, wherein the reading rate of the cache metadata layer is higher than that of the cache data layer.
In the embodiment of the application, when the storage object is read, the metadata information of the storage object is quickly acquired through the cache metadata layer, and the corresponding storage object is acquired from the cache layer according to the metadata information, so that compared with the prior art, the metadata information of the storage object and the storage object is acquired through the cache data layer, and the acquisition efficiency of the storage object can be improved because the reading rate of the cache metadata layer is higher than that of the cache data layer.
With reference to the first aspect, in a possible implementation manner, the cache metadata layer is further configured to store metadata information of the storage object.
With reference to the first aspect, in one possible implementation manner, the storage object includes a data set, and the data set includes at least one data block, and each data block includes at least one file.
With reference to the first aspect, in one possible implementation, the method further includes:
the service layer is used for acquiring a first storage object from the data layer according to a data access request from a client;
the cache data layer is also used for storing the first storage object acquired by the service layer from the data layer;
the cache metadata layer is further configured to store metadata information corresponding to the first storage object and/or update metadata information of the cache data layer.
With reference to the first aspect, in one possible implementation, the system further includes:
and the distributed lock is used for locking the metadata information stored in the cache metadata layer and/or locking the storage object of the cache data layer.
With reference to the first aspect, in a possible implementation manner, the service layer is further configured to acquire a first distributed lock of the first storage object, and store the first storage object in the cache data layer after the first distributed lock is successfully acquired; and/or
The service layer is further configured to acquire a second distributed lock of the metadata information corresponding to the first storage object, and store the metadata information of the first storage object in the cache metadata layer after the second distributed lock is successfully acquired.
With reference to the first aspect, in one possible implementation, the metadata information of the storage object is stored in a key-value manner, where a key in a first key-value includes identification information of a first data block, a value in the first key-value includes identification information of a data set to which the first data block belongs, a key in a second key-value includes file name encoding information of a file, and a value in the second key-value includes information of a data block to which the file belongs.
With reference to the first aspect, in one possible implementation, the data layer includes a storage cluster including at least one hard disk drive.
With reference to the first aspect, in one possible implementation, the cache data layer includes a storage cluster including at least one solid state drive.
A second aspect of an embodiment of the present application provides a data access method, where the method includes:
receiving a data access request sent by a client, wherein the data access request is used for requesting to access a first storage object;
according to the data access request, obtaining metadata information which is stored in a cache metadata layer and corresponds to the first storage object;
and acquiring at least one part of the first storage object from a cache data layer based on the metadata information of the first storage object, wherein the reading rate of the cache metadata layer is higher than that of the cache data layer.
With reference to the second aspect, in one possible implementation, the first storage object includes a data set, the data set includes at least one data block, and each data block includes at least one file.
With reference to the second aspect, in a possible implementation manner, the acquiring at least a portion of the first storage object from a cache data layer based on metadata information of the first storage object includes:
acquiring identification information of the first storage object in the metadata information, and if the identification information is successfully acquired, acquiring the identification information of a data set to which the first storage object belongs according to the identification information;
and acquiring at least one part of the first storage object from a cache data layer according to the identification information of the data set to which the first storage object belongs.
With reference to the second aspect, in a possible implementation manner, the acquiring at least a portion of the first storage object from a cache data layer based on metadata information of the first storage object includes:
acquiring file name coding information of the first storage object in the metadata information, and acquiring information of a file block to which the first storage object belongs according to the file name coding information if the file name coding information is successfully acquired;
and acquiring at least one part of the first storage object from a cache data layer according to the information of the file block to which the first storage object belongs.
With reference to the second aspect, in one possible implementation, the method further includes:
acquiring a first distributed lock of the first storage object, and storing the first storage object to the cache data layer after the first distributed lock is successfully acquired; and/or
Acquiring a second distributed lock of metadata information corresponding to the first storage object, and storing the metadata information of the first storage object in the cache metadata layer after the second distributed lock is successfully acquired
With reference to the second aspect, in one possible implementation manner, the cache data layer includes a storage cluster, and the storage cluster includes at least one solid state drive.
A third aspect of the embodiments of the present application provides an electronic device, including a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the step instructions in the second aspect of the embodiments of the present application.
A fourth aspect of embodiments of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes a computer to perform some or all of the steps as described in the second aspect of embodiments of the present application.
A fifth aspect of embodiments of the present application provides a computer program product, wherein the computer program product comprises a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps as described in the second aspect of embodiments of the present application. The computer program product may be a software installation package.
These and other aspects of the present application will be more readily apparent from the following description of the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1A is a schematic structural diagram of a data storage system according to an embodiment of the present application;
FIG. 1B is a schematic diagram of an alternative data storage system according to an embodiment of the present application;
FIG. 1C is a schematic diagram of an alternative data storage system according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of another data storage system according to an embodiment of the present application;
fig. 3 is a schematic diagram of a cache data replacement method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a distributed lock provided by an embodiment of the present application;
fig. 5 is a schematic flowchart of a data reading method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Referring to fig. 1A, fig. 1A is a schematic structural diagram of a data storage system according to an embodiment of the present disclosure. As shown in fig. 1A, the data storage system includes: a data layer, a cache data layer, and a cache metadata layer, wherein,
the data layer is used for storing all storage objects in the storage system;
the cache data layer is used for caching at least one storage object of the data layer;
and the cache metadata layer is used for storing metadata information of the cache data layer, wherein the reading rate of the cache metadata layer is higher than that of the cache data layer.
When the cache data layer caches at least one storage object of the data layer, after the storage object is sent to the client, the sent storage object is cached to obtain the at least one storage object. The storage object may be training data of an artificial neural network.
The storage object may be a file block (SuperBlock) or a data set (DataSet), where the file block may be a file block formed by aggregating a plurality of small files, the data set is used to divide data, files in different data sets are not affected and correlated with each other, the data set includes at least one data block, and each data block includes at least one file.
The metadata information includes state information, the state information may include state information of the system and state information of the data, the state information of the data may include read time, storage location, etc. of the data, the state information of the system may include size of free memory of the cache data layer, memory space information, etc
In one possible embodiment, the cache metadata layer is also used to cache metadata information of the storage object. When the metadata information of the storage object is cached, the method is mainly used for firstly acquiring the metadata information of the storage object when the storage object is read, and then acquiring the storage object from the caching data layer according to the metadata information of the storage object.
In one possible embodiment, as shown in fig. 1B, the data storage system further includes a service layer, the service layer is configured to obtain the first storage object from the data layer according to a data access request from the client;
the cache data layer is also used for storing a first storage object acquired from the data layer by the service layer;
the cache metadata layer is further used for storing metadata information corresponding to the first storage object and/or updating metadata information of the cache data layer.
The data access request sent by the client carries information of the first storage object, and the information is used for identifying the first storage object.
The cache data layer may directly cache the acquired first storage object, or may extract the first storage object from the data layer and store the first storage object. The service layer may include a plurality of servers
When the cache data layer caches the first storage object, one possible method for the cache data layer to cache the first storage object is as follows:
if the free memory of the cache data layer is larger than or equal to the size of the first storage object, caching the first storage object;
and if the free memory of the cache data layer is smaller than the size of the first storage object, caching the first storage object after deleting the files in the cache data layer according to the time sequence of file storage in the cache data layer.
Free memory is understood to be memory that can be used for data storage.
When the free memory is larger than or equal to the size of the first storage object, the first storage object may be directly cached, and when the free memory is smaller than the size of the first storage object, more free memories need to be cleared to cache the first storage object.
The method for deleting the files in the cache data layer according to the time sequence of the file storage in the cache data layer may be: the deletion is carried out in a mode of starting to delete the file with the longest storage time, the storage time of the file is related to the access time of the file, the longer the access time of the file is, the longer the storage time of the file is, and the closer the access time of the file is, the shorter the storage time of the file is. The access time is the time when the file is accessed and is not understood to be the time consumed in accessing the file.
In one possible embodiment, the method for deleting the files in the cached data layer according to the time sequence of the storage of the files in the cached data layer may be:
and deleting the files with the longest storage time until the free memory of the cache data layer is larger than or equal to the size of the first storage object.
In one possible embodiment, the service layer obtains the first storage object from the cache data layer according to a data access request from the client.
In one possible embodiment, as shown in FIG. 1C, the data storage system further comprises a distributed lock for locking metadata information stored by the cache metadata layer and/or locking storage objects of the cache data layer.
When the metadata information in the cache metadata layer is locked, the metadata information can be locked in a key-value mode, that is, when the metadata information is accessed, the key of the metadata information needs to be acquired first, and when the metadata information is acquired successfully, the metadata information can be acquired.
The distributed lock may be a distributed lock implemented based on a strongly consistent distributed database, Etcd. When the distributed lock locks the metadata information, the distributed lock specifically ensures that the actual states of the metadata information stored in the cache metadata layer and the data stored in the cache data layer are consistent, and the like.
In a possible embodiment, the service layer is further configured to acquire a first distributed lock of the first storage object, and store the first storage object to the cache data layer after the first distributed lock is successfully acquired; and/or
And the service layer is also used for acquiring a second distributed lock of the metadata information corresponding to the first storage object, and storing the metadata information of the first storage object to the cache metadata layer after the second distributed lock is successfully acquired.
When the service layer caches the first storage object to the cache data layer, the service layer needs to cache the first storage object to the cache data layer after the first distributed lock of the first storage object is successfully acquired.
In one possible embodiment, the metadata information of the storage object is stored in a key-value manner, wherein a key in the first key-value comprises identification information of a first data block, a value in the first key-value comprises identification information of a data set to which the first data block belongs, a key in the second key-value comprises file name encoding information of a file, and a value in the second key-value comprises information of a data block to which the file belongs.
The identification information of a data set is used to uniquely identify the data set.
In one possible embodiment, the data layer comprises at least one storage cluster of hard disk drives.
In one possible embodiment, the cached data layer comprises a storage cluster comprising at least one solid state drive.
In one possible embodiment, the service layer acquires a first storage object from the cache data layer according to the file reading request;
and if the first storage object does not exist in the cache data layer, the service layer acquires the first storage object from the data layer.
In this example, the first storage object is first obtained from the cache data layer, and if the first storage object does not exist in the cache data layer, the first storage object is obtained from the data layer.
In artificial intelligence training, there is often a need for large numbers of small data file storage and reading. In order to meet the storage and reading requirements of massive files during training, a plurality of distributed object storage service clusters Petrel-OSS are used for storing the files, and a Hard Disk Drive (HDD) is used as a bottom storage medium. However, when reading a file, each time a data storage cluster using an HDD as a storage medium is accessed, there are problems of slow speed, low performance, and the like. In order to improve the system performance, the small files are aggregated into a large file for network transmission and storage. The concept of large file blocks (SuperBlock) and data sets (DataSet). Multiple small files are aggregated into one large file block for storage and reading. The data set is similar to the disk partition, the data set is used for dividing data, and files in different data sets are not affected and are not related to each other. However, although the Petrel-OSS cluster using the HDD as the underlying storage medium can meet the storage requirement of large capacity and low price, the fast file reading performance cannot be provided due to the limitation of the read-write performance of the HDD.
The existing method is generally to add a layer of distributed cache system to improve the file reading performance. The existing distributed cache system usually adopts Redis or Memcached, but the existing distributed cache system runs in a memory and has two defects, 1) the memory is expensive and is difficult to construct into a large-capacity cluster; 2) the data is stored in the memory and can not be persisted. Therefore, the solid state drive SSD is used as a storage medium, a layer of high-performance, large-capacity and durable distributed cache system is built based on Petrel-OSS, and file reading efficiency is improved. As shown in fig. 2, an embodiment of the present application provides a schematic structural diagram of another data storage system, where the data storage system includes: the system comprises a service layer, a data layer, a cache metadata layer and a distributed lock, wherein the service layer can comprise a plurality of servers, a client can read a storage object or a file between the servers in a Remote Procedure Call (RPC) mode, and the read storage object or file can be training data of an artificial neural network;
the service layer is used for acquiring a first storage object from the data layer or the cache data layer according to a data access request sent by the client;
the data layer is used for storing all storage objects in the storage system;
the cache data layer is used for caching at least one storage object of the data layer, and the at least one storage object can be understood as part or all of the storage objects in the data layer;
the cache metadata layer is used for storing metadata information of the cache data layer, wherein the reading rate of the cache metadata layer is higher than that of the cache data layer;
and the distributed lock is used for locking the metadata information stored in the cache metadata layer and/or locking the storage object of the cache data layer.
The service layer is a cluster formed by a plurality of proxy servers, receives RPC requests of clients from outside, and reads files of the internal call data layer and the cache data layer. The data layer is composed of a Petrel-OSS cluster composed of a plurality of HDDs and stores all files required during training. The cache data layer consists of a Petrel-OSS cluster consisting of a plurality of SSDs and stores files accessed recently during training. The cache metadata layer is a TiKV database (distributed KV storage), and stores information (system state and data state) such as free space (storage location of data) of the cache data layer. The distributed lock is realized based on the Etcd and is used for ensuring the consistency of a service layer to a cache data layer and a cache metadata layer.
The client sends a file reading request to a SenseAgentServer of the service layer through RPC calling, and the SenseAgentServer is connected with the back-end data storage cluster to read the file. Because a plurality of servers exist, service registration and discovery are carried out by adopting the Etcd, and meanwhile, in order to ensure the consistency of the data access of the cache system by the plurality of servers at the same time, access control is carried out by adopting a distributed lock. The back-end data storage cluster consists of three parts, namely a cache data layer, a cache metadata layer and a data layer. The cache metadata is a TiKV database, and the data layer is a Petrel-OSS cluster consisting of a plurality of HDDs. Generally speaking, the read-write speed of the cache data layer needs to be better than the read-write speed of the data layer.
When the free space information of the cache system stored in the TiKV database in the cache metadata layer is stored, the cache layer comprises a plurality of Petrel-OSS clusters formed by SSD, and the free space of each Petrel-OSS cluster is stored in the TiKV. When new data needs to be cached in the cache system, the size of the free space of the cache cluster can be quickly obtained by inquiring the TiKV. When the space is enough, directly caching the data; and when the space is not enough, deleting the data release space with the longest storage time. The free space information of the cache system stored in the TiKV database is shown in the following table:
Figure BDA0002377380150000091
wherein, Key is a Key, Value is a Value, CacheName is a storage object name, and FreeScace is a free space.
The cache metadata information stored in the TiKV database in the cache metadata layer can cache large file blocks and also can cache data sets. When a large file block is cached, the superBlockId can be quickly found through the filename by using the TiKV database, and the file of the whole large file block is cached. When the data set is cached, the TiKV database can be inquired, and the mapping from the file name to the superBlockId and the mapping from the superBlockId to the data set can be quickly found, so that the whole data set is cached. The following table shows the metadata information stored in the TiKV database.
Figure BDA0002377380150000101
Wherein, Key is a Key and Value is a Value.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating a cache data replacement method according to an embodiment of the present disclosure. As shown in fig. 3, when the service layer queries the first file in the cache data layer, the first file is read from the cache data layer, as shown in step S301 in the figure; when the service layer does not inquire the first file in the cache data layer, reading the first file from the data layer, as shown in step S302, the cache data layer caches the first file, as shown in step S303; when the first file is stored in the cache data layer, if the free memory is smaller than the size of the first file, determining the storage time of the data in the cache data layer according to the sorting operation ZSET, deleting the data according to the storage time, and then caching the first file, i.e., updating and eliminating the complete data, as shown in step S304 in the figure. The ZSET may be a ZSET in Redis, and is used for performing ordered storage on data, where the stored data includes data accessed in a data layer.
Referring to fig. 4, fig. 4 is a schematic diagram of a distributed lock according to an embodiment of the present application. As shown in fig. 4, the distributed lock is an Etcd-based distributed lock, and the distributed lock is used for performing access control on a large file and a data set and updating information on memory information. The number of locks is merely illustrative and may be any number.
When the service layer reads files in the data layer or the cache data layer, a plurality of access operations are carried out on the data and the metadata information of the cache system. Before caching data, judging whether the free space of a cache system is enough; when caching data, the data is guaranteed not to be cached repeatedly; after the data is cached, the free space of the cache system needs to be modified. Since a plurality of servers in the service layer may access the same cache system and the TiKV database at the same time, inconsistent states such as repeated caching of data or repeated modification of free space of the cache system may occur.
The design of this application has a plurality of servers to receive concurrent RPC client request, adopts distributed lock to carry out the assurance of data consistency. FIG. 4 illustrates a design architecture diagram of a distributed lock for a caching system. The distributed lock of the cache system is realized based on Etcd, which is a high-availability Key-Value database, and the strong consistency of data is ensured by the Raft algorithm in the Etcd. When the service layer needs to access the cache layer, the service layer firstly acquires a key lock with a key value from the Etcd cluster, and if the key lock is successfully acquired, the service layer executes modification operation; if the lock acquisition fails, the service end waits for the release of the lock with the key value as the key, and the service layer executes modification operation after the release. For example, when the free space of the cache system stored in the TiKV is modified, the service layer needs to apply for the Etcd to obtain a lock whose key is updatespace info, and if the key is successfully obtained, the modification operation is performed; if the acquisition fails, the operation is carried out after the lock with the key of updateSpaceInfo is released. If the cache data is superBlock when the cache data is modified, the service layer first needs to acquire a lock whose key is superBlock id from the Etcd, where the superBlock id includes superBlock1 and superBlock2, and then operates after success. If the cached data is the dataSet, the service layer first needs to acquire a lock with a key of the dataSetName to the Etcd, and the dataSetName includes the dataSet1 and the dataSet2, and then the operation is performed after the key is successful.
Referring to fig. 5, fig. 5 is a schematic flow chart of a data access method according to an example of the present application. As shown in fig. 5, the data access method includes steps 501-50, which are as follows:
501. and receiving a data access request sent by a client, wherein the data access request is used for requesting to access the first storage object.
502. And according to the data access request, obtaining metadata information which is stored in the cache metadata layer and corresponds to the first storage object.
503. And acquiring at least one part of the first storage object from a cache data layer based on the metadata information of the first storage object, wherein the reading rate of the cache metadata layer is higher than that of the cache data layer.
The data access request sent by the client carries information of the first storage object, and the information is used for identifying the first storage object.
In one possible embodiment, the first storage object comprises a data set containing at least one data block, each data block containing at least one file.
Based on the metadata information of the first storage object, acquiring at least one part of the first storage object from the cache data layer, comprising:
acquiring identification information of a first storage object in the metadata information, and if the identification information is successfully acquired, acquiring the identification information of a data set to which the first storage object belongs according to the identification information;
and acquiring at least one part of the first storage object from the cache data layer according to the identification information of the data set to which the first storage object belongs.
In one possible embodiment, obtaining at least a portion of the first storage object from the cache data layer based on the metadata information of the first storage object includes:
acquiring file name coding information of a first storage object in the metadata information, and acquiring information of a file block to which the first storage object belongs according to the file name coding information if the file name coding information is successfully acquired;
and acquiring at least one part of the first storage object from the buffer data layer according to the information of the file block to which the first storage object belongs.
In one possible embodiment, the method further comprises:
acquiring a first distributed lock of a first storage object, and storing the first storage object to a cache data layer after the first distributed lock is successfully acquired; and/or
And acquiring a second distributed lock of the metadata information corresponding to the first storage object, and storing the metadata information of the first storage object to a cache metadata layer after the second distributed lock is successfully acquired.
In one possible embodiment, the cache data layer includes a storage cluster including at least one solid state drive.
For the sake of convenience, corresponding terms in this embodiment, such as key-value, metadata information, etc., are not described in detail, and the storage and acquisition method of the first storage object, etc., are described in detail, specifically referring to the corresponding contents in the foregoing embodiments.
In accordance with the foregoing embodiments, please refer to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and as shown in the figure, the electronic device includes a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, where the memory is used to store a computer program, the computer program includes program instructions, and the processor is configured to call the program instructions, and the program includes instructions for performing the following steps;
receiving a data access request sent by a client, wherein the data access request is used for requesting to access a first storage object;
according to the data access request, obtaining metadata information which is stored in a cache metadata layer and corresponds to a first storage object;
and acquiring at least one part of the first storage object from a cache data layer based on the metadata information of the first storage object, wherein the reading rate of the cache metadata layer is higher than that of the cache data layer.
Embodiments of the present application also provide a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any one of the data access methods as described in the above method embodiments.
Embodiments of the present application also provide a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, and the computer program causes a computer to execute part or all of the steps of any one of the data access methods as described in the above method embodiments.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.
The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a read-only memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and the like.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash memory disks, read-only memory, random access memory, magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (17)

1. A data storage system, comprising a data tier, a cache data tier, and a cache metadata tier, wherein,
the cache data layer is used for caching at least one storage object of the data layer;
the cache metadata layer is used for storing metadata information of the cache data layer, wherein the reading rate of the cache metadata layer is higher than that of the cache data layer.
2. The system of claim 1, wherein the cache metadata layer is further configured to store metadata information of the storage object.
3. The system of claim 2, wherein the storage object comprises a data set, the data set comprising at least one data block, each data block comprising at least one file.
4. The system of any one of claims 1 to 3, further comprising:
the service layer is used for acquiring a first storage object from the data layer according to a data access request from a client;
the cache data layer is also used for storing the first storage object acquired by the service layer from the data layer;
the cache metadata layer is further configured to store metadata information corresponding to the first storage object and/or update metadata information of the cache data layer.
5. The system according to any one of claims 1 to 4, further comprising:
and the distributed lock is used for locking the metadata information stored in the cache metadata layer and/or locking the storage object of the cache data layer.
6. The system of claim 4, wherein the service layer is further configured to acquire a first distributed lock of the first storage object and store the first storage object to the cache data layer after the first distributed lock is successfully acquired; and/or
The service layer is further configured to acquire a second distributed lock of the metadata information corresponding to the first storage object, and store the metadata information of the first storage object in the cache metadata layer after the second distributed lock is successfully acquired.
7. The system according to any one of claims 1 to 6, wherein the metadata information of the storage object is stored in a key-value manner, wherein a key of a first key-value includes identification information of a first data block, a value of the first key-value includes identification information of a data set to which the first data block belongs, a key of a second key-value includes file name encoding information of a file, and a value of the second key-value includes information of a data block to which the file belongs.
8. The system of any of claims 1-7, wherein the data plane comprises a storage cluster comprising at least one hard disk drive.
9. The system of any of claims 1-8, wherein the cached data layer comprises a storage cluster comprising at least one solid state drive.
10. A method of data access, the method comprising:
receiving a data access request sent by a client, wherein the data access request is used for requesting to access a first storage object;
according to the data access request, obtaining metadata information which is stored in a cache metadata layer and corresponds to the first storage object;
and acquiring at least one part of the first storage object from a cache data layer based on the metadata information of the first storage object, wherein the reading rate of the cache metadata layer is higher than that of the cache data layer.
11. The method of claim 10, wherein the first storage object comprises a data set, wherein the data set comprises at least one data block, and wherein each data block comprises at least one file.
12. The method of claim 11, wherein the retrieving at least a portion of the first storage object from a cache data layer based on metadata information of the first storage object comprises:
acquiring identification information of the first storage object in the metadata information, and if the identification information is successfully acquired, acquiring the identification information of a data set to which the first storage object belongs according to the identification information;
and acquiring at least one part of the first storage object from a cache data layer according to the identification information of the data set to which the first storage object belongs.
13. The method of claim 11, wherein the retrieving at least a portion of the first storage object from a cache data layer based on metadata information of the first storage object comprises:
acquiring file name coding information of the first storage object in the metadata information, and acquiring information of a file block to which the first storage object belongs according to the file name coding information if the file name coding information is successfully acquired;
and acquiring at least one part of the first storage object from a cache data layer according to the information of the file block to which the first storage object belongs.
14. The method according to any one of claims 10 to 13, further comprising:
acquiring a first distributed lock of the first storage object, and storing the first storage object to the cache data layer after the first distributed lock is successfully acquired; and/or
And acquiring a second distributed lock of the metadata information corresponding to the first storage object, and storing the metadata information of the first storage object to the cache metadata layer after the second distributed lock is successfully acquired.
15. The method of any of claims 10 to 14, wherein the cached data layer comprises a storage cluster comprising at least one solid state drive.
16. An electronic device comprising a processor and a memory, wherein the memory is configured to store a computer program comprising program instructions, and wherein the processor is configured to invoke the program instructions to perform the method of any of claims 10-15.
17. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 10-15.
CN202010071340.9A 2020-01-21 2020-01-21 Data storage system, data access method and related device Pending CN113220211A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010071340.9A CN113220211A (en) 2020-01-21 2020-01-21 Data storage system, data access method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010071340.9A CN113220211A (en) 2020-01-21 2020-01-21 Data storage system, data access method and related device

Publications (1)

Publication Number Publication Date
CN113220211A true CN113220211A (en) 2021-08-06

Family

ID=77085522

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010071340.9A Pending CN113220211A (en) 2020-01-21 2020-01-21 Data storage system, data access method and related device

Country Status (1)

Country Link
CN (1) CN113220211A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114428589A (en) * 2022-01-04 2022-05-03 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298641A (en) * 2011-09-14 2011-12-28 清华大学 Method for uniformly storing files and structured data based on key value bank
CN104580437A (en) * 2014-12-30 2015-04-29 创新科存储技术(深圳)有限公司 Cloud storage client and high-efficiency data access method thereof
CN106021381A (en) * 2016-05-11 2016-10-12 北京搜狐新媒体信息技术有限公司 Data access/storage method and device for cloud storage service system
CN108984130A (en) * 2018-07-25 2018-12-11 广东浪潮大数据研究有限公司 A kind of the caching read method and its device of distributed storage
CN110457265A (en) * 2019-08-20 2019-11-15 上海商汤智能科技有限公司 Data processing method, device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298641A (en) * 2011-09-14 2011-12-28 清华大学 Method for uniformly storing files and structured data based on key value bank
CN104580437A (en) * 2014-12-30 2015-04-29 创新科存储技术(深圳)有限公司 Cloud storage client and high-efficiency data access method thereof
CN106021381A (en) * 2016-05-11 2016-10-12 北京搜狐新媒体信息技术有限公司 Data access/storage method and device for cloud storage service system
CN108984130A (en) * 2018-07-25 2018-12-11 广东浪潮大数据研究有限公司 A kind of the caching read method and its device of distributed storage
CN110457265A (en) * 2019-08-20 2019-11-15 上海商汤智能科技有限公司 Data processing method, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114428589A (en) * 2022-01-04 2022-05-03 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium
CN114428589B (en) * 2022-01-04 2024-05-28 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11010300B2 (en) Optimized record lookups
CN109254733B (en) Method, device and system for storing data
US11093454B2 (en) Speeding deduplication using a most wanted digest cache
US9779023B1 (en) Storing inline-compressed data in segments of contiguous physical blocks
US8219562B1 (en) Efficient storage and retrieval for large number of data objects
US20190235759A1 (en) Unaligned IO Cache for Inline Compression Optimization
US9535630B1 (en) Leveraging array operations at virtualized storage processor level
US9727479B1 (en) Compressing portions of a buffer cache using an LRU queue
US9807168B2 (en) Distributed shared log for modern storage servers
CN107046812A (en) A kind of data save method and device
US9430492B1 (en) Efficient scavenging of data and metadata file system blocks
US11372576B2 (en) Data processing apparatus, non-transitory computer-readable storage medium, and data processing method
CN109766318B (en) File reading method and device
US11947826B2 (en) Method for accelerating image storing and retrieving differential latency storage devices based on access rates
CN114610708A (en) Vector data processing method and device, electronic equipment and storage medium
CN116467275A (en) Shared remote storage method, apparatus, system, electronic device and storage medium
CN113220211A (en) Data storage system, data access method and related device
US8818970B2 (en) Partitioning a directory while accessing the directory
CN112667847A (en) Data caching method, data caching device and electronic equipment
CN115576947A (en) Data management method and device, combined library, electronic equipment and storage medium
CN114625695A (en) Data processing method and device
US11943294B1 (en) Storage medium and compression for object stores
US10970221B2 (en) Optimizing space utilization by retaining metadata in cache
CN113127717A (en) Key retrieval method and system
KR100785774B1 (en) Obeject based file system and method for inputting and outputting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination