CN113760855A - Data storage method and device, electronic equipment and storage medium - Google Patents

Data storage method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113760855A
CN113760855A CN202111063591.3A CN202111063591A CN113760855A CN 113760855 A CN113760855 A CN 113760855A CN 202111063591 A CN202111063591 A CN 202111063591A CN 113760855 A CN113760855 A CN 113760855A
Authority
CN
China
Prior art keywords
data
metadata
name node
target metadata
query request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111063591.3A
Other languages
Chinese (zh)
Inventor
梁海昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202111063591.3A priority Critical patent/CN113760855A/en
Publication of CN113760855A publication Critical patent/CN113760855A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/144Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1748De-duplication implemented within the file system, e.g. based on file segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Abstract

The application relates to a data storage method, a data storage device, electronic equipment and a storage medium, which are applied to the technical field of data processing, wherein the method comprises the following steps: acquiring metadata stored in name nodes of a distributed file system; determining cold data in the metadata; and storing the cold data into a preset external storage unit, and deleting the cold data in the name node. The method and the device solve the problems that in the prior art, as the memory of the name node is more and more consumed, the response speed of the name node and the starting speed of the name node are influenced.

Description

Data storage method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data storage method and apparatus, an electronic device, and a storage medium.
Background
Hadoop is a cluster distributed project dominated by the Apache fund and mainly comprises two core modules: Map/Reduce programming model and HDFS (Hadoop distribution File System) distributed File system. The HDFS mainly realizes the characteristics of high availability of data, cluster expansibility, high-speed reading and writing of data and the like through a multi-backup mechanism, a heartbeat mechanism and the like of file data blocks. Due to the above characteristics of HDFS, most enterprises currently choose to build cloud storage based on HDFS.
HDFS clusters have two types of nodes and operate in manager-worker mode, i.e., one NameNode as manager and multiple datanodes as workers. The NameNode (NN) is mainly responsible for managing the HDFS file system, and the DataNode (DN) is mainly used for storing data files.
In the big data technology, the HDFS is used as a data storage system, and metadata information of the data is indexed in the NN memory and recorded in the NN memory. With the increase of directories and files in the HDFS, the memory of the NN is also consumed more and more, which affects the response speed of the NN on one hand, and affects the startup speed of the NN on the other hand (the existing metadata needs to be loaded into the memory from the disk during startup).
Disclosure of Invention
The application provides a data storage method, a data storage device, an electronic device and a storage medium, which are used for solving the problem that in the prior art, as the memory of an NN is more and more consumed, the response speed of the NN and the starting speed of the NN are influenced.
In a first aspect, an embodiment of the present application provides a data storage method, including:
acquiring metadata stored in name nodes of a distributed file system;
determining cold data in the metadata;
and storing the cold data into a preset external storage unit, and deleting the cold data in the name node.
Optionally, after storing the cold data in a preset external storage unit and deleting the cold data in the name node, the method further includes:
acquiring a data query request;
judging whether target metadata corresponding to the data query request exists in the name node or not;
if the name node has target metadata corresponding to the data query request, returning storage data corresponding to the target metadata;
and if the target metadata corresponding to the data query request does not exist in the name node, obtaining storage data corresponding to the target metadata based on the preset external storage unit.
Optionally, the obtaining, based on the preset external storage unit, storage data corresponding to the target metadata includes:
judging whether the preset external storage unit has target metadata corresponding to the data query request or not;
and if the target metadata exists in the preset external storage unit, loading the target metadata into the name node, and returning the storage data corresponding to the target metadata.
Optionally, the number of the metadata is at least one, and the determining cold data in the metadata stored in the name node includes:
acquiring a respective use parameter value of each metadata, wherein the use parameter value refers to a parameter value generated in the calling process of the metadata;
and using the metadata corresponding to the use parameter value smaller than the preset parameter value as the cold data.
Optionally, the usage parameter includes a usage frequency and/or a usage interval duration.
Optionally, before obtaining the metadata stored in the name node of the distributed file system, the method further includes:
instructing a name node of the distributed file system to start working;
and after the name node starts to work, loading the metadata in the distributed file system to the name node.
In a second aspect, an embodiment of the present application provides a data storage device, including:
the first acquisition module is used for acquiring metadata stored in name nodes of the distributed file system;
a determination module to determine cold data in the metadata;
and the storage module is used for storing the cold data into a preset external storage unit and deleting the cold data in the name node.
Optionally, the method further includes:
the second acquisition module is used for acquiring the data query request;
the judging module is used for judging whether target metadata corresponding to the data query request exists in the name node;
the data returning module is used for returning the storage data corresponding to the target metadata if the judging module judges that the target metadata corresponding to the data query request exists in the name node;
and the data obtaining module is used for obtaining storage data corresponding to the target metadata based on the preset external storage unit if the judging module judges that the target metadata corresponding to the data query request does not exist in the name node.
Optionally, the data obtaining module is specifically configured to:
judging whether the preset external storage unit has target metadata corresponding to the data query request or not;
and if the preset external storage unit comprises the target metadata, loading the target metadata into the name node, and returning storage data corresponding to the target metadata.
Optionally, the determining module is specifically configured to:
obtaining a usage parameter value of each of the metadata;
and taking the metadata with the use parameter value smaller than a preset parameter value as the cold data.
Optionally, the usage parameter includes a usage frequency and/or a usage interval duration.
Optionally, the apparatus further includes:
the starting module is used for starting the name nodes of the distributed file system;
and the loading module is used for loading the metadata to the name node by the distributed file system after the name node is started.
In a third aspect, an embodiment of the present application provides an electronic device, including: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
the memory for storing a computer program;
the processor is configured to execute the program stored in the memory, and implement the data storage method according to the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the data storage method of the first aspect.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages: the method provided by the embodiment of the application acquires metadata stored in a NameNode of an HDFS; determining cold data in the metadata; and storing the cold data into a preset external storage unit, and deleting the cold data in the NameNode. Therefore, in order to avoid the phenomenon that the memory in the NameNode is excessively occupied and delete the cold data in the NameNode, on one hand, part of the memory in the NameNode can be cleaned, and the cold data does not need to be loaded into the memory during starting, so that the starting speed of the NameNode is improved, the occupation of the cold data is reduced, and the response speed of the NameNode is improved. In addition, the cold data in the NameNode is stored in the external storage unit, so that the data loss of the part of cold data caused by deletion from the memory is avoided.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is an architecture diagram of a data storage method according to an embodiment of the present application;
fig. 2 is a flowchart of a data storage method according to an embodiment of the present application;
fig. 3 is a data transmission diagram in a data storage method according to an embodiment of the present application;
FIG. 4 is a block diagram of a data storage device according to an embodiment of the present application;
fig. 5 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Before further detailed description of the embodiments of the present invention, terms and expressions referred to in the embodiments of the present invention are described, and the terms and expressions referred to in the embodiments of the present invention are applicable to the following explanations.
(1) HDFS (Hadoop distributed File System): the Hadoop distribution File System is a cluster formed by a plurality of servers, namely a distributed File System, a large data common storage System and a System for realizing the distributed File System;
(2) thermal data: data that is often accessed, viewed by a user;
(3) cold data: infrequently accessed data;
(4) NameNode: the name node, NN for short, is a host role in the HDFS, and is mainly responsible for the management of the HDFS cluster, receiving a client request and distributing storage nodes;
(5) a DataNode: the data node, DN for short, one host role in HDFS, is mainly responsible for data storage, receive NameNode order;
(6) HDFS metadata: the NameNode maintains, records the basic information of the files stored in the current HDFS, is similar to the directory index of the HDFS, each directory or file is a piece of metadata, and mainly comprises the following 3 parts according to types:
1. attribute information of the file and the directory itself, such as a file name, a directory name, modification information, and the like;
2. information related to storage of information of file records, such as storage block information, blocking conditions, copy number, and the like;
3. recording the information of the DataNode of the HDFS for the management of the DataNode.
According to one embodiment of the application, a method for storing data is provided. Alternatively, in the embodiment of the present application, the above data storage method may be applied to a hardware environment formed by the terminal 101 and the server 102 as shown in fig. 1. As shown in fig. 1, a server 102 is connected to a terminal 101 through a network, which may be used to provide services (such as video services, application services, etc.) for the terminal or a client installed on the terminal, and a database may be provided on the server or separately from the server for providing data storage services for the server 102, and the network includes but is not limited to: the terminal 101 is not limited to a PC, a mobile phone, a tablet computer, and the like.
The data storage method according to the embodiment of the present application may be executed by the server 102, the terminal 101, or both the server 102 and the terminal 101. The terminal 101 may execute the data storage method according to the embodiment of the present application, or may be executed by a client installed thereon.
Taking a terminal as an example to execute the data storage method of the embodiment of the present application, the method may be applied to the terminal, fig. 2 is a schematic flow chart of an optional data storage method according to the embodiment of the present application, and as shown in fig. 2, the flow of the method may include the following steps:
step 201, obtaining metadata stored in the NameNode of the HDFS.
In some embodiments, when the HDFS is started, the HDFS indicates that the NN and the DN are respectively started and start to work, after the DN is started, the DN registers with the NN, and after the NameNode is started, the HDFS loads metadata in a disk to the NameNode, so that the metadata is stored in the NN. Therefore, when acquiring metadata in the NN, an acquisition instruction may be directly sent to the NN to obtain the metadata stored in the NN.
Wherein, for a data block in the HDFS, the data block is stored on the disk in a file form on the DN, and the data block includes original data (i.e. data itself) and metadata (including the length of the data block, block data, checksum of the block data, and timestamp), the DN is registered to the NN after being started, and after the registration is passed, all block information is periodically (for example, every 1 hour) reported to the NN, the heartbeat is once every 3s, each heartbeat carries a command given to the DN by the NN, and if the heartbeat of the DN is not received for a certain time (for example, 10 minutes), the NN considers that the node is unavailable, and thus data cannot be obtained from the DN. Further, upon receiving a data request in the NN, a signal can be sent to the DN to obtain the corresponding data.
Step 202, determine cold data in the metadata.
In some embodiments, in actual use, all data stored in the HDFS is not accessed, but only hot data in the last period (last 1 day, 1 month, 1 year …) is accessed frequently, and cold data in the early period is used with low probability or even no longer.
In this embodiment, the cold data in the metadata may be determined first, and the cold data may be identified to further process the cold data.
In an alternative embodiment, the determination of the cold data in the metadata stored by the NameNode may be performed by:
acquiring respective use parameter values of each metadata, wherein the use parameter values refer to parameter values generated in the calling process of the metadata; and using the metadata corresponding to the use parameter value which does not meet the preset condition as cold data.
In some embodiments, based on the usage parameter value of each metadata, metadata having a usage parameter value smaller than a preset parameter value is used as the cold data. The use parameter value can be but is not limited to the use frequency and/or the use interval duration, so that the frequently accessed data can be prevented from being divided into cold data, and the accuracy of data classification is improved.
Specifically, when a plurality of metadata are obtained, the server respectively determines whether each metadata is cold data, and stores parameter values generated in the metadata calling process, such as a usage interval duration (i.e., an interval duration from a latest calling time of the metadata to a current device running time) and a usage frequency (which can be calculated by the number of calling times within a preset duration), in the calling process of each metadata, where the number of calling times can be obtained by counting the number of calling times within the preset duration.
After obtaining the respective use parameter value of each metadata, comparing the use parameter value with a preset parameter value, for example, when the use parameter value is a use interval duration and a preset condition is set that the use interval duration is greater than a duration threshold, determining the metadata corresponding to the use interval duration greater than the duration threshold as cold data; or when the usage parameter value is the usage frequency and the preset condition is set to be that the usage frequency is smaller than the preset frequency threshold, determining the metadata corresponding to the usage frequency smaller than the preset frequency threshold as the cold data.
The preset frequency threshold may be, but is not limited to, 3 times, the preset duration may be, but is not limited to, 1 week, and the duration threshold may be, but is not limited to, 1 day.
Step 203, storing the cold data into a preset external storage unit, and deleting the cold data in the NameNode.
In some embodiments, by deleting the cold data in the NameNode, on one hand, part of the memory in the NameNode can be cleaned, and the cold data does not need to be loaded to the memory during starting, so that the starting speed of the NameNode is improved, the occupation of the cold data is reduced, the response speed of the NameNode is improved, and the metadata information corresponding to the cold data of the cold data is dynamically removed from the NN memory, so that the memory pressure of the NN is reduced. In addition, the cold data in the NameNode is stored in the external storage unit, so that the data loss of the part of cold data caused by deletion from the memory is avoided.
The external storage unit may be, but is not limited to, an external file or an external database.
In an optional embodiment, after storing the cold data in a preset external storage unit and deleting the cold data in the NameNode, the method further includes:
acquiring a data query request; judging whether target metadata corresponding to the data query request exists in the NameNode; if the name node has target metadata corresponding to the data query request, returning storage data corresponding to the target metadata; and if the target metadata corresponding to the data query request does not exist in the name node, obtaining storage data corresponding to the target metadata based on a preset external storage unit.
In some embodiments, since the cold data is deleted from the NN and stored in the external storage unit in the above embodiments, after the data query request is obtained, the query request may not be queried from the NN without the cold data query request. Therefore, after the data query request is acquired, whether target metadata corresponding to the data query request exists in the NameNode is judged, and if the target metadata corresponding to the data query request exists in the NN, the storage data corresponding to the target metadata is directly returned; if the target metadata corresponding to the data query request does not exist, the target metadata is cold data and is stored in the external storage unit, so that the storage data corresponding to the target metadata can be obtained based on the preset external storage unit.
Specifically, when the target metadata is stored in the NN, the NN sends a query command to the DN, so that the DN queries, in its data storage unit, the target raw data corresponding to the target metadata, thereby returning the target raw data.
Further, obtaining storage data corresponding to the target metadata based on a preset external storage unit includes:
judging whether a preset external storage unit has target metadata corresponding to the data query request or not; and if the preset external storage unit comprises the target metadata, loading the target metadata into the NameNode, and returning the storage data corresponding to the target metadata.
In some embodiments, when the target metadata is not in the NN, the NN may send a data query request to a preset external storage unit, query in the preset external storage unit whether target metadata corresponding to the data query request exists, and load the target metadata in the preset external storage unit into the NN, and further, the NN may query the target metadata, so that the storage data corresponding to the target metadata may be returned. Therefore, when the cold data is required to be inquired, the metadata is automatically supplemented back to the NN again, and the metadata is guaranteed not to be lost.
Fig. 3 is a specific process for obtaining stored data provided in the embodiment of the present application, and referring to fig. 3, the HDFS includes a plurality of DataNode nodes and a NameNode, where the DataNode stores data in the HDFS, and the NameNode is used for storing metadata. The DataNode registers to the NN, the NN stores cold data in the metadata into a preset external storage unit, and deletes the cold data. After a data query request is acquired, corresponding target metadata are queried in the NN, if the target metadata corresponding to the data query request do not exist in the NN, the target metadata are acquired from a preset external storage unit and are reloaded to the NN, and if the target metadata corresponding to the data query request exist in the NN, corresponding storage data are acquired from the DN on the basis of the target metadata.
According to the method and the device, the metadata information corresponding to the cold data in the HDFS in the NN memory is automatically removed, the NN memory is released, the utilization rate of the NN memory is improved, and meanwhile when a user really needs to access the cold data, the NN can reload the metadata back to the memory, so that dynamic optimization of the memory is achieved.
Based on the same concept, embodiments of the present application provide a data storage device, and specific implementation of the device may refer to the description of the method embodiment, and repeated details are not repeated, as shown in fig. 4, the device mainly includes:
a first obtaining module 401, configured to obtain metadata stored in a NameNode of the HDFS;
a determining module 402 for determining cold data in the metadata;
the storage module 403 is configured to store the cold data in a preset external storage unit, and delete the cold data in the NameNode.
In an optional embodiment, the apparatus further comprises:
the second acquisition module is used for acquiring the data query request;
the judging module is used for judging whether target metadata corresponding to the data query request exists in the NameNode;
the data returning module is used for returning the storage data corresponding to the target metadata if the judging module judges that the target metadata corresponding to the data query request exists in the name node;
and the data obtaining module is used for obtaining storage data corresponding to the target metadata based on the preset external storage unit if the judging module judges that the target metadata corresponding to the data query request does not exist in the name node.
In an optional embodiment, the data obtaining module is specifically configured to:
judging whether the preset external storage unit has target metadata corresponding to the data query request or not;
and if the preset external storage unit comprises the target metadata, loading the target metadata into the NameNode, and returning storage data corresponding to the target metadata.
In an optional embodiment, the determining module is specifically configured to:
obtaining a usage parameter value of each of the metadata;
and taking the metadata with the use parameter value smaller than a preset parameter value as the cold data.
In an alternative embodiment, the usage parameters include frequency of use and/or duration of a usage interval.
In an optional embodiment, the apparatus further comprises:
the starting module is used for starting the NameNode of the HDFS;
and the loading module is used for loading the metadata to the NameNode by the HDFS after the NameNode is started.
Based on the same concept, an embodiment of the present application further provides an electronic device, as shown in fig. 5, the electronic device mainly includes: a processor 501, a memory 502 and a communication bus 503, wherein the processor 501 and the memory 502 communicate with each other through the communication bus 503. The memory 502 stores a program executable by the processor 501, and the processor 501 executes the program stored in the memory 502, so as to implement the following steps:
acquiring metadata stored in a NameNode of the HDFS;
determining cold data in the metadata;
and storing the cold data into a preset external storage unit, and deleting the cold data in the NameNode.
The communication bus 503 mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 503 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
The memory 502 may include a Random Access Memory (RAM) or a non-volatile memory (non-volatile memory), such as at least one disk memory. Alternatively, the memory may be at least one memory device located remotely from the aforementioned processor 501.
The processor 501 may be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc., and may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other programmable logic devices, discrete gates or transistor logic devices, and discrete hardware components.
In still another embodiment of the present application, there is also provided a computer-readable storage medium having stored therein a computer program which, when run on a computer, causes the computer to execute the storage method of data described in the above-described embodiment.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The available media may be magnetic media (e.g., floppy disks, hard disks, tapes, etc.), optical media (e.g., DVDs), or semiconductor media (e.g., solid state drives), among others.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for storing data, comprising:
acquiring metadata stored in name nodes of a distributed file system;
determining cold data in the metadata;
and storing the cold data into a preset external storage unit, and deleting the cold data in the name node.
2. The method for storing data according to claim 1, wherein after storing the cold data in a preset external storage unit and deleting the cold data in the name node, the method further comprises:
acquiring a data query request;
judging whether target metadata corresponding to the data query request exists in the name node or not;
if the name node has target metadata corresponding to the data query request, returning storage data corresponding to the target metadata;
and if the target metadata corresponding to the data query request does not exist in the name node, obtaining storage data corresponding to the target metadata based on the preset external storage unit.
3. The data storage method according to claim 2, wherein obtaining the storage data corresponding to the target metadata based on the preset external storage unit comprises:
judging whether the preset external storage unit has target metadata corresponding to the data query request or not;
and if the target metadata exists in the preset external storage unit, loading the target metadata into the name node, and returning the storage data corresponding to the target metadata.
4. The method according to claim 1, wherein the number of the metadata is at least one, and the determining cold data in the metadata stored in the name node comprises:
acquiring a respective use parameter value of each metadata, wherein the use parameter value refers to a parameter value generated in the calling process of the metadata;
and taking the metadata corresponding to the use parameter value which does not meet the preset condition as the cold data.
5. The method according to claim 4, wherein the usage parameter comprises a usage frequency and/or a usage interval duration.
6. The data storage method according to claim 1, wherein before obtaining the metadata stored in the name node of the distributed file system, the method further comprises:
instructing a name node of the distributed file system to start working;
after the name node begins to operate, the distributed file system loads the metadata into the name node.
7. An apparatus for storing data, comprising:
the first acquisition module is used for acquiring metadata stored in name nodes of the distributed file system;
a determination module to determine cold data in the metadata;
and the storage module is used for storing the cold data into a preset external storage unit and deleting the cold data in the name node.
8. The data storage device of claim 7, further comprising:
the second acquisition module is used for acquiring the data query request;
the judging module is used for judging whether target metadata corresponding to the data query request exists in the name node;
the data returning module is used for returning the storage data corresponding to the target metadata if the judging module judges that the target metadata corresponding to the data query request exists in the name node;
and the data obtaining module is used for obtaining storage data corresponding to the target metadata based on the preset external storage unit if the judging module judges that the target metadata corresponding to the data query request does not exist in the name node.
9. An electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
the memory for storing a computer program;
the processor, executing the program stored in the memory, implements the data storage method of any one of claims 1 to 6.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements a method of storing data according to any one of claims 1 to 6.
CN202111063591.3A 2021-09-10 2021-09-10 Data storage method and device, electronic equipment and storage medium Pending CN113760855A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111063591.3A CN113760855A (en) 2021-09-10 2021-09-10 Data storage method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111063591.3A CN113760855A (en) 2021-09-10 2021-09-10 Data storage method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113760855A true CN113760855A (en) 2021-12-07

Family

ID=78794840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111063591.3A Pending CN113760855A (en) 2021-09-10 2021-09-10 Data storage method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113760855A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104536904A (en) * 2014-12-29 2015-04-22 杭州华为数字技术有限公司 Data management method, equipment and system
CN107037985A (en) * 2017-02-14 2017-08-11 中山大学 The super fusion integrated machine system of one kind and its horizontal and vertical expansion method
CN107665224A (en) * 2016-07-29 2018-02-06 北京京东尚科信息技术有限公司 Scan the mthods, systems and devices of HDFS cold datas
CN107861999A (en) * 2017-10-20 2018-03-30 北京集奥聚合科技有限公司 The processing method and system of cold data in a kind of hdfs
US9934147B1 (en) * 2015-06-26 2018-04-03 Emc Corporation Content-aware storage tiering techniques within a job scheduling system
CN108021585A (en) * 2016-10-28 2018-05-11 腾讯科技(深圳)有限公司 Distributed data storage method and device
CN111881107A (en) * 2020-08-05 2020-11-03 北京计算机技术及应用研究所 Distributed storage method supporting mounting of multi-file system
CN113032349A (en) * 2019-12-25 2021-06-25 阿里巴巴集团控股有限公司 Data storage method and device, electronic equipment and computer readable medium
CN113760854A (en) * 2021-09-10 2021-12-07 北京金山云网络技术有限公司 Method for identifying data in HDFS memory and related equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104536904A (en) * 2014-12-29 2015-04-22 杭州华为数字技术有限公司 Data management method, equipment and system
US9934147B1 (en) * 2015-06-26 2018-04-03 Emc Corporation Content-aware storage tiering techniques within a job scheduling system
CN107665224A (en) * 2016-07-29 2018-02-06 北京京东尚科信息技术有限公司 Scan the mthods, systems and devices of HDFS cold datas
CN108021585A (en) * 2016-10-28 2018-05-11 腾讯科技(深圳)有限公司 Distributed data storage method and device
CN107037985A (en) * 2017-02-14 2017-08-11 中山大学 The super fusion integrated machine system of one kind and its horizontal and vertical expansion method
CN107861999A (en) * 2017-10-20 2018-03-30 北京集奥聚合科技有限公司 The processing method and system of cold data in a kind of hdfs
CN113032349A (en) * 2019-12-25 2021-06-25 阿里巴巴集团控股有限公司 Data storage method and device, electronic equipment and computer readable medium
CN111881107A (en) * 2020-08-05 2020-11-03 北京计算机技术及应用研究所 Distributed storage method supporting mounting of multi-file system
CN113760854A (en) * 2021-09-10 2021-12-07 北京金山云网络技术有限公司 Method for identifying data in HDFS memory and related equipment

Similar Documents

Publication Publication Date Title
US8756199B2 (en) File level hierarchical storage management system, method, and apparatus
CN110888889B (en) Data information updating method, device and equipment
JP5924209B2 (en) Backup control program, backup control method, and information processing apparatus
US11574025B2 (en) Systems and methods for managed asset distribution in a distributed heterogeneous storage environment
CN110958300B (en) Data uploading method, system, device, electronic equipment and computer readable medium
CN111208934B (en) Data storage method and device
CN110781149A (en) Method, device, equipment and storage medium for managing live broadcast room information
CN111400334A (en) Data processing method, data processing device, storage medium and electronic device
CN112511627B (en) Method and device for migrating metadata
CN110750211B (en) Storage space management method and device
CN112840334A (en) Method and device for managing data of partition table, management node and storage medium
WO2020029588A1 (en) Data reading method, device, system, and distributed system
CN113760854A (en) Method for identifying data in HDFS memory and related equipment
CN113779412B (en) Message touch method, node and system based on blockchain network
CN112579633A (en) Data retrieval method, device, equipment and storage medium
CN111190861A (en) Hot file management method, server and computer readable storage medium
CN113760855A (en) Data storage method and device, electronic equipment and storage medium
CN111400327B (en) Data synchronization method and device, electronic equipment and storage medium
CN111399754B (en) Method and device for releasing storage space and distributed system
CN113609168A (en) Data export method, device, terminal and readable storage medium
CN111078643A (en) Method and device for deleting files in batches and electronic equipment
CN111858498A (en) Storage type conversion method, system, device and equipment
WO2021063242A1 (en) Metadata transmission method of storage system, and storage system
CN115277858B (en) Data processing method and system for big data
CN116414620A (en) Data backup method, system, device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination