CN117493277A - Cold file searching method and device, electronic equipment and storage medium - Google Patents

Cold file searching method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117493277A
CN117493277A CN202311436061.8A CN202311436061A CN117493277A CN 117493277 A CN117493277 A CN 117493277A CN 202311436061 A CN202311436061 A CN 202311436061A CN 117493277 A CN117493277 A CN 117493277A
Authority
CN
China
Prior art keywords
file
target
cluster
cold
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311436061.8A
Other languages
Chinese (zh)
Inventor
穆纯进
姜雨彤
王云朋
霍勇杰
李振豪
张逸明
郝树运
冯佳佳
茅矛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Unicom Digital Technology Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Unicom Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd, Unicom Digital Technology Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202311436061.8A priority Critical patent/CN117493277A/en
Publication of CN117493277A publication Critical patent/CN117493277A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a cold file searching method, a cold file searching device, electronic equipment and a storage medium. The method comprises the following steps: receiving a file copy instruction; according to the file copying instruction, sending a storage file of the cluster to be positioned to a target cluster, wherein the cluster to be positioned and the target cluster are a set of each server in the distributed file system; the target cluster is controlled to convert the data structure of the storage file to obtain a target file; constructing a mapping file according to the target file, wherein the mapping file is used for storing the corresponding relation between the target file and the access time of the target file; determining a cold file list from the mapping file according to the access time of the target file; and obtaining the address of the cold file according to the cold file list. According to the method, the storage position of the cold file can be quickly searched under the condition that file system data is not affected, and position support is provided for the file system to process the cold file, so that the file processing efficiency and the stability of the distributed file system are improved.

Description

Cold file searching method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for searching a cold file, an electronic device, and a storage medium.
Background
In practical applications, the number of files in the distributed file system is tens of millions or even hundreds of millions, and the stacking of a large number of files can degrade the stability of the distributed file system, especially when there are many cold files that are no longer used or are not commonly used, a large amount of memory space of the distributed file system is occupied, and the stability of the storage system is further affected.
In view of the foregoing, the prior art generally uses a method of traversing file lists to recursively acquire all file lists from a directory structure tree of an entire distributed file system, thereby determining a cold file list in all file lists, and thus searching for a storage location of a cold file in the distributed file system.
However, when searching for a cold file, the method for traversing the file list can affect the implementation condition of normal data service of the system, and the searching efficiency is not high.
Disclosure of Invention
The application provides a cold file searching method, a cold file searching device, electronic equipment and a storage medium, which are used for solving the problems of limiting normal data service implementation of a system and low searching efficiency in the cold file searching process.
In a first aspect, the present application provides a method for searching a cold file, including:
receiving a file copy instruction;
according to a file copying instruction, a storage file of a cluster to be positioned is sent to a target cluster, wherein the cluster to be positioned and the target cluster are a set of each server in a distributed file system, the cluster to be positioned is a cluster which is used by a target object, and the target cluster is in an idle state and does not have a use relation with the target object;
the target cluster is controlled to convert the data structure of the storage file to obtain a target file;
constructing a mapping file according to the target file, wherein the mapping file is used for storing the corresponding relation between the target file and the access time of the target file;
determining a cold file list from the mapping file according to the access time of the target file;
and obtaining the address of the cold file according to the cold file list.
In an embodiment of the present application, before sending the storage file of the cluster to be located to the target cluster according to the file copy instruction, the method further includes:
receiving a file reading instruction sent by a target object;
determining a cluster to be positioned, which has a use relation with a target object, according to the file reading instruction;
determining other clusters in the distributed file system according to the clusters to be positioned;
determining address information of a cluster to be positioned and address information of other clusters;
and determining target clusters in other clusters according to the address information of the clusters to be positioned and the address information of the other clusters, wherein the address information of the target clusters and the address information of the clusters to be positioned meet preset address requirements.
In an embodiment of the present application, according to a file copy instruction, sending a storage file of a cluster to be located to a target cluster includes:
determining a storage file of the cluster to be positioned according to the file copying instruction;
obtaining a copy storage file according to the storage file of the cluster to be positioned;
and sending the copy storage file of the cluster to be positioned to the target cluster according to the address information of the target cluster.
In this embodiment of the present application, the controlling the target cluster to convert the data structure of the storage file to obtain the target file includes:
determining file attribute information and file storage information of a storage file according to the storage file;
constructing an object input stream;
and reading file attribute information and file storage information through the object input stream to obtain the target file.
In the embodiment of the application, constructing the mapping file according to the target file includes:
determining the corresponding relation of the path of the target file, the access time of the target file and the access time of the target file;
and writing the corresponding relation between the path of the target file and the access time of the target file into a pre-constructed file mapping database to obtain a mapping file.
In an embodiment of the present application, determining a cold file list from the mapped files according to access time of the target file includes:
determining a preset cold file access time;
and inquiring the mapping file according to the cold file access time to obtain a cold file list.
In the embodiment of the present application, according to the cold file access time, the mapping file is queried to obtain a cold file list, including:
constructing a mapping file inquiry statement according to the cold file access time;
inquiring the mapping file according to the mapping file inquiry statement, and determining cold file storage information;
and generating a cold file list according to the cold file storage information.
In a second aspect, the present application provides a cold file searching apparatus, including:
the receiving module is used for receiving a file copy instruction;
the sending module is used for sending the storage file of the cluster to be positioned to the target cluster according to the file copy instruction, wherein the cluster to be positioned and the target cluster are the sets of all servers in the distributed file system, the cluster to be positioned is the cluster which is used by the target object, and the target cluster is the cluster which is in an idle state and has no use relation with the target object;
the conversion module is used for controlling the target cluster to convert the data structure of the storage file to obtain the target file;
the first determining module is used for constructing a mapping file according to the target file, wherein the mapping file is used for storing the corresponding relation between the target file and the access time of the target file;
the second determining module is used for determining a cold file list from the mapping file according to the access time of the target file;
and the obtaining module is used for obtaining the address of the cold file according to the cold file list.
In a third aspect, the present application provides an electronic device, including: a processor, a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes the computer-executable instructions stored in the memory to implement the cold file lookup method of the embodiments of the present application.
In a fourth aspect, a computer readable storage medium stores computer executable instructions that, when executed by a processor, are configured to implement a cold file search method according to an embodiment of the present application.
The application provides a cold file searching method, a cold file searching device, electronic equipment and a storage medium, wherein a file copying instruction is received; according to a file copying instruction, a storage file of a cluster to be positioned is sent to a target cluster, wherein the cluster to be positioned and the target cluster are a set of each server in a distributed file system, the cluster to be positioned is a cluster which is used by a target object, and the target cluster is in an idle state and does not have a use relation with the target object; the target cluster is controlled to convert the data structure of the storage file to obtain a target file; constructing a mapping file according to the target file, wherein the mapping file is used for storing the corresponding relation between the target file and the access time of the target file; determining a cold file list from the mapping file according to the access time of the target file; according to the cold file list, the address of the cold file is obtained, so that the process of searching the cold file is performed on the target cluster, the implementation condition of normal data service of the system is not affected, the stored data of the system is protected from being damaged in the searching process, and meanwhile, the effect of quickly searching the cold file is realized through the access time of the cold file.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
Fig. 1 is a flow chart of a method for searching a cold file according to an embodiment of the present application;
FIG. 2 is a flow chart of another method for searching a cold file according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of another cold file searching method according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a cold file searching device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
In the prior art, in the process of searching a cold file, a directory structure tree of the whole distributed file system is recursively utilized to acquire all file lists, so that the cold file list in all file lists is determined, and the storage position of the cold file in the distributed file system is searched. However, the process of directly traversing the file list in the cluster needing to search the cold file brings data processing pressure to the distributed file system, and misoperation and other conditions occur in the searching process, so that service data being implemented by the system are lost, and the searching efficiency is low.
In order to solve the above problem, a target cluster can be selected from the distributed file system to execute a related command for searching the cold file, the target cluster and the cluster executing the service data have no relation on the enterprise production environment, so that the method is non-invasive, meanwhile, the cold file in the distributed file system is searched by defining the access time of the cold file, the searching efficiency can be improved, and the quick screening is realized.
The following describes the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
The embodiment of the application provides a cold file searching method, a device, electronic equipment and a storage medium, which are applied to a distributed file system, wherein the distributed file system is a file system capable of deploying file data in a service on a plurality of nodes, each node realizes different sub-services in the overall service of the system, each node is unaware, a plurality of copy nodes are arranged on each node and used for completing the same service so as to accelerate the service processing efficiency, and the nodes and the corresponding copy nodes form a cluster. The present embodiment is not particularly limited as to the selection type of the distributed file system, as long as it is possible to receive a file copy instruction; according to a file copying instruction, a storage file of a cluster to be positioned is sent to a target cluster, wherein the cluster to be positioned and the target cluster are a set of each server in a distributed file system, the cluster to be positioned is a cluster which is used by a target object, and the target cluster is in an idle state and does not have a use relation with the target object; the target cluster is controlled to convert the data structure of the storage file to obtain a target file; constructing a mapping file according to the target file, wherein the mapping file is used for storing the corresponding relation between the target file and the access time of the target file; determining a cold file list from the mapping file according to the access time of the target file; and obtaining the address of the cold file according to the cold file list.
Fig. 1 is a flow chart of a cold file searching method provided in an embodiment of the present application, where an execution body of the cold file searching method may be a server in a distributed file system, as shown in fig. 1, and the cold file searching method includes the following steps:
s101, receiving a file copy instruction.
For example, to find the cold file address associated with the solution a in the distributed file system, it is necessary to determine the storage file in the to-be-located cluster participating in executing the solution a, and copy the storage file to the target cluster unrelated to the solution a, and the received file copy instruction may be an instruction to transfer the storage file from the to-be-located cluster to the target cluster according to the IP address of the cluster.
Where a cluster may be a set of servers that as a whole provide a set of network resources for a target object, these individual servers are nodes (nodes) of the cluster.
S102, according to a file copy instruction, a storage file of a cluster to be positioned is sent to a target cluster, wherein the cluster to be positioned and the target cluster are sets of all servers in a distributed file system, the cluster to be positioned is a cluster which is used by a target object, and the target cluster is in an idle state and does not have a use relation with the target object.
The target object may refer to an object of an enterprise, a user, or the like that uses a distributed file system to store data, for example, the distributed file system a includes a cluster 1 and a cluster 2, and when the target object performs service processing, only the cluster 1 in the distributed file system a is used, and then the cluster 1 may be a cluster to be located, and the cluster 2 may be a target cluster.
Having a usage relationship may refer to the provision of resources by the servers of the cluster to be located of the traffic processing resources required by the target object.
In this embodiment of the present application, before sending the storage file of the cluster to be located to the target cluster according to the file copy instruction, the method for searching the cold file may further include:
receiving a file reading instruction sent by a target object;
determining a cluster to be positioned, which has a use relation with a target object, according to the file reading instruction;
determining other clusters in the distributed file system according to the clusters to be positioned;
determining address information of a cluster to be positioned and address information of other clusters;
and determining target clusters in other clusters according to the address information of the clusters to be positioned and the address information of the other clusters, wherein the address information of the target clusters and the address information of the clusters to be positioned meet preset address requirements.
The preset address requirement may be that the IP address of the corresponding node of each cluster meets the shortest path requirement.
For example, the file reading instruction may be an instruction sent by the target object and required to find a cold file, if the target object uses the cluster 2 in the distributed file system b during service processing, and the distributed file system includes the clusters 1, 2, and 3, then the cluster to be located may be determined to be the cluster 2, the other clusters are the cluster 1 and the cluster 3, and according to determining the IP address distances between the cluster 2 and the clusters 1 and 3, the shortest distance cluster is selected as the target cluster, for example, the distance between the cluster 2 and the cluster 3 is closer than the distance between the cluster 2 and the cluster 1, and the target cluster is the cluster 3.
In an embodiment of the present application, according to a file copy instruction, a method for sending a storage file of a cluster to be located to a target cluster may include:
determining a storage file of the cluster to be positioned according to the file copying instruction;
obtaining a copy storage file according to the storage file of the cluster to be positioned;
and sending the copy storage file of the cluster to be positioned to the target cluster according to the address information of the target cluster.
S103, controlling the target cluster to convert the data structure of the storage file to obtain the target file.
The conversion of the data structure may be converting a binary stream form corresponding to the stored file into a state form of the object, that is, reverse serialization of the file.
Binary streaming to facilitate storage to disk, transmission over a network, or persistence in memory; while the state form of the object may pass the object through a different computer, process, or network and be restored as needed.
In this embodiment of the present application, the method for controlling the target cluster to convert the data structure of the storage file to obtain the target file may include:
determining file attribute information and file storage information of a storage file according to the storage file;
constructing an object input stream;
and reading file attribute information and file storage information through the object input stream to obtain the target file.
The file attribute information may include information such as file name, directory name, file size, creation time, modification time, etc.; the file storage information may include information such as file data block size, copy number, etc.
The object input stream may refer to a serialization interface provided using Java, with which a sequence of bytes is read from the source input stream and de-serialized into a Java object and returned to form the object file.
S104, constructing a mapping file according to the target file, wherein the mapping file is used for storing the corresponding relation between the target file and the access time of the target file.
In this embodiment of the present application, constructing a mapping file according to a target file includes:
determining the corresponding relation of the path of the target file, the access time of the target file and the access time of the target file;
and writing the corresponding relation between the path of the target file and the access time of the target file into a pre-constructed file mapping database to obtain a mapping file.
Wherein, the pre-constructed file mapping database can be constructed by hive in the distributed file system. The hive model includes hive tables, which are identical to tables in the relational database, and regardless of how many databases (data warehouses) the user has, and how many tables are under the databases, all of these tables only store information such as metadata location, type, attribute, etc., and all of the actual stored data corresponding to the tables are stored in the distributed file system.
S105, determining a cold file list from the mapping file according to the access time of the target file.
In this embodiment of the present application, determining, according to access time of the target file, a cold file list from the mapped file includes:
determining a preset cold file access time;
and inquiring the mapping file according to the cold file access time to obtain a cold file list.
The preset cold file access time may be set according to information such as file type, attribute, last modification time, and the like.
In this embodiment of the present application, according to the cold file access time, the mapping file is queried to obtain a cold file list, including:
constructing a mapping file inquiry statement according to the cold file access time;
inquiring the mapping file according to the mapping file inquiry statement, and determining cold file storage information;
and generating a cold file list according to the cold file storage information.
The mapping file query statement may be HQL (Hibernate Query Language, object relation mapping framework query language) provided by hive, and only the target object is required to write the HQL statement, and hive automatically converts SQL into mapreduce program to process the structured data on the distributed file system.
For example, the file accessed at the time point of T2 is set as a cold file, the query of the cold file can be performed in the mapping file by utilizing the HQL to generate a file query statement, all the files accessed at the time point of T2 in the mapping file are read, and the storage information of the files is recorded, so that a cold file list is generated.
S106, obtaining the address of the cold file according to the cold file list.
In particular, the storage location of the cold file may be determined in the distributed file system from the list of cold files.
After the address of the cold file is obtained, the cold file may be processed, and the processing method may be transferring the cold file to other file systems for independent storage, or deleting the cold file, so as to release the memory space of the distributed file system and ensure the stability of the system.
According to the cold file searching method, the target cluster can be used for searching the cold files, the implementation condition of normal data service in the system is not affected, therefore, the stored data of the system is protected from being damaged in the searching process, the non-invasiveness of the method is reflected, meanwhile, the files corresponding to the access time are directly searched in the mapping files through the access time of the cold files, a cold file list can be generated, and the searching efficiency of the cold files is improved.
Fig. 2 is another method for searching a cold file according to an embodiment of the present application, as shown in fig. 2, including the following steps:
s201, acquiring a metadata file and pushing the metadata file to a cluster irrelevant to the enterprise production environment.
The metadata file may be a storage file in the NameNode, where the storage file is needed to be searched for by a cold file, and the metadata may be formally divided into memory metadata and metadata files, where the NameNode maintains a metadata image of the entire file system in the memory, and is used for management of an HDFS (Hadoop Distributed File System, distributed file system), and the metadata file is used for persisting the storage data.
HDFS may include NameNode, dataNode and blocks where the NameNode is responsible for metadata management of the entire distributed file system, i.e. information such as file pathname, ID and storage location of the data block, and also records which nodes are part of the cluster, and a block has several copies.
Distributed file systems and clusters are commonly used together to provide high availability, high performance, and scalability. In particular, a distributed file system is a file system capable of distributing file data over multiple nodes, which can connect multiple computers over a network such that the file systems on the computers appear to be a single file system. While a cluster is a group of interconnected computers that share resources and workload to achieve high availability and high performance. In one cluster, a distributed file system may be used to store and share data so that all nodes can access them.
A distributed file system may be made up of multiple clusters, each of which may contain multiple nodes. Different clusters may be distributed in different geographical locations for communication and collaboration over a network connection. For example, HDFS is a cluster of multiple nodes, each of which can store and process data.
The clusters irrelevant to the enterprise production environment may be other clusters in idle state, which are not associated with common services of the clusters that need to acquire the metadata file.
S202, inversely serializing the metadata file.
Specifically, by knowing the structure of metadata, a binary metadata file is deserialized into a plaintext file by writing a Java program deserializing program, and the format is as follows:
hdfs_dir: file path and directory path of HDFS; REPLICATION: the number of copies; modification_time: modifying the time; access_time: access time; PREFERRED _block_size: preferred data block size; BLOCKS_COUNT: the number of data blocks; file_size: file size; NSQUOTA: file number quota size; DSQUOTA: memory footprint quota size; PERMISION: rights; user_name: a user name; group_name: group name.
When two processes are in remote communication, various data including text, pictures, audio, video and the like can be sent, the data are transmitted in the form of binary sequences on a network, java is an object-oriented development mode, all Java objects are Java objects, the Java objects are transmitted in the network, serialization and deserialization can be used for realizing the transmission of the Java objects, a sender needs to convert the Java objects into byte sequences and then transmit the byte sequences on the network, and a receiver can restore the byte sequences into the Java objects through deserialization after receiving the character sequences.
The process of saving a Java object in a disk file in a series of bytes, which can also be said to be the process of saving the state of the Java object, is called serialization. Serialization can permanently store data on a disk (typically in a file), corresponding to metadata files in embodiments of the present application, i.e., the read metadata file is a serialized file; re-converting Java byte codes stored in disk files into Java objects is called deserialization.
S203, establishing a hive table and loading the deserialized file.
Specifically, the established hive table statement is as follows:
wherein hive is open source data warehouse software built on a distributed file system, which can map structured and semi-structured data files stored in the distributed files into a database table, provide a SQL-like query model based on the table, called hive query language HQL, for accessing and analyzing large data sets stored in the distributed file system, and hive core is to convert HQL into a mapreduce program, and then submit the program to a distributed cluster for execution.
S204, defining cold file attributes, and calculating a cold file list by using the HQL distributed job.
For example, defining that the file accessed at less than T1 time point is a cold file, then the query statement may be followed:
select path
from hdfs_table
where accesstime<T1
and submitting the HQL distributed operation to quickly and accurately calculate a cold file list.
After the cold file list is searched, the cold files can be deleted and cleaned according to the search information, so that the stability of the HDFS is effectively ensured.
According to the cold file searching method, metadata files are copied to the server cluster irrelevant to the HDFS to perform cold file searching operation, the implementation condition of the normal data service of the HDFS is not affected, therefore, stored data of the HDFS is protected from being damaged in the searching process, the non-invasiveness of the method is reflected, meanwhile, files corresponding to the access time are directly searched in a Hive table through the access time of the cold files, a cold file list can be generated, the searching efficiency of the cold files is improved, and compared with a method of scanning by using an interface of the HDFS, the cold file searching method provided by the embodiment of the invention can accurately position the cold files in a minute level, the cold file list is rapidly processed, and the stability of a distributed file system is guaranteed.
Fig. 3 is a specific flowchart of another method for searching a cold file according to an embodiment of the present application. As shown in fig. 3, metadata files in the NameNode are collected and uploaded to a server, wherein the server is different from a running server; performing deserialization on the metadata file on the server to form a plaintext file; then establishing a hive table, loading a plaintext file into the hive table, and obtaining the mapping relation of each file in the plaintext file; and defining the caliber of the cold file, calculating the cold file in a distributed mode by using the HQL, and finishing to obtain the file corresponding to the cold file list, thereby processing the cold file.
Fig. 4 is a schematic structural diagram of a cold file searching device according to an embodiment of the present application. As shown in fig. 4, the cold file search apparatus 40 includes: a receiving module 401, a transmitting module 402, a converting module 403, a first determining module 404, a second determining module 405, and an obtaining module 406. Wherein:
a receiving module 401, configured to receive a file copy instruction;
a sending module 402, configured to send, according to a file copy instruction, a storage file of a cluster to be located to a target cluster, where the cluster to be located and the target cluster are a set of servers in a distributed file system, the cluster to be located is a cluster that is being used by a target object, and the target cluster is a cluster that is in an idle state and has no use relationship with the target object;
the conversion module 403 is configured to control the target cluster to convert the data structure of the storage file to obtain a target file;
a first determining module 404, configured to construct a mapping file according to the target file, where the mapping file is used to store a correspondence between the target file and access time of the target file;
a second determining module 405, configured to determine a cold file list from the mapped files according to the access time of the target file;
the obtaining module 406 is configured to obtain an address of the cold file according to the cold file list.
In the embodiment of the present application, the receiving module 401 may also be used to:
receiving a file reading instruction sent by a target object;
determining a cluster to be positioned, which has a use relation with a target object, according to the file reading instruction;
determining other clusters in the distributed file system according to the clusters to be positioned;
determining address information of a cluster to be positioned and address information of other clusters;
and determining target clusters in other clusters according to the address information of the clusters to be positioned and the address information of the other clusters, wherein the address information of the target clusters and the address information of the clusters to be positioned meet preset address requirements.
In the embodiment of the present application, the sending module 402 may also be configured to:
determining a storage file of the cluster to be positioned according to the file copying instruction;
obtaining a copy storage file according to the storage file of the cluster to be positioned;
and sending the copy storage file of the cluster to be positioned to the target cluster according to the address information of the target cluster.
In the embodiment of the present application, the conversion module 403 may be further configured to:
determining file attribute information and file storage information of a storage file according to the storage file;
constructing an object input stream;
and reading file attribute information and file storage information through the object input stream to obtain the target file.
In the embodiment of the present application, the first determining module 404 may also be configured to:
determining the corresponding relation of the path of the target file, the access time of the target file and the access time of the target file;
and writing the corresponding relation between the path of the target file and the access time of the target file into a pre-constructed file mapping database to obtain a mapping file.
In the embodiment of the present application, the second determining module 405 may further be configured to:
determining a preset cold file access time;
and inquiring the mapping file according to the cold file access time to obtain a cold file list.
In the embodiment of the present application, the second determining module 405 may further be configured to:
constructing a mapping file inquiry statement according to the cold file access time;
inquiring the mapping file according to the mapping file inquiry statement, and determining cold file storage information;
and generating a cold file list according to the cold file storage information.
As can be seen from the above, the cold file searching apparatus in the embodiment of the present application has a receiving module 401, configured to receive a file copy instruction; a sending module 402, configured to send, according to a file copy instruction, a storage file of a cluster to be located to a target cluster, where the cluster to be located and the target cluster are a set of servers in a distributed file system, the cluster to be located is a cluster that is being used by a target object, and the target cluster is a cluster that is in an idle state and has no use relationship with the target object; the conversion module 403 is configured to control the target cluster to convert the data structure of the storage file to obtain a target file; a first determining module 404, configured to construct a mapping file according to the target file, where the mapping file is used to store a correspondence between the target file and access time of the target file; a second determining module 405, configured to determine a cold file list from the mapped files according to the access time of the target file; the obtaining module 406 is configured to obtain an address of the cold file according to the cold file list. Therefore, the embodiment of the application can provide the target cluster to perform the searching process of the cold file according to the cold file searching device, and the implementation condition of the normal data service of the system is not affected, so that the stored data of the system is protected from being damaged in the searching process, the non-invasive performance of the method is reflected, and meanwhile, the effect of quickly searching the cold file is realized through the access time of the cold file.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 5, the electronic device 50 includes:
the electronic device 50 may include one or more processing cores 'processors 501, one or more computer-readable storage media's memory 502, a network interface 503, and the like. Wherein the processor 501, the memory 502, and the network interface 503 are connected by a bus 504.
In a specific implementation, at least one processor 501 executes computer-executable instructions stored in memory 502, causing at least one processor 501 to perform the cold file lookup method as described above.
The specific implementation process of the processor 501 may refer to the above-mentioned method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be described herein again.
In the embodiment shown in fig. 5, it should be understood that the processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: digital Signal Processor, abbreviated as DSP), application specific integrated circuits (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.
The Memory may comprise high-speed Memory (Random Access Memory, RAM) or may further comprise Non-volatile Memory (NVM), such as at least one disk Memory.
The network interface may be a wireless network interface or a wired network interface, which is typically used to establish communication connections between the electronic device and other electronic devices. For example, a network interface is used to connect an electronic device with an external terminal through a network, establish a data transmission channel and a communication connection between the electronic device and the external terminal, and the like.
The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (Peripheral Component, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present application are not limited to only one bus or one type of bus.
In some embodiments, a computer program product is also presented, comprising a computer program or instructions which, when executed by a processor, implement the steps of any of the cold file lookup methods described above.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in any computer-readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present application provide a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any of the cold file lookup methods provided by embodiments of the present application.
Wherein the storage medium may include: read Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium.
The steps in any of the cold file searching methods provided in the embodiments of the present application may be executed due to the instructions stored in the storage medium, so that the beneficial effects that any of the cold file searching methods provided in the embodiments of the present application may be achieved, which are detailed in the previous embodiments and are not described herein.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method for cold file lookup, characterized by being applied to a distributed file system, the method comprising:
receiving a file copy instruction;
according to the file copy instruction, a storage file of a cluster to be positioned is sent to a target cluster, wherein the cluster to be positioned and the target cluster are sets of all servers in the distributed file system, the cluster to be positioned is a cluster which is used by a target object, and the target cluster is a cluster which is in an idle state and has no use relation with the target object;
controlling the target cluster to convert the data structure of the storage file to obtain a target file;
constructing a mapping file according to the target file, wherein the mapping file is used for storing the corresponding relation between the target file and the access time of the target file;
determining a cold file list from the mapping file according to the access time of the target file;
and obtaining the address of the cold file according to the cold file list.
2. The method of claim 1, wherein prior to said sending the stored file of the cluster to be located to the target cluster in accordance with the file copy instruction, the method further comprises:
receiving a file reading instruction sent by the target object;
determining a cluster to be positioned, which has the use relation with the target object, according to the file reading instruction;
determining other clusters in the distributed file system according to the cluster to be positioned;
determining address information of the cluster to be positioned and address information of other clusters;
determining target clusters in the other clusters according to the address information of the clusters to be positioned and the address information of the other clusters, wherein the address information of the target clusters and the address information of the clusters to be positioned meet preset address requirements.
3. The method according to claim 1, wherein the sending the storage file of the cluster to be located to the target cluster according to the file copy instruction includes:
determining a storage file of the cluster to be positioned according to the file copying instruction;
obtaining a copy storage file according to the storage file of the cluster to be positioned;
and sending the copy storage file of the cluster to be positioned to the target cluster according to the address information of the target cluster.
4. The method according to claim 1, wherein the controlling the target cluster to transform the data structure of the storage file to obtain the target file includes:
determining file attribute information and file storage information of the storage file according to the storage file;
constructing an object input stream;
and reading the file attribute information and the file storage information through the object input stream to obtain the target file.
5. The method of claim 1, wherein constructing a mapping file from the target file comprises:
determining the path of the target file, the access time of the target file and the corresponding relation between the path of the target file and the access time of the target file;
and writing the corresponding relation between the path of the target file and the access time of the target file into a pre-constructed file mapping database to obtain the mapping file.
6. The method of claim 1, wherein determining a list of cold files from the mapped files based on access times of the target files comprises:
determining a preset cold file access time;
and inquiring the mapping file according to the cold file access time to obtain the cold file list.
7. The method of claim 6, wherein querying the mapped file based on the cold file access time to obtain the cold file list comprises:
constructing the mapping file inquiry statement according to the cold file access time;
inquiring the mapping file according to the mapping file inquiry statement, and determining cold file storage information;
and generating the cold file list according to the cold file storage information.
8. A cold file search apparatus, comprising:
the receiving module is used for receiving a file copy instruction;
the sending module is used for sending the storage file of the cluster to be positioned to a target cluster according to the file copy instruction, wherein the cluster to be positioned and the target cluster are sets of all servers in a distributed file system, the cluster to be positioned is a cluster which is used by a target object, and the target cluster is a cluster which is in an idle state and has no use relation with the target object;
the conversion module is used for controlling the target cluster to convert the data structure of the storage file to obtain a target file;
the first determining module is used for constructing a mapping file according to the target file, wherein the mapping file is used for storing the corresponding relation between the target file and the access time of the target file;
the second determining module is used for determining a cold file list from the mapping file according to the access time of the target file;
and the obtaining module is used for obtaining the address of the cold file according to the cold file list.
9. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the cold file lookup method of any one of claims 1 to 7.
10. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are for implementing the cold file search method of any of claims 1 to 7.
CN202311436061.8A 2023-10-31 2023-10-31 Cold file searching method and device, electronic equipment and storage medium Pending CN117493277A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311436061.8A CN117493277A (en) 2023-10-31 2023-10-31 Cold file searching method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311436061.8A CN117493277A (en) 2023-10-31 2023-10-31 Cold file searching method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117493277A true CN117493277A (en) 2024-02-02

Family

ID=89670055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311436061.8A Pending CN117493277A (en) 2023-10-31 2023-10-31 Cold file searching method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117493277A (en)

Similar Documents

Publication Publication Date Title
CN109800222B (en) HBase secondary index self-adaptive optimization method and system
WO2017167171A1 (en) Data operation method, server, and storage system
JP2021523436A (en) Input and output schema mapping
JP5375972B2 (en) Distributed file system, data selection method thereof, and program
CN110413845B (en) Resource storage method and device based on Internet of things operating system
US11775480B2 (en) Method and system for deleting obsolete files from a file system
CN113111038B (en) File storage method, device, server and storage medium
CN111723161A (en) Data processing method, device and equipment
CN107493309B (en) File writing method and device in distributed system
CN113051102A (en) File backup method, device, system, storage medium and computer equipment
CN113779452B (en) Data processing method, device, equipment and storage medium
CN101483668A (en) Network storage and access method, device and system for hot spot data
CN117493277A (en) Cold file searching method and device, electronic equipment and storage medium
CN110597808A (en) Distributed database table connection method, device, system, server and medium
CN111400327B (en) Data synchronization method and device, electronic equipment and storage medium
CN109343928B (en) Virtual memory file redirection method and system for virtual machine in virtualization cluster
CN112527900A (en) Method, device, equipment and medium for database multi-copy reading consistency
CN117493274A (en) Cold catalog searching method and device, electronic equipment and storage medium
CN116594848B (en) Task monitoring method, device, equipment, terminal equipment and storage medium
CN104765748A (en) Method and device for converting copying table into slicing table
WO2024001280A1 (en) Data flow perception method and related apparatus
CN111752941B (en) Data storage and access method and device, server and storage medium
WO2023143061A1 (en) Data access method and data access system thereof
CN111797062B (en) Data processing method, device and distributed database system
CN117493275A (en) Cold data retrieval method, cold data retrieval device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination