CN116048800A

CN116048800A - Data processing method and device, storage medium and electronic equipment

Info

Publication number: CN116048800A
Application number: CN202310035863.1A
Authority: CN
Inventors: 李勇; 程稳; 陈�光; 朱世强; 曾令仿
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-01-10
Filing date: 2023-01-10
Publication date: 2023-05-02

Abstract

The specification discloses a data processing method, a data processing device, a storage medium and electronic equipment. The data processing method comprises the following steps: determining each candidate computing node, acquiring a history access record of each candidate computing node, determining access frequencies of each candidate computing node to different namespaces according to the history access record, taking namespaces with the access frequencies meeting preset conditions as target namespaces, selecting at least one target computing node from each candidate computing node according to the residual storage space of each candidate computing node and storage requirements corresponding to each target namespace, transmitting at least part of metadata under the target namespaces to the target computing nodes for storage, enabling other computing candidate computing nodes to transmit data acquisition requests to the target computing nodes, and performing data processing according to the acquired metadata.

Description

Data processing method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data processing method, an apparatus, a storage medium, and an electronic device.

Background

In recent years, a large-scale distributed cluster system has been widely applied to fields requiring large-scale high-performance computation such as petroleum acquisition, natural gas acquisition, manufacturing, rich media, finance, and the like, by virtue of its characteristics of high performance, high scalability, high availability, and the like. The architecture of the distributed system is generally composed of three parts, namely a metadata node, a computing node and a storage node, so that the high performance and the high expansibility of the distributed system are realized by a method of separating storage from computing, and the running requirements of various applications are further met.

However, in the current process of data access through the distributed cluster system, the access amount of metadata is often too huge, and a large number of data access requests concurrent by computing nodes contend for network resources and storage resources of the metadata node cluster, so that system performance is affected, and even problems such as network congestion, I/O congestion and the like are caused.

Therefore, how to effectively reduce occupation of data access requests of the computing nodes to network resources and storage resources of the metadata node cluster, and avoid network congestion and I/O congestion in the process of data access is a problem to be solved.

Disclosure of Invention

The present disclosure provides a data processing method, apparatus, storage medium, and electronic device, so as to partially solve the foregoing problems in the prior art.

The technical scheme adopted in the specification is as follows:

the present specification provides a method of data processing, comprising:

determining each candidate computing node and acquiring a history access record of each candidate computing node;

determining the access frequency of each candidate computing node to different namespaces according to the historical access record, and taking namespaces with the access frequency meeting preset conditions as target namespaces;

selecting at least one computing node from the candidate computing nodes as a target computing node according to the residual storage space of each candidate computing node and the storage requirement corresponding to each target name space;

and sending at least part of metadata under the target name space to the target computing node for 5 storage, so that other computing candidate computing nodes except the target computing node send a data acquisition request to the target computing node, and performing data processing according to the acquired metadata.

Optionally, determining each candidate computing node specifically includes:

And selecting the computing nodes with the distance between deployment positions meeting the preset distance condition from the computing nodes as candidate computing nodes.

0 optionally, sending at least part of metadata under the target namespace to the target computing node for storage, specifically including:

according to the history access record, determining history metadata accessed by each candidate computing node in a root directory of the target name space;

and sending the history metadata to the target computing node for storage, and taking the target computing node storing the history 5 metadata as a local metadata node.

Optionally, the method further comprises:

in the data processing process, if the metadata corresponding to the data acquisition request is not stored in the target computing node, the metadata corresponding to the data acquisition request stored in the global metadata node is sent to the target computing node for storage.

0 optionally, the method further comprises:

if the access frequency of the target name space corresponding to the local metadata node is monitored to be lower than a preset threshold value, acquiring a revocation request sent by the local metadata node;

And deleting the metadata stored in the target computing node according to the revocation request, and revoked the local metadata node.

5 optionally, deleting the metadata stored in the target computing node according to the revocation request, and revokeing the local metadata node, which specifically includes:

and if the candidate computing node is allowed to execute the writing operation of the metadata, deleting the metadata stored in the target computing node after the writing operation is synchronized in the global metadata node, and canceling the local metadata node.

Optionally, at least part of metadata under the target namespace is sent to the target computing node for storage, so that other computing candidate computing nodes except the target computing node send a data acquisition request to the target computing node, and data processing is performed according to the acquired metadata, and specifically includes:

and determining a preset data processing rule corresponding to the target computing node, so that the other candidate computing nodes perform data processing under the constraint of the data processing rule.

Optionally, determining a preset data processing rule corresponding to the target computing node, so that the other candidate computing nodes perform data processing under the constraint of the data processing rule, which specifically includes:

If the data processing rule only allows the candidate computing node to execute the reading operation on the metadata, deleting the metadata stored in the target computing node and releasing the storage space occupied by the target name space when the occurrence of the writing operation on the metadata is monitored.

Optionally, the method further comprises:

if it is detected that only the write operation for the file data occurs, but no write operation for the metadata occurs, the metadata stored in the target computing node is not deleted and the storage space occupied by the target namespace is released.

if the data processing rule allows the candidate computing node to execute the reading operation and the writing operation of the metadata, judging whether the metadata corresponding to the reading and writing request is stored in the target computing node after the reading and writing request of the metadata is acquired;

if so, returning the metadata corresponding to the read-write request to the candidate computing node for sending the read-write request through the target computing node, otherwise, returning the metadata stored in the global metadata node to the candidate computing node.

Optionally, the method further comprises:

and if other computing nodes except for the candidate computing nodes exist, performing read-write operation on the metadata stored in the target computing node, deleting the metadata stored in the target computing node and releasing the storage space occupied by the target name space.

Optionally, returning, by the target computing node, metadata corresponding to the read-write request to a candidate computing node that sends the 5 read-write request, where the candidate computing node specifically includes:

judging whether the target computing node stores object information of a storage object where the metadata is located;

if yes, returning metadata stored in the storage object to the candidate computing node through the target computing node, otherwise, creating the storage object in the target computing node, acquiring the object information from the storage node, and storing the object information in the target computing node.

0 optionally, sending at least part of metadata under the target namespace to the target computing node for storage, so that other computing candidate computing nodes except the target computing node send a data acquisition request to the target computing node, and performing data processing according to the acquired metadata, and specifically including:

Transmitting metadata corresponding to the data acquisition request to the other candidate computing nodes through the target computing node, so that the other candidate computing nodes determine storage information of the data 5 in the storage nodes according to the metadata;

and acquiring the data from the storage node according to the storage information.

Optionally, at least a portion of the metadata under the target namespace is sent to the target computing node for storage to direct other computing candidate computing nodes other than the target computing node to the target

The computing node sends a data acquisition request and performs data processing according to the acquired metadata, and specifically comprises the following steps: 0 judging whether target computing nodes storing the metadata exist in each candidate computing node or not;

if yes, sending the metadata stored in the target computing node to the other candidate computing nodes through the target computing node, otherwise, sending the metadata stored in the global metadata node to the other candidate computing nodes.

Optionally, the method further comprises:

5, judging whether the data path prefix corresponding to the data acquisition request is consistent with the path prefix of the name space entrusted by the data acquisition request or not through the target computing node;

And if not, returning an error indication to the other candidate computing nodes.

The present specification provides an apparatus for data processing, comprising:

the acquisition module is used for determining each candidate computing node and acquiring a history access record of each candidate computing node;

the determining module is used for determining the access frequency of each candidate computing node to different namespaces according to the historical access record, and taking the namespaces with the access frequency meeting the preset conditions as target namespaces;

the selecting module is used for selecting at least one computing node from the candidate computing nodes according to the residual storage space of each candidate computing node and the storage requirement corresponding to each target name space, and taking the at least one computing node as a target computing node;

and the processing module is used for sending at least part of metadata under the target name space to the target computing node for storage so that other computing candidate computing nodes except the target computing node send a data acquisition request to the target computing node and perform data processing according to the acquired metadata.

The present specification provides a computer readable storage medium storing a computer program which when executed by a processor performs the method of data processing described above.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method of data processing as described above when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

the method for processing data provided by the specification comprises the following steps: the global metadata node determines each candidate computing node, acquires a history access record of each candidate computing node, determines access frequencies of each candidate computing node to different namespaces according to the history access record, takes namespaces with access frequencies meeting preset conditions as target namespaces, selects at least one target computing node from each candidate computing node according to the residual storage space of each candidate computing node and storage requirements corresponding to each target namespace, sends at least part of metadata under the target namespaces to the target computing node for storage, and enables other computing candidate computing nodes to send data acquisition requests to the target computing node, and data processing is carried out according to the acquired metadata.

According to the method, the target computing node can be selected from the candidate computing nodes to store the metadata in the namespaces by taking the namespaces with the access frequency meeting the conditions as a unit, so that each candidate computing node can directly acquire the required metadata from the target computing node.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:

FIG. 1 is a flow chart of a method of data processing provided in the present specification;

FIG. 2 is a schematic diagram of a construction process of a local metadata node provided in the present specification;

FIG. 3 is a schematic diagram of a data reading and writing process provided in the present specification;

FIG. 4 is a schematic diagram of a metadata acquisition process provided in the present specification;

FIG. 5 is a schematic diagram of one data processing rule provided in the present specification;

FIG. 6 is a schematic diagram of an apparatus for data processing provided in the present specification;

fig. 7 is a schematic view of an electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of a method for capturing objects provided in the present specification, including the following steps:

s101: and determining each candidate computing node and acquiring a history access record of each candidate computing node.

In practical applications, the architecture of the large-scale distributed cluster system generally includes three parts, including a metadata cluster, a computing cluster and a storage cluster, so that high performance and high expansibility are realized through a storage and computation separation method, and the requirements of various applications running on systems with different scales, such as small-scale high-performance computer cluster (High Performance Computing, HPC) environments, supercomputers and the like, can be met.

In a large-scale distributed system, metadata clusters generally include a plurality of metadata nodes (metadata servers) for storing and managing metadata of a cluster file system and performing all namespaces on the cluster file system, where the metadata includes access rights of files, file owners, distribution information of file data blocks, etc., and for data in a file, a user must acquire metadata corresponding to the file to locate the position of the file data and obtain content or related attributes of the data if he wants to perform an operation on the data in the file.

A storage cluster is made up of multiple storage nodes (storage servers), each of which has access to a storage volume of a set of storage objects. Clustered file systems often employ object storage techniques to fragment and store large file data across multiple storage objects.

A computing cluster is made up of several computing nodes (computing servers) on which applications can access and use data on the cluster file system through the interfaces of the cluster file system clients. The computing node firstly acquires metadata of the file from the metadata node cluster, then determines storage information of the file in the storage node according to the metadata, and further acquires required data from the storage node according to the storage information.

Because the number of access requests of metadata is large (usually occupies more than half of the file system I/O requests) in the process of executing data processing or access, and many metadata operations include multiple sub-operations, for example, opening a file needs to perform path analysis multiple times, so that the metadata operations trigger multiple network I/os.

Based on the above, the present disclosure provides a data processing method applied to a distributed cluster system, so as to effectively reduce occupation of network resources and storage resources of a metadata node cluster by a data acquisition request of a computing node. Wherein for each metadata node in the distributed cluster system, the metadata node may be regarded as a global metadata node.

Before performing read-write operation on data of a file, the computing node acquires metadata from the global metadata node, including: the global metadata node records the file name, access time, read-write size, access frequency and other history access records of the computing nodes, so that the global metadata node can select computing nodes with the distance between deployment positions meeting the preset distance condition from the computing nodes as candidate computing nodes, and then the global metadata node can acquire the history access records corresponding to the computing nodes stored in the global metadata node, wherein the preset distance condition can be set according to actual conditions, and the description is not limited in detail.

For example, a global metadata node may have computing nodes connected to the same switch or router, or computing nodes located within the same enclosure, as candidate computing nodes.

S102: and determining the access frequency of each candidate computing node to different namespaces according to the historical access record, and taking the namespaces with the access frequency meeting the preset condition as target namespaces.

The global metadata node can count and identify the data access distribution condition of each computing node and the accessed hot spot data according to the historical access record corresponding to each candidate computing node, so as to determine the access frequency of each computing node to different name spaces.

If all the data accessed by the candidate computing nodes are concentrated in one or more namespaces, the namespaces corresponding to the namespaces are indicated to meet the preset condition, so that the namespaces with the access frequencies meeting the preset condition can be taken as target namespaces.

In this specification, the preset condition may be that the access frequency corresponding to the namespace is greater than the preset access frequency, or may be other preset conditions, which is not limited in this specification.

S103: and selecting at least one computing node from the candidate computing nodes as a target computing node according to the residual storage space of each candidate computing node and the storage requirement corresponding to each target name space.

S104: and transmitting at least part of metadata under the target name space to the target computing node for storage, so that other computing candidate computing nodes except the target computing node transmit a data acquisition request to the target computing node, and performing data processing according to the acquired metadata.

Specifically, the global metadata node may first detect storage space information of each computing node to determine whether a current remaining storage space in the persistent storage device of the candidate computing node can meet a storage requirement corresponding to the target namespace. Wherein the storage requirement may be a requirement of storage space occupied by metadata in the target namespace.

In addition, in order to ensure that the metadata can be stored and calculated normally in the data processing process after being stored in the candidate computing node, the global metadata node may further determine whether the remaining storage space of the candidate computing node is greater than a preset storage space after storing the metadata on the candidate computing node and using the metadata as a local metadata node, if so, it is indicated that the candidate computing node satisfies all the conditions, and at this time, the candidate computing node may be used as a target computing node.

For example, for each candidate computing node, the local metadata node may first determine whether the remaining storage space on the candidate computing node is higher than 20% of the total storage space after storing metadata in the candidate computing node and using the metadata as the local metadata node, and if so, may use the candidate computing node as the target computing node.

The global metadata node may select only one target computing node from the candidate computing nodes for storing metadata corresponding to the target name space, for example, select the candidate computing node with the largest remaining storage space as the target computing node, and of course, may also select two or more candidate computing nodes as the target computing node.

After determining the target computing nodes, the global metadata node may send at least part of the metadata under the target namespaces to each target computing node for storage, so as to use the metadata as a local metadata node, and the local metadata node is used for receiving the data acquisition requests sent by each candidate computing node and returning the stored metadata.

In this specification, the global metadata node may determine, according to the history access record, the history metadata accessed by each candidate computing node in the root directory of the target namespace, and then send the history metadata to the target computing node for storage, so as to use the target computing node as the local metadata node, so that if the required metadata is stored in the target computing node during the actual data processing process, the required metadata may be directly obtained from the target computing node.

In addition, the global metadata node may store metadata with a corresponding access frequency greater than a preset frequency in the target calculation according to the access frequency of each candidate calculation node to different metadata in the history access record

In the node, the access frequency may be set according to the actual situation, which is not specifically limited in this specification.

Of course, the global metadata node may also send all metadata under the target namespace to the target computing node for storage, if the target computing node storage space permits.

In the actual data processing process, if the target computing node does not store the data acquisition request pair

And if the metadata is needed, transmitting the metadata 0 corresponding to the data acquisition request stored in the global metadata node to the target computing node for storage, so that the local metadata node can acquire the metadata as required, and the metadata in the target computing node can be prevented from occupying a larger storage space from the beginning.

It should be noted that, when the target computing node is taken as the local metadata node, it has, in addition to the following

The function of storing metadata is still preserved as the function 5 of processing (reading and writing) data by the computing node.

The consistency of the data in the local metadata node (target computing node) and the global metadata node is ensured. The global metadata node may determine the recent read-write condition of the target namespace, and then set an appropriate data processing rule for the target node according to the read-write condition, where in this specification, the data processing rule

The method may include a read-only policy and a readable-writable policy, and of course, may also include other data processing rules, and specific application processes of 0 with respect to different data processing rules will be described in detail below, which will not be repeated herein.

In the present specification, the process of using each target computing node as a local metadata node may be considered as a construction process of the local metadata node, where a global metadata node may obtain each object first

And marking metadata of the name space root directory, and sending the metadata information to a target computing node, wherein after receiving the metadata information, the target computing node 5 can take the target name space as a building unit to build a local metadata node storing metadata corresponding to the target name space.

In addition, metadata in files and directories in namespaces other than the target namespace may not be initially stored in the target computing node, but may be retrieved from the global metadata node and stored in the local metadata node during actual data access to avoid the local metadata node initially occupying excessive storage resources.

In this way, when the candidate computing node performs data processing, the corresponding metadata can be directly obtained from the local metadata node (target computing node), metadata does not need to be obtained from the remote global metadata node, resource loss in the metadata transmission process is reduced, and occupation of system resources in the data obtaining and accessing processes is further reduced. For ease of understanding, the present disclosure provides a schematic diagram of a construction process of a local metadata node, as shown in fig. 2.

Fig. 2 is a schematic diagram of a construction process of a local metadata node provided in the present specification.

The global metadata node obtains historical access records of candidate computing nodes, determines a target name space with higher access frequency according to access information, then selects the target computing node from the candidate computing nodes, takes the target name space as a construction unit, stores metadata stored in a root directory of the global metadata node in the local of the target computing node, thereby constructing a corresponding local metadata node, and sets a data processing rule.

After the local metadata node is constructed, when a candidate node located within the local metadata node connection area preferentially transmits a data acquisition request to the local metadata node (target computing node), metadata stored in the target computing node is acquired.

In the actual data access process, in order to avoid that the metadata stored in the target computing node is changed, so that the metadata is inconsistent with the data in the global metadata node, and thus a read-write error occurs, therefore, the global metadata node can preset a data processing rule, and for a read-only policy, the candidate computing node can only perform a read operation on the metadata stored in the target computing node, and only needs to perform write operations such as updating, creating, deleting and the like on the metadata or the target name space in the target computing node, cancel the local metadata node, delete the metadata stored in the target computing node, and release the storage space occupied by the target name space.

For the readable and writable strategy, each candidate computing node whose deployment position meets the preset distance condition can execute the read operation and the write operation on the metadata stored in the target computing node, but once other computing nodes except each candidate computing node execute the metadata write operation, the local metadata node is withdrawn, the metadata stored in the target computing node is deleted, and the storage space occupied by the target name space is released.

Of course, other data processing rules, such as a dual-activity policy, may be formulated according to the actual situation, where the policy may support the candidate computing nodes in the self-connection area and the computing nodes in other connection areas to perform the read-write operation in the local metadata node and the global metadata node at the same time, and the application may implement different dual-activity policies according to the own performance requirement, and then deploy the dual-activity policies in the local metadata node frame, so as to implement synchronization of data in the local metadata node and the global metadata node.

And then each candidate computing node can process data under the constraint of a preset data processing rule in the global metadata node.

It should be noted that, in the process of data processing, not only the candidate nodes except the target computing node may acquire metadata from the local metadata node, but also the target computing node may acquire metadata from itself (the local metadata node), and only because the target computing node already stores the required metadata at this time, no additional acquisition request needs to be sent.

Specifically, if the data processing rule is a read-only policy, that is, only the computing node is allowed to perform a read operation on metadata stored in the target computing node, and is not allowed to perform a write operation on metadata, when the computing node (including the candidate computing node and other computing nodes except the candidate computing node) performs a write operation on metadata in the target computing node, the local metadata node sends a revocation request to the global metadata node, and after receiving the revocation request, the global metadata node revokes the local metadata node, deletes the metadata stored in the target computing node, and releases the storage space occupied by the target name space, thereby ensuring the consistency of the data.

It should be noted that, since the writing operation of the file data only updates the data information of the file, does not update the metadata of the file, and only the writing operation of the metadata (such as modifying the file metadata of the file name, the size, etc.) will conflict with the inconsistent data, if the candidate computing node only performs the writing operation of the file data in the local metadata node, but does not perform the writing operation of the file metadata, the local metadata node is not revoked. Only when a write operation is performed on the metadata will the local metadata node be revoked, the metadata stored in the target computing node be deleted, and the storage space occupied by the target namespace be freed.

Further, after the candidate computing node executes the write operation on the metadata, when the global metadata node receives the revocation request, an error indication is returned to the candidate computing node, at this time, the candidate computing node executes the write operation on the metadata again in the global metadata node, and the global metadata node updates the metadata after receiving the write operation.

If other computing nodes than the candidate computing node perform a metadata write operation to the target computing node, then the global metadata node may stop and revoke the local metadata node.

If the data processing rule is a readable and writable policy, the candidate computing node is allowed to execute the read operation and the write operation of the metadata stored in the target computing node, but once the candidate computing node is out, the other computing nodes execute the read and write operation of the metadata stored in the target computing node, the local metadata node is also revoked.

The candidate computing node may first determine, according to the recorded namespace information, whether metadata corresponding to the namespace is stored in the target computing node, that is, whether a local metadata node exists. If the metadata exists, the candidate computing node further requests the local metadata node for the metadata of the file corresponding to the acquisition request, if the metadata of the file does not exist in the local metadata node, the candidate computing node sends the acquisition request to the global metadata node, so that the metadata of the file is acquired in the global metadata node, and the metadata of the file and the metadata of the upper catalog of the file are pulled and stored on a persistent storage device of the local metadata node (target computing node) for storage. If it is determined that the local metadata node does not exist in each candidate computing node according to the recorded space information, the candidate computing node can directly access the global metadata node, so that corresponding metadata is obtained.

In addition, if the metadata of the file corresponding to the data acquisition request is stored in the target computing node, it may be further determined whether the target computing node has object information of the object stored in the storage node where the metadata is located, and if yes, the local metadata node may directly return the metadata (such as address information in the storage node where the metadata is located) of the storage object to the candidate computing node.

If the object information does not exist, it is indicated that the target computing node is not provided with the storage object, so that the storage object can be created in the target computing node, and the object information of the storage object can be acquired from the storage node and stored in the target computing node.

In this way, the network overhead of data access can be further reduced, thereby further reducing the network overhead of file data access. The on-demand reading effectively reduces the storage requirement of the local metadata node and provides flexibility of the deployment of the local metadata node. The local storage node and the local metadata node can be deployed in different target computing nodes in the self area, so that the flexibility in deployment is improved.

It should be noted that the above process includes a data processing process corresponding to a data read operation and a data write operation, that is, the data processing process can be implemented by the above method regardless of a read-only policy or a readable-write policy, but the read-only policy does not allow the write operation to the metadata of the file, and the readable-write policy allows the read operation to the metadata of the file as well as the write operation to the metadata of the file. For ease of understanding, the present disclosure provides a schematic diagram of a data reading and writing process, as shown in fig. 3.

Fig. 3 is a schematic diagram of a data reading and writing process provided in the present specification.

When the candidate computing node sends a data acquisition request, firstly acquiring a recorded name space, judging whether the data is stored in a local metadata node according to the name space, if so, further judging whether the required metadata is stored locally, if so, directly acquiring the metadata from the local metadata node, and if not, pulling the metadata from a global metadata node to the local.

After the metadata is acquired, the local metadata node may further determine whether object information of the storage object is stored locally, if so, acquire data in the local storage object according to the metadata to read and write the data, and if the object information is not stored, pull the storage information of the storage object from the storage node to the local.

If the local metadata node does not exist, the computing node can directly access the global metadata node so as to acquire the file metadata and execute data reading and writing.

Further, in the process of reading and writing data, the data obtaining request may be a read-write request for reading and writing data, and the candidate computing node may determine whether a local metadata node exists according to a namespace recorded in the multiple description coding (Multiple Description Coding, MDC). If a local metadata node is present, the MDC of the computing node may first send a read-write request to the local metadata node. If no local metadata node exists, a read-write request is sent to the global metadata node.

The global metadata node first determines whether the name space where the read-write request is located is delegated to a local metadata node in other computing nodes. If delegated, the explanation may conflict, at which point the data processing rules described above may be employed for processing. If no delegation is made, metadata of the file accessed by the read-write request is returned.

After receiving the read-write request, the local metadata node can check whether the file path is correct, i.e. determine whether the file path prefix is consistent with the path prefix of the namespace delegated by the data acquisition request (read-write request). If not, an error indication is returned to the candidate computing node, at which point the candidate computing node may request metadata from the global metadata node while reporting the error.

If the path prefixes are consistent, the file path is a correct path, and the local metadata node can process the file path step by step from the root directory of the name space to check whether the local metadata node exists in the metadata of each level of directory or file. For each level of metadata directory, if the local metadata node does not have metadata of the level of directory, the local metadata node may pull metadata of the directory from the global metadata node and store the metadata in a storage device local to the target computing node where the local metadata node is located, until the local metadata node is released when revoked. For ease of understanding, the present disclosure provides a schematic diagram of a metadata acquisition process, as shown in fig. 4.

Fig. 4 is a schematic diagram of a metadata acquisition process provided in the present specification.

After the candidate computing node sends the data acquisition request, the local metadata node can judge whether a namespace corresponding to the access request exists, if so, the local metadata node can be accessed, the prefix of the namespace path is checked for consistency through the local metadata node, then the host processes the metadata of the directory file, and the conflict is processed through a preset data processing rule, and then the metadata is returned through the local metadata node.

If the local metadata node does not have the namespace, the candidate computing node may send the access request directly to the global metadata node, and further, global metadata may return the metadata.

And then the local metadata can be respectively processed according to the read-write type corresponding to the data acquisition request. If the data processing rule is a write request, the local metadata node firstly judges whether the currently adopted data processing rule allows the write operation to be executed locally or not (namely, judges whether the data processing rule is a read-only strategy or a readable and writable strategy) after receiving the request. If the current data processing rule is allowed to be described as a readable and writable strategy, the metadata update is directly performed locally.

If the write operation is not allowed, the current data processing rule is a read-only strategy, and the local metadata node can be revoked at the moment and then the global metadata node is responsible for the write operation.

When the data processing rule is a readable and writable strategy, the local metadata node can synchronize the writing operation to the global metadata node in a log mode when the writing operation is executed, so that the reliability of the metadata is ensured. In the case of a read operation, the local metadata node returns metadata stored locally. For ease of understanding, the present description provides a schematic diagram of a data processing rule, as shown in fig. 5.

Fig. 5 is a schematic diagram of a data processing rule provided in the present specification.

The user can perform custom setting on the data processing rules, then select a target data processing rule from the custom data processing rules and the data processing rules of the distributed cluster system according to the actual data access requirements of all the computing nodes, and further guarantee the consistency of metadata in the local metadata nodes and the global metadata nodes according to the target data processing rules.

After the candidate computing node obtains the metadata of the files such as address information, object ID, offset information and the like in the data storage node from the local metadata node, the candidate computing node can obtain the data of the files from the corresponding storage node according to the metadata, so that the data access is completed.

In addition, in practical applications, some local metadata nodes may gradually decrease the use frequency in the subsequent use process, and even not be used, so that the local metadata nodes lose the value of the local metadata nodes. Therefore, when the global metadata node monitors that the access frequency of the local metadata node corresponding to the target name space is lower than the preset threshold value, the revocation request corresponding to the local metadata node can be obtained, so that the local metadata node is revoked according to the revocation request, and the storage space occupied by the local metadata node in the target computing node is released.

It should be noted that, if the data processing rule corresponding to the global metadata node is a read-write policy, that is, write operation on metadata is allowed, the global metadata node may synchronize update metadata information first, cancel the local metadata node after the write operation on file metadata is performed on the local metadata node by the synchronization candidate computing node, delete metadata stored in the target computing node, and release the storage space occupied by the target namespace

The foregoing describes one or more methods for performing data processing according to the present disclosure, and provides a corresponding apparatus for data processing according to the same concept, as shown in fig. 6.

Fig. 6 is a schematic diagram of an apparatus for data processing provided in the present specification, including:

the acquiring module 601 is configured to determine each candidate computing node, and acquire a history access record of each candidate computing node;

the determining module 602 is configured to determine, according to the history access record, access frequencies of each candidate computing node to different namespaces, and take namespaces with access frequencies meeting preset conditions as target namespaces;

the selecting module 603 is configured to select at least one computing node from the candidate computing nodes as a target computing node according to the remaining storage space of each candidate computing node and the storage requirement corresponding to each target namespace;

and the processing module 604 is configured to send at least part of metadata under the target namespace to the target computing node for storage, so that other computing candidate computing nodes except the target computing node send a data acquisition request to the target computing node, and perform data processing according to the acquired metadata.

Optionally, the obtaining module 601 is specifically configured to select, from the computing nodes, a computing node whose distance between deployment locations satisfies a preset distance condition as a candidate computing node.

Optionally, the processing module 604 is specifically configured to determine, according to the history access record, history metadata that has been accessed by each candidate computing node in the root directory of the target namespace; and sending the history metadata to the target computing node for storage, and taking the target computing node storing the history metadata as a local metadata node.

Optionally, the processing module 604 is further configured to send, in a data processing process, metadata corresponding to the data acquisition request stored in the global metadata node to the target computing node for storage if metadata corresponding to the data acquisition request is not stored in the target computing node.

Optionally, the processing module 604 is further configured to obtain a revocation request sent by the local metadata node if it is monitored that the access frequency of the target namespace corresponding to the local metadata node is lower than a preset threshold; and deleting the metadata stored in the target computing node according to the revocation request, and revoked the local metadata node.

Optionally, the processing module 604 is specifically configured to delete the metadata stored in the target computing node and cancel the local metadata node after synchronizing the write operation in the global metadata node if the candidate computing node is allowed to perform the write operation on the metadata.

Optionally, the processing module 604 is specifically configured to determine a preset data processing rule corresponding to the target computing node, so that the other candidate computing nodes perform data processing under the constraint of the data processing rule.

Optionally, the processing module 604 is specifically configured to, if the data processing rule only allows the candidate computing node to perform a read operation on metadata, delete metadata stored in the target computing node and release a storage space occupied by the target namespace when it is detected that a write operation on metadata occurs.

Optionally, the processing module 604 is further specifically configured to, if it is detected that only a write operation for file data occurs, but no write operation for metadata occurs, not delete metadata stored in the target computing node and release a storage space occupied by the target namespace.

Optionally, the processing module 604 is specifically configured to, if the data processing rule allows the candidate computing node to perform a read operation and a write operation on metadata, determine whether metadata corresponding to the read-write request is stored in the target computing node after the read-write request of the metadata is obtained; if so, returning the metadata corresponding to the read-write request to the candidate computing node for sending the read-write request through the target computing node, otherwise, returning the metadata stored in the global metadata node to the candidate computing node.

Optionally, the processing module 604 is further configured to delete the metadata stored in the target computing node and release the storage space occupied by the target namespace if there are other computing nodes except each candidate computing node that perform a read-write operation on the metadata stored in the target computing node.

Optionally, the processing module 604 is specifically configured to determine whether the target computing node stores object information of a storage object where the metadata is located; if yes, returning metadata stored in the storage object to the candidate computing node through the target computing node, otherwise, creating the storage object in the target computing node, acquiring the object information from the storage node, and storing the object information in the target computing node.

Optionally, the processing module 604 is specifically configured to send, by the target computing node, metadata corresponding to the data acquisition request to the other candidate computing nodes, so that the other candidate computing nodes determine storage information of the data in a storage node according to the metadata;

Optionally, the processing module 604 is specifically configured to determine whether a target computing node storing the metadata exists in each candidate computing node; if yes, sending the metadata stored in the target computing node to the other candidate computing nodes through the target computing node, otherwise, sending the metadata stored in the global metadata node to the other candidate computing nodes.

Optionally, the processing module 604 is further configured to determine, by the target computing node, whether a data path prefix corresponding to the data acquisition request is consistent with a path prefix of a namespace delegated by the data acquisition request; and if not, returning an error indication to the other candidate computing nodes.

The present specification also provides a computer readable storage medium storing a computer program operable to perform a method of data processing as provided in figure 1 above.

The present specification also provides a schematic structural diagram of an electronic device corresponding to fig. 1 shown in fig. 7. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as described in fig. 7, although other hardware required by other services may be included. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to implement the method of data processing described above with respect to fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

Improvements to one technology can clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A method of data processing, the method being applied to a global metadata node of a distributed cluster system, comprising:

and transmitting at least part of metadata under the target name space to the target computing node for storage, so that other computing candidate computing nodes except the target computing node transmit a data acquisition request to the target computing node, and performing data processing according to the acquired metadata.

2. The method of claim 1, wherein determining each candidate computing node comprises:

3. The method of claim 1, wherein sending at least a portion of metadata under the target namespace to the target computing node for storage, comprises:

and sending the history metadata to the target computing node for storage, and taking the target computing node storing the history metadata as a local metadata node.

4. The method of claim 1, wherein the method further comprises:

5. A method as claimed in claim 3, wherein the method further comprises:

6. The method of claim 5, wherein deleting the metadata stored in the target computing node and revoking the local metadata node according to the revocation request, comprises:

7. The method of claim 1, wherein sending at least part of the metadata under the target namespace to the target computing node for storage, so that other computing candidate computing nodes except the target computing node send a data acquisition request to the target computing node, and performing data processing according to the acquired metadata, specifically comprises:

8. The method of claim 7, wherein determining a preset data processing rule corresponding to the target computing node, so that the other candidate computing nodes perform data processing under the constraint of the data processing rule, specifically includes:

9. The method of claim 8, wherein the method further comprises:

10. The method of claim 7, wherein determining a preset data processing rule corresponding to the target computing node, so that the other candidate computing nodes perform data processing under the constraint of the data processing rule, specifically includes:

11. The method of claim 10, wherein the method further comprises:

12. The method of claim 10, wherein returning, by the target computing node, metadata corresponding to the read-write request to the candidate computing node that sent the read-write request, specifically comprises:

13. The method of claim 12, wherein sending at least part of the metadata under the target namespace to the target computing node for storage, so that other computing candidate computing nodes except the target computing node send a data acquisition request to the target computing node, and performing data processing according to the acquired metadata, specifically comprises:

transmitting metadata corresponding to the data acquisition request to the other candidate computing nodes through the target computing node, so that the other candidate computing nodes determine storage information of the data in the storage node according to the metadata;

14. The method of claim 1, wherein sending at least part of the metadata under the target namespace to the target computing node for storage, so that other computing candidate computing nodes except the target computing node send a data acquisition request to the target computing node, and performing data processing according to the acquired metadata, specifically comprises:

judging whether target computing nodes storing the metadata exist in each candidate computing node or not;

15. The method of claim 1, wherein the method further comprises:

judging whether the data path prefix corresponding to the data acquisition request is consistent with the path prefix of the name space entrusted by the data acquisition request or not through the target computing node;

16. An apparatus for data processing, comprising:

17. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-15.

18. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-15 when executing the program.