CN112015696A

CN112015696A - Data access method, data relationship setting method, data access device, data relationship setting device and storage medium

Info

Publication number: CN112015696A
Application number: CN202010852473.XA
Authority: CN
Inventors: 张俊帆
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2020-08-21
Filing date: 2020-08-21
Publication date: 2020-12-01

Abstract

The embodiment of the invention provides a data access method, a data relation setting method device, electronic equipment and a computer readable storage medium, wherein the data access method comprises the following steps: acquiring a data acquisition request of target data; determining a target storage path corresponding to the target access path according to the preset path corresponding relation between the access path and the storage path; forwarding the acquisition request to a target data node which is positioned in the data cluster and corresponds to the target storage path; and receiving target data returned by the target data node in response to the data acquisition request. In all data accesses to the cluster, the access path is fixed, so that the real storage path of the target data can be determined again through the access path. Thus, for a user, the user only needs to directly obtain the data by using the access path without knowing the real storage path of the data in the cluster.

Description

Data access method, data relationship setting method, data access device, data relationship setting device and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for data access and data relationship setting, an electronic device, and a storage medium.

Background

With the rise of big data and artificial intelligence, data is valued more and more. In application, large data clusters are used more, evaluation is carried out according to the dispatching capacity and the machine room bearing capacity of Hadoop, the maximum node number of a single cluster is about 2000, and computing resources and storage resources are dispersed among the clusters.

At present, in order to solve the task operation problem after cluster segmentation, a synchronization tool is firstly adopted to synchronize data of a Hadoop Distributed File System (HDFS) among clusters, and then an upper-layer scheduling platform is used to designate an operation task of a cluster, in the scheduling scheme, a user needs to directly face a designated cluster, and there are many disadvantages during data scheduling, on one hand, a large amount of data maintenance needs to be performed on the designated cluster during scheduling, for example: during cluster migration or offline, a user is required to manually perform data migration and task migration, and in the process, in order to ensure the consistency of data among clusters, the user is required to perform a large amount of data synchronization work, so that the cost is high, the risk is high, and the duration is long; on the other hand, because the existing scheduling is performed by using a single cluster as an object, and the working state among the clusters is not known during scheduling, the computing resources and the storage resources are dispersed among the clusters, which results in serious task scheduling isolation among multiple clusters, for example: at a certain time point, the cluster A is busy, and the cluster B is idle, so that a large amount of tasks of the cluster A are blocked, and the problems of poor scheduling efficiency and low resource utilization rate are caused.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, a device and a storage medium for data access and data relationship setting, so as to solve the problem of data access among multiple clusters. The specific technical scheme is as follows:

in a first aspect, the present application provides a data access method, including:

acquiring a data acquisition request of target data; the data acquisition request carries a target access path of target data;

determining a target storage path corresponding to the target access path according to a preset path corresponding relation between the access path and the storage path;

forwarding the data acquisition request to a target data node which is positioned in a data cluster and corresponds to the target storage path;

and receiving target data returned by the target data node in response to the data acquisition request.

Optionally, the data obtaining request for obtaining the target data includes:

receiving a data identifier of target data;

searching a target access path corresponding to the data identifier according to a second corresponding relation between a preset data identifier and a corresponding access path;

and generating a data acquisition request carrying the target access path for acquiring target data.

Optionally, the method further comprises:

determining an API interface corresponding to the preset path corresponding relation;

and calling a pre-stored preset path corresponding relation by utilizing the API.

Optionally, the method comprises:

after a cluster configuration instruction is received and cluster configuration is carried out, detecting data to be updated, of which the positions are changed, stored in a data cluster; the cluster configuration instructions include: at least one of a cluster offline instruction, a cluster migration instruction, and a cluster addition instruction;

and updating the path corresponding relation of the data to be updated according to the cluster configuration instruction.

In a second aspect, the present application provides a data relationship setting method, including:

acquiring a data identifier and a storage path of data stored in a data node in a data cluster, and establishing a first corresponding relation between the data identifier and the storage path;

generating an access path of data stored by data nodes in the data cluster according to a preset format, and establishing a second corresponding relation between the data identification and the access path;

and establishing a path corresponding relation between the access path corresponding to the same data identifier and the corresponding storage path according to the first corresponding relation and the second corresponding relation.

Optionally, the method comprises:

In a third aspect, the present application provides a data access apparatus, the apparatus comprising:

the request acquisition module is used for acquiring a data acquisition request of the target data; the data acquisition request carries a target access path of target data;

the path determining module is used for determining a target storage path corresponding to the target access path according to the preset path corresponding relation between the access path and the storage path;

the request forwarding module is used for forwarding the data acquisition request to a target data node which is positioned in the data cluster and corresponds to the target storage path;

and the data receiving module is used for receiving the target data returned by the target data node responding to the data acquisition request.

In a fourth aspect, the present application provides a data relationship setting apparatus, the apparatus comprising:

the first acquisition module is used for acquiring a data identifier and a storage path of data stored in a data node in a data cluster;

the first establishing module is used for establishing a first corresponding relation between the data identifier and the storage path;

the generating module is used for generating an access path of data stored by the data nodes in the data cluster according to a preset format;

a second establishing module, configured to establish a second corresponding relationship between the data identifier and the access path;

and a third establishing module, configured to establish a path correspondence between an access path and a corresponding storage path corresponding to the same data identifier according to the first correspondence and the second correspondence, where the access path and the storage path in the path correspondence are in one-to-one correspondence.

In a fifth aspect, the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

a processor, configured to implement the steps of the data access method according to any embodiment of the first aspect when executing the program stored in the memory.

In a sixth aspect, the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the steps of the data access method in any embodiment of the second aspect when executing the program stored in the memory.

In a seventh aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the data access method described in any of the embodiments of the first aspect.

In an eighth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the data access method of any of the embodiments of the second aspect.

The data access method provided by the embodiment of the application comprises the following steps: acquiring a data acquisition request of target data; the data acquisition request carries a target access path of target data; determining a target storage path corresponding to the target access path according to a preset path corresponding relation between the access path and the storage path; forwarding the data acquisition request to a target data node which is positioned in a data cluster and corresponds to the target storage path; and receiving target data returned by the target data node in response to the data acquisition request.

The method provided by the embodiment of the application realizes a unified distributed file system based on the HDFS, all data accesses in the cluster are carried out through the access paths, for a user, the storage path, namely the real storage position, of data in cluster data does not need to be known, after a data acquisition request carrying a target access path is acquired, the target storage path corresponding to the target access path is determined by using the preset path corresponding relation, then the data acquisition request is forwarded to the target data node corresponding to the target storage path, and finally the target data returned by the target data node is received, so that the data access can be completed.

For the user, it is not necessary to know the actual situation of data storage in the cluster, and it is only necessary to ensure that the preset path corresponding relationship between the access path and the storage path corresponds, and then when the cluster changes, for example: and the corresponding relation of the preset path is only required to be changed when the data is online, offline or migrated, and the original access path can still be used for accessing the corresponding data.

In addition, the path relation setting method realizes a unified distributed file system based on the HDFS, all data accesses in the cluster can be uniformly set as access paths instead of real storage paths of the data, so that the access forms can be unified, namely, for a certain data, the access paths are fixed, and then the real storage paths of the target data are determined through the access paths. This provides the basis that a user can directly obtain data only by using an access path without knowing the actual storage path of the data in the cluster.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a schematic diagram of a data system according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a data access method according to an embodiment of the present application;

FIG. 3 is a schematic flow chart diagram illustrating another data access method according to an embodiment of the present application;

FIG. 4 is a schematic flow chart diagram illustrating a further data access method according to an embodiment of the present application;

FIG. 5 is a schematic flow chart diagram illustrating a further data access method according to an embodiment of the present application;

FIG. 6 is a schematic flow chart diagram illustrating a further data access method according to an embodiment of the present application;

FIG. 7 is a schematic flow chart diagram illustrating a further data access method according to an embodiment of the present application;

FIG. 8 is a diagram of another embodiment of a data system;

FIG. 9 is a diagram illustrating another structure of a data system according to an embodiment of the present invention;

FIG. 10 is a schematic flow chart diagram illustrating a further data access method according to an embodiment of the present application;

FIG. 11 is a diagram illustrating a dictionary tree according to an embodiment of the present application;

fig. 12 is a schematic flowchart of a data relationship setting method according to an embodiment of the present application;

fig. 13 is a schematic flowchart of another data relationship setting method according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a data access device according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of a data relationship setting apparatus according to an embodiment of the present application;

fig. 16 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

The scheme provided by the embodiment of the application is applied to a cluster of an HDFS, a plurality of clusters are arranged in the data system, a uniform distributed file system of the HDFS is adopted in the application, as shown in fig. 1, the HDFS data cluster is 100, and the method includes: cluster 101, cluster 102, cluster 103, etc., in the figure, the unified distributed file system of HDFS is 200, and the system includes: in the embodiment of the present application, the BFS module 201 is implemented based on a File System SPI (Service Provider Interface) mechanism of Hadoop, and needs to be deployed to each cluster (101, 102, 103 … …) of the Hadoop cluster 100, and in addition, all BFS modules share one client 300 convenient for a user to access, and 400 is an access path input module, and can receive an access path input by the user and forward the access path to the client 300. In one case, the unified distributed file system 200 may further include: the path correspondence storage module 202 stores a mapping relationship between the access path and the storage path. The path correspondence storage module 202 may be a database (also referred to as a persistent storage medium), or may be deployed on a web server in an API manner, and is provided for the BFS module 201 to access, so that the BFS module accesses the path correspondence in an API (Application Programming Interface) calling manner.

As shown in fig. 1, the present invention further includes: the path mapping system 500, wherein the path mapping system 500 comprises: the device comprises a path generating module 501, a path storing module 502 and a path mapping module 503, wherein the path generating module 501 is used for generating access paths with uniform formats, the path storing module 502 is used for storing the access paths and the storage paths, and the path mapping module 503 is used for establishing a mapping relation between the access paths and the storage paths and storing the mapping relation into the path corresponding relation storing module 202.

Fig. 2 is a data access method provided in an embodiment of the present application, where the method may be applied to a uniformly distributed file system in the data system shown in fig. 1, and as shown in fig. 2, the method may include the following steps:

s101, acquiring a data acquisition request of target data.

The data obtaining request may be directly input by the user or forwarded by the client 300.

In an embodiment of the application, the data acquisition request carries a target access path of target data, and the target access path may be input by a user; for example, in the data search bar, the user may manually input detailed information of the target access path; in addition, each data to be accessed can be provided with a corresponding virtual file identifier or virtual icon, so that the access path can also be stored in the virtual file identifier or virtual icon in a file attribute mode, and when the virtual file identifier or virtual icon corresponding to the target data is clicked by a user, the target access path corresponding to the virtual file identifier or virtual icon can be input.

In the embodiment of the present application, an example of an access path may be: bfs:// t1/user, where, "bfs", "t 1" and "user" are rear access path nodes of the access path, in the embodiment of the present application, each access path node in the access path may be freely set, and may not have any mapping or corresponding relationship with a data node in a cluster, and only through the access path, corresponding data stored in the data node in the cluster cannot be directly searched or searched.

S102, determining a target storage path corresponding to the target access path according to the preset path corresponding relation between the access path and the storage path.

In the embodiment of the present application, a storage path refers to a real storage location of data in a centralized data node, and the data stored in the real storage location can be directly read through the storage path. In the embodiment of the present application, an example of a storage path may be: hdfs:// hadoop 1-name/user/child, wherein the storage path comprises at least one storage path node, and the storage path node may be: a cluster identifier corresponding to the cluster, a node identifier corresponding to the data node, and a path identifier in the corresponding data identifier, where the storage path is taken as an example, the node identifier of the storage path is: hadoop1-namenode, path identifier is user/cloud, there is no separate cluster identifier in the stored path, but the cluster identifier is added to the node identifier, for example: the cluster identification may be hadoop 1.

In the embodiment of the present application, a preset path corresponding relationship between an access path and a storage path is pre-established, where the preset path corresponding relationship may exist in a form of a relationship table or a data field, and in the preset path corresponding relationship, each access path corresponds to one storage path.

In one embodiment, the access paths may be in a one-to-one correspondence relationship with the storage paths, that is, each storage path uniquely corresponds to one access path, that is, a user can only access corresponding data through one access path.

For example, the correspondence relationship may be shown in table 1.

TABLE 1

In another embodiment, one storage path may correspond to two or more access paths, and since there may be multiple display positions when data is displayed at the front end in the form of virtual file identifiers or virtual icons, multiple access paths may correspond to one storage path, so that a user may access the same data corresponding to the same storage path through different virtual file identifiers or virtual icons.

In the embodiment of the application, after the access path is obtained, a target storage path corresponding to the target access path may be searched from a preset path corresponding relationship.

In a specific application, the preset path corresponding relationship may be stored locally, and may be read directly in this step, and in other embodiments, the preset path relationship may be stored in a local or remote server, and in this step, the preset path corresponding relationship needs to be obtained from the local or remote server first.

S103, forwarding the data acquisition request to a target data node which is positioned in the data cluster and corresponds to the target storage path.

Because the storage path contains the node identification, after the target storage path corresponding to the target access path is determined, the node identification of the target data node is extracted from the target storage path.

In an embodiment of the present application, in this step, the obtaining request may be directly forwarded to the target data node corresponding to the node identifier, and the target storage path may be forwarded at the same time while forwarding, so that the target data node may directly read the corresponding target data according to the target storage path.

And S104, receiving the target data returned by the target data node responding to the data acquisition request.

After receiving the data acquisition request, the target data node may directly search for data corresponding to the target storage path, use the searched data as target data corresponding to the data acquisition request, and return the target data to the unified distributed file system 200. The target data is then returned by the unified distributed file system 200 to the client 300 for presentation to the user.

In an embodiment of the present application, as shown in fig. 3, the foregoing step S101 may include the following steps:

s1011, receiving a data identifier of the target data.

The data identifier refers to identification information for indicating or representing the target data, and may be a name corresponding to the target data, for example: when the target data is a folder, the data identifier may be a name of the folder, and when the target data is a file, the data identifier may be a name of the file.

When a user accesses data, the data is usually purposeful, and in a specific operation, the step may be to acquire a name of the data clicked by the user in the presentation interface, or a name of the data selected by the user in the presentation interface, or the like.

And S1012, searching a target access path corresponding to the data identifier according to a second corresponding relation between the preset data identifier and the corresponding access path.

In the embodiment of the application, a second corresponding relationship between each data identifier and the access path may be pre-established, and the target access path may be obtained by directly searching the second corresponding relationship in this step.

And S1013, generating a data acquisition request carrying the target access path and used for acquiring target data.

In the foregoing embodiment, the BFS module can only receive data acquisition requests sent by other modules, and referring to fig. 1, it is typical that the client 300 generates the data acquisition requests, in which case an access path for storing data in the client 300 is required.

In the scheme, the client does not need to generate a data acquisition request, but only forwards one data identifier, and the BFS module can directly generate the data acquisition request according to the data identifier.

In an embodiment of the present application, as shown in fig. 4, the method in the embodiment shown in fig. 1 may further include the following steps:

s401, after receiving the cluster configuration instruction and performing cluster configuration, detecting data to be updated with changed positions in the data cluster.

In the normal use process of the cluster, technicians can migrate data in the cluster according to the working state of the cluster or the storage condition of the data in the cluster, for example: and data migration in the clusters, wherein in one cluster, data is transferred from one data node to another data node, or data migration between the clusters, and data is transferred from a data node in the A cluster to B data node in the B cluster.

The data to be updated refers to data that changes during the cluster configuration process.

In this embodiment of the present application, the cluster configuration instruction includes: at least one of a cluster offline instruction, a cluster migration instruction, and a cluster addition instruction;

s402, updating the path corresponding relation of the data to be updated according to the cluster configuration instruction.

In an embodiment of the present application, the cluster configuration instruction includes: a cluster migration instruction. And the data to be updated is the first data. As shown in fig. 5, step S402 in fig. 4 may include the following steps:

s501, acquiring a current storage path of the data corresponding to the first data after the data moves in the storage position.

The current storage path refers to the latest storage path after data migration.

S502, determining an access path corresponding to the first data in the preset path corresponding relation.

In the data migration, the access path corresponding to the first data is not changed, and the access path in the path correspondence relationship is also used.

And S503, updating the storage path corresponding to the access path corresponding to the first data into the current storage path in the preset path corresponding relationship.

In an embodiment of the present application, the cluster configuration instruction includes: and cluster offline instructions. And the data to be updated is the second data. As shown in fig. 6, step S402 in fig. 4 may include the following steps:

s601, acquiring a storage path of data corresponding to the second data.

S602, determining an access path corresponding to the second data in the preset path corresponding relation.

S603, deleting the path corresponding relation between the access path and the storage path of the second data in the preset path corresponding relation.

In an embodiment of the present application, the cluster configuration instruction includes: a cluster add instruction. And the data to be updated is the third data. As shown in fig. 7, step S402 in fig. 4 may include the following steps:

s701, acquiring a storage path of data corresponding to the third data identifier.

S702, generating an access path corresponding to the third data identifier.

S703, establishing a path corresponding relation between the access path identified by the third data and the storage path.

S704, storing the established path corresponding relation of the third data identifier into the preset path corresponding relation.

The following description of the present embodiment will be made in detail with reference to specific examples:

the first embodiment is as follows:

an embodiment of the present application provides a data access method, which may be applied to a unified distributed file system in a data system shown in fig. 1, and the method may include the following steps:

s11, the client 300 receives the data obtaining request input by the user.

In the embodiment of the present application, an example of an access path may be: bfs:// t1/user, wherein each character in the access path can be freely set, there may not be any mapping or corresponding relation with the data nodes in the cluster, and only through the access path, the corresponding data stored in the data nodes in the cluster cannot be directly searched or searched.

S12, the client 300 forwards the data acquisition request to the BFS module 201 in the unified distributed file system 200.

S13, the BFS module 201 obtains a preset path mapping relationship between the access path and the storage path.

In the embodiment of the present application, a preset path corresponding relationship between an access path and a storage path is pre-established, where the preset path corresponding relationship may exist in a form of a relationship table or a data field, and in the preset path corresponding relationship, each access path corresponds to one storage path. The pre-established preset path correspondence may be stored in the path correspondence storage module 202.

In a specific application, when the preset path correspondence relationship server is not stored in the path correspondence relationship storage module 202, the step S13 may include:

s131, determining an API interface corresponding to the preset path corresponding relation;

and S132, calling a pre-stored preset path corresponding relation by using the API.

S14, the BFS module 201 determines a target storage path corresponding to the target access path according to the preset path correspondence between the access path and the storage path.

S15, the BFS module 201 forwards the data obtaining request to a target data node located in the data cluster and corresponding to the target storage path.

In the specific embodiment of the present application, in the step of forwarding, both the data acquisition request and the target storage path may be forwarded to the target data node at the same time, so that the target data node searches for corresponding target data through the target storage path.

S16, the BFS module 201 receives the target data returned by the target data node in response to the data acquisition request.

S17, the BFS module 201 sends the target data to the client 300 as response data to the data acquisition request.

S18, the client 300 displays the received target data according to the data obtaining request.

In other embodiments of the present application, in step S103 or step S15, the BFS module 201 may generate a data read request corresponding to the data obtaining request according to the found target storage path, where the data read request carries the target storage path, and the data read request is used to request a data node corresponding to the target storage path and read target data corresponding to the target storage path.

In addition to the embodiment shown in fig. 1, in other embodiments of the present application, the method may also be applied to the unified distributed file system 200 in the embodiment shown in fig. 8, and in fig. 8, a single cluster is taken as an example for description.

In fig. 8, there is only one cluster 101, and in the cluster 101, there are a plurality of data nodes, for example: data node a1011, data node b1012, data node c1013, etc. The BFS module 201 may directly forward the data acquisition request to the corresponding data node.

Example two

The embodiment of the present application provides a data access method, which may also be used in the unified distributed file system 200 in the embodiment shown in fig. 9, and in fig. 9, a description is given by taking a multi-cluster as an example.

In fig. 9, there are a plurality of clusters including: a cluster 102, clusters 103, … …, and a cluster 104, where the cluster 102 includes data node a1021, data node b1022, data node c1023, etc., the cluster 103 includes data node a1031, data node b1032, data node c1033, etc., and the cluster 104 includes data node a1041, data node b1042, data node c1043, etc.

In this embodiment of the present application, the BFS module is divided into two parts, one part is located in a unified distributed file system 200, and the other part is dispersedly disposed in each cluster, and in fig. 9, the BFS module includes: a first BFS module 206, a second BFS module 203, a third BFS module 204, and a second BFS module 205, wherein the first BFS module 206 is located in the unified distributed file system 200, the second BFS module 203 is located in the cluster 102, the third BFS module 204 is located in the cluster 103, and the fourth BFS module 205 is located in the cluster 104.

The first BFS module 206 performs upper layer scheduling, and the second BFS module 203, the third BFS module 204, and the second BFS module 205 are distributed in each cluster to perform data reading in the data node.

In an embodiment of the present application, the method may include the steps of:

s21, the client 300 receives the data obtaining request input by the user.

S22, the client 300 forwards the data retrieval request to the first BFS module 206 in the unified distributed file system 200.

S23, the first BFS module 206 obtains a preset path mapping relationship between the access path and the storage path.

In a specific application, when the preset path correspondence relationship server is not stored in the path correspondence relationship storage module 202, as shown in fig. 10, the step S23 may include:

s231, determining an API interface corresponding to the preset path corresponding relation;

and S232, calling a pre-stored preset path corresponding relation by utilizing the API.

S24, the first BFS module 206 determines a target storage path corresponding to the target access path according to the preset path correspondence between the access path and the storage path.

S25, the first BFS module 206 determines the target cluster according to the target storage path.

When the target storage path found through the target access path is hdfs:// hadoop 1-name/user/cluster, where hadoop1 is the cluster identifier and user is the data node identifier, when the identifier of the cluster 102 is hadoop1, it may be determined that the target cluster is hadoop1, that is, the cluster 102 in fig. 9. When the data node of the data node a1011 in the cluster 102 is user, the target data node is user, and the data node a1011 in fig. 9 can be found through the target storage path.

S26, the first BFS module 206 sends the data acquisition request and the target storage path to the second BFS module 203 in the target cluster.

S27, the second BFS module 203 receives the data obtaining request and the target storage path, and determines the target data node as data node a1011 according to the target storage path.

S28, the second BFS module 203 forwards the data obtaining request and the target storage path to the data node a 1011.

S29, the second BFS module 203 receives the data searched by the data node a1011 from the target storage path according to the data obtaining request, and uses the data as the target data, and returns the target data to the first BFS module 206.

S30, the first BFS module 206 sends the target data to the client 300 as response data of the data acquisition request.

S31, the client 300 displays the received target data according to the data obtaining request.

In this embodiment of the present application, the first BFS module 206 serves as an upper layer schedule, and is configured to determine the target cluster 101 according to the target storage path, forward the data acquisition request and the target storage path to the second BFS module 203 in the target cluster 101, and then use the second BFS module 203 in the target cluster 101 as a lower layer schedule, and acquire the target data from the target data node in the target cluster 101.

In this embodiment of the application, the preset path corresponding relationship is set according to a dictionary tree manner, the dictionary tree may be as shown in fig. 11, and on the basis of fig. 11, as shown in fig. 12, the foregoing step S102 may include the following steps:

and S1201, reading the corresponding relation of the preset paths.

And S1202, traversing each dictionary tree node corresponding to each access path node in the access path in the dictionary tree according to the sequence from the father node to the child node in the dictionary tree corresponding to the path corresponding relation.

In one embodiment of the present application, the mapping relationship between the current access path and the storage path may be as shown in table 1,

it is constructed into a tree-like structure according to the algorithm of the dictionary tree, as shown in fig. 11.

S1203, searching a target dictionary tree node corresponding to the minimum access path node in the access path.

If there is an access path bfs:// t 0/user/closed/tmp at present, when looking up by using the dictionary tree, the nodes along bfs:// → t0 → user → closed → tmp (i.e. left side branch in the figure) in fig. 11 are found according to the dictionary tree algorithm, and finally the node with the minimum access path node tmp is found as the target dictionary tree node.

S1204, obtaining a target storage path node corresponding to the target dictionary tree node, and taking a complete path containing the target storage path node in the dictionary tree as a target storage path.

Since each node in the dictionary tree corresponds to both an access path node and a storage path node, when a target dictionary tree node matched with bfs:// t 0/user/closed/tmp/a.txt is found to be tmp on the left branch in fig. 11, then at this time, the complete path including the storage path node corresponding to the tmp node is hdfs:// hadoop 2-namenode/tmp/closed/.

The embodiment of the present application further provides a data relationship setting method, which may be set in the correspondence mapping system 500, as shown in fig. 13, and the method may include:

s1301, acquiring data identification and a storage path of data stored in the data nodes in the data cluster.

The storage path, that is, the storage address or link address of the data stored in the data node in the cluster, for example: one data in cluster C exists identified as app1, and the corresponding storage path may be: hdfs:// C/user/app 1.

S1302, establishing a first corresponding relation between the data identification and the storage path.

Since the data identification and the storage location of each data are unique, the first correspondence of the data identification to the storage path can be established very easily. Referring to fig. 1, in the embodiment of the present application, the path generation module 204 may obtain a storage path and establish a first corresponding relationship.

And S1303, generating an access path of the data stored by the data nodes in the data cluster according to a preset format.

The embodiment of the application designs a uniform access path agent of data among multiple clusters, so that the formats of access paths corresponding to all the clusters are uniform formats, for example: the access path of the data app1 may be bfs:// t0/user, and the access path of the data app2 may be bfs:// t1/user, so as to implement a unified access path for inter-cluster data access, and in this application, this unified access manner may be implemented and managed by a unified naming module corresponding to data of multiple clusters.

S1304, establishing a second corresponding relation between the data identification and the access path.

Since the access path is generated according to the data identifier, the second corresponding relationship between the data identifier and the access path can be easily established.

S1305, according to the first corresponding relation and the second corresponding relation, a path corresponding relation between an access path corresponding to the same data identification and a corresponding storage path is established.

Referring to fig. 1, in the embodiment of the present application, the path mapping module 503 establishes a path correspondence between an inter-storage path and a corresponding access path, and stores the path correspondence into a persistent storage medium. In this embodiment, the path correspondence, the first correspondence, and the second correspondence may be data tables.

In one embodiment, the access paths may be in a one-to-one correspondence relationship with the storage paths, that is, each storage path uniquely corresponds to one access path, that is, a user can only access corresponding data through one access path. For example, the correspondence relationship may be shown in table 1.

The foregoing embodiment describes that data in a cluster is already stored, an access path corresponding to the data is also created, and a path correspondence between the access path corresponding to the data and the storage path is also created in advance, so that the data is directly accessed by using the access path. However, for a cluster, a cluster goes online, goes offline or migrates, so when a cluster changes, as shown in fig. 4, the method may further include the following steps:

For specific updating, refer to the detailed description of the examples shown in fig. 5 to fig. 7, which is not described herein again.

When the cluster is reconfigured, the data stored in the cluster will change, and therefore, the latest condition of the data change in the cluster can be obtained by monitoring the cluster configuration instruction, and then the storage path after the data storage change can be obtained, and then the relationship between the access path and the storage path is reestablished. For example: if data app1 migrated from cluster C to cluster B, then the storage path for data app1 changed from hdfs:// C/user/app1 to: hdfs:// B/user/app1, but the access path of data app1 does not change, so accordingly the path correspondence for data app1 will become: bfs:// t0/user/app1 → hdfs:// B/user/app 1.

Furthermore, in the data synchronization process, the access path seen by the user is unchanged, but the storage path of the background is changed, so that the data synchronization process is not perceived by the user. In addition, in the data synchronization process, users do not need to participate, so that the users do not need to manually perform data migration and task migration, a lot of synchronization work is reduced, the cost can be reduced, the risk can also be reduced, and the data synchronization can be rapidly completed.

An embodiment of the present invention further provides a data access apparatus, which may be applied to a unified distributed file system in the data system shown in fig. 1, and as shown in fig. 14, the apparatus 140 may include:

a request acquisition module 141, configured to acquire a data acquisition request of target data; the data acquisition request carries a target access path of target data;

a path determining module 142, configured to determine, according to a preset path correspondence between an access path and a storage path, a target storage path corresponding to the target access path;

a request forwarding module 143, configured to forward the data obtaining request to a target data node located in the data cluster and corresponding to the target storage path;

a data receiving module 144, configured to receive target data returned by the target data node in response to the data obtaining request.

In an embodiment of the present application, the request obtaining module may further include:

the identification receiving submodule is used for receiving the data identification of the target data;

the path searching submodule is used for searching a target access path corresponding to the data identifier according to a second corresponding relation between a preset data identifier and a corresponding access path;

and the request generation submodule is used for generating a data acquisition request which is used for acquiring target data and carries the target access path.

In another embodiment of the present application, the apparatus further comprises:

the interface determining module is used for determining an API interface corresponding to the preset path corresponding relation;

and the calling module is used for calling the pre-stored preset path corresponding relation by utilizing the API.

The preset path corresponding relation called by the calling module can be used for determining the path by the path determining module.

the updating detection module is used for detecting the data to be updated with changed positions in the data cluster after receiving the cluster configuration instruction and performing cluster configuration; the cluster configuration instructions include: at least one of a cluster offline instruction, a cluster migration instruction, and a cluster addition instruction;

and the relationship updating module is used for updating the path corresponding relationship of the data to be updated according to the cluster configuration instruction.

In a particular embodiment, the relationship update module includes:

and the first path acquisition submodule is used for acquiring a current storage path of the first data after the data corresponding to the first data move in the storage position.

And the first access path determining submodule is used for determining an access path corresponding to the first data in the preset path corresponding relation.

When data is migrated, the access path corresponding to the first data is not changed, and the access path in the path correspondence relationship is also used.

And the first updating submodule is used for updating the storage path corresponding to the access path corresponding to the first data into the current storage path in a preset path corresponding relationship.

In another embodiment, the relationship update module includes:

and the second path acquisition submodule is used for acquiring a storage path of data corresponding to the second data.

And the second access path determining submodule is used for determining an access path corresponding to the second data in the preset path corresponding relation.

And the deleting submodule is used for deleting the path corresponding relation between the access path of the second data and the storage path in the preset path corresponding relation.

In another embodiment, the relationship update module includes:

and the third path obtaining submodule is used for obtaining a storage path of the data corresponding to the third data identification.

And the access path generation submodule is used for generating an access path corresponding to the third data identifier.

And the relationship establishing submodule is used for establishing a path corresponding relationship between the access path identified by the third data and the storage path.

And the corresponding relation storage submodule is used for storing the established path corresponding relation of the third data identifier into the preset path corresponding relation.

An embodiment of the present invention further provides a data relationship setting apparatus, which may be applied to the path mapping system 500 in the data system shown in fig. 1, and as shown in fig. 15, the apparatus may include:

a first obtaining module 151, configured to obtain a data identifier and a storage path of data stored in a data node in a data cluster;

a first establishing module 152, configured to establish a first corresponding relationship between the data identifier and the storage path;

the generating module 153 is configured to generate an access path of data stored in a data node in the data cluster according to a preset format;

a second establishing module 154, configured to establish a second corresponding relationship between the data identifier and the access path;

a third establishing module 155, configured to establish a path correspondence between an access path and a corresponding storage path corresponding to the same data identifier according to the first correspondence and the second correspondence, where the access path and the storage path in the path correspondence are in one-to-one correspondence.

In one embodiment, the apparatus further comprises:

In the embodiment of the present application, the relationship updating module may also refer to the description of the relationship updating module in the embodiment of fig. 14, and details are not repeated here.

An embodiment of the present invention further provides an electronic device, as shown in fig. 16, which includes a processor 111, a communication interface 112, a memory 113, and a communication bus 114, where the processor 111, the communication interface 112, and the memory 113 complete mutual communication through the communication bus 114,

a memory 113 for storing a computer program;

in one embodiment of the present application, the processor 111, when executing the program stored in the memory 113, implements the following steps of data access:

In another embodiment of the present application, the processor 111, when executing the program stored in the memory 113, implements the following data relationship setting steps:

The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the terminal and other equipment.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In yet another embodiment provided by the present invention, a computer-readable storage medium is further provided, which stores instructions that, when executed on a computer, cause the computer to perform the data access method described in any one of the embodiments of fig. 2, fig. 3, fig. 4, fig. 5, fig. 6, and fig. 7.

In another embodiment provided by the present invention, a computer-readable storage medium is further provided, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer, the computer is caused to execute the data relationship setting method according to any one of the embodiments in fig. 12 and 13 in the foregoing embodiments.

In yet another embodiment provided by the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the data access method of any one of the embodiments described above with reference to fig. 2, 3, 4, 5, 6 and 7.

In another embodiment of the present invention, a computer program product containing instructions is further provided, which when run on a computer, causes the computer to execute the data relationship setting method described in any one of the embodiments in fig. 12 and 13 in the foregoing embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method of data access, the method comprising:

2. The method of claim 1, wherein the data acquisition request for acquiring the target data comprises:

receiving a data identifier of target data;

3. The method of claim 1, further comprising:

4. The method according to claim 1, characterized in that it comprises:

5. A data relationship setting method, characterized in that the method comprises:

6. The method of claim 5, wherein the method comprises:

7. A data access apparatus, characterized in that the apparatus comprises:

8. A data relationship setting apparatus, characterized in that the apparatus comprises:

9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the steps of the data access method of any of claims 1 to 4 when executing a program stored in the memory.

10. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the steps of the data access method of claim 5 or 6 when executing a program stored in the memory.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data access method according to any one of claims 1 to 4.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data access method according to any one of claims 5 or 6.