WO2021036989A1

WO2021036989A1 - Method, apparatus and system for data access

Info

Publication number: WO2021036989A1
Application number: PCT/CN2020/110819
Authority: WO
Inventors: 李铮; 王明月; 刘玉; 张巍
Original assignee: 华为技术有限公司
Priority date: 2019-08-23
Filing date: 2020-08-24
Publication date: 2021-03-04
Also published as: CN112416871A; CN112416871B

Abstract

A method, apparatus and system for data access, which relate to the field of communications. The method is executed by a management server (1), the management server (1) is connected to a plurality of processing nodes (2), and the plurality of processing nodes (2) are connected to a storage server (3). The method comprises: receiving a file access request, the file access request carrying the identification of a file to be accessed; according to the identification of the file, determining whether the file is cached in caches of the plurality of processing nodes (2), the management server (1) storing the identification of files cached in the caches of the plurality of processing nodes (2); and when the file is not cached in the caches of the plurality of processing nodes (2), then instructing at least one processing node (2) among the plurality of processing nodes (2) to acquire the file from the storage server (3). The described method may reduce the path of data reading and improve the performance of data access.

Description

Method, device and system for data access

Technical field

This application relates to the computer field, and in particular to a method, device and system for data access.

Background technique

With the advent of the era of big data, the scale of enterprise data continues to expand, and how to quickly access massive amounts of data is a core issue facing enterprises.

At present, in order to improve the efficiency of data access and storage, enterprises generally adopt a distributed system. The distributed system includes a coordination server, multiple processing nodes, and a storage server that stores data. When receiving the access request sent by the client, the coordination server decomposes the access request into multiple tasks, which are sent to each processing node respectively, and each processing node accesses the data in the storage server, and each processing node reads the data in the storage server. The data is returned to the coordination server, and the coordination server integrates the data returned by each processing node, and then returns it to the client.

After each processing node receives the task sent by the coordination server, it first determines whether the data to be accessed in the received task is in the cache of the processing node. If it is in the cache, it reads the data directly from the cache. If it is not in the cache, you need to read the data to be accessed from the storage server to the cache, and then read the data from the cache. It can be seen that for each processing node, if the accessed data does not hit the cache, it needs to read the data in the storage server to the processing node cache, and then read the data in the cache to the coordination server, so Increase the data read path, thereby affecting the performance of data access.

Summary of the invention

The present application provides a data access method, device and system to reduce the path of data reading and improve the performance of data access. The technical solution is as follows:

In the first aspect, this application provides a data access method, which is executed by a management server, the management server is connected to a plurality of processing nodes, the plurality of processing nodes are connected to a storage server, and the management server stores the plurality of processing nodes. The identifier of the file cached in the node's cache. In this method, a file access request is received, and the file access request carries the identification of the file to be accessed; according to the identification of the file to be accessed, it is determined whether the file to be accessed is cached in the caches of the multiple processing nodes; when the file to be accessed is not cached In the cache of the multiple processing nodes, at least one processing node of the multiple processing nodes is instructed to obtain the file to be accessed from the storage server. Because when the file to be accessed is not cached in the caches of the multiple processing nodes, the at least one processing node is instructed to obtain the file to be accessed from the storage server. In this way, the at least one processing node directly reads the file to be accessed from the storage server according to the instructions of the management server, and directly returns the file to be accessed to the management server after reading the file to be accessed, and does not return the file to be accessed to the management server. The file to be accessed is cached in the cache of the at least one processing node. In this way, the file to be accessed does not need to pass through the cache of the at least one processing node, thereby reducing the transmission path of the file to be accessed, reducing the path for data reading, and improving the performance of data access.

In a possible implementation, the identification of at least one sub-file included in the file to be accessed and the identification of the storage server where each sub-file is located are obtained from the storage server; a read file is generated for each sub-file included in the file to be accessed. Each read task includes the identification of a sub-file and the identification of the storage server where the sub-file is located; each read task is sent to a processing node, instructing the processing node that received the read task from The storage server storing the subfile reads the subfile; receives the subfile read by the processing node that receives the reading task; merges the subfile into the file to be accessed. Since the generated read task includes the identifier of the storage server where the subfile is located, the processing node that receives the read task can directly read the subfile from the storage server according to the identifier of the storage server in the read task, And after the sub file is read, the sub file is directly sent to the management server. In this way, the sub-file will not be cached in the cache of the processing node first, and then the processing node will read the sub-file from its own cache and send it to the management server, thereby reducing the transmission path of the sub-file and improving Read the performance of this subfile.

In another possible implementation manner, when the access frequency of the file to be accessed exceeds the preset frequency, the cache task is sent to at least one processing node of the multiple processing nodes to instruct the at least one processing node to place the file to be accessed. The included sub-files are cached to the at least one processing node; the identifier of the sub-file included in the identifier of the file to be accessed and the identifier of the processing node that caches each sub-file are recorded. When the file to be accessed is cached in the caches of the multiple processing nodes, at least one reading task is generated, and each reading task includes the identification of the subfile and the identification of the processing node where the subfile is located; sending the at least one reading Assigning tasks to the multiple processing nodes instructs the multiple processing nodes to read the subfile from the cache of the processing node storing the subfile; synthesize the read subfile into the file to be accessed.

When the access frequency of the file to be accessed exceeds the preset frequency, it indicates that the file to be accessed is a frequently accessed file. Due to the limited cache space in each processing node, the frequently accessed file to be accessed is saved to the at least one processing node. In the cache, this not only improves the cache utilization of the processing node, but also increases the hit rate of the files to be accessed. When the file to be accessed is cached in the caches of the multiple processing nodes, the generated read task includes the identification of the processing node where the subfile is located, so that the processing node that receives the cache task does not need to determine the processing node where the subfile is located. , Read the sub-file from the processing node where the sub-file is located directly according to the identifier of the processing node where the sub-file is located, which improves the efficiency of reading the sub-file.

In another possible implementation manner, when the access frequency of the file to be accessed is lower than the preset frequency, a deletion task is sent to the processing node where the subfile included in the file to be accessed is located, and the deletion task includes the identifier of the subfile, To instruct the processing node to delete the subfile; delete the identifier of the subfile and the identifier of the processing node recorded in the management server. In this way, files to be accessed with a low access frequency can be deleted from the caches of the multiple processing nodes, and more cache space can be saved for storing files with a high access frequency, which not only improves the caches of the multiple processing nodes Utilization rate has also improved the file hit rate.

In a second aspect, the present application provides a data access method, which is executed by a processing node, which is one of a plurality of processing nodes connected to a management server, and the plurality of processing nodes are connected to a storage server. In this method, a reading task is received, and the reading task is a task sent by the management server when it determines that the file to be accessed is not cached in the caches of the multiple processing nodes, and the reading task includes a child in the file to be accessed. The identifier of the file and the identifier of the storage server where the subfile is located; read the subfile from the storage server corresponding to the identifier of the storage server according to the identifier of the subfile; send the read subfile to the management server. Since the received reading task includes the identifier of the subfile and the identifier of the storage server where the subfile is located, the processing node can directly read the subfile from the storage server according to the identifier of the storage server, and then directly return it to the management server , And the processing node will not cache the sub-file in the cache of the processing node before returning the sub-file to the management server. Therefore, the child file returned to the management server will not be cached by the processing node, reducing the transmission path of the child file, reducing the data reading path, and improving the performance of data access.

In a possible implementation manner, a cache task is received. The cache task is a task sent by the management server when the access frequency of the file to be accessed exceeds a preset frequency. The cache task includes the identifier of a subfile of the file to be accessed and the The identifier of the storage server where the subfile is located; read the subfile from the storage server corresponding to the identifier of the storage server according to the identifier of the subfile; store the subfile in the cache of the processing node. When the access frequency of the file to be accessed exceeds the preset frequency, it indicates that the file to be accessed is a file that is frequently accessed. Due to the limited cache space in the processing node, the sub-files of the file to be accessed that are frequently accessed are saved to the processing node. In the cache, this not only improves the cache utilization of the processing node, but also improves the hit rate of the files to be accessed.

In a third aspect, the present application provides a data access method, the method is executed by a processing node, the processing node is one of a plurality of processing nodes connected to a management server, and the plurality of processing nodes are connected to a storage server. In this method, a reading task is received, and the reading task is a task sent by the management server when determining that the file to be accessed is cached in the caches of the multiple processing nodes, and the reading task includes a subfile in the file to be accessed The identifier of the subfile and the identifier of the processing node where the subfile is located; read the subfile from the processing node corresponding to the identifier of the processing node according to the identifier of the subfile; send the read subfile to the management server. When the file to be accessed is cached in the caches of the multiple processing nodes, since the read task includes the identification of the processing node where the subfile is located, the processing node does not need to determine the processing node where the subfile is located, and directly based on the The sub-file reads the sub-file from the processing node where the sub-file is located according to the identifier of the processing node, which improves the efficiency of reading the sub-file.

In a possible implementation manner, a deletion task is received, the deletion task is a task sent by the management server when the access frequency of the file to be accessed is lower than a preset frequency, and the deletion task includes an identifier of a subfile of the file to be accessed; Delete the subfile corresponding to the identifier of the subfile. In this way, when the access frequency of the file to be accessed is low, the processing node can delete the sub-files belonging to the file to be accessed from its own cache, which can save more cache space for storing the files with high access frequency. Increasing the cache utilization of the processing node also improves the file hit rate.

In a fourth aspect, the present application provides a data access device, which is used to execute the first aspect or the method in any one of the optional implementation manners of the first aspect. Specifically, the device includes a unit for executing the method in the first aspect or any one of the possible implementation manners of the first aspect.

In a fifth aspect, this application provides a data access device, which is used to execute the second aspect or a method in an optional implementation of the second aspect. Specifically, the device includes a unit for executing the method in the second aspect or a possible implementation of the second aspect. Or, the device is configured to execute the method in the third aspect or an optional implementation manner of the third aspect. Specifically, the device includes a unit for executing the method in the third aspect or a possible implementation manner of the third aspect

In a sixth aspect, the present application provides a data access device. The device includes a processor, a memory, and a communication interface. The processor is connected to the memory and the communication interface through a bus; the memory stores computer execution instructions, The computer-executable instructions are executed by the processor to implement the operation steps of the first aspect or any one of the possible implementation manners of the first aspect.

In a seventh aspect, the present application provides a data access device. The device includes a processor, a memory, and a communication interface. The processor is connected to the memory and the communication interface through a bus; the memory stores computer-executable instructions. The computer-executable instructions are executed by the processor, and are used to execute the operation steps of the second aspect or a possible implementation of the second aspect, or are used to execute the third aspect or a possible implementation of the third aspect Steps to implement the method.

In an eighth aspect, the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the methods described in the above aspects.

In a ninth aspect, this application provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the methods described in the above aspects.

In a tenth aspect, this application provides a data access system, which includes: a management server, a storage server, and multiple processing nodes.

The management server receives the file access request, and the file access request carries the identification of the file to be accessed; according to the identification of the file to be accessed, it is determined whether the file to be accessed is cached in the caches of the multiple processing nodes, and the multiple processes are stored in the management server. The identifier of the file cached in the cache of the node; when the file to be accessed is not cached in the caches of the multiple processing nodes, the identifier of at least one subfile included in the file to be accessed and the storage where each subfile is located are obtained from the storage server The identification of the server, a read task is generated for each sub-file included in the file to be accessed, and each read task includes the identification of a sub-file and the identification of the storage server where the sub-file is located; separate each read task Send to a processing node. The processing node that receives the reading task reads the corresponding subfile from the storage server corresponding to the identifier of the storage server according to the identifier of the subfile in the received reading task, and sends the read subfile to the management server. The management server receives the subfile read by the processing node that has received the read task.

Because when the file to be accessed is not cached in the caches of the multiple processing nodes, each read task generated by the management server includes the identifier of a subfile and the identifier of the storage server where the subfile is located. In this way, the processing node that receives the read task can directly read the subfile from the storage server according to the identifier of the storage server in the read task, and directly send the subfile to the management server after reading the subfile . In this way, the sub-file will not be cached in the cache of the processing node first, and then the processing node will read the sub-file from its own cache and send it to the management server, thereby reducing the transmission path of the sub-file and improving Read the performance of this subfile.

In a possible implementation manner, when the access frequency of the file to be accessed exceeds the preset frequency, the management server sends the cache task to at least one processing node of the plurality of processing nodes. The processing node receiving the cache task caches the sub-files included in the file to be accessed. And, the management server also records the identification of the sub-file included in the identification of the file to be accessed and the identification of the processing node that caches each sub-file. When the file to be accessed is cached in the caches of the multiple processing nodes, the management server generates at least one reading task. Each reading task includes the identifier of the subfile and the identifier of the processing node where the subfile is located; sending at least one read Get tasks to the multiple processing nodes. The processing node that receives the reading task reads the subfile according to the identifier of the subfile in the received reading task and the identifier of the processing node where the subfile is located, and sends the read subfile to the management server. The management server receives the subfile read by the processing node that has received the read task.

When the access frequency of the file to be accessed exceeds the preset frequency, it indicates that the file to be accessed is a frequently accessed file. Due to the limited cache space in each processing node, the frequently accessed file to be accessed is saved to the at least one processing node In the cache, this not only improves the cache utilization of the processing node, but also improves the hit rate of the files to be accessed. When the file to be accessed is cached in the caches of the multiple processing nodes, the generated read task includes the identification of the processing node where the subfile is located, so that the processing node that receives the cache task does not need to determine the processing node where the subfile is located. , Read the sub-file from the processing node where the sub-file is located directly according to the identifier of the processing node where the sub-file is located, which improves the efficiency of reading the sub-file.

Description of the drawings

Figure 1 is a schematic structural diagram of a data access system provided by an embodiment of the present application;

2 is a schematic diagram of a client access data access system provided by an embodiment of the present application;

FIG. 3 is a flowchart of a data access method provided by an embodiment of the present application;

FIG. 4 is a flowchart of a method for caching files provided by an embodiment of the present application;

FIG. 5 is a flowchart of a method for deleting files provided by an embodiment of the present application;

Fig. 6 is a schematic structural diagram of a data access device provided by an embodiment of the present application;

FIG. 7 is a schematic structural diagram of another data access device provided by an embodiment of the present application;

FIG. 8 is a schematic structural diagram of another data access device provided by an embodiment of the present application;

Fig. 9 is a schematic structural diagram of another data access device provided by an embodiment of the present application.

detailed description

The implementation manners of the present application will be described in further detail below in conjunction with the accompanying drawings.

Referring to Figure 1, an embodiment of the present application provides a data access system. The system includes: a management server 1, a plurality of processing nodes 2, at least one storage server 3, the management server 1 is connected to each processing node 2, and each processing Node 2 is also connected. The management server 1 and each processing node 2 are connected to each storage server 3 through a network.

The management server 1 is used to decompose the received file access request into multiple tasks, which are sent to each processing node 2 respectively, and each processing node 2 accesses the file in the storage server 3, and each processing node 2 will read The files returned to the management server 1 are returned to the client after the management server 1 integrates the files returned by the processing nodes 2 (not shown in the figure).

Each storage server 3 stores files for user access. In the storage server 3, a file can be divided into multiple sub-files for storage. For example, the file may be a form in a database. Assuming that the form includes 100 records, three sub-files are used to save the form in the storage server 3, and the three sub-files are the first sub-file, the second sub-file, and the third sub-file, respectively. The first subfile saves the records of Article 1 to Article 33 of the form, the second subfile saves the records of Article 34 to Article 66 of the form, and the third subfile saves the records of Article 67 to Article 100 of the form. Records.

For any file in the storage server 3, the management server 1 may cache each sub-file included in the file in one or more processing nodes 2 of the data access system. For each sub-file included in the file, when the management server 1 caches the sub-file to a processing node 2, the corresponding relationship between the identifier of the file, the identifier of the sub-file, and the identifier of the processing node 2 is stored in File list.

Optionally, the processing node 2 includes a cache 21, and the management server 1 caches the sub-file in the cache 21 of the processing node 2.

The detailed implementation process of each sub-file included in the cache file in the processing node 2 of the data access system by the management server 1 can be referred to related content in the subsequent embodiment shown in FIG. 3, which will not be described in detail here.

Regarding the above-mentioned file list, let’s take an example to explain it. For the above-mentioned form, suppose the identifier of the form is ID1, and the form includes the identifier of the first subfile, the identifier of the second subfile, and the third subfile. The identifiers are file1, file2, and file3. It is assumed that the management server 1 caches the first sub-file, the second sub-file, and the third sub-file in the first processing node, the second processing node, and the third processing node, respectively. The identifier of the processing node and the identifier of the third processing node are TE1, TE2, and TE3, respectively. The management server 1 saves the identification ID1 of the form, the identification file1 of the first sub-file, and the identification TE1 of the first processing node in the file list shown in Table 1 below, and the identification ID1 of the form and the identification of the second sub-file are correspondingly saved. File2 and the identification TE2 of the second processing node are correspondingly saved in the file list shown in Table 1 below, and the identification ID1 of the form, the identification file3 of the third subfile, and the identification TE3 of the third processing node are correspondingly saved in the following Table 1 List of files shown.

Table 1

文件的标识Identification of the file	子文件的标识ID of the subfile	处理节点的标识ID of the processing node
ID1ID1	file1file1	TE1TE1
ID1ID1	file2file2	TE2TE2
ID1ID1	file3file3	TE3TE3
……...	……...	……...

Optionally, referring to FIG. 2, when the user needs to access the file to be accessed, the user can input the identification of the file to be accessed to the client 4. The client 4 obtains the input identifier of the file to be accessed, and sends a file access request including the identifier of the file to be accessed to the management server 1.

The management server 1 receives the file access request, and determines whether the file to be accessed is cached in the processing node 2 of the data access system according to the file list and the identification of the file to be accessed included in the file access request. If the file to be accessed is cached in the processing node 2 of the data access system, the file to be accessed is obtained from the processing node 2 of the data access system. If the file to be accessed is not cached in the processing node 2 of the data access system, the processing node 2 is controlled to obtain the file to be accessed from the storage server 3 where the file to be accessed is located. Then send the file to be accessed to the client.

For the detailed implementation process of the management server 1 obtaining the file to be accessed, please refer to the related content in the embodiment shown in FIG. 3, which will not be described in detail here.

Optionally, the identifier of the file may be the file name of the file, etc., and the identifier of the sub-file may be the storage path or file name of the sub-file in the storage server, etc.

In the embodiment of the present application, since the management server 1 saves a file list, the file list saves the identification of the file cached in the processing node 2 in the data access system, the identification of the subfile in the file, and the processing of the subfile. Correspondence between the identities of node 2. In this way, when the management server 1 receives the identification of the file to be accessed sent by the client 4, it can determine whether the file to be accessed is cached in the processing node 2 of the data access system according to the identification of the file to be accessed and the file list. In the case that the file to be accessed is not cached in the processing node 2 of the data access system, the processing node 2 is controlled to obtain the file to be accessed from the storage server 3 where the file to be accessed is located. Among them, under the control of the management server 1, the processing node 2 does not first read the file from the cache 21 of the processing node 2 of the data access system, but directly obtains the file to be accessed from the storage server 3, and then directly sends it to the management server 1. Send the acquired file, and before sending the file to the management server 1, the file will not be cached in the included cache 21, so that the file to be accessed does not need to be cached in the cache 21 of the processing node 2 before being processed The node 2 reads out from its cache 21 and sends it to the management server 1, which improves the efficiency of accessing files.

In the embodiment of the present invention, the data access system is mainly used for accessing database data. The following is to take access to the file in the storage server 3 as an example to introduce the file access method provided by the embodiment of the present invention.

Referring to FIG. 3, an embodiment of the present application provides a file access method, which can be applied to the system described in FIG. 1, and includes:

Step 201: The management server 1 receives a file access request, and the file access request includes the identification of the file to be accessed.

The user logs in to the management server 1 through the client 4, and then the client 4 displays the interface provided by the management server 1. The user can input a file access request through the interface provided by the management server 1. The file access request may be a database access statement, and the database access statement may include the identification of the file to be accessed. The database access statement may be a structured query language (SQL) access statement, and the file to be accessed may be a form in the database.

For example, suppose that the user enters the SQL access statement "select name from teacher join people on teacher.id=people.id" through the client 4, and the SQL access statement is used to request access to two forms, and the name of one form is "teacher ", the identifier of this form is "teacher.id", the name of another form is "people", and the identifier of this other form is "people.id". When the management server 1 receives the SQL access statement, it extracts the identifiers of the two forms in the SQL access statement. For example, extracts the identifiers of two forms from the SQL access statement, and the identifier of one form is the form "teacher" The identifier of "teacher.id", and the identifier of the other form is the identifier "people.id" of the form "people".

Optionally, the management server 1 also analyzes whether the statement format of the SQL access statement is correct, and if it is correct, execute step 202. If it is incorrect, the client 4 is fed back an alarm that the sentence is incorrect, and the client 4 receives the alarm and displays it to the user.

Step 202: The management server 1 determines whether a file to be accessed is cached in the processing node 2 in the data access system according to the identifier and the file list.

As shown in Table 1 above, the file list is used to store the correspondence between the identifier of the file, the identifier of the sub-file, and the identifier of the processing node 2.

The management server 1 can cache files in the processing node 2 of the data access system. The file includes multiple sub-files. When a sub-file included in the file is cached to a certain processing node 2, the correspondence between the identifier of the file, the identifier of the sub-file, and the identifier of the processing node 2 is stored in the file List.

Optionally, the management server 1 may cache files whose access frequency exceeds a preset frequency in the processing node 2 of the data access system. And, delete files whose access frequency is lower than the preset frequency from the processing node 2 of the data access system.

Optionally, the management server 1 caches the files whose access frequency exceeds the first preset frequency threshold in the most recent preset time period in the processing node 2 of the data access system. And, deleting from the processing node 2 of the data access system the files whose access frequency in the most recent preset period of time does not exceed the second preset frequency threshold. The first preset frequency threshold is greater than or equal to the second preset frequency threshold.

The management server 1 saves a historical access record, and each record in the historical access record stores the identification and access time of the file that the user has accessed.

Optionally, the management server 1 periodically or irregularly collects the access frequency of the file accessed in the latest preset time period in the historical access record, and when the access frequency of the file exceeds the first preset frequency threshold , The control processing node 2 obtains the file from the storage server 3 where the file is located and caches the file in the processing node 2 of the data access system.

Referring to FIG. 4, in implementation, it can be implemented through the following operations from 2021 to 2026. The operations from 2021 to 2026 are:

2021: The management server 1 selects the identifier of a file that does not exist in the file list from the historical access record, and calculates the access frequency of the file in the latest preset time period according to the historical access record and the identifier of the file.

The selected identifier is not in the file list, which means that the file corresponding to the identifier is not cached in the processing node 2 of the data access system.

In this step, the management server 1 may obtain the access time corresponding to the identifier of the file in the latest preset time period from the historical access record. The management server 1 counts the number of access times obtained, and the counted number is equal to the number of accesses to the file, and obtains the access frequency of the file in the latest preset time period according to the number of accesses and the preset duration.

2022: When the access frequency exceeds the first preset frequency threshold, the management server 1 obtains the identification of the storage server 3 where the file is located and the identification of at least one sub-file included in the file according to the identification of the file. The identifier may be the address of the storage server 3, for example, may be an internet protocol (IP) address for interconnection between networks of the storage server 3.

The technician can input the identification of the storage server 3 in the data access system to the management server 1 in advance. The management server 1 can obtain the identification of each file saved by the storage server 3 according to the identification of the storage server 3, and save the identification of each acquired file and the identification of the storage server 3 to the correspondence between the identification of the file and the identification of the storage server .

Optionally, for each file saved by the storage server 3, the management server 1 may also obtain the identification of each sub-file included in the file from the storage server 3, and save the identification of the file and the obtained identification of each sub-file Correspondence between the ID of the file and the ID of the sub-file.

In this step, the management server 1 counts that the access frequency of a certain file exceeds the first preset frequency threshold, and can obtain the storage server where the file is located from the corresponding relationship between the file ID and the storage server ID according to the file ID. 3 logo. In the case that the management server 1 saves the corresponding relationship between the identifier of the file and the identifier of the sub-file, the management server 1 obtains each sub-file included in the file from the corresponding relationship between the identifier of the file and the identifier of the sub-file according to the identifier of the file. Logo. In the case that the management server 1 does not save the correspondence between the identifier of the file and the identifier of the subfile, the management server 1 obtains the identifier of each subfile included in the file from the storage server 3 according to the identifier of the storage server 3.

2023: The management server 1 generates at least one cache task, and each cache task includes an identifier of the storage server 3 and an identifier of a subfile in the file.

2024: For each cache task, the management server 1 selects a processing node 2 and sends the cache task to the processing node 2.

The management server 1 may start traversal from the first cache task among the at least one cache task, and each time a cache task is traversed, a processing node 2 is selected, and then the one cache task is sent to the processing node 2. Then traverse the next cache task and repeat the above process until the last cache task is sent.

Optionally, the management server 1 randomly selects a processing node 2 from the processing nodes 2 of the data access system. or,

Optionally, the management server 1 may store a correspondence relationship between the identifier of the processing node 2 and the size of the free cache space, and the correspondence relationship stores the identifier of each processing node 2 in the data access system and the size of the free cache space. In this way, the management server 1 can first select at least one processing node 2 with the largest amount of free cache space based on the corresponding relationship, and the number of the at least one processing node 2 is equal to the number of sub-files included in the file, and then each traverse to a cache task One processing node 2 is selected from the at least one processing node 2, and then the cache task is sent to the processing node 2.

2025: The processing node 2 receives a cache task, and according to the identifier of the storage server 3 in the cache task, obtains the child file corresponding to the identifier of the child file in the cache task from the storage server 3, and the acquired child file Cached in its own cache 21.

Optionally, the processing node 2 may also send a cache success message corresponding to the cache task to the management server 1.

Optionally, after the processing node 2 caches the subfile, it may also obtain the remaining free cache space size of itself, and send the remaining free cache space size to the management server 1.

2026: The management server 1 may correspondingly save the identification of the file, the identification of the subfile in the cache task, and the identification of the selected processing node 2 into the data list.

Optionally, the management server 1 executes this step after selecting a processing node 2 for the caching task, or executes this step after receiving a caching success message corresponding to the caching task sent by the processing node 2.

Optionally, the management server 1 may also receive the size of the remaining free cache space of the processing node 2, and update the size of the free cache space of the processing node 2 in the corresponding relationship between the identifier of the processing node 2 and the size of the free cache space to receive The size of the remaining free cache space.

Optionally, the management server 1 also obtains the access frequency of each file cached in the processing node 2 of the data access system in the most recent preset time period, and deletes the access frequency from the processing node 2 of the data access system lower than the predetermined time period. Set the file with the second preset frequency threshold.

Referring to Fig. 5, in implementation, it can be implemented through the following operations 2121 to 2123. The operations from 2121 to 2123 are:

2121: For the identifier of any file in the file list, the management server 1 counts the access frequency of the file in the latest preset time period according to the file identifier and historical access records.

In implementation, the access time corresponding to the file is obtained in the historical access record according to the identifier of the file, and the number of access times in the latest preset time period is counted to obtain the number of accesses to the file, according to The number of visits obtains the access frequency of the file.

2122: When the access frequency of the file is lower than the second preset frequency threshold, the management server 1 obtains, from the file list, the identifier of each sub-file included in the file and the identifier of the processing node 2 where each sub-file is located.

2123: For each sub-file, the management server 1 sends a deletion task to the processing node 2 where the sub-file is located. The deletion task includes the identifier of the sub-file, and then deletes the record including the identifier of the file from the file list.

The processing node 2 receives the deletion task, and deletes the subfile corresponding to the identifier of the subfile in the deletion task from its own cache 21.

Optionally, after deleting the subfile, the processing node 2 may also obtain the remaining free cache space size in its cache 21, and send the remaining free cache space size to the management server 1.

Optionally, the management server 1 also receives the size of the remaining free cache space of the processing node 2, and updates the size of the free cache space of the processing node 2 to the received size in the corresponding relationship between the identifier of the processing node 2 and the size of the free cache space. The size of the remaining free cache space.

Since the files in the processing node 2 of the data access system whose access frequency exceeds the first preset frequency threshold in the most recent preset period of time, when accessing the files, the cost of each file cached in the processing node 2 can be improved. Hit rate.

The foregoing is only an implementation example of caching files in the processing node 2 of the data access system listed in this application, so as to eliminate files from the processing node 2 of the data access system. For other implementations of caching files in the processing node 2 of the data access system, and other implementations of eliminating files from the processing node 2 of the data access system, this application can also be applied, which will not be listed here.

In this step, the management server 1 can query the file list according to the identifier of the file to be accessed. If the identifier of each subfile included in the file to be accessed and the identifier of the processing node 2 where each subfile is located is not found, then the data access is determined The processing node 2 of the system does not cache the file to be accessed. If the identification of each sub-file included in the file to be accessed and the identification of the processing node 2 where each sub-file is located are queried, it is determined that the file to be accessed is cached in the processing node 2 of the data access system.

Optionally, after receiving the file access request, the management server 1 may also use the current time as the access time of the file to be accessed, and may save the correspondence between the identification of the file to be accessed and the access time in the historical access record in.

Step 203: When the file to be accessed is not cached in the processing node 2 of the data access system, the management server 1 generates at least one first reading task, and each first reading task includes the address of the storage server 2 where the file to be accessed is located and The identifier of a sub-file in the file to be accessed.

The identifiers of the subfiles included in each first reading task are different.

In this step, the management server 1 may obtain the identification of the storage server 3 where the file to be accessed is located from the correspondence between the identification of the file and the identification of the storage server 3 according to the identification of the file to be accessed.

In the case that the management server 1 saves the corresponding relationship between the identification of the file and the identification of the sub-file, the management server 1 obtains at least the file to be accessed from the corresponding relationship between the identification of the file and the identification of the sub-file according to the identification of the file to be accessed. An identification of a subfile generates at least one first reading task, and each first reading task includes an identification of the storage server 3 and an identification of a subfile in the file to be accessed.

In the case that the management server 1 does not save the corresponding relationship between the identification of the file and the identification of the subfile, the management server 1 obtains the identification of at least one subfile included in the file to be accessed from the storage server 3 according to the identification of the storage server 3, and generates At least one first reading task, and each first reading task includes the identifier of the storage server 3 and the identifier of a subfile in the file to be accessed.

Optionally, the management server 1 may also count the access frequency of the file to be accessed in the latest preset time period. When the access frequency exceeds the first preset frequency threshold, each first read task generated Can also include caching instructions. The cache indication is used to instruct the processing node 2 that has received the first reading task to cache the subfile of the file to be accessed when it obtains the subfile of the file to be accessed from the storage server 3 where the file to be accessed is located.

Optionally, the management server 1 may obtain each access time corresponding to the file to be accessed from the access history record according to the identifier of the file to be accessed, and count the number of access times in the latest preset time period to obtain the file to be accessed The number of accesses, the number of times as the access frequency of the file to be accessed.

Step 204: For each first reading task in the at least one first reading task, the management server 1 selects a processing node 2 and sends the first reading task to the processing node 2.

In this step, the management server 1 may start traversing from the first first reading task of the at least one first reading task, and each time a first reading task is traversed, the processing node included in the data access system Select a processing node 2 in 2, and send the first reading task to the processing node 2. After sending the first reading task, the management server 1 traverses the next first reading task and repeats the above process until the last first reading task is sent.

Optionally, a processing node can be selected from the processing nodes 2 of the data access system in the following two ways. The two methods are:

Manner 1: The management server 1 can randomly select a processing node 2 from the processing nodes 2 of the data access system.

Manner 2: The management server 1 can select the processing node 2 with the least number of tasks currently processed from the processing nodes 2 of the data access system.

In the second manner, the management server 1 saves the correspondence between the identification of the processing node 2 and the number of tasks, and each record in the correspondence includes an identification of the processing node 2 and the number of tasks currently being processed by the processing node 2.

In this way, the management server 1 reads the number of tasks of each processing node 2 in the data access system from the corresponding relationship when selecting the processing node 2 and selects the processing node 2 with the least number of tasks.

In the second method, after selecting the processing node 2 with the least number of tasks, the number of tasks of the processing node 2 is increased in the corresponding relationship.

Optionally, when the management server 1 selects the processing node 2 for the first first reading task, the processing node 2 may be used as a file summary node, and then before sending the first reading task, the first reading task Add the ID of the summary node of the file. Alternatively, before generating the at least one first reading task, the management server 1 may select one processing node 2 as the file summary node through the above method 1 or method 2, so that each generated first reading task includes the file The ID of the summary node.

Optionally, the management server 1 also sends a summary task to the file summary node, where the summary task includes the number of sub-files in the file to be accessed.

The file summary node selected by the above method 1 or method 2 may be different from the processing node 2 selected by the management server 1 for each first reading task, or may be different from the processing node selected by the management server 1 for a certain first reading task 2 is the same.

Optionally, the management server 1 selects a processing node 2 every time it traverses a first reading task. For a certain processing node 2, the processing node 2 may be selected by the management server 1 multiple times, that is, multiple first reading tasks are sent to the processing node 2 at different times.

In the second mode of use, the management server 1 records the number of first read tasks allocated by the selected processing node 2, that is, saves the correspondence between the identifier of the selected processing node 2 and the number of first read tasks.

Step 205: The processing node 2 receives the first reading task, and obtains the subfile corresponding to the identifier included in the first reading task from the storage server 3 where the file to be accessed is located according to the first reading task, and sends it to the management server 1 Send the obtained subfile, and go to step 209.

In this step, the processing node 2 receives the first reading task, and the processing node 2 establishes a network connection between the processing node 2 and the storage server 3 according to the identifier of the storage server 3 included in the first reading task According to the identifier of the sub file included in the first reading task, the sub file is obtained from the storage server 3 through the network connection, and the sub file is sent to the management server 1.

Optionally, due to the identification of the storage server 3 included in the first reading task, the processing node 2 can determine that the processing node 2 included in the data access system does not cache the file to be accessed, so the processing node 2 uses the storage server 3 The identifier directly establishes a network connection between the processing node 2 and the storage server 3, obtains the sub-file from the storage server 3 and directly sends the sub-file to the management server 1. The processing node 2 will not cache the sub-file in the cache 21 of the processing node 2 before sending the sub-file to the management server 1, so that the sub-file will not pass through the cache 21 of the processing node 2, reducing the number of sub-files. The transmission path of the file improves the transmission efficiency of the sub-file.

Optionally, if the first reading task also includes the identification of the document summary node, when the processing node 2 is not a document summary node, the processing node 2 sends the obtained child file to the document summary node according to the identification of the document summary node . When the processing node 2 is a document summary node, the processing node 2 also receives summary tasks and receives sub-files sent by other processing nodes 2, and the number of sub-files acquired by itself and the number of received sub-files reach the value in the summary task. In the case of the number of sub-files, the sub-files obtained by itself and the received sub-files form the files to be accessed, and the files to be accessed are sent to the management server 1.

Optionally, when the document summary node and the processing node 2 selected by the management server 1 for each first reading task are different, the document summary node receives the summary task and receives the child files sent by other processing nodes 2. When the number of sub-files reaches the number of sub-files included in the summary task, the received sub-files are formed into files to be accessed, and the files to be accessed are sent to the management server 1.

Optionally, if the first reading task further includes a cache instruction, when the processing node 2 obtains the sub file, it caches the obtained sub file in the cache 21 included in the processing node 2. After the processing node 2 sends the acquired sub-file to the management server 1, the acquired sub-file may be cached in the cache 21 included in the processing node 2. Alternatively, the processing node 2 may cache the acquired sub-files in the cache 21 included in the processing node 2 while sending the acquired sub-files to the management server 1.

Step 206: When it is found that the file to be accessed is cached in the processing node 2 in the data access system, the management server 1 generates at least one second reading task.

When a file to be accessed is cached in the processing node 2 in the data access system, the management server 1 can query the identification of each subfile in the file to be accessed and the identification of the processing node 2 where each subfile is located from the file list.

Each second reading task includes the identifier of a sub-file in the file to be accessed and the identifier of the processing node 2 where the sub-file is located.

Step 207: For each second reading task in the at least one second reading task, the management server 1 selects a processing node 2 and sends the second reading task to the processing node 2.

In this step, the management server 1 may start traversing from the first second reading task of the at least one second reading task. Whenever traversing to a second reading task, start from the processing node included in the data access system. Select a processing node 2 in 2, and send the second reading task to the processing node 2. After sending the second reading task, the management server 1 traverses the next second reading task and repeats the above process until the last second reading task is sent.

Optionally, one processing node 2 can be selected from the processing nodes 2 of the data access system through the above-mentioned method one or two.

In the second method of use, the management server 1 records the number of second reading tasks allocated by the selected processing node 2, that is, saves the correspondence between the identification of the selected processing node 2 and the number of second reading tasks.

In addition to the above method 1 and method 2, the following method 3 can also be used to select a processing node 3:

Manner 3: The management server 1 directly selects the processing node 2 corresponding to the identifier of the processing node 2 in the second reading task.

Optionally, when the management server 1 selects the processing node 2 for the first second reading task, the processing node 2 may be used as the file summary node, and then before sending the second reading task, the second reading task Add the ID of the summary node of the file. Alternatively, the management server 1 may also select one processing node 2 as the file summary node through the above method 1 or method 2 before generating at least one second reading task, so that each second reading task generated includes the file summary The ID of the node.

Optionally, the management server 1 also sends a summary task to the file summary node, and the summary task includes the number of sub-files in the file to be accessed.

The file summary node selected by the above method 1 or method 2 may be different from the processing node 2 selected by the management server 1 for each second reading task, or may be the same as the processing selected by the management server 1 for a certain second reading task. Node 2 is the same.

Optionally, the management server 1 selects a processing node 2 after each second reading task is traversed. For a certain processing node 2, the processing node 2 may be selected by the management server 1 multiple times, that is, multiple second reading tasks are sent to the processing node 2 at different times.

Step 208: The processing node 2 receives the second reading task, obtains the sub file according to the identifier of the sub file included in the second reading task and the identifier of the processing node 2, and sends the obtained sub file to the management server 1.

In this step, the processing node 2 receives the second reading task, and the second reading task includes an identifier of a subfile and an identifier of the processing node 2. If the processing node 2 is the processing node 2 corresponding to the identifier of the processing node 2 in the second reading task, the processing node 2 obtains the corresponding subfile according to the identifier of the subfile in the second reading task. If the processing node 2 is not the processing node 2 corresponding to the identification of the processing node 2 in the second reading task, the processing node 2 reads from the second reading task according to the identification of the subfile in the second reading task. The corresponding subfile is obtained from the processing node 2 corresponding to the identifier of the processing node 2 in the task.

Optionally, if the second reading task also includes the identification of the document summary node, when the processing node 2 is not a document summary node, the processing node 2 sends the obtained child file to the document summary node according to the identification of the document summary node . When the processing node 2 is a document summary node, the processing node 2 also receives summary tasks and receives sub-files sent by other processing nodes 2, and the sum of the number of sub-files acquired by itself and the number of received sub-files reaches the summary task When the number of sub-files in, the sub-files obtained by itself and the received sub-files are formed into the files to be accessed, and the files to be accessed are sent to the management server 1.

Optionally, in the case where the file summary node and the processing node 2 selected by the management server 1 for each second reading task are different, the file summary node receives the summary task and receives the sub-files sent by other processing nodes 2, and then When the number of sub-files reaches the number of sub-files in the summary task, the received sub-files are grouped into files to be accessed, and the files to be accessed are sent to the management server 1.

Step 209: The management server 1 receives the sub-files sent by each processing node 2, obtains the file to be accessed, and sends the file to be accessed to the client 4.

Optionally, the management server 1 integrates the received sub-files into a file to be accessed, and sends the file to be accessed to the client 4.

The first reading task or the second reading task also includes the identification of the file summary node. The management server 1 receives the file to be accessed sent by the file summary node, and sends the file to be accessed to the client 4.

Optionally, when the processing node 2 is selected in the second method, the management server 1 saves the corresponding relationship between the identifier of the processing node 2 and the number of tasks. For any selected processing node 2, the number of tasks of the processing node 2 stored in the corresponding relationship between the identifier of the processing node 2 and the number of tasks is subtracted from the recorded first read task number or the second number of the processing node 2 The number of read tasks.

In the embodiment of the present application, when the management server 1 determines that the file to be accessed is not cached in the processing node 2 of the data access system, it generates at least one first reading task, and each first reading task includes the storage where the file to be accessed is located. The identifier of the server 3 and the identifier of a sub-file in the file to be accessed. In this way, the processing node 2 will not first access the cache 21 in the processing node 2 when receiving the first reading task, but can directly obtain the storage server 3 from the storage server 3 according to the identifier of the storage server 3 included in the first reading task. Sub-file, and then send the sub-file to the management server 1. The processing node 2 will not cache the sub-file in the cache 21 of the processing node 2 before sending the sub-file to the management server 1, so that the sub-file can be Without passing through the cache 21 of the processing node 2, the transmission delay of the file to be accessed is reduced. When the processing node 2 of the data access system caches the file to be accessed, the generated second reading task includes the identification of the processing node 2 where the child file of the file to be accessed is located, so that the processing node 2 that receives the second reading task is convenient for The identification of the processing node 2 in the second reading task obtains the subfile, which improves the efficiency of accessing the file. In addition, when the file to be accessed is not saved in the processing node 2 of the data access system, the access frequency of the file to be accessed in the latest preset time period is obtained, and when the access frequency exceeds the first preset frequency threshold, control Processing node 2 caches the file to be accessed. When the access frequency exceeds the first preset frequency threshold, it indicates that the file to be accessed is a file that is frequently accessed recently, and the file to be accessed is saved in the cache 21 of the processing node 2 of the data access system, which not only improves the cache of the processing node 2 The utilization rate of 21 has also improved the file hit rate.

Referring to FIG. 6, an embodiment of the present application provides an apparatus 300 for data access. The apparatus 300 is deployed in the above-mentioned management server 1. The apparatus 300 is connected to a plurality of processing nodes 2, and the plurality of processing nodes 2 are connected to a storage server. 3. The device 300 includes:

The receiving unit 301 is configured to receive a file access request, and the file access request carries an identifier of the file to be accessed.

The processing unit 302 is configured to determine whether the file to be accessed is cached in the cache 21 of the multiple processing nodes 2 according to the identifier of the file to be accessed, and the device 300 stores the files cached in the cache 21 of the multiple processing nodes 2 Logo.

The processing unit 303 is further configured to, when the file to be accessed is not cached in the cache 21 of the multiple processing nodes 2, instruct at least one processing node 2 of the multiple processing nodes 2 to obtain the to-be-accessed file from the storage server 3. file.

Optionally, for the detailed implementation process of the processing unit 302 determining whether the file to be accessed is cached in the cache 21 of the multiple processing nodes 2, refer to the relevant content in step 202 of the embodiment shown in FIG. 3, which will not be described in detail here. .

Referring to FIG. 6, optionally, the apparatus 300 further includes: a first sending unit 302,

The processing unit 302 is configured to obtain the identification of at least one sub-file included in the file to be accessed and the identification of the storage server 3 where each sub-file is located from the storage server 3; generate a read file for each sub-file included in the file to be accessed For fetching tasks, each reading task includes an identification of a sub-file and an identification of the storage server 3 where the sub-file is located.

The first sending unit 303 is configured to send each reading task to a processing node 2 respectively, and instruct the processing node 2 that has received the reading task to read the sub-file from the storage server 3 that stores the sub-file.

The receiving unit 301 is configured to receive the subfile read by the processing node 2 that has received the read task.

The processing unit 302 is configured to merge the received sub-files into a file to be accessed.

Optionally, the processing unit 302 generates a detailed implementation process of the reading task, which can be referred to related content in step 203 of the embodiment shown in FIG. 3. For the detailed implementation process of sending the reading task by the first sending unit 303, please refer to the related content in step 204 of the embodiment shown in FIG. 3, which will not be described in detail here.

Referring to FIG. 6, optionally, the apparatus 300 further includes: a second sending unit 304,

The second sending unit 304 is configured to send a cache task to at least one processing node 2 of the plurality of processing nodes 2 when the access frequency of the file to be accessed exceeds the preset frequency, so as to instruct the at least one processing node 2 to send the file to be accessed The included sub-files are cached to the at least one processing node 2.

The processing unit 302 is also used to record the sub-file identifier included in the identifier of the file to be accessed and the identifier of the processing node 2 that caches each sub-file; when the file to be accessed is cached in the cache 21 of the multiple processing nodes 2, generate At least one reading task, each reading task includes the identification of the subfile and the identification of the processing node 2 where the subfile is located.

The second sending unit 304 is further configured to send at least one reading task to multiple processing nodes 2 to instruct the multiple processing nodes 2 to read the sub-file from the cache 21 of the processing node 2 storing the sub-file.

The processing unit 302 is also used to synthesize the fetched sub-files into the file to be accessed.

Optionally, for the detailed implementation process of the second sending unit 304 sending the buffering task, refer to related content in steps 2023 and 2024 in the embodiment shown in FIG. 4. The processing unit 302 generates a detailed implementation process of the reading task, which can be referred to related content in step 206 in the embodiment shown in FIG. 3. As well as the detailed implementation process of the second sending unit 304 sending the reading task, please refer to the related content in step 207 in the embodiment shown in FIG. 3, which will not be described in detail here.

Optionally, the second sending unit 304 is further configured to send a deletion task to the processing node 2 where the subfile included in the file to be accessed is located when the access frequency of the file to be accessed is lower than the preset frequency, and the deletion task includes the subfile. The identifier of the file to instruct the processing node 2 to delete the sub-file.

The processing unit 302 is further configured to delete the identifier of the subfile and the identifier of the processing node 2 recorded in the device 300.

Optionally, for the detailed implementation process of the second sending unit 304 sending the deletion task, refer to the related content in

steps

2122 and 2123 in the embodiment shown in FIG. 5, which will not be described in detail here.

In the embodiment of the present application, the processing unit 302 determines whether the file to be accessed is cached in the cache 21 of the multiple processing nodes 2 according to the identifier of the file to be accessed; when the file to be accessed is not cached in the cache of the multiple processing nodes 2 In step 21, at least one processing node 2 of the multiple processing nodes 2 is instructed to obtain the file to be accessed from the storage server 3. In this way, the at least one processing node 2 directly reads the file to be accessed from the storage server 3 according to the instruction of the processing unit 302, and directly returns the file to be accessed to the device 300 after reading the file to be accessed, and then returns the file to be accessed to the device 300. Before accessing the file, the file to be accessed will not be cached in the cache 21 of the at least one processing node 2. In this way, the file to be accessed does not need to pass through the cache 21 of the at least one processing node 2, thereby reducing the transmission path of the file to be accessed, reducing the path for data reading, and improving the performance of data access.

Referring to FIG. 7, an embodiment of the present application provides an apparatus 400 for data access. The apparatus 400 is deployed in the above-mentioned processing node 2. The apparatus 400 is one of multiple processing nodes 2 connected to the management server 1. Two processing nodes 2 are connected to the storage server 3. The device 400 includes:

The receiving unit 401 is configured to receive a reading task, which is a task sent by the management server 1 when it determines that the file to be accessed is not cached in the cache 21 of the multiple processing nodes 2, and the reading task includes the file to be accessed The identifier of a sub-file in and the identifier of the storage server 3 where the sub-file is located.

The processing unit 402 is configured to read the sub file from the storage server 3 corresponding to the identifier of the storage server 3 according to the identifier of the sub file.

The sending unit 403 is configured to send the read sub-file to the management server 1.

Optionally, for the detailed implementation process of the processing unit 402 reading the sub-file, refer to the related content in step 205 in the embodiment shown in FIG. 3, which will not be described in detail here.

Optionally, the receiving unit 401 is further configured to receive a cache task, which is a task sent by the management server 1 when the access frequency of the file to be accessed exceeds a preset frequency, and the cache task includes the status of a subfile of the file to be accessed. The identifier and the identifier of the storage server 3 where the sub-file is located.

The processing unit 402 is further configured to read the sub file from the storage server 3 corresponding to the identifier of the storage server 3 according to the identifier of the sub file; store the sub file in the cache 21 of the device 400.

Optionally, for the detailed implementation process of the processing unit 402 caching the sub-files, refer to the related content in step 2025 in the embodiment shown in FIG. 4, which will not be described in detail here.

In the embodiment of the present application, since the reading task received by the receiving unit 401 includes the identification of a sub-file in the file to be accessed and the identification of the storage server 3 where the sub-file is located; The storage server 3 corresponding to the identifier of the storage server 3 reads the subfile; the sending unit 403 sends the read subfile to the management server 1. In this way, the processing unit 402 can directly read the sub-file from the storage server 3 according to the identifier of the storage server 3, and then the sending unit 403 directly returns the sub-file to the management server 1, and does not return the sub-file to the management server 1. The processing unit 402 caches the sub-file in the cache 21 of the device 400. Therefore, the subfile returned to the management server 1 will not pass through the cache 21 of the device 400, reducing the transmission path of the subfile, reducing the data reading path, and improving the performance of data access.

Referring to FIG. 8, FIG. 8 is a schematic diagram of a data access apparatus 500 provided by an embodiment of the application. The device 500 includes at least one processor 501, a bus system 502, a memory 503, and a transceiver 504.

The device 500 is a device with a hardware structure, and can be used to implement the functional units in the device described in FIG. 6. For example, those skilled in the art can imagine that the processing unit 302 in the device 300 shown in FIG. 6 can be implemented by the at least one processor 501 calling the application code in the memory 503, and the receiving unit in the device 300 shown in FIG. 6 301. The first sending unit 303 and the second sending unit 304 may be implemented by the transceiver 504.

Optionally, the device 500 may also be used to implement the functions of the management server 1 in the embodiment described in FIG. 1 or FIG. 3.

Optionally, the above-mentioned processor 501 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the computer Apply for integrated circuits for program execution.

The above-mentioned bus system 502 may include a path for transferring information between the above-mentioned components.

The above-mentioned memory 503 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions. The type of dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), or other optical disk storage, optical discs Storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by Any other medium accessed by the computer, but not limited to this. The memory can exist independently and is connected to the processor through a bus. The memory can also be integrated with the processor.

The memory 503 is used to store application program codes for executing the solutions of the present application, and the processor 501 controls the execution. The processor 501 is configured to execute the application program code stored in the memory 503, so as to realize the functions in the method of the present patent.

In a specific implementation, as an embodiment, the processor 501 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 5.

In specific implementation, as an embodiment, the apparatus 500 may include multiple processors, such as the processor 501 and the processor 508 in FIG. 5. Each of these processors can be a single-CPU (single-CPU) processor or a multi-core (multi-CPU) processor. The processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).

Referring to FIG. 9, FIG. 9 is a schematic diagram of a data access apparatus 600 provided by an embodiment of the application. The device 700 includes at least one processor 601, a bus system 602, a memory 603, and a transceiver 604. The memory 603 also includes a cache 21, and the cache 21 is used to store subfiles included in files whose access frequency exceeds a preset frequency.

The device 600 is a device with a hardware structure, and can be used to implement the functional units in the device described in FIG. 7. For example, those skilled in the art can imagine that the processing unit 402 in the device 400 shown in FIG. 7 can be implemented by calling the code in the memory 603 by the at least one processor 601. The sending unit 403 and the sending unit 403 in the device 400 shown in FIG. The receiving unit 401 can be implemented by the transceiver 604.

Optionally, the device 600 may also be used to implement the function of the processing node 2 in the embodiment described in FIG. 1 or FIG. 3.

Optionally, the aforementioned processor 601 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the computer. Apply for integrated circuits for program execution.

The above-mentioned bus system 602 may include a path for transferring information between the above-mentioned components.

The aforementioned memory 603 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), or other types that can store information and instructions. The type of dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), or other optical disk storage, optical discs Storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by Any other medium accessed by the computer, but not limited to this. The memory can exist independently and is connected to the processor through a bus. The memory can also be integrated with the processor.

The memory 603 is used to store application program codes for executing the solutions of the present application, and the processor 601 controls the execution. The processor 601 is configured to execute the application program code stored in the memory 603, so as to realize the functions in the method of the present patent.

In a specific implementation, as an embodiment, the processor 601 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 7.

In specific implementation, as an embodiment, the apparatus 600 may include multiple processors, such as the processor 601 and the processor 608 in FIG. 7. Each of these processors can be a single-CPU (single-CPU) processor or a multi-core (multi-CPU) processor. The processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).

The above are only optional embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection of this application. Within range.

Claims

A method for data access, characterized in that the method is executed by a management server, the management server is connected to a plurality of processing nodes, and the plurality of processing nodes are connected to a storage server, and the method includes:

Receiving a file access request, where the file access request carries an identifier of the file to be accessed;

Determining, according to the identifier of the file to be accessed, whether the file to be accessed is cached in the cache of at least one of the multiple processing nodes, and the management server stores the identifier of the cached file;

When the file to be accessed is not cached in the cache of at least one processing node of the multiple processing nodes, at least one processing node of the multiple processing nodes is instructed to obtain the file to be accessed from the storage server .
The method of claim 1, wherein the method further comprises:

Acquiring, from the storage server, the identifier of at least one sub-file included in the file to be accessed and the identifier of the storage server where each sub-file is located;

The instructing at least one processing node of the plurality of nodes to obtain the file to be accessed from the storage server includes:

Generating a reading task for each sub-file included in the file to be accessed, and each reading task includes an identification of a sub-file and an identification of the storage server where the sub-file is located;

Sending each reading task to a processing node, and instructing the processing node that has received the reading task to read the sub-file from the storage server that stores the sub-file;

Receive the subfile read by the processing node that has received the read task;

The sub-files are merged into the file to be accessed.
The method according to claim 1 or 2, wherein the method further comprises:

When the access frequency of the file to be accessed exceeds the preset frequency, send a cache task to at least one processing node of the multiple processing nodes to instruct the at least one processing node to remove the subfiles included in the file to be accessed Cache to the at least one processing node;

Record the identifier of the sub-file included in the identifier of the file to be accessed and the identifier of the processing node that caches each sub-file;

When the file to be accessed is cached in the caches of the multiple processing nodes, at least one reading task is generated, and each reading task includes the identifier of the subfile and the identifier of the processing node where the subfile is located;

Sending the at least one reading task to the multiple processing nodes, instructing the multiple processing nodes to read the sub-file from the cache of the processing node storing the sub-file;

Synthesize the read sub-files into the file to be accessed.
The method of claim 3, wherein the method further comprises:

When the access frequency of the file to be accessed is lower than the preset frequency, a deletion task is sent to the processing node where the subfile included in the file to be accessed is located, and the deletion task includes the identifier of the subfile to instruct the The processing node deletes the sub-file;

Delete the identifier of the subfile and the identifier of the processing node recorded in the management server.
A data access device, characterized in that the device is connected to multiple processing nodes, the multiple processing nodes are connected to a storage server, and the device includes:

The receiving unit is configured to receive a file access request, where the file access request carries an identifier of the file to be accessed;

A processing unit, configured to determine whether the file to be accessed is cached in the cache of at least one processing node of the plurality of processing nodes according to the identifier of the file to be accessed, and the device stores the identifier of the file to be cached;

The processing unit is further configured to, when the file to be accessed is not cached in the cache of at least one processing node of the multiple processing nodes, instruct at least one processing node of the multiple processing nodes to download from the storage Obtain the file to be accessed from the server.
The device according to claim 5, wherein the device further comprises: a first sending unit,

The processing unit is configured to obtain the identification of at least one sub-file included in the file to be accessed and the identification of the storage server where each sub-file is located from the storage server; Each sub-file generates a reading task, and each reading task includes an identifier of a sub-file and an identifier of the storage server where the sub-file is located;

The first sending unit is configured to send each reading task to a processing node, and instruct the processing node that has received the reading task to read the sub-file from a storage server that stores the sub-file;

The receiving unit is configured to receive the subfile read by the processing node that has received the read task;

The processing unit is configured to merge the sub-files into the file to be accessed.
The device according to claim 5 or 6, wherein the device further comprises: a second sending unit,

The second sending unit is configured to send a cache task to at least one processing node of the multiple processing nodes when the access frequency of the to-be-accessed file exceeds a preset frequency, so as to instruct the at least one processing node to Cache the sub-files included in the file to be accessed to the at least one processing node;

The processing unit is further configured to record the identifier of the sub-file included in the identifier of the file to be accessed and the identifier of the processing node that caches each sub-file; when the file to be accessed is cached in the caches of the multiple processing nodes When generating at least one reading task, each reading task includes the identifier of the sub-file and the identifier of the processing node where the sub-file is located;

The second sending unit is further configured to send the at least one reading task to the multiple processing nodes, and instruct the multiple processing nodes to read the sub file;

The processing unit is also used to synthesize the read sub-files into the file to be accessed.
The device of claim 7, wherein:

The second sending unit is further configured to send a deletion task to the processing node where the subfile included in the file to be accessed is located when the access frequency of the file to be accessed is lower than the preset frequency, where the deletion task includes all The identifier of the subfile to instruct the processing node to delete the subfile;

The processing unit is further configured to delete the identifier of the subfile and the identifier of the processing node recorded in the device.
A data access system, characterized in that the system includes: a management server, a storage server and multiple processing nodes;

The management server is configured to receive a file access request, where the file access request carries an identifier of the file to be accessed; determining whether the file to be accessed is cached in the multiple processing nodes according to the identifier of the file to be accessed In the cache of at least one processing node, the management server stores the identifier of the file to be cached; when the file to be accessed is not cached in the cache of at least one of the multiple processing nodes, from the storage The server acquires the identifier of at least one sub-file included in the file to be accessed and the identifier of the storage server where each sub-file is located, and generates a read task for each sub-file included in the file to be accessed, each The reading task includes the identifier of a subfile and the identifier of the storage server where the subfile is located; sending each of the reading tasks to a processing node;

The processing node that has received the reading task is configured to read the corresponding sub-file from the storage server corresponding to the identifier of the storage server according to the identifier of the sub-file in the received reading task, and send the corresponding sub-file to the management server. Sub-files read;

The management server is further configured to receive the subfile read by the processing node that has received the read task.
The system of claim 9, wherein:

The management server is further configured to send a cache task to at least one processing node of the plurality of processing nodes when the access frequency of the file to be accessed exceeds a preset frequency;

The processing node that receives the cache task is used to cache the sub-files included in the file to be accessed;

The management server is also used to record the identifier of the sub-file included in the identifier of the file to be accessed and the identifier of the processing node that caches each sub-file;

The management server is further configured to generate at least one reading task when the file to be accessed is cached in the caches of the multiple processing nodes, and each reading task includes the identifier of the subfile and the location of the subfile. The identification of the processing node; sending the at least one read task to the multiple processing nodes;

The processing node that receives the reading task is configured to read the sub-file according to the identifier of the sub-file in the received reading task and the identifier of the processing node where the sub-file is located, and send the read to the management server sub file;

The management server is further configured to receive the subfile read by the processing node that receives the read task.