CN112416871B - Data access method, device and system - Google Patents

Data access method, device and system Download PDF

Info

Publication number
CN112416871B
CN112416871B CN201910786485.4A CN201910786485A CN112416871B CN 112416871 B CN112416871 B CN 112416871B CN 201910786485 A CN201910786485 A CN 201910786485A CN 112416871 B CN112416871 B CN 112416871B
Authority
CN
China
Prior art keywords
file
accessed
processing node
sub
identifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910786485.4A
Other languages
Chinese (zh)
Other versions
CN112416871A (en
Inventor
李铮
王明月
刘玉
张巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201910786485.4A priority Critical patent/CN112416871B/en
Priority to PCT/CN2020/110819 priority patent/WO2021036989A1/en
Publication of CN112416871A publication Critical patent/CN112416871A/en
Application granted granted Critical
Publication of CN112416871B publication Critical patent/CN112416871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1727Details of free space management performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device and a system for data access, and belongs to the field of communication. The method is performed by a management server connected to a plurality of processing nodes connected to a storage server, the method comprising: receiving a file access request, wherein the file access request carries an identification of a file to be accessed; determining whether the file to be accessed is cached in caches of the plurality of processing nodes according to the identification of the file to be accessed, wherein the management server stores the identification of the file cached in the caches of the plurality of processing nodes; and when the file to be accessed is not cached in the caches of the plurality of processing nodes, indicating at least one processing node in the plurality of processing nodes to acquire the file to be accessed from the storage server. The application can reduce the data reading path and improve the data access performance.

Description

Data access method, device and system
Technical Field
The present application relates to the field of computers, and in particular, to a method, an apparatus, and a system for data access.
Background
With the advent of the big data age, the enterprise data scale is expanding continuously, and how to access massive data quickly is a core problem faced by enterprises.
Currently, to improve the access efficiency of data, enterprises generally use a distributed system, which includes a coordination server, a plurality of processing nodes, and a storage server for storing data. When an access request sent by a client is received, the coordination server breaks down the access request into a plurality of tasks, the tasks are respectively issued to each processing node, the processing nodes respectively access data in the storage server, the data read by each processing node are returned to the coordination server, and the coordination server integrates the data returned by each processing node and then returns the data to the client.
After each processing node receives the task sent by the coordination server, firstly judging whether the data to be accessed in the received task is in the cache of the processing node, if the data is in the cache, directly reading the data from the cache, if the data is not in the cache, reading the data to be accessed from the storage server to the cache, and then reading the data from the cache. It can be seen that, for each processing node, if the accessed data does not hit in the cache, the data in the storage server needs to be read to the processing node cache, and then the data in the cache is read to the coordination server, so that the path for reading the data is increased, and the performance of data access is affected.
Disclosure of Invention
The application provides a data access method, a data access device and a data access system, which are used for reducing a data reading path and improving the data access performance. The technical scheme is as follows:
in a first aspect, the present application provides a method of data access, the method being performed by a management server, the management server being coupled to a plurality of processing nodes, the plurality of processing nodes being coupled to a storage server, the management server having stored therein an identification of a file cached in a cache of the plurality of processing nodes. In the method, a file access request is received, wherein the file access request carries an identification of a file to be accessed; determining whether the file to be accessed is cached in caches of the plurality of processing nodes according to the identification of the file to be accessed; and when the file to be accessed is not cached in the caches of the plurality of processing nodes, instructing at least one processing node in the plurality of processing nodes to acquire the file to be accessed from the storage server. Since the at least one processing node is instructed to retrieve the file to be accessed from the storage server when the file to be accessed is not cached in the caches of the plurality of processing nodes. In this way, the at least one processing node directly reads the file to be accessed from the storage server according to the instruction of the management server, and directly returns the file to be accessed to the management server after reading the file to be accessed, so that the file to be accessed is not cached in the cache of the at least one processing node before returning the file to be accessed to the management server. Therefore, the file to be accessed does not need to pass through the cache of the at least one processing node, so that the transmission path of the file to be accessed is reduced, the path of data reading is reduced, and the performance of data access is improved.
In one possible implementation manner, obtaining an identifier of at least one subfile included in a file to be accessed and an identifier of a storage server where each subfile is located from the storage servers; generating a reading task for each sub-file included in the file to be accessed, wherein each reading task comprises an identification of the sub-file and an identification of a storage server where the sub-file is located; each reading task is respectively sent to a processing node, and the processing node receiving the reading task is instructed to read the subfiles from a storage server storing the subfiles; receiving a subfile read by a processing node receiving a reading task; and merging the subfiles into a file to be accessed. Because the generated reading task includes the identifier of the storage server where the sub-file is located, the processing node receiving the reading task can directly read the sub-file from the storage server according to the identifier of the storage server in the reading task, and directly send the sub-file to the management server after reading the sub-file. Therefore, the subfiles are not cached in the cache of the processing node, and then the processing node reads the subfiles from the cache of the processing node and sends the subfiles to the management server, so that the transmission path of the subfiles is reduced, and the performance of reading the subfiles is improved.
In another possible implementation manner, when the access frequency of the file to be accessed exceeds a preset frequency, a buffer task is sent to at least one processing node of the plurality of processing nodes, so as to instruct the at least one processing node to buffer the subfiles included in the file to be accessed to the at least one processing node; recording the identification of the subfiles included in the identification of the file to be accessed and the identification of the processing node for caching each subfile. When a file to be accessed is cached in caches of the plurality of processing nodes, generating at least one reading task, wherein each reading task comprises an identifier of a sub-file and an identifier of a processing node where the sub-file is located; sending the at least one reading task to the plurality of processing nodes, and instructing the plurality of processing nodes to read the subfiles from the caches of the processing nodes storing the subfiles; and synthesizing the read subfiles into a file to be accessed.
When the access frequency of the files to be accessed exceeds the preset frequency, the files to be accessed are indicated to be files which are frequently accessed, and the files to be accessed which are frequently accessed are stored in the cache of the at least one processing node due to the limited cache space in each processing node, so that the cache utilization rate of the processing node is improved, and the hit rate of the files to be accessed is also improved. When a file to be accessed is cached in caches of the plurality of processing nodes, the generated reading task comprises the identification of the processing node where the sub-file is located, so that the processing node receiving the caching task does not need to determine the processing node where the sub-file is located any more, the sub-file is directly read from the processing node where the sub-file is located according to the identification of the processing node where the sub-file is located, and the efficiency of reading the sub-file is improved.
In another possible implementation manner, when the access frequency of the file to be accessed is lower than a preset frequency, a deletion task is sent to a processing node where a sub-file included in the file to be accessed is located, where the deletion task includes an identifier of the sub-file, so as to instruct the processing node to delete the sub-file; and deleting the identification of the subfile recorded in the management server and the identification of the processing node. Therefore, files to be accessed with lower access frequency can be deleted from caches of the plurality of processing nodes, more cache space can be saved for storing files with higher access frequency, and therefore the cache utilization rate of the plurality of processing nodes is improved, and the hit rate of the files is also improved.
In a second aspect, the present application provides a method of data access, the method being performed by a processing node, the processing node being one of a plurality of processing nodes connected to a management server, the plurality of processing nodes being connected to a storage server. In the method, a reading task is received, wherein the reading task is a task sent by a management server when determining that a file to be accessed is not cached in caches of the plurality of processing nodes, and comprises an identification of a sub-file in the file to be accessed and an identification of a storage server where the sub-file is located; reading the subfiles from the storage servers corresponding to the identifiers of the storage servers according to the identifiers of the subfiles; and sending the read subfiles to a management server. Because the received reading task includes the identification of the sub-file and the identification of the storage server where the sub-file is located, the processing node can directly read the sub-file from the storage server according to the identification of the storage server and then directly return the sub-file to the management server, and the processing node cannot buffer the sub-file to the processing node before returning the sub-file to the management server. Therefore, the subfiles returned to the management server cannot pass through the cache of the processing node, the transmission path of the subfiles is reduced, the path of data reading is reduced, and the performance of data access is improved.
In one possible implementation manner, a cache task is received, wherein the cache task is a task sent by a management server when the access frequency of a file to be accessed exceeds a preset frequency, and comprises an identifier of a sub-file of the file to be accessed and an identifier of a storage server where the sub-file is located; reading the subfiles from the storage servers corresponding to the identifiers of the storage servers according to the identifiers of the subfiles; the subfile is stored in a cache of the processing node. When the access frequency of the file to be accessed exceeds the preset frequency, the file to be accessed is indicated to be a file which is frequently accessed, and the subfiles of the file to be accessed which is frequently accessed are saved in the cache of the processing node due to the limited cache space in the processing node, so that the cache utilization rate of the processing node is improved, and the hit rate of the file to be accessed is also improved.
In a third aspect, the present application provides a method of data access, the method performed by a processing node, the processing node being one of a plurality of processing nodes connected to a management server, the plurality of processing nodes being connected to a storage server. In the method, a reading task is received, wherein the reading task is a task sent by a management server when determining that a file to be accessed is cached in caches of a plurality of processing nodes, and comprises an identification of a sub-file in the file to be accessed and an identification of a processing node where the sub-file is located; reading the subfiles from the processing nodes corresponding to the identifiers of the processing nodes according to the identifiers of the subfiles; and sending the read subfiles to a management server. When the file to be accessed is cached in the caches of the plurality of processing nodes, the reading task comprises the identification of the processing node where the sub-file is located, so that the processing node does not need to determine the processing node where the sub-file is located any more, the sub-file is directly read from the processing node where the sub-file is located according to the identification of the processing node where the sub-file is located, and the efficiency of reading the sub-file is improved.
In one possible implementation manner, a deletion task is received, wherein the deletion task is a task sent by a management server when the access frequency of a file to be accessed is lower than a preset frequency, and the deletion task comprises an identifier of a sub-file of the file to be accessed; and deleting the subfiles corresponding to the identifiers of the subfiles. Therefore, when the access frequency of the files to be accessed is low, the processing node can delete the subfiles belonging to the files to be accessed from the own cache, and more cache space can be saved for storing the files with higher access frequency, so that the cache utilization rate of the processing node is improved, and the hit rate of the files is also improved.
In a fourth aspect, the present application provides an apparatus for data access for performing the method of the first aspect or any of the alternative implementations of the first aspect. In particular, the apparatus comprises means for performing the method of the first aspect or any one of the possible implementations of the first aspect.
In a fifth aspect, the present application provides an apparatus for data access for performing the method of the second aspect or an alternative implementation of the second aspect. In particular, the apparatus comprises means for performing the method of the second aspect or one possible implementation of the second aspect. Or for performing the method of the third aspect or an alternative implementation of the third aspect. In particular, the apparatus comprises means for performing the method of the third aspect or in one possible implementation of the third aspect
In a sixth aspect, the present application provides an apparatus for data access, the apparatus comprising: the processor is connected with the memory and the communication interface through a bus; the memory stores computer-executable instructions that are executed by the processor to implement the operational steps of the method of the first aspect or any one of the possible implementations of the first aspect.
In a seventh aspect, the present application provides an apparatus for data access, the apparatus comprising: the processor is connected with the memory and the communication interface through buses; the memory stores computer-executable instructions for execution by the processor for performing the operational steps of the method of the second aspect or one possible implementation of the second aspect or for performing the operational steps of the method of the third aspect or one possible implementation of the third aspect.
In an eighth aspect, the present application provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of the above aspects.
In a ninth aspect, the application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the above aspects.
In a tenth aspect, the present application provides a system for data access, the system comprising: a management server, a storage server, and a plurality of processing nodes.
The management server receives a file access request, wherein the file access request carries an identification of a file to be accessed; determining whether the file to be accessed is cached in the caches of the plurality of processing nodes according to the identification of the file to be accessed, wherein the identification of the file cached in the caches of the plurality of processing nodes is stored in the management server; when the file to be accessed is not cached in the caches of the plurality of processing nodes, acquiring the identification of at least one sub-file included in the file to be accessed and the identification of the storage server where each sub-file is located from the storage server, and generating a reading task for each sub-file included in the file to be accessed, wherein each reading task comprises the identification of one sub-file and the identification of the storage server where the sub-file is located; each read task is sent to a processing node, respectively. And the processing node receiving the reading task reads the corresponding subfile from the storage server corresponding to the identifier of the storage server according to the identifier of the subfile in the received reading task, and sends the read subfile to the management server. The management server receives the subfiles read by the processing nodes receiving the reading task.
Because the files to be accessed are not cached in the caches of the plurality of processing nodes, each reading task generated by the management server comprises an identification of a sub-file and an identification of a storage server where the sub-file is located. The processing node receiving the reading task can directly read the subfile from the storage server according to the identification of the storage server in the reading task, and directly send the subfile to the management server after reading the subfile. Therefore, the subfiles are not cached in the cache of the processing node, and then the processing node reads the subfiles from the cache of the processing node and sends the subfiles to the management server, so that the transmission path of the subfiles is reduced, and the performance of reading the subfiles is improved.
In one possible implementation, the management server sends the cache task to at least one processing node of the plurality of processing nodes when the access frequency of the file to be accessed exceeds a preset frequency. The processing node receiving the caching task caches the subfiles included in the file to be accessed. And the management server also records the sub-file identification included in the identification of the file to be accessed and the identification of the processing node for caching each sub-file. When a file to be accessed is cached in caches of the plurality of processing nodes, the management server generates at least one reading task, wherein each reading task comprises an identifier of a sub-file and an identifier of a processing node where the sub-file is located; at least one read task is sent to the plurality of processing nodes. And the processing node receiving the reading task reads the subfiles according to the received identification of the subfiles in the reading task and the identification of the processing node where the subfiles are positioned, and sends the read subfiles to the management server. The management server receives the subfiles read by the processing nodes receiving the reading task.
When the access frequency of the files to be accessed exceeds the preset frequency, the files to be accessed are indicated to be files which are frequently accessed, and the files to be accessed which are frequently accessed are stored in the cache of the at least one processing node due to the limited cache space in each processing node, so that the cache utilization rate of the processing node is improved, and the hit rate of the files to be accessed is also improved. When a file to be accessed is cached in caches of the plurality of processing nodes, the generated reading task comprises the identification of the processing node where the sub-file is located, so that the processing node receiving the caching task does not need to determine the processing node where the sub-file is located any more, the sub-file is directly read from the processing node where the sub-file is located according to the identification of the processing node where the sub-file is located, and the efficiency of reading the sub-file is improved.
Drawings
FIG. 1 is a schematic diagram of a data access system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a client access data access system according to an embodiment of the present application;
FIG. 3 is a flow chart of a method for data access according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for caching a file according to an embodiment of the present application;
FIG. 5 is a flowchart of a method for deleting a file according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a device for data access according to an embodiment of the present application;
FIG. 7 is a schematic diagram of another apparatus for data access according to an embodiment of the present application;
FIG. 8 is a schematic diagram of another apparatus for data access according to an embodiment of the present application;
fig. 9 is a schematic diagram of another apparatus for data access according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present application provides a data access system, including: the system comprises a management server 1, a plurality of processing nodes 2 and at least one storage server 3, wherein the management server 1 is connected with each processing node 2, and the processing nodes 2 are also connected. The management server 1 and each processing node 2 are connected to respective storage servers 3 via a network.
The management server 1 is configured to decompose the received file access request into a plurality of tasks, send the tasks to each processing node 2, access the files in the storage server 3 by each processing node 2, return the read files to the management server 1 by each processing node 2, integrate the files returned by each processing node 2 by the management server 1, and return the files to a client (not shown).
Each storage server 3 stores therein a file for access by a user. In the storage server 3, one file may be divided into a plurality of subfiles to be saved. For example, the file may be a form in a database. Assuming that the form includes 100 records, three subfiles, a first subfile, a second subfile, and a third subfile, respectively, are used in the storage server 3 to save the form. The first subfile holds the records of the 1 st to 33 th of the form, the second subfile holds the records of the 34 th to 66 th of the form, and the third subfile holds the 67 th to 100 th of the form.
For any one of the files in the storage server 3, the management server 1 may cache the subfiles that the file includes into one or more processing nodes 2 of the data access system. For each sub-file included in the file, when the management server 1 caches the sub-file in a processing node 2, the corresponding relationship among the identification of the file, the identification of the sub-file, and the identification of the processing node 2 is stored in a file list.
Optionally, the processing node 2 includes a cache 21, and the management server 1 caches the subfiles in the cache 21 of the processing node 2.
The detailed implementation of the files comprising the sub-files in the processing node 2 of the data access system by the management server 1 may be referred to in the following embodiments of fig. 3, and will not be described in detail here.
Regarding the above-described file list, description will be made next for an example, assuming that the identifier of the form is ID1, the identifier of the first sub-file, the identifier of the second sub-file, and the identifier of the third sub-file included in the form are file1, file2, and file3, respectively. It is assumed that the management server 1 buffers the first subfile, the second subfile, and the third subfile in the first processing node, the second processing node, and the third processing node, respectively, and that the identification of the first processing node, the identification of the second processing node, and the identification of the third processing node are TE1, TE2, and TE3, respectively. The management server 1 stores the identification ID1 of the form, the identification file1 of the first sub-file, and the identification TE1 of the first processing node in correspondence with the file list shown in table 1 below, stores the identification ID1 of the form, the identification file2 of the second sub-file, and the identification TE2 of the second processing node in correspondence with the file list shown in table 1 below, and stores the identification ID1 of the form, the identification file3 of the third sub-file, and the identification TE3 of the third processing node in correspondence with the file list shown in table 1 below.
TABLE 1
Identification of files Identification of subfiles Identification of processing nodes
ID1 file1 TE1
ID1 file2 TE2
ID1 file3 TE3
…… …… ……
Alternatively, referring to fig. 2, when the user needs to access the file to be accessed, the identifier of the file to be accessed may be input to the client 4. The client 4 acquires the input identification of the file to be accessed, and transmits a file access request including the identification of the file to be accessed to the management server 1.
The management server 1 receives the file access request, and determines whether the file to be accessed is cached in the processing node 2 of the data access system according to the file list and the identification of the file to be accessed included in the file access request. If the file to be accessed is cached in the processing node 2 of the data access system, the file to be accessed is acquired from the processing node 2 of the data access system. If the file to be accessed is not cached in the processing node 2 of the data access system, the processing node 2 is controlled to acquire the file to be accessed from the storage server 3 where the file to be accessed is located. And then sending the file to be accessed to the client.
The detailed implementation process of the management server 1 to obtain the file to be accessed is referred to in the following embodiment shown in fig. 3, and will not be described in detail here.
Alternatively, the identification of the file may be a file name of the file, etc., and the identification of the subfile may be a storage path or a file name of the subfile in the storage server, etc.
In the embodiment of the present application, since the management server 1 stores a file list, the file list stores the correspondence between the identifier of the file cached in the processing node 2 in the data access system, the identifier of the sub-file in the file, and the identifier of the processing node 2 where the sub-file is located. Thus, when receiving the identification of the file to be accessed sent by the client 4, the management server 1 can determine whether the file to be accessed is cached in the processing node 2 of the data access system according to the identification of the file to be accessed and the file list. In the case that the file to be accessed is not cached in the processing node 2 of the data access system, the processing node 2 is controlled to acquire the file to be accessed from the storage server 3 where the file to be accessed is located. The processing node 2 does not first read a file from the cache 21 of the processing node 2 of the data access system under the control of the management server 1, but directly obtains the file to be accessed from the storage server 3, then directly sends the obtained file to the management server 1, and does not cache the file in the cache 21 included in the file before sending the file to the management server 1, so that the file to be accessed does not need to be cached in the cache 21 of the processing node 2 first, and then is read out from the cache 21 by the processing node 2 and sent to the management server 1, thereby improving the efficiency of accessing the file.
In the embodiment of the present application, the data access system is mainly used for accessing database data, and the file access method provided in the embodiment of the present application is described below by taking accessing the file in the storage server 3 as an example.
Referring to fig. 3, an embodiment of the present application provides a method for accessing a file, which may be applied to the system described in fig. 1, including:
step 201: the management server 1 receives a file access request comprising an identification of the file to be accessed.
The user logs in to the management server 1 through the client 4, and then the client 4 displays an interface provided by the management server 1. The user can input a file access request through the interface provided by the management server 1. The file access request may be a database access statement that may include an identification of the file to be accessed. The database access statement may be a structured access language (structured query language, SQL) access statement and the file to be accessed may be a form in the database.
For example, assume that a user inputs, through the client 4, an SQL access statement of "select name from teacher join people on teacher.id=scope.id" for requesting access to two forms, one of which is named "teacher", the other of which is named "scope.id", and the other of which is named "scope.id". When receiving the SQL access statement, the management server 1 extracts the identifiers of the two forms in the SQL access statement, for example, extracts the identifiers of the two forms in the SQL access statement, wherein the identifier of one form is the identifier "pointer.id" of the form "pointer", and the identifier of the other form is the identifier "pointer.id" of the form "pointer".
Optionally, the management server 1 also analyzes whether the statement format of the SQL access statement is correct, and if so, executes step 202. If not, an alarm indicating that the statement is incorrect is fed back to the client 4, and the client 4 receives the alarm and displays the alarm to the user.
Step 202: the management server 1 determines from the identification and the file list whether files to be accessed are cached in the processing node 2 in the data access system.
As shown in table 1, the file list is used to store the correspondence between the identification of the file, the identification of the sub-file and the identification of the processing node 2.
The management server 1 may cache files in the processing node 2 of the data access system. The file comprises a plurality of subfiles, and when caching a subfile included in the file to a certain processing node 2, the corresponding relation among the identification of the file, the identification of the subfile and the identification of the processing node 2 is stored in a file list.
Alternatively, the management server 1 may cache the files with the access frequency exceeding the preset frequency into the processing node 2 of the data access system. And deleting files with access frequency lower than the preset frequency from the processing node 2 of the data access system.
Optionally, the management server 1 caches in the processing node 2 of the data access system files whose access frequency exceeds the first preset frequency threshold during the period of the last preset duration. And deleting, from the processing node 2 of the data access system, files for which the access frequency in the period of the last preset duration does not exceed the second preset frequency threshold. The first preset frequency threshold is greater than or equal to the second preset frequency threshold.
The management server 1 holds therein history access records each of which holds an identification of a file that the user has accessed and access time.
Optionally, the management server 1 counts the access frequency of the file accessed in the latest time period of the preset duration in the history access record periodically or aperiodically, and when the access frequency of the file exceeds the first preset frequency threshold, the control processing node 2 acquires the file from the storage server 3 where the file is located and caches the file in the processing node 2 of the data access system.
Referring to fig. 4, in implementation, this can be achieved by the operations 2021 to 2026 as follows. The operations of 2021 to 2026 are respectively:
2021: the management server 1 selects an identification of a file which does not exist in the file list from the history access record, and counts the access frequency of the file in a time period of a latest preset duration according to the history access record and the identification of the file.
The selected identifier is not in the file list, indicating that the file corresponding to the identifier is not cached in the processing node 2 of the data access system.
In this step, the management server 1 may acquire, from the history access record, the access time in the period of the latest preset duration corresponding to the identification of the file. The management server 1 counts the number of access times obtained, the counted number is equal to the access times of the file, and the access frequency of the file in the time period of the latest preset duration is obtained according to the access times and the preset duration.
2022: when the access frequency exceeds a first preset frequency threshold, the management server 1 obtains the identifier of the storage server 3 where the file is located and the identifier of at least one subfile included in the file according to the identifier of the file, where the identifier of the storage server 3 may be an address of the storage server 3, for example, may be a protocol (internet protocol, IP) address of interconnection between networks of the storage server 3.
The technician may input the identification of the storage server 3 in the data access system to the management server 1 in advance. The management server 1 may obtain, according to the identifier of the storage server 3, the identifier of each file stored in the storage server 3, and store the obtained identifier of each file and the identifier of the storage server 3 in a correspondence relationship between the identifier of the file and the identifier of the storage server.
Optionally, for each file stored in the storage server 3, the management server 1 may further obtain, from the storage server 3, an identifier of each sub-file included in the file, and store the identifier of the file and the obtained identifier of each sub-file in a correspondence between the identifier of the file and the identifier of the sub-file.
In this step, the management server 1 counts that the access frequency of a certain file exceeds the first preset frequency threshold, and may obtain, according to the identifier of the file, the identifier of the storage server 3 where the file is located from the correspondence between the identifier of the file and the identifier of the storage server. In the case where the management server 1 stores the correspondence between the identifiers of the files and the identifiers of the subfiles, the management server 1 acquires the identifier of each subfile included in the file from the correspondence between the identifier of the file and the identifier of the subfile based on the identifier of the file. In the case where the management server 1 does not store the correspondence between the identification of the file and the identification of the subfiles, the management server 1 acquires the identification of each subfile included in the file from the storage server 3 based on the identification of the storage server 3.
2023: the management server 1 generates at least one cache job, each comprising an identification of the storage server 3 and an identification of a sub-file in the file.
2024: for each cache task, the management server 1 selects a processing node 2 and sends the cache task to that processing node 2.
The management server 1 may start traversing from a first one of the at least one cache task, each traversing to a cache task, select a processing node 2, and then send the one cache task to the processing node 2. Then traversing the next buffer task, repeating the process until the last buffer task is sent.
Alternatively, the management server 1 randomly selects one processing node 2 among the processing nodes 2 of the data access system. Alternatively, the management server 1 may store a correspondence between the identifier of the processing node 2 and the size of the free cache space, where the identifier of each processing node 2 in the data access system and the size of the free cache space are stored. In this way, the management server 1 may first select at least one processing node 2 with the largest free cache space based on the correspondence, the number of the at least one processing node 2 is equal to the number of subfiles included in the file, then select one processing node 2 from the at least one processing node 2 in each traversal to one cache task, and then send the cache task to the processing node 2.
2025: the processing node 2 receives a buffering task, acquires a sub-file corresponding to the identifier of the sub-file in the buffering task from the storage server 3 according to the identifier of the storage server 3 in the buffering task, and caches the acquired sub-file in the own buffer 21.
Optionally, the processing node 2 may also send a cache success message corresponding to the cache task to the management server 1.
Optionally, after caching the subfiles, the processing node 2 may also obtain the remaining free cache space size of itself, and send the remaining free cache space size to the management server 1.
2026: the management server 1 may store the identification of the file, the identification of the subfiles in the cache task and the identification of the selected processing node 2 in correspondence to a data list.
Optionally, the management server 1 performs this step after selecting a processing node 2 for the buffering task, or performs this step after receiving a buffering success message sent by the processing node 2 and corresponding to the buffering task.
Optionally, the management server 1 may further receive a remaining free cache space size of the processing node 2, and update the free cache space size of the processing node 2 to the received remaining free cache space size in a corresponding relationship between the identifier of the processing node 2 and the free cache space size.
Optionally, the management server 1 further obtains an access frequency of each file cached in the processing node 2 of the data access system in a time period of a latest preset duration, and deletes the file with the access frequency lower than a preset second preset frequency threshold from the processing node 2 of the data access system.
Referring to fig. 5, in implementation, this can be achieved by the operations 2121 to 2123 as follows. The operations of 2121 to 2123 are respectively:
2121: for the identification of any one of the files in the file list, the management server 1 counts the access frequency of the file in the latest time period of the preset duration according to the identification of the file and the history access record.
When the method is realized, each access time corresponding to the file is obtained from the historical access record according to the identification of the file, the number of each access time in the latest time period of the preset duration is counted, the access times of the file are obtained, and the access frequency of the file is obtained according to the access times.
2122: when the access frequency of the file is lower than a second preset frequency threshold, the management server 1 acquires the identification of each sub-file included in the file and the identification of the processing node 2 where each sub-file is located from the file list.
2123: for each sub-file, the management server 1 sends a deletion task to the processing node 2 where the sub-file is located, the deletion task comprising the identity of the sub-file, and then deletes the record comprising the identity of the file from the file list.
The processing node 2 receives the deletion task and deletes the subfile corresponding to the identification of the subfile in the deletion task from its own cache 21.
Optionally, after deleting the subfile, the processing node 2 may also obtain the remaining free cache space size in the cache 21, and send the remaining free cache space size to the management server 1.
Optionally, the management server 1 further receives a remaining free cache space size of the processing node 2, and updates the free cache space size of the processing node 2 to the received remaining free cache space size in a corresponding relationship between the identifier of the processing node 2 and the free cache space size.
Since the files stored in the processing node 2 of the data access system in the period of the latest preset time have access frequencies exceeding the first preset frequency threshold, the hit rate of each file cached in the processing node 2 can be improved when accessing the files.
The above is just one implementation example of caching files in the processing node 2 of the data access system listed in the present application to eliminate files from the processing node 2 of the data access system. The application is applicable to other implementations of caching files in the processing node 2 of the data access system, and other implementations of eliminating files from the processing node 2 of the data access system, which are not listed here.
In this step, the management server 1 may query the file list according to the identifier of the file to be accessed, and if the identifier of each sub-file included in the file to be accessed and the identifier of the processing node 2 where each sub-file is located are not queried, determine that the file to be accessed is not cached in the processing node 2 of the data access system. If the identification of each sub-file included in the file to be accessed and the identification of the processing node 2 where each sub-file is located are queried, determining that the file to be accessed is cached in the processing node 2 of the data access system.
Optionally, after receiving the file access request, the management server 1 may further use the current time as an access time of the file to be accessed, and may store a correspondence between the identifier of the file to be accessed and the access time in the history access record.
Step 203: when the file to be accessed is not cached in the processing node 2 of the data access system, the management server 1 generates at least one first reading task, each first reading task comprising an address of the storage server 2 where the file to be accessed is located and an identification of a sub-file in the file to be accessed.
Each first reading task includes a different identification of the subfiles.
In this step, the management server 1 may obtain, from the correspondence between the identification of the file and the identification of the storage server 3, the identification of the storage server 3 where the file to be accessed is located, according to the identification of the file to be accessed.
In the case that the management server 1 stores the correspondence between the identifier of the file and the identifier of the sub-file, the management server 1 obtains the identifier of at least one sub-file included in the file to be accessed from the correspondence between the identifier of the file and the identifier of the sub-file according to the identifier of the file to be accessed, and generates at least one first reading task, where each first reading task includes the identifier of the storage server 3 and the identifier of one sub-file in the file to be accessed.
In the case that the management server 1 does not store the correspondence between the identification of the file and the identification of the subfiles, the management server 1 obtains the identification of at least one subfile included in the file to be accessed from the storage server 3 according to the identification of the storage server 3, and generates at least one first reading task, where each first reading task includes the identification of the storage server 3 and the identification of one subfile in the file to be accessed.
Optionally, the management server 1 may further count the access frequency of the file to be accessed in the period of the latest preset duration, and when the access frequency exceeds the first preset frequency threshold, each generated first reading task may further include a cache indication. The cache instruction is used for instructing the processing node 2 receiving the first reading task to cache a sub-file of the file to be accessed when the sub-file is obtained from the storage server 3 where the file to be accessed is located.
Optionally, the management server 1 may obtain, from the access history, each access time corresponding to the file to be accessed according to the identifier of the file to be accessed, count the number of access times in the period of the latest preset duration to obtain the number of times that the file to be accessed is accessed, and use the number of times as the access frequency of the file to be accessed.
Step 204: for each of the at least one first reading task, the management server 1 selects one processing node 2 and sends the first reading task to the processing node 2.
In this step, the management server 1 may traverse from a first one of the at least one first reading task, select one processing node 2 from the processing nodes 2 comprised by the data access system each time a first reading task is traversed, and send the first reading task to the processing node 2. When the first reading task is transmitted, the management server 1 again traverses the next first reading task, and repeats the above-described process until the last first reading task is transmitted.
Alternatively, one processing node may be selected from the processing nodes 2 of the data access system in the following two ways. The two modes are respectively:
in one mode, the management server 1 may randomly select one processing node 2 from the processing nodes 2 of the data access system.
In the second mode, the management server 1 may select one processing node 2 having the smallest number of tasks currently processed from the processing nodes 2 of the data access system.
In the second mode, the management server 1 stores a correspondence between the identifier of the processing node 2 and the number of tasks, and each record in the correspondence includes the identifier of one processing node 2 and the number of tasks currently being processed by the processing node 2.
In this way, the management server 1 reads the number of tasks of each processing node 2 in the data access system from the correspondence when selecting a processing node 2, and selects one processing node 2 having the smallest number of tasks.
In the second mode, when one processing node 2 having the smallest number of tasks is selected, the number of tasks of that processing node 2 is increased in the correspondence relation.
Alternatively, when the processing node 2 is selected for the first reading task, the management server 1 may use the processing node 2 as a file summary node, and then add the identifier of the file summary node to the first reading task before sending the first reading task. Alternatively, before generating the at least one first reading task, the management server 1 may select one processing node 2 as the file aggregation node in the first or second manner, where each first reading task thus generated includes an identifier of the file aggregation node.
Optionally, the management server 1 further sends a summary task to the file summary node, where the summary task includes the number of subfiles in the file to be accessed.
The file summary node selected by the first or second method may be different from the processing node 2 selected by the management server 1 for each first reading task, or may be the same as the processing node 2 selected by the management server 1 for a certain first reading task.
Alternatively, the management server 1 selects one processing node 2 each time it traverses to a first reading task. For a certain processing node 2, the processing node 2 may be selected by the management server 1 a plurality of times, i.e. a plurality of first reading tasks are sent to the processing node 2 at different times.
In the second usage mode, the management server 1 records the number of the first reading tasks allocated to the selected processing node 2, that is, stores the correspondence between the identification of the selected processing node 2 and the number of the first reading tasks.
Step 205: the processing node 2 receives the first reading task, obtains the subfiles corresponding to the identifiers included in the first reading task from the storage server 3 where the files to be accessed are located according to the first reading task, sends the obtained subfiles to the management server 1, and executes step 209.
In this step, the processing node 2 receives the first reading task, the processing node 2 establishes a network connection between the processing node 2 and the storage server 3 according to the identifier of the storage server 3 included in the first reading task, and obtains the subfile from the storage server 3 through the network connection according to the identifier of the subfile included in the first reading task, and sends the subfile to the management server 1.
Optionally, because the first reading task includes the identifier of the storage server 3, the processing node 2 may determine that the processing node 2 included in the data access system does not cache the file to be accessed, so that the processing node 2 directly establishes a network connection between the processing node 2 and the storage server 3 according to the identifier of the storage server 3, obtains the subfiles from the storage server 3, and directly sends the subfiles to the management server 1. The processing node 2 will not cache the subfile in the cache 21 of the processing node 2 before sending the subfile to the management server 1, so that the subfile will not pass through the cache 21 of the processing node 2, reducing the transmission path of the subfile, and improving the transmission efficiency of the subfile.
Optionally, if the first reading task further includes an identifier of a file summarizing node, when the processing node 2 is not the file summarizing node, the processing node 2 sends the obtained subfiles to the file summarizing node according to the identifier of the file summarizing node. When the processing node 2 is a file summarizing node, the processing node 2 also receives summarizing tasks and receives subfiles sent by other processing nodes 2, and when the number of subfiles acquired by itself and the number of subfiles received reach the number of subfiles in the summarizing tasks, the processing node 2 composes the subfiles acquired by itself and the subfiles received into a file to be accessed, and sends the file to be accessed to the management server 1.
Optionally, in the case that the file summarizing node is different from the processing node 2 selected by the management server 1 for each first reading task, the file summarizing node receives the summarizing task, and receives the subfiles sent by other processing nodes 2, and when the number of the subfiles received reaches the number of subfiles included in the summarizing task, the received subfiles form a file to be accessed, and sends the file to be accessed to the management server 1.
Optionally, if the first reading task further includes a buffer instruction, the processing node 2 buffers the obtained subfile in a buffer 21 included in the processing node 2 when the subfile is obtained. The processing node 2 may cache the acquired subfiles in a cache 21 comprised by the processing node 2 after sending the acquired subfiles to the management server 1. Alternatively, the processing node 2 may cache the acquired subfiles in the cache 21 included in the processing node 2 while sending the acquired subfiles to the management server 1.
Step 206: when the processing node 2 in the data access system is queried for a file to be accessed, the management server 1 generates at least one second reading task.
When a file to be accessed is cached in a processing node 2 in the data access system, the management server 1 may query the file list for an identifier of each sub-file in the file to be accessed and an identifier of the processing node 2 where each sub-file is located.
Each second reading task comprises an identification of a sub-file in the file to be accessed and an identification of the processing node 2 where the sub-file is located.
Step 207: for each of the at least one second read task, the management server 1 selects one of the processing nodes 2 and sends the second read task to the processing node 2.
In this step, the management server 1 may traverse from a first second read task of the at least one second read task, select one processing node 2 from the processing nodes 2 included in the data access system every time traversing to the second read task, and send the second read task to the processing node 2. When the second reading task is transmitted, the management server 1 again traverses the next second reading task, and repeats the above-described process until the last second reading task is transmitted.
Alternatively, one processing node 2 may be selected from the processing nodes 2 of the data access system in one or both of the above-described ways.
In the second usage mode, the management server 1 records the second number of reading tasks allocated to the selected processing node 2, that is, stores the correspondence between the identifier of the selected processing node 2 and the second number of reading tasks.
In addition to the first and second modes described above, one processing node 3 may be selected in the following third mode:
in a third aspect, the management server 1 directly selects a processing node 2 corresponding to the identifier of the processing node 2 in the second reading task.
Alternatively, when the processing node 2 is selected for the first second reading task, the management server 1 may use the processing node 2 as a file aggregation node, and then add the identifier of the file aggregation node to the second reading task before sending the second reading task. Alternatively, before generating at least one second reading task, the management server 1 may select one processing node 2 as the file aggregation node in the first or second manner, where each second reading task thus generated includes an identifier of the file aggregation node.
Optionally, the management server 1 further sends a summary task to the file summary node, where the summary task includes the number of subfiles in the file to be accessed.
The file aggregation node selected in the first or second embodiment may be different from the processing node 2 selected by the management server 1 for each second reading task, or may be the same as the processing node 2 selected by the management server 1 for a certain second reading task.
Alternatively, the management server 1 selects one processing node 2 each time it traverses to a second read task. For a certain processing node 2, the processing node 2 may be selected by the management server 1 a plurality of times, i.e. a plurality of second reading tasks are sent to the processing node 2 at different times.
Step 208: the processing node 2 receives the second reading task, obtains the subfile according to the identification of the subfile included in the second reading task and the identification of the processing node 2, and sends the obtained subfile to the management server 1.
In this step, the processing node 2 receives the second read task, which includes an identification of a subfile and an identification of the processing node 2. If the processing node 2 is the processing node 2 corresponding to the identifier of the processing node 2 in the second reading task, the processing node 2 obtains the corresponding subfile according to the identifier of the subfile in the second reading task. If the processing node 2 is not the processing node 2 corresponding to the identifier of the processing node 2 in the second reading task, the processing node 2 obtains the corresponding subfile from the processing node 2 corresponding to the identifier of the processing node 2 in the second reading task according to the identifier of the subfile in the second reading task.
Optionally, if the second reading task further includes an identifier of a file summarizing node, when the processing node 2 is not the file summarizing node, the processing node 2 sends the obtained subfiles to the file summarizing node according to the identifier of the file summarizing node. When the processing node 2 is a file summarizing node, the processing node 2 also receives summarizing tasks and receives subfiles sent by other processing nodes 2, and when the sum of the number of subfiles acquired by itself and the number of subfiles received reaches the number of subfiles in the summarizing tasks, the subfiles acquired by itself and the subfiles received form a file to be accessed, and the file to be accessed is sent to the management server 1.
Optionally, in the case that the file summarizing node is different from the processing node 2 selected by the management server 1 for each second reading task, the file summarizing node receives the summarizing task and receives the subfiles sent by other processing nodes 2, and when the number of the subfiles received reaches the number of subfiles in the summarizing task, the received subfiles form a file to be accessed, and the file to be accessed is sent to the management server 1.
Step 209: the management server 1 receives the subfiles sent by each processing node 2, obtains the files to be accessed, and sends the files to be accessed to the client 4.
Optionally, the management server 1 integrates the received subfiles into a file to be accessed, and sends the file to be accessed to the client 4.
The first reading task or the second reading task further comprises an identifier of the file summarizing node, and the management server 1 receives the file to be accessed sent by the file summarizing node and sends the file to be accessed to the client 4.
Alternatively, in the case of selecting the processing node 2 in the second usage mode, the correspondence between the identification of the processing node 2 and the number of tasks is stored in the management server 1. For any processing node 2 selected, subtracting the recorded first reading task number or second reading task number of the processing node 2 from the task number of the processing node 2 saved in the correspondence relation between the identification of the processing node 2 and the task number.
In the embodiment of the present application, when determining that the file to be accessed is not cached in the processing node 2 of the data access system, the management server 1 generates at least one first reading task, where each first reading task includes an identifier of the storage server 3 where the file to be accessed is located and an identifier of a sub-file in the file to be accessed. In this way, when the processing node 2 receives the first reading task, the processing node 2 does not access the cache 21 in the processing node 2, but can directly acquire the sub-file from the storage server 3 according to the identifier of the storage server 3 included in the first reading task, and then send the sub-file to the management server 1, and the processing node 2 does not cache the sub-file in the cache 21 of the processing node 2 before sending the sub-file to the management server 1, so that the sub-file can not pass through the cache 21 of the processing node 2, and the transmission delay of the file to be accessed is reduced. When the processing node 2 of the data access system caches the file to be accessed, the generated second reading task comprises the identifier of the processing node 2 where the sub-file of the file to be accessed is located, so that the processing node 2 receiving the second reading task can conveniently acquire the sub-file based on the identifier of the processing node 2 in the second reading task, and the efficiency of accessing the file is improved. In addition, when the file to be accessed is not stored in the processing node 2 of the data access system, the access frequency of the file to be accessed in the time period of the latest preset duration is obtained, and when the access frequency exceeds a first preset frequency threshold, the processing node 2 is controlled to cache the file to be accessed. When the access frequency exceeds a first preset frequency threshold value, the file to be accessed is indicated to be a file which is frequently accessed recently, and the file to be accessed is stored in the cache 21 of the processing node 2 of the data access system, so that the utilization rate of the cache 21 of the processing node 2 is improved, and the hit rate of the file is also improved.
Referring to fig. 6, an embodiment of the present application provides an apparatus 300 for data access, where the apparatus 300 is deployed in the above-mentioned management server 1, and the apparatus 300 is connected to a plurality of processing nodes 2, and the plurality of processing nodes 2 are connected to a storage server 3. The apparatus 300 includes:
the receiving unit 301 is configured to receive a file access request, where the file access request carries an identifier of a file to be accessed.
A processing unit 302, configured to determine whether the file to be accessed is cached in the caches 21 of the plurality of processing nodes 2 according to the identification of the file to be accessed, where the device 300 stores the identification of the file cached in the caches 21 of the plurality of processing nodes 2.
The processing unit 303 is further configured to instruct at least one processing node 2 of the plurality of processing nodes 2 to obtain the file to be accessed from the storage server 3 when the file to be accessed is not cached in the caches 21 of the plurality of processing nodes 2.
Optionally, the detailed implementation of the processing unit 302 in determining whether the file to be accessed is cached in the caches 21 of the plurality of processing nodes 2 may be referred to as relevant content in step 202 of the embodiment shown in fig. 3, and will not be described in detail herein.
Referring to fig. 6, optionally, the apparatus 300 further includes: the first transmitting unit 302,
A processing unit 302, configured to obtain, from the storage server 3, an identifier of at least one subfile included in the file to be accessed and an identifier of the storage server 3 where each subfile is located; and generating a reading task for each sub-file included in the file to be accessed, wherein each reading task comprises an identification of the sub-file and an identification of a storage server 3 where the sub-file is located.
A first sending unit 303, configured to send each reading task to one processing node 2, and instruct the processing node 2 that receives the reading task to read the subfile from the storage server 3 that stores the subfile.
A receiving unit 301, configured to receive a subfile read by the processing node 2 that receives the reading task.
The processing unit 302 is configured to merge the received subfiles into a file to be accessed.
Optionally, the processing unit 302 generates a detailed implementation procedure of the reading task, which may be referred to as related content in step 203 of the embodiment shown in fig. 3. The detailed implementation process of the first sending unit 303 for sending the reading task may refer to the relevant content in step 204 in the embodiment shown in fig. 3, which will not be described in detail here.
Referring to fig. 6, optionally, the apparatus 300 further includes: the second transmitting unit 304 is configured to transmit,
The second sending unit 304 is configured to send a buffering task to at least one processing node 2 of the plurality of processing nodes 2 when the access frequency of the file to be accessed exceeds a preset frequency, so as to instruct the at least one processing node 2 to buffer the subfiles included in the file to be accessed to the at least one processing node 2.
The processing unit 302 is further configured to record a sub-file identifier included in the identifier of the file to be accessed and an identifier of the processing node 2 that caches each sub-file; when a file to be accessed is cached in the caches 21 of the plurality of processing nodes 2, at least one reading task is generated, and each reading task includes an identifier of a sub-file and an identifier of the processing node 2 where the sub-file is located.
The second sending unit 304 is further configured to send at least one reading task to the plurality of processing nodes 2, and instruct the plurality of processing nodes 2 to read the subfiles from the caches 21 of the processing nodes 2 storing the subfiles.
The processing unit 302 is further configured to synthesize the fetched subfiles into a file to be accessed.
Optionally, the detailed implementation process of the buffering task by the second sending unit 304 may refer to the relevant content in steps 2023 and 2024 in the embodiment shown in fig. 4. The processing unit 302 generates a detailed implementation of the read task, see relevant content in step 206 in the embodiment shown in fig. 3. And the detailed implementation process of the second sending unit 304 for sending the reading task may refer to the relevant content in step 207 in the embodiment shown in fig. 3, which will not be described in detail here.
Optionally, the second sending unit 304 is further configured to send a deletion task to the processing node 2 where the sub-file included in the file to be accessed is located when the access frequency of the file to be accessed is lower than a preset frequency, where the deletion task includes an identifier of the sub-file, so as to instruct the processing node 2 to delete the sub-file.
The processing unit 302 is further configured to delete the identifier of the subfile and the identifier of the processing node 2 recorded in the apparatus 300.
Optionally, the detailed implementation process of the deletion task by the second sending unit 304 may refer to the relevant content in steps 2122 and 2123 in the embodiment shown in fig. 5, which will not be described in detail here.
In the embodiment of the present application, since the processing unit 302 determines whether the file to be accessed is cached in the caches 21 of the plurality of processing nodes 2 according to the identification of the file to be accessed; when the file to be accessed is not cached in the caches 21 of the plurality of processing nodes 2, at least one processing node 2 of the plurality of processing nodes 2 is instructed to acquire the file to be accessed from the storage server 3. In this way, for the at least one processing node 2, the file to be accessed is directly read from the storage server 3 according to the instruction of the processing unit 302, and the file to be accessed is directly returned to the device 300 after the file to be accessed is read, so that the file to be accessed is not cached in the cache 21 of the at least one processing node 2 before the file to be accessed is returned to the device 300. Therefore, the file to be accessed does not need to pass through the cache 21 of the at least one processing node 2, so that the transmission path of the file to be accessed is reduced, the path for data reading is reduced, and the performance of data access is improved.
Referring to fig. 7, an embodiment of the present application provides an apparatus 400 for data access, the apparatus 400 being deployed in the above-mentioned processing node 2, the apparatus 400 being one of a plurality of processing nodes 2 connected to a management server 1, the plurality of processing nodes 2 being connected to a storage server 3. The apparatus 400 includes:
a receiving unit 401, configured to receive a read task, where the read task is sent by the management server 1 when it is determined that the file to be accessed is not cached in the caches 21 of the plurality of processing nodes 2, and the read task includes an identifier of a sub-file in the file to be accessed and an identifier of the storage server 3 where the sub-file is located.
And the processing unit 402 is configured to read the subfile from the storage server 3 corresponding to the identifier of the storage server 3 according to the identifier of the subfile.
A transmitting unit 403 for transmitting the read subfiles to the management server 1.
Optionally, the detailed implementation process of the reading of the subfiles by the processing unit 402 may refer to the relevant content in step 205 in the embodiment shown in fig. 3, which is not described in detail here.
Optionally, the receiving unit 401 is further configured to receive a cache task, where the cache task is a task sent by the management server 1 when the access frequency of the file to be accessed exceeds a preset frequency, and the cache task includes an identifier of a sub-file of the file to be accessed and an identifier of the storage server 3 where the sub-file is located.
The processing unit 402 is further configured to read the subfiles from the storage servers 3 corresponding to the identifiers of the storage servers 3 according to the identifiers of the subfiles; the subfiles are stored in the cache 21 of the device 400.
Optionally, the details of the implementation of the sub-files in the processing unit 402 may be referred to in step 2025 in the embodiment shown in fig. 4, and will not be described in detail herein.
In the embodiment of the present application, since the reading task received by the receiving unit 401 includes the identifier of one sub-file in the file to be accessed and the identifier of the storage server 3 where the sub-file is located; the processing unit 402 reads the subfiles from the storage servers 3 corresponding to the identifiers of the storage servers 3 according to the identifiers of the subfiles; the transmitting unit 403 transmits the read subfiles to the management server 1. In this way, the processing unit 402 may directly read the subfile from the storage server 3 according to the identity of the storage server 3, and then the sending unit 403 directly returns to the management server 1, and the processing unit 402 does not cache the subfile in the cache 21 of the apparatus 400 until the subfile is returned to the management server 1. Therefore, the subfiles returned to the management server 1 will not pass through the buffer memory 21 of the device 400, so that the transmission path of the subfiles is reduced, the path of data reading is reduced, and the performance of data access is improved.
Referring to fig. 8, fig. 8 is a schematic diagram of an apparatus 500 for data access according to an embodiment of the present application. The apparatus 500 comprises at least one processor 501, a bus system 502, a memory 503 and a transceiver 504.
The apparatus 500 is a hardware configuration apparatus that may be used to implement the functional units in the apparatus described in fig. 6. For example, it will be appreciated by those skilled in the art that the processing unit 302 in the apparatus 300 shown in fig. 6 may be implemented by the at least one processor 501 invoking application code in the memory 503, and the receiving unit 301, the first transmitting unit 303 and the second transmitting unit 304 in the apparatus 300 shown in fig. 6 may be implemented by the transceiver 504.
Optionally, the apparatus 500 may also be used to implement the functionality of the management server 1 in the embodiments described in fig. 1 or fig. 3.
Alternatively, the processor 501 may be a general purpose central processing unit (central processing unit, CPU), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the present application.
The bus system 502 may include a path to transfer information between the components.
The memory 503 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), a compact disc read-only memory (compact disc read-only memory) or other optical disc storage, a compact disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store the desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be stand alone and coupled to the processor via a bus. The memory may also be integrated with the processor.
Wherein the memory 503 is used for storing application codes for executing the inventive arrangements and is controlled by the processor 501 for execution. The processor 501 is configured to execute application code stored in the memory 503 to implement the functions of the method of the present patent.
In a particular implementation, as one embodiment, processor 501 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 5.
In a specific implementation, the apparatus 500 may include multiple processors, such as the processor 501 and the processor 508 in fig. 5, as an embodiment. Each of these processors may be a single-core (single-CPU) processor or may be a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
Referring to fig. 9, fig. 9 is a schematic diagram of an apparatus 600 for data access according to an embodiment of the present application. The apparatus 700 includes at least one processor 601, a bus system 602, a memory 603, and a transceiver 604. The memory 603 further includes a buffer 21, where the buffer 21 is configured to store subfiles included in a file having an access frequency exceeding a preset frequency.
The apparatus 600 is a hardware configuration apparatus that may be used to implement the functional units in the apparatus described in fig. 7. For example, it will be appreciated by those skilled in the art that the processing unit 402 in the apparatus 400 shown in fig. 7 may be implemented by the at least one processor 601 invoking code in the memory 603, and that the transmitting unit 403 and the receiving unit 401 in the apparatus 400 shown in fig. 7 may be implemented by the transceiver 604.
Optionally, the apparatus 600 may also be used to implement the functionality of the processing node 2 in the embodiments described in fig. 1 or fig. 3.
Alternatively, the processor 601 may be a general purpose central processing unit (central processing unit, CPU), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the present application.
The bus system 602 may include a path to transfer information between the components.
The memory 603 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), a compact disc read-only memory (compact disc read-only memory) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be stand alone and coupled to the processor via a bus. The memory may also be integrated with the processor.
The memory 603 is used for storing application program codes for executing the scheme of the present application, and the processor 601 controls the execution. The processor 601 is arranged to execute application code stored in the memory 603 for performing the functions of the method of the present patent.
In a particular implementation, the processor 601 may include one or more CPUs, such as CPU0 and CPU1 of FIG. 7, as an embodiment.
In a specific implementation, the apparatus 600 may include multiple processors, such as the processor 601 and the processor 608 in FIG. 7, as one embodiment. Each of these processors may be a single-core (single-CPU) processor or may be a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims (9)

1. A method of data access, the method performed by a management server coupled to a plurality of processing nodes coupled to a storage server, the method comprising:
Receiving a file access request, wherein the file access request carries an identification of a file to be accessed;
determining whether the file to be accessed is cached in a cache of at least one processing node in the plurality of processing nodes according to the identification of the file to be accessed, wherein the identification of the cached file is stored in the management server;
when the file to be accessed is not cached in the cache of at least one processing node of the plurality of processing nodes, acquiring an identifier of at least one sub-file included in the file to be accessed and an identifier of a storage server where each sub-file is located from the storage server;
generating a reading task aiming at each sub-file included in the file to be accessed, wherein each reading task comprises an identification of the sub-file and an identification of a storage server where the sub-file is located;
each reading task is respectively sent to a processing node, and the processing node receiving the reading task is instructed to read the subfiles from a storage server storing the subfiles;
receiving the subfiles read by the processing nodes receiving the reading tasks;
and merging the subfiles into the files to be accessed.
2. The method of claim 1, wherein the method further comprises:
when the access frequency of the file to be accessed exceeds a preset frequency, sending a caching task to at least one processing node of the plurality of processing nodes so as to instruct the at least one processing node to cache subfiles included in the file to be accessed to the at least one processing node;
recording a subfile identifier included in the identifier of the file to be accessed and an identifier of a processing node for caching each subfile;
when the file to be accessed is cached in the caches of the plurality of processing nodes, generating at least one reading task, wherein each reading task comprises an identification of a sub-file and an identification of a processing node where the sub-file is located;
sending the at least one reading task to the plurality of processing nodes, and indicating the plurality of processing nodes to read the subfiles from the caches of the processing nodes storing the subfiles;
and synthesizing the read subfiles into the file to be accessed.
3. The method of claim 2, wherein the method further comprises:
when the access frequency of the file to be accessed is lower than a preset frequency, a deletion task is sent to a processing node where a sub-file included in the file to be accessed is located, wherein the deletion task includes an identifier of the sub-file so as to instruct the processing node to delete the sub-file;
And deleting the identification of the subfiles recorded in the management server and the identification of the processing nodes.
4. An apparatus for data access, the apparatus being coupled to a plurality of processing nodes, the plurality of processing nodes being coupled to a storage server, the apparatus comprising:
the receiving unit is used for receiving a file access request, wherein the file access request carries an identification of a file to be accessed;
the processing unit is used for determining whether the file to be accessed is cached in the cache of at least one processing node of the plurality of processing nodes according to the identification of the file to be accessed, and the device stores the identification of the cached file;
the processing unit is further configured to, when the file to be accessed is not cached in the cache of at least one processing node of the plurality of processing nodes, obtain, from the storage server, an identifier of at least one subfile included in the file to be accessed and an identifier of a storage server where each subfile is located; generating a reading task aiming at each sub-file included in the file to be accessed, wherein each reading task comprises an identification of the sub-file and an identification of a storage server where the sub-file is located;
The first sending unit is used for respectively sending each reading task to one processing node and indicating the processing node which receives the reading task to read the subfiles from a storage server which stores the subfiles;
the receiving unit is used for receiving the subfiles read by the processing nodes receiving the reading task;
and the processing unit is used for merging the subfiles into the files to be accessed.
5. The apparatus of claim 4, wherein the apparatus further comprises: a second transmitting unit is provided for transmitting the data to the second transmitting unit,
the second sending unit is configured to send a buffering task to at least one processing node of the plurality of processing nodes when the access frequency of the file to be accessed exceeds a preset frequency, so as to instruct the at least one processing node to buffer a sub-file included in the file to be accessed to the at least one processing node;
the processing unit is further used for recording a sub-file identifier included in the identifier of the file to be accessed and an identifier of a processing node for caching each sub-file; when the file to be accessed is cached in the caches of the plurality of processing nodes, generating at least one reading task, wherein each reading task comprises an identification of a sub-file and an identification of a processing node where the sub-file is located;
The second sending unit is further configured to send the at least one reading task to the plurality of processing nodes, and instruct the plurality of processing nodes to read the subfiles from a cache of the processing node storing the subfiles;
and the processing unit is also used for synthesizing the read subfiles into the file to be accessed.
6. The apparatus of claim 5, wherein,
the second sending unit is further configured to send a deletion task to a processing node where a sub-file included in the file to be accessed is located when the access frequency of the file to be accessed is lower than a preset frequency, where the deletion task includes an identifier of the sub-file, so as to instruct the processing node to delete the sub-file;
the processing unit is further configured to delete the identifier of the subfile and the identifier of the processing node recorded in the device.
7. A system for data access, the system comprising: a management server, a storage server and a plurality of processing nodes;
the management server is used for receiving a file access request, wherein the file access request carries an identification of a file to be accessed; determining whether the file to be accessed is cached in a cache of at least one processing node in the plurality of processing nodes according to the identification of the file to be accessed, wherein the identification of the cached file is stored in the management server; when the file to be accessed is not cached in the cache of at least one processing node of the plurality of processing nodes, acquiring an identifier of at least one sub-file included in the file to be accessed and an identifier of a storage server where each sub-file is located from the storage server, and generating a reading task for each sub-file included in the file to be accessed, wherein each reading task comprises an identifier of one sub-file and an identifier of the storage server where the sub-file is located; each reading task is respectively sent to a processing node;
The processing node is used for reading the corresponding subfile from the storage server corresponding to the identifier of the storage server according to the identifier of the subfile in the received reading task, and sending the read subfile to the management server;
the management server is further configured to receive the subfiles read by the processing node that receives the reading task.
8. The system of claim 7, wherein the system comprises a plurality of sensors,
the management server is further configured to send a cache task to at least one processing node of the plurality of processing nodes when the access frequency of the file to be accessed exceeds a preset frequency;
the processing node receives the caching task and is used for caching the subfiles included in the files to be accessed;
the management server is further configured to record a sub-file identifier included in the identifier of the file to be accessed and an identifier of a processing node that caches each sub-file;
the management server is further configured to generate at least one reading task when the file to be accessed is cached in caches of the plurality of processing nodes, where each reading task includes an identifier of a sub-file and an identifier of a processing node where the sub-file is located; transmitting the at least one read task to the plurality of processing nodes;
The processing node receiving the reading task is used for reading the subfiles according to the received identifiers of the subfiles in the reading task and the identifiers of the processing nodes where the subfiles are located and sending the read subfiles to the management server;
the management server is further configured to receive the subfiles read by the processing node that receives the read task.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a computer, implements the method according to any of claims 1-3.
CN201910786485.4A 2019-08-23 2019-08-23 Data access method, device and system Active CN112416871B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910786485.4A CN112416871B (en) 2019-08-23 2019-08-23 Data access method, device and system
PCT/CN2020/110819 WO2021036989A1 (en) 2019-08-23 2020-08-24 Method, apparatus and system for data access

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910786485.4A CN112416871B (en) 2019-08-23 2019-08-23 Data access method, device and system

Publications (2)

Publication Number Publication Date
CN112416871A CN112416871A (en) 2021-02-26
CN112416871B true CN112416871B (en) 2023-10-13

Family

ID=74683263

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910786485.4A Active CN112416871B (en) 2019-08-23 2019-08-23 Data access method, device and system

Country Status (2)

Country Link
CN (1) CN112416871B (en)
WO (1) WO2021036989A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107026876A (en) * 2016-01-29 2017-08-08 杭州海康威视数字技术股份有限公司 A kind of file data accesses system and method
CN107562757A (en) * 2016-07-01 2018-01-09 阿里巴巴集团控股有限公司 Inquiry, access method based on distributed file system, apparatus and system
CN107920101A (en) * 2016-10-10 2018-04-17 阿里巴巴集团控股有限公司 A kind of file access method, device, system and electronic equipment
CN109002260A (en) * 2018-07-02 2018-12-14 深圳市茁壮网络股份有限公司 A kind of data cached processing method and processing system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9772787B2 (en) * 2014-03-31 2017-09-26 Amazon Technologies, Inc. File storage using variable stripe sizes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107026876A (en) * 2016-01-29 2017-08-08 杭州海康威视数字技术股份有限公司 A kind of file data accesses system and method
CN107562757A (en) * 2016-07-01 2018-01-09 阿里巴巴集团控股有限公司 Inquiry, access method based on distributed file system, apparatus and system
CN107920101A (en) * 2016-10-10 2018-04-17 阿里巴巴集团控股有限公司 A kind of file access method, device, system and electronic equipment
CN109002260A (en) * 2018-07-02 2018-12-14 深圳市茁壮网络股份有限公司 A kind of data cached processing method and processing system

Also Published As

Publication number Publication date
WO2021036989A1 (en) 2021-03-04
CN112416871A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
CN107590001B (en) Load balancing method and device, storage medium and electronic equipment
CN109739810B (en) File synchronization method, server, client and device with storage function
CN108694075B (en) Method and device for processing report data, electronic equipment and readable storage medium
CN111221469B (en) Method, device and system for synchronizing cache data
CN111782692B (en) Frequency control method and device
CN111159219B (en) Data management method, device, server and storage medium
CN114116613A (en) Metadata query method, equipment and storage medium based on distributed file system
CN111797091A (en) Method and device for querying data in database, electronic equipment and storage medium
CN113094430B (en) Data processing method, device, equipment and storage medium
EP3384384A1 (en) Methods and devices for acquiring data using virtual machine and host machine
CN112035766A (en) Webpage access method and device, storage medium and electronic equipment
US20060112083A1 (en) Object relation information management program, method, and apparatus
CN114641034B (en) Downlink information processing method based on 5G message system and related components
CN113127477A (en) Method and device for accessing database, computer equipment and storage medium
CN101247405A (en) Method, system and device for calculating download time and resource downloading
CN112416871B (en) Data access method, device and system
US20140025630A1 (en) Data-store management apparatus, data providing system, and data providing method
CN110781137A (en) Directory reading method and device for distributed system, server and storage medium
CN105912477B (en) A kind of method, apparatus and system that catalogue is read
CN112559570B (en) Cache data acquisition method, device, equipment and storage medium
CN115904211A (en) Storage system, data processing method and related equipment
CN113312402A (en) Report query method, device, server, storage medium and program product
JPH07182263A (en) Distributed processing control system
CN112597119A (en) Method and device for generating processing log and storage medium
CN112749166A (en) Service data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant