CN112416871A - Data access method, device and system - Google Patents

Data access method, device and system Download PDF

Info

Publication number
CN112416871A
CN112416871A CN201910786485.4A CN201910786485A CN112416871A CN 112416871 A CN112416871 A CN 112416871A CN 201910786485 A CN201910786485 A CN 201910786485A CN 112416871 A CN112416871 A CN 112416871A
Authority
CN
China
Prior art keywords
file
subfile
accessed
identifier
processing node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910786485.4A
Other languages
Chinese (zh)
Other versions
CN112416871B (en
Inventor
李铮
王明月
刘玉
张巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201910786485.4A priority Critical patent/CN112416871B/en
Priority to PCT/CN2020/110819 priority patent/WO2021036989A1/en
Publication of CN112416871A publication Critical patent/CN112416871A/en
Application granted granted Critical
Publication of CN112416871B publication Critical patent/CN112416871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1727Details of free space management performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device and a system for data access, belonging to the field of communication. The method is performed by a management server connected to a plurality of processing nodes connected to a storage server, the method comprising: receiving a file access request, wherein the file access request carries an identifier of a file to be accessed; determining whether the file to be accessed is cached in the caches of the processing nodes or not according to the identifier of the file to be accessed, wherein the identifier of the file cached in the caches of the processing nodes is stored in the management server; and when the file to be accessed is not cached in the caches of the processing nodes, instructing at least one processing node in the processing nodes to acquire the file to be accessed from the storage server. The method and the device can reduce data reading paths and improve data access performance.

Description

Data access method, device and system
Technical Field
The present application relates to the field of computers, and in particular, to a method, an apparatus, and a system for data access.
Background
With the advent of the big data era, the scale of enterprise data is continuously expanded, and how to quickly access mass data is a core problem faced by enterprises.
At present, in order to improve the access and storage efficiency of data, an enterprise generally adopts a distributed system, and the distributed system includes a coordination server, a plurality of processing nodes, and a storage server for storing data. When receiving an access request sent by a client, a coordination server decomposes the access request into a plurality of tasks, the tasks are respectively issued to each processing node, each processing node respectively accesses data in a storage server, the data read by each processing node is returned to the coordination server, and the coordination server integrates the data returned by each processing node and then returns the data to the client.
After each processing node receives a task sent by the coordination server, whether data to be accessed in the received task is in a cache of the processing node is judged, if so, the data is directly read from the cache, and if not, the data to be accessed needs to be read from the storage server to the cache, and then the data is read from the cache. It can be seen that, for each processing node, if the accessed data is not hit in the cache, the data in the storage server needs to be read to the processing node cache, and then the data in the cache is read to the coordination server, so that the data reading path is increased, and the performance of data access is affected.
Disclosure of Invention
The application provides a data access method, device and system, which are used for reducing data reading paths and improving data access performance. The technical scheme is as follows:
in a first aspect, the present application provides a method for data access, where the method is performed by a management server, the management server is connected to a plurality of processing nodes, the plurality of processing nodes are connected to a storage server, and an identifier of a file cached in a cache of the plurality of processing nodes is stored in the management server. In the method, a file access request is received, wherein the file access request carries an identifier of a file to be accessed; determining whether the file to be accessed is cached in the caches of the processing nodes or not according to the identifier of the file to be accessed; and when the file to be accessed is not cached in the caches of the processing nodes, instructing at least one processing node in the processing nodes to acquire the file to be accessed from the storage server. When the file to be accessed is not cached in the caches of the plurality of processing nodes, the at least one processing node is instructed to acquire the file to be accessed from the storage server. Therefore, for the at least one processing node, the file to be accessed is directly read from the storage server according to the indication of the management server, the file to be accessed is directly returned to the management server after the file to be accessed is read, and the file to be accessed cannot be cached in the cache of the at least one processing node before the file to be accessed is returned to the management server. Therefore, the file to be accessed does not need to pass through the cache of the at least one processing node, so that the transmission path of the file to be accessed is reduced, the path for reading data is reduced, and the performance of data access is improved.
In a possible implementation manner, the identification of at least one subfile included in a file to be accessed and the identification of a storage server where each subfile is located are obtained from the storage server; generating a reading task aiming at each subfile included in the file to be accessed, wherein each reading task comprises an identifier of the subfile and an identifier of a storage server where the subfile is located; sending each reading task to a processing node respectively, and indicating the processing node receiving the reading task to read the subfile from a storage server storing the subfile; receiving subfiles read by processing nodes receiving the reading tasks; and merging the subfiles into a file to be accessed. Because the generated reading task comprises the identifier of the storage server where the subfile is located, the processing node receiving the reading task can directly read the subfile from the storage server according to the identifier of the storage server in the reading task, and directly send the subfile to the management server after reading the subfile. Therefore, the subfile is not cached in the cache of the processing node firstly, and then the processing node reads the subfile from the cache of the processing node and sends the subfile to the management server, so that the transmission path of the subfile is reduced, and the performance of reading the subfile is improved.
In another possible implementation manner, when the access frequency of the file to be accessed exceeds a preset frequency, sending a caching task to at least one processing node of the plurality of processing nodes to instruct the at least one processing node to cache the subfiles included in the file to be accessed to the at least one processing node; and recording the subfile identifications included by the identifications of the files to be accessed and the identifications of the processing nodes caching each subfile. When the file to be accessed is cached in the caches of the processing nodes, generating at least one reading task, wherein each reading task comprises an identifier of the subfile and an identifier of the processing node where the subfile is located; sending the at least one reading task to the plurality of processing nodes, and instructing the plurality of processing nodes to read the subfile from the cache of the processing node storing the subfile; and synthesizing the read subfiles into a file to be accessed.
When the access frequency of the file to be accessed exceeds the preset frequency, the file to be accessed is the file which is accessed frequently, and the file to be accessed frequently is stored in the cache of the at least one processing node due to the limited cache space in each processing node, so that the cache utilization rate of the processing node is improved, and the hit rate of the file to be accessed is also improved. When the file to be accessed is cached in the caches of the processing nodes, the generated reading task comprises the identification of the processing node where the subfile is located, so that the processing node receiving the caching task does not need to determine the processing node where the subfile is located, the subfile is directly read from the processing node where the subfile is located according to the identification of the processing node where the subfile is located, and the efficiency of reading the subfile is improved.
In another possible implementation manner, when the access frequency of the file to be accessed is lower than a preset frequency, sending a deletion task to a processing node where a subfile included in the file to be accessed is located, where the deletion task includes an identifier of the subfile to instruct the processing node to delete the subfile; and deleting the identifier of the subfile and the identifier of the processing node recorded in the management server. Therefore, the files to be accessed with lower access frequency can be deleted from the caches of the processing nodes, more cache space can be saved for storing the files with higher access frequency, the cache utilization rate of the processing nodes is improved, and the hit rate of the files is also improved.
In a second aspect, the present application provides a method of data access performed by a processing node, the processing node being one of a plurality of processing nodes connected to a management server, the plurality of processing nodes being connected to a storage server. Receiving a reading task, wherein the reading task is a task sent by a management server when the management server determines that a file to be accessed is not cached in caches of the processing nodes, and comprises an identifier of a subfile in the file to be accessed and an identifier of a storage server where the subfile is located; reading the subfile from the storage server corresponding to the identifier of the storage server according to the identifier of the subfile; and sending the read subfile to the management server. Because the received reading task comprises the identifier of the subfile and the identifier of the storage server where the subfile is located, the processing node can directly read the subfile from the storage server according to the identifier of the storage server and then directly return the subfile to the management server, and the processing node cannot cache the subfile to the cache of the processing node before returning the subfile to the management server. Therefore, the subfiles returned to the management server can not be cached by the processing node, the transmission path of the subfiles is reduced, the path for reading data is reduced, and the performance of data access is improved.
In a possible implementation manner, a caching task is received, wherein the caching task is a task sent by a management server when the access frequency of a file to be accessed exceeds a preset frequency, and the caching task comprises an identifier of a subfile of the file to be accessed and an identifier of a storage server where the subfile is located; reading the subfile from the storage server corresponding to the identifier of the storage server according to the identifier of the subfile; the subfile is stored in the processing node's cache. When the access frequency of the file to be accessed exceeds the preset frequency, the file to be accessed is the file which is accessed frequently, and the subfiles of the file to be accessed frequently are stored in the cache of the processing node due to the limited cache space in the processing node, so that the cache utilization rate of the processing node is improved, and the hit rate of the file to be accessed is also improved.
In a third aspect, the present application provides a method of data access performed by a processing node, the processing node being one of a plurality of processing nodes connected to a management server, the plurality of processing nodes being connected to a storage server. In the method, a reading task is received, wherein the reading task is a task sent by a management server when determining that a file to be accessed is cached in caches of a plurality of processing nodes, and the reading task comprises an identifier of a subfile in the file to be accessed and an identifier of a processing node where the subfile is located; reading the subfile from the processing node corresponding to the processing node identifier according to the subfile identifier; and sending the read subfile to the management server. When the file to be accessed is cached in the caches of the processing nodes, the processing node does not need to determine the processing node where the subfile is located any more because the reading task comprises the identification of the processing node where the subfile is located, the subfile is directly read from the processing node where the subfile is located according to the identification of the processing node where the subfile is located, and the efficiency of reading the subfile is improved.
In one possible implementation manner, a deletion task is received, wherein the deletion task is a task sent by the management server when the access frequency of the file to be accessed is lower than a preset frequency, and the deletion task comprises an identifier of a subfile of the file to be accessed; and deleting the subfile corresponding to the identifier of the subfile. Therefore, when the access frequency of the file to be accessed is low, the processing node can delete the subfiles belonging to the file to be accessed from the cache of the processing node, so that more cache space can be saved for storing the file with high access frequency, the cache utilization rate of the processing node is improved, and the hit rate of the file is also improved.
In a fourth aspect, the present application provides an apparatus for data access, where the apparatus is configured to perform the method in the first aspect or any one of the optional implementations of the first aspect. In particular, the apparatus comprises means for performing the method of the first aspect or any one of its possible implementations.
In a fifth aspect, the present application provides an apparatus for data access, the apparatus being configured to perform the method of the second aspect or an alternative implementation manner of the second aspect. In particular, the apparatus comprises means for performing the method of the second aspect or one possible implementation of the second aspect. Alternatively, the apparatus is configured to perform the third aspect or the method in an optional implementation manner of the third aspect. In particular, the apparatus comprises means for performing the method of the third aspect or one possible implementation form of the third aspect
In a sixth aspect, the present application provides an apparatus for data access, the apparatus comprising: the system comprises a processor, a memory and a communication interface, wherein the processor is connected with the memory and the communication interface through a bus; the memory stores computer executable instructions for execution by the processor to perform the operational steps of the method of the first aspect or any one of its possible implementations.
In a seventh aspect, the present application provides an apparatus for data access, the apparatus comprising: the processor is connected with the memory and the communication interface through a bus; the memory stores computer-executable instructions for execution by the processor for performing the operational steps of the method of the second aspect or one of the possible implementations of the second aspect, or for performing the operational steps of the method of the third aspect or one of the possible implementations of the third aspect.
In an eighth aspect, the present application provides a computer-readable storage medium having stored therein instructions, which, when executed on a computer, cause the computer to perform the method of the above aspects.
In a ninth aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the above aspects.
In a tenth aspect, the present application provides a system for data access, the system comprising: the system comprises a management server, a storage server and a plurality of processing nodes.
The management server receives a file access request, wherein the file access request carries an identifier of a file to be accessed; determining whether the file to be accessed is cached in the caches of the processing nodes or not according to the identifier of the file to be accessed, wherein the identifier of the file cached in the caches of the processing nodes is stored in the management server; when the file to be accessed is not cached in the caches of the processing nodes, acquiring the identifier of at least one subfile included in the file to be accessed and the identifier of the storage server where each subfile is located from the storage server, and generating a reading task aiming at each subfile included in the file to be accessed, wherein each reading task comprises the identifier of one subfile and the identifier of the storage server where the subfile is located; and sending each reading task to one processing node respectively. And the processing node receiving the reading task reads the corresponding subfile from the storage server corresponding to the identifier of the storage server according to the identifier of the subfile in the received reading task, and sends the read subfile to the management server. And the management server receives the subfiles read by the processing nodes receiving the reading tasks.
When the file to be accessed is not cached in the caches of the processing nodes, each reading task generated by the management server comprises the identifier of one subfile and the identifier of the storage server where the subfile is located. Thus, the processing node receiving the reading task can directly read the subfile from the storage server according to the identification of the storage server in the reading task, and directly send the subfile to the management server after reading the subfile. Therefore, the subfile is not cached in the cache of the processing node firstly, and then the processing node reads the subfile from the cache of the processing node and sends the subfile to the management server, so that the transmission path of the subfile is reduced, and the performance of reading the subfile is improved.
In a possible implementation manner, when the access frequency of the file to be accessed exceeds a preset frequency, the management server sends a caching task to at least one processing node of the plurality of processing nodes. And caching the subfiles included by the file to be accessed by the processing node receiving the caching task. And the management server also records the subfile identifications included by the identifications of the files to be accessed and the identifications of the processing nodes caching each subfile. When the file to be accessed is cached in the caches of the processing nodes, the management server generates at least one reading task, wherein each reading task comprises an identifier of the subfile and an identifier of the processing node where the subfile is located; at least one read task is sent to the plurality of processing nodes. And the processing node receiving the reading task reads the subfile according to the identifier of the subfile in the received reading task and the identifier of the processing node where the subfile is located, and sends the read subfile to the management server. And the management server receives the subfiles read by the processing nodes receiving the reading tasks.
When the access frequency of the file to be accessed exceeds the preset frequency, the file to be accessed is indicated to be a frequently accessed file, and the frequently accessed file to be accessed is stored in the cache of the at least one processing node due to the limited cache space in each processing node, so that the cache utilization rate of the processing node is improved, and the hit rate of the file to be accessed is also improved. When the file to be accessed is cached in the caches of the processing nodes, the generated reading task comprises the identification of the processing node where the subfile is located, so that the processing node receiving the caching task does not need to determine the processing node where the subfile is located, the subfile is directly read from the processing node where the subfile is located according to the identification of the processing node where the subfile is located, and the efficiency of reading the subfile is improved.
Drawings
Fig. 1 is a schematic structural diagram of a data access system provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a client access data access system according to an embodiment of the present disclosure;
FIG. 3 is a flow chart of a method for accessing data according to an embodiment of the present disclosure;
fig. 4 is a flowchart of a method for caching a file according to an embodiment of the present application;
FIG. 5 is a flowchart of a method for deleting a file according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of an apparatus for data access according to an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of an apparatus for accessing data according to an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of an apparatus for accessing data according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of another data access apparatus provided in an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present application provides a data access system, including: the system comprises a management server 1, a plurality of processing nodes 2 and at least one storage server 3, wherein the management server 1 is connected with each processing node 2, and each processing node 2 is also connected with each other. The management server 1 and each processing node 2 are connected to the respective storage servers 3 via a network.
The management server 1 is configured to decompose the received file access request into a plurality of tasks, and issue the tasks to the processing nodes 2, where the processing nodes 2 access the files in the storage server 3, the processing nodes 2 return the read files to the management server 1, and the management server 1 integrates the files returned by the processing nodes 2 and then returns the integrated files to the client (not shown).
Each storage server 3 stores therein files to be accessed by users. One file may be divided into a plurality of subfiles to be stored in the storage server 3. For example, the file may be a form in a database. Assuming that the form includes 100 records, the form is saved in the storage server 3 using three subfiles, which are a first subfile, a second subfile, and a third subfile, respectively. The first subfile stores the records from item 1 to item 33 of the form, the second subfile stores the records from item 34 to item 66 of the form, and the third subfile stores the records from item 67 to item 100 of the form.
For any file in the storage server 3, the administrative server 1 may cache the individual subfiles included with the file into one or more processing nodes 2 of the data access system. For each subfile included in the file, when caching the subfile to a processing node 2, the management server 1 stores the corresponding relationship among the identifier of the file, the identifier of the subfile and the identifier of the processing node 2 in a file list.
Optionally, the processing node 2 includes a cache 21, and the management server 1 caches the subfile in the cache 21 of the processing node 2.
The detailed implementation process of the management server 1 for caching each subfile included in the file in the processing node 2 of the data access system can refer to the following related content in the embodiment shown in fig. 3, and will not be described in detail here.
Regarding the above file list, the following description is given by way of example, and for the above form, assuming that the identifier of the form is ID1, the identifier of the first subfile, the identifier of the second subfile and the identifier of the third subfile are file1, file2 and file3, respectively. It is assumed that the management server 1 caches the first, second and third subfiles in the first, second and third processing nodes, respectively, and that the identity of the first processing node, the identity of the second processing node and the identity of the third processing node are TE1, TE2 and TE3, respectively. The management server 1 correspondingly stores the identification ID1 of the form, the identification file1 of the first subfile and the identification TE1 of the first processing node in the file list shown in table 1 below, correspondingly stores the identification ID1 of the form, the identification file2 of the second subfile and the identification TE2 of the second processing node in the file list shown in table 1 below, and correspondingly stores the identification ID1 of the form, the identification file3 of the third subfile and the identification TE3 of the third processing node in the file list shown in table 1 below.
TABLE 1
Identification of files Identification of subfiles Identification of processing nodes
ID1 file1 TE1
ID1 file2 TE2
ID1 file3 TE3
…… …… ……
Optionally, referring to fig. 2, when the user needs to access the file to be accessed, the identifier of the file to be accessed may be input to the client 4. The client 4 acquires the input identifier of the file to be accessed, and sends a file access request including the identifier of the file to be accessed to the management server 1.
The management server 1 receives the file access request, and determines whether the file to be accessed is cached in the processing node 2 of the data access system according to the file list and the identifier of the file to be accessed included in the file access request. And if the file to be accessed is cached in the processing node 2 of the data access system, acquiring the file to be accessed from the processing node 2 of the data access system. And if the file to be accessed is not cached in the processing node 2 of the data access system, controlling the processing node 2 to acquire the file to be accessed from the storage server 3 where the file to be accessed is located. And then sending the file to be accessed to the client.
The detailed implementation process of the management server 1 for obtaining the file to be accessed refers to the relevant content in the embodiment shown in the subsequent fig. 3, and is not described in detail here.
Alternatively, the identifier of the file may be a file name of the file, and the identifier of the subfile may be a storage path or a file name of the subfile in the storage server.
In the embodiment of the present application, since the management server 1 stores a file list, the file list stores a correspondence between an identifier of a file cached in the processing node 2 in the data access system, an identifier of a subfile in the file, and an identifier of the processing node 2 in which the subfile is located. Thus, when receiving the identifier of the file to be accessed sent by the client 4, the management server 1 may determine whether the file to be accessed is cached in the processing node 2 of the data access system according to the identifier of the file to be accessed and the file list. And under the condition that the file to be accessed is not cached in the processing node 2 of the data access system, controlling the processing node 2 to acquire the file to be accessed from the storage server 3 where the file to be accessed is located. The processing node 2 does not read a file from the cache 21 of the processing node 2 of the data access system under the control of the management server 1, but directly obtains the file to be accessed from the storage server 3, then directly sends the obtained file to the management server 1, and does not cache the file in the cache 21 included in the management server 1 before sending the file to the management server 1, so that the file to be accessed does not need to be cached in the cache 21 of the processing node 2 first, and then the processing node 2 reads the file from the cache 21 and sends the file to the management server 1, and the efficiency of accessing the file is improved.
In the embodiment of the present invention, the data access system is mainly used for accessing database data, and the file access method provided in the embodiment of the present invention is described below by taking an example of accessing a file in the storage server 3.
Referring to fig. 3, an embodiment of the present application provides a method for file access, where the method may be applied to the system shown in fig. 1, and includes:
step 201: the management server 1 receives a file access request comprising an identification of a file to be accessed.
The user logs in to the management server 1 through the client 4, and then the client 4 displays an interface provided by the management server 1. The user can input a file access request through the interface provided by the management server 1. The file access request may be a database access statement, which may include an identification of a file to be accessed. The database access statement may be a structured access language (SQL) access statement, and the file to be accessed may be a form in the database.
For example, suppose a user inputs through client 4 a SQL access statement "select name from team join peer on task.id" scope, which is used to request access to two forms, one of which is named "task", the other of which is named "task.id", and the other of which is named "task". When receiving the SQL access statement, the management server 1 extracts the identifiers of the two forms in the SQL access statement, for example, extracts the identifiers of the two forms in the SQL access statement, where the identifier of one form is the identifier "teacher.id" of the form "teacher", and the identifier of the other form is the identifier "scope.id" of the form "scope".
Optionally, the management server 1 further analyzes whether the statement format of the SQL access statement is correct, and if so, executes step 202. If not, an alarm that the sentence is not correct is fed back to the client 4, and the client 4 receives the alarm and displays the alarm to the user.
Step 202: the management server 1 determines whether the processing node 2 in the data access system caches the file to be accessed according to the identifier and the file list.
Referring to table 1 above, the file list is used to store the correspondence between the file identifier, the subfile identifier, and the processing node 2 identifier.
The management server 1 may cache files into the processing nodes 2 of the data access system. The file comprises a plurality of subfiles, and when one subfile included in the file is cached to a certain processing node 2, the corresponding relation among the identifier of the file, the identifier of the subfile and the identifier of the processing node 2 is stored in a file list.
Alternatively, the management server 1 may cache files with access frequencies exceeding a preset frequency in the processing node 2 of the data access system. And deleting files with access frequency lower than the preset frequency from the processing node 2 of the data access system.
Optionally, the management server 1 caches, in the processing node 2 of the data access system, the file whose access frequency exceeds the first preset frequency threshold in the time period of the last preset time. And deleting files of which the access frequency in the time period of the latest preset time length does not exceed the second preset frequency threshold from the processing node 2 of the data access system. The first preset frequency threshold is greater than or equal to the second preset frequency threshold.
The management server 1 stores therein history access records each of which stores an identification of a file that a user has accessed and an access time.
Optionally, the management server 1 periodically or aperiodically counts, in the historical access record, the access frequency of the file accessed within the latest preset time period, and when the access frequency of the file exceeds a first preset frequency threshold, the processing node 2 is controlled to obtain the file from the storage server 3 where the file is located and cache the file in the processing node 2 of the data access system.
Referring to fig. 4, when implemented, it may be implemented by the operations 2021 to 2026 as follows. The operations of 2021 to 2026 are respectively:
2021: the management server 1 selects an identifier of a file which does not exist in the file list from the historical access records, and counts the access frequency of the file in the latest time period of preset duration according to the historical access records and the identifier of the file.
The selected identifier is not in the file list, indicating that the file to which the identifier corresponds is not cached in the processing node 2 of the data access system.
In this step, the management server 1 may obtain, from the history access record, the access time within the period of the latest preset duration corresponding to the identifier of the file. The management server 1 counts the number of the acquired access time, the counted number is equal to the access times of the file, and the access frequency of the file in the latest time period of the preset time is obtained according to the access times and the preset time.
2022: when the access frequency exceeds a first preset frequency threshold, the management server 1 obtains, according to the identifier of the file, an identifier of a storage server 3 where the file is located and an identifier of at least one subfile included in the file, where the identifier of the storage server 3 may be an address of the storage server 3, and may be, for example, an Internet Protocol (IP) address interconnected between networks of the storage server 3.
The technician may input in advance to the management server 1 the identity of the storage server 3 in the data access system. The management server 1 may obtain the identifier of each file stored in the storage server 3 according to the identifier of the storage server 3, and store the obtained identifier of each file and the identifier of the storage server 3 in the correspondence relationship between the identifier of the file and the identifier of the storage server.
Optionally, for each file stored in the storage server 3, the management server 1 may further obtain an identifier of each subfile included in the file from the storage server 3, and store the identifier of the file and the obtained identifier of each subfile in a corresponding relationship between the identifier of the file and the identifier of the subfile.
In this step, the management server 1 counts that the access frequency of a certain file exceeds a first preset frequency threshold, and may obtain, according to the identifier of the file, the identifier of the storage server 3 where the file is located from the correspondence between the identifier of the file and the identifier of the storage server. Under the condition that the management server 1 stores the corresponding relationship between the file identifier and the subfile identifier, the management server 1 acquires the identifier of each subfile included in the file from the corresponding relationship between the file identifier and the subfile identifier according to the file identifier. In the case that the management server 1 does not store the correspondence between the identifier of the file and the identifier of the subfile, the management server 1 acquires the identifier of each subfile included in the file from the storage server 3 according to the identifier of the storage server 3.
2023: the management server 1 generates at least one caching task, each caching task comprising an identification of the storage server 3 and an identification of one of the subfiles in the file.
2024: for each caching task, the management server 1 selects one processing node 2 and sends the caching task to the processing node 2.
The management server 1 may start traversal from a first cache task of the at least one cache task, select one processing node 2 each time a cache task is traversed, and then send the one cache task to the processing node 2. And then traversing the next caching task, and repeating the process until the last caching task is sent.
Alternatively, the management server 1 randomly selects one processing node 2 among the processing nodes 2 of the data access system. Or, optionally, the management server 1 may store a corresponding relationship between the identifier of the processing node 2 and the size of the free cache space, where the corresponding relationship stores the identifier of each processing node 2 and the size of the free cache space in the data access system. Thus, the management server 1 may first select at least one processing node 2 with the largest free cache space size based on the corresponding relationship, where the number of the at least one processing node 2 is equal to the number of subfiles included in the file, then select one processing node 2 from the at least one processing node 2 in each traversal of one cache task, and then send the cache task to the processing node 2.
2025: the processing node 2 receives a caching task, acquires the subfile corresponding to the identifier of the subfile in the caching task from the storage server 3 according to the identifier of the storage server 3 in the caching task, and caches the acquired subfile in its own cache 21.
Optionally, the processing node 2 may further send a cache success message corresponding to the caching task to the management server 1.
Optionally, after caching the subfile, the processing node 2 may further obtain the size of the remaining free cache space of itself, and send the size of the remaining free cache space to the management server 1.
2026: the management server 1 may correspondingly save the identifier of the file, the identifier of the subfile in the caching task, and the identifier of the selected processing node 2 into the data list.
Optionally, the management server 1 executes this step after selecting one processing node 2 for the caching task, or executes this step after receiving a caching success message corresponding to the caching task and sent by the processing node 2.
Optionally, the management server 1 may further receive the size of the remaining free cache space of the processing node 2, and update the size of the free cache space of the processing node 2 to the size of the received remaining free cache space in the correspondence between the identifier of the processing node 2 and the size of the free cache space.
Optionally, the management server 1 further obtains an access frequency of each file cached in the processing node 2 of the data access system within a latest preset time period, and deletes a file with an access frequency lower than a preset second preset frequency threshold from the processing node 2 of the data access system.
Referring to fig. 5, when implemented, it can be implemented by the operations of 2121 to 2123 as follows. The operations of 2121 to 2123 are respectively:
2121: for the identifier of any file in the file list, the management server 1 counts the access frequency of the file in the latest time period of preset duration according to the identifier of the file and the historical access record.
When the access frequency is achieved, the access time corresponding to the file is obtained in the historical access record according to the identification of the file, the number of the access time in the latest time period preset in duration is counted, the access frequency of the file is obtained, and the access frequency of the file is obtained according to the access frequency.
2122: when the access frequency of the file is lower than a second preset frequency threshold, the management server 1 obtains the identifier of each subfile included in the file and the identifier of the processing node 2 where each subfile is located from the file list.
2123: for each subfile, the management server 1 sends a deletion task to the processing node 2 where the subfile is located, the deletion task including the identity of the subfile, and then deletes the record including the identity of the file from the file list.
The processing node 2 receives the deletion task, and deletes the subfile corresponding to the identifier of the subfile in the deletion task from its cache 21.
Optionally, after deleting the subfile, the processing node 2 may further obtain the size of the remaining free cache space in the cache 21 of the processing node, and send the size of the remaining free cache space to the management server 1.
Optionally, the management server 1 further receives the size of the remaining free cache space of the processing node 2, and updates the size of the free cache space of the processing node 2 to the received size of the remaining free cache space in the correspondence between the identifier of the processing node 2 and the size of the free cache space.
Since the processing node 2 of the data access system stores the files with the access frequency exceeding the first preset frequency threshold value in the latest preset time period, the hit rate of each file cached in the processing node 2 can be improved when accessing the files.
The above is only one implementation example of caching files in the processing node 2 of the data access system to eliminate files from the processing node 2 of the data access system, which is listed in the present application. The present application is also applicable to other implementations of caching files in the processing node 2 of the data access system and other implementations of eliminating files from the processing node 2 of the data access system, which are not listed here.
In this step, the management server 1 may query the file list according to the identifier of the file to be accessed, and if the identifier of each subfile included in the file to be accessed and the identifier of the processing node 2 where each subfile is located are not queried, it is determined that the file to be accessed is not cached in the processing node 2 of the data access system. And if the identification of each subfile included in the file to be accessed and the identification of the processing node 2 where each subfile is located are inquired, determining that the file to be accessed is cached in the processing node 2 of the data access system.
Optionally, after receiving the file access request, the management server 1 may further use the current time as the access time of the file to be accessed, and may store the correspondence between the identifier of the file to be accessed and the access time in the history access record.
Step 203: when the file to be accessed is not cached in the processing node 2 of the data access system, the management server 1 generates at least one first reading task, wherein each first reading task comprises an address of the storage server 2 where the file to be accessed is located and an identifier of one subfile in the file to be accessed.
The identity of the subfiles included in each first read task is different.
In this step, the management server 1 may obtain, according to the identifier of the file to be accessed, the identifier of the storage server 3 where the file to be accessed is located from the correspondence between the identifier of the file and the identifier of the storage server 3.
Under the condition that the management server 1 stores the corresponding relationship between the file identifier and the subfile identifier, the management server 1 obtains the identifier of at least one subfile included in the file to be accessed from the corresponding relationship between the file identifier and the subfile identifier according to the file to be accessed, and generates at least one first reading task, wherein each first reading task comprises the identifier of the storage server 3 and the identifier of one subfile in the file to be accessed.
Under the condition that the management server 1 does not store the corresponding relation between the file identifier and the subfile identifier, the management server 1 acquires the identifier of at least one subfile included in the file to be accessed from the storage server 3 according to the identifier of the storage server 3, and generates at least one first reading task, wherein each first reading task includes the identifier of the storage server 3 and the identifier of one subfile in the file to be accessed.
Optionally, the management server 1 may further count an access frequency of the file to be accessed in a latest time period of a preset time duration, and when the access frequency exceeds a first preset frequency threshold, each generated first reading task may further include a cache instruction. The caching indication is used for indicating that the processing node 2 receiving the first reading task caches a subfile of the file to be accessed when the subfile is acquired from the storage server 3 where the file to be accessed is located.
Optionally, the management server 1 may obtain each access time corresponding to the file to be accessed from the access history record according to the identifier of the file to be accessed, count the number of access times in the latest time period of the preset duration to obtain the number of times that the file to be accessed is accessed, and use the number of times as the access frequency of the file to be accessed.
Step 204: for each of the at least one first read task, the management server 1 selects one processing node 2 and sends the first read task to that processing node 2.
In this step, the management server 1 may start traversal from a first read task of the at least one first read task, select one processing node 2 from the processing nodes 2 included in the data access system each time one first read task is traversed, and send the first read task to the processing node 2. When the first reading task is sent, the management server 1 traverses the next first reading task again, and repeats the above process until the last first reading task is sent.
Alternatively, one processing node may be selected from the processing nodes 2 of the data access system in the following two ways. The two modes are respectively as follows:
in a first manner, the management server 1 may randomly select one processing node 2 from the processing nodes 2 of the data access system.
In the second mode, the management server 1 may select one processing node 2 having the smallest number of tasks currently processed from the processing nodes 2 of the data access system.
In the second mode, the management server 1 stores a corresponding relationship between the identifier of the processing node 2 and the number of tasks, and each record in the corresponding relationship includes the identifier of one processing node 2 and the number of tasks currently processed by the processing node 2.
Thus, when selecting the processing node 2, the management server 1 reads the number of tasks of each processing node 2 in the data access system from the correspondence relationship, and selects the processing node 2 having the smallest number of tasks.
In the second mode, after one processing node 2 with the smallest number of tasks is selected, the number of tasks of the processing node 2 is increased in the correspondence relationship.
Optionally, when the management server 1 selects the processing node 2 for the first reading task, the processing node 2 may be used as a file summary node, and then before sending the first reading task, the identifier of the file summary node is added to the first reading task. Alternatively, the management server 1 may also select one processing node 2 as a file summary node in the above-mentioned manner or in the above-mentioned manner two before generating the at least one first read task, and each first read task generated in this way includes an identifier of the file summary node.
Optionally, the management server 1 further sends a summary task to the file summary node, where the summary task includes the number of subfiles in the file to be accessed.
The file summary node selected by the above-mentioned first or second method may be different from the processing node 2 selected by the management server 1 for each first read task, or may be the same as the processing node 2 selected by the management server 1 for a certain first read task.
Optionally, the management server 1 selects one processing node 2 after traversing to one first reading task. For a certain processing node 2, the processing node 2 may be selected by the management server 1 multiple times, i.e. multiple first read tasks are sent to the processing node 2 at different times.
In the second usage mode, the management server 1 records the number of first read tasks allocated by the selected processing node 2, that is, stores the correspondence between the identifier of the selected processing node 2 and the number of first read tasks.
Step 205: the processing node 2 receives the first reading task, acquires the subfile corresponding to the identifier included in the first reading task from the storage server 3 where the file to be accessed is located according to the first reading task, sends the acquired subfile to the management server 1, and executes step 209.
In this step, the processing node 2 receives the first reading task, the processing node 2 establishes a network connection between the processing node 2 and the storage server 3 according to the identifier of the storage server 3 included in the first reading task, acquires the subfile from the storage server 3 through the network connection according to the identifier of the subfile included in the first reading task, and sends the subfile to the management server 1.
Optionally, because the identifier of the storage server 3 included in the first reading task is included, the processing node 2 may determine that the processing node 2 included in the data access system does not cache the file to be accessed, so that the processing node 2 directly establishes a network connection between the processing node 2 and the storage server 3 according to the identifier of the storage server 3, acquires the subfile from the storage server 3, and directly sends the subfile to the management server 1. The processing node 2 does not cache the subfile in the cache 21 of the processing node 2 before sending the subfile to the management server 1, so that the subfile does not pass through the cache 21 of the processing node 2, transmission paths of the subfile are reduced, and transmission efficiency of the subfile is improved.
Optionally, if the first reading task further includes an identifier of a file summarizing node, when the processing node 2 is not a file summarizing node, the processing node 2 sends the acquired subfile to the file summarizing node according to the identifier of the file summarizing node. When the processing node 2 is a file summarizing node, the processing node 2 further receives a summarizing task and subfiles sent by other processing nodes 2, and when the number of the subfiles acquired by itself and the number of the received subfiles reach the number of the subfiles in the summarizing task, the subfiles acquired by itself and the received subfiles form a file to be accessed, and the file to be accessed is sent to the management server 1.
Optionally, under the condition that the file summarizing node and the processing node 2 selected by the management server 1 for each first reading task are different, the file summarizing node receives the summarizing task and sub-files sent by other processing nodes 2, and when the number of the received sub-files reaches the number of the sub-files included in the summarizing task, the received sub-files are combined into a file to be accessed, and the file to be accessed is sent to the management server 1.
Optionally, if the first read task further includes a cache indication, when the processing node 2 acquires the subfile, the acquired subfile is cached in the cache 21 included in the processing node 2. The processing node 2 may cache the acquired subfiles in the cache 21 included in the processing node 2 after transmitting the acquired subfiles to the management server 1. Alternatively, the processing node 2 may cache the acquired subfile in the cache 21 included in the processing node 2 while transmitting the acquired subfile to the management server 1.
Step 206: when the processing node 2 in the data access system is queried to cache the file to be accessed, the management server 1 generates at least one second reading task.
When the file to be accessed is cached in the processing node 2 in the data access system, the management server 1 may query the identifier of each subfile in the file to be accessed and the identifier of the processing node 2 where each subfile is located from the file list.
Each second reading task comprises the identification of a subfile in the file to be accessed and the identification of the processing node 2 where the subfile is located.
Step 207: for each of the at least one second read task, the management server 1 selects one processing node 2 and sends the second read task to the processing node 2.
In this step, the management server 1 may start traversal from a first second read task of the at least one second read task, and each time a second read task is traversed, select one processing node 2 from the processing nodes 2 included in the data access system, and send the second read task to the processing node 2. After sending the second reading task, the management server 1 traverses the next second reading task, and repeats the above process until the last second reading task is sent.
Alternatively, one processing node 2 may be selected from the processing nodes 2 of the data access system in the above-described manner one or manner two.
In the second usage mode, the management server 1 records the number of second reading tasks allocated by the selected processing node 2, that is, stores the correspondence between the identifier of the selected processing node 2 and the number of second reading tasks.
In addition to the above-described first and second modes, one processing node 3 may be selected in the following third mode:
in the third mode, the management server 1 directly selects the processing node 2 corresponding to the identifier of the processing node 2 in the second reading task.
Optionally, when the management server 1 selects the processing node 2 for the first second reading task, the processing node 2 may be used as a file summarizing node, and then before the second reading task is sent, the identifier of the file summarizing node is added to the second reading task. Alternatively, the management server 1 may also select one processing node 2 as a file aggregation node in the foregoing one or two ways before generating at least one second reading task, and each second reading task generated in this way includes an identifier of the file aggregation node.
Optionally, the management server 1 further sends a summary task to the file summary node, where the summary task includes the number of subfiles in the file to be accessed.
The file summary node selected in the first or second manner may be different from the processing node 2 selected by the management server 1 for each second read task, or may be the same as the processing node 2 selected by the management server 1 for a certain second read task.
Optionally, the management server 1 selects one processing node 2 after traversing to one second reading task. For a certain processing node 2, the processing node 2 may be selected by the management server 1 multiple times, that is, multiple second read tasks are sent to the processing node 2 at different times.
Step 208: the processing node 2 receives the second reading task, acquires the subfile according to the identifier of the subfile included in the second reading task and the identifier of the processing node 2, and sends the acquired subfile to the management server 1.
In this step, the processing node 2 receives the second read task, which includes an identification of a subfile and an identification of the processing node 2. If the processing node 2 is the processing node 2 corresponding to the identifier of the processing node 2 in the second reading task, the processing node 2 acquires the corresponding subfile according to the identifier of the subfile in the second reading task. If the processing node 2 is not the processing node 2 corresponding to the identifier of the processing node 2 in the second reading task, the processing node 2 acquires the corresponding subfile from the processing node 2 corresponding to the identifier of the processing node 2 in the second reading task according to the identifier of the subfile in the second reading task.
Optionally, if the second reading task further includes an identifier of a file summarizing node, when the processing node 2 is not a file summarizing node, the processing node 2 sends the acquired subfile to the file summarizing node according to the identifier of the file summarizing node. When the processing node 2 is a file summarizing node, the processing node 2 further receives a summarizing task and subfiles sent by other processing nodes 2, and when the sum of the number of the subfiles acquired by the processing node and the number of the received subfiles reaches the number of the subfiles in the summarizing task, the subfiles acquired by the processing node and the received subfiles form a file to be accessed, and the file to be accessed is sent to the management server 1.
Optionally, under the condition that the file summarizing node and the processing node 2 selected by the management server 1 for each second reading task are different, the file summarizing node receives the summarizing task and sub-files sent by other processing nodes 2, and when the number of the received sub-files reaches the number of the sub-files in the summarizing task, the received sub-files are combined into a file to be accessed, and the file to be accessed is sent to the management server 1.
Step 209: the management server 1 receives the subfiles sent by each processing node 2, obtains the file to be accessed, and sends the file to be accessed to the client 4.
Optionally, the management server 1 integrates the received subfiles into a file to be accessed, and sends the file to be accessed to the client 4.
The first reading task or the second reading task further includes an identifier of a file summarizing node, and the management server 1 receives the file to be accessed sent by the file summarizing node and sends the file to be accessed to the client 4.
Optionally, in the case that the processing node 2 is selected in the second usage mode, the management server 1 stores a correspondence between the identifier of the processing node 2 and the number of tasks. For any selected processing node 2, the recorded first reading task number or second reading task number of the processing node 2 is subtracted from the task number of the processing node 2 stored in the corresponding relationship between the identifier of the processing node 2 and the task number.
In the embodiment of the present application, when determining that a file to be accessed is not cached in a processing node 2 of a data access system, the management server 1 generates at least one first read task, where each first read task includes an identifier of a storage server 3 where the file to be accessed is located and an identifier of a subfile in the file to be accessed. Therefore, when receiving the first reading task, the processing node 2 does not access the cache 21 in the processing node 2 first, but can directly acquire the subfile from the storage server 3 according to the identifier of the storage server 3 included in the first reading task and then send the subfile to the management server 1, and the processing node 2 does not cache the subfile in the cache 21 of the processing node 2 before sending the subfile to the management server 1, so that the subfile does not pass through the cache 21 of the processing node 2, and the transmission delay of the file to be accessed is reduced. When the processing node 2 of the data access system caches the file to be accessed, the generated second reading task includes the identifier of the processing node 2 where the subfile of the file to be accessed is located, so that the processing node 2 receiving the second reading task can conveniently obtain the subfile based on the identifier of the processing node 2 in the second reading task, and the file access efficiency is improved. In addition, when the file to be accessed is not stored in the processing node 2 of the data access system, the access frequency of the file to be accessed in the latest time period preset by duration is acquired, and when the access frequency exceeds a first preset frequency threshold, the processing node 2 is controlled to cache the file to be accessed. When the access frequency exceeds a first preset frequency threshold value, the file to be accessed is a file which is frequently accessed recently, and the file to be accessed is stored in the cache 21 of the processing node 2 of the data access system, so that the utilization rate of the cache 21 of the processing node 2 is improved, and the hit rate of the file is also improved.
Referring to fig. 6, an embodiment of the present application provides an apparatus 300 for data access, where the apparatus 300 is deployed in the management server 1, the apparatus 300 is connected to a plurality of processing nodes 2, and the plurality of processing nodes 2 are connected to a storage server 3. The apparatus 300 comprises:
the receiving unit 301 is configured to receive a file access request, where the file access request carries an identifier of a file to be accessed.
A processing unit 302, configured to determine whether the file to be accessed is cached in the caches 21 of the multiple processing nodes 2 according to the identifier of the file to be accessed, where the identifier of the file cached in the caches 21 of the multiple processing nodes 2 is stored in the apparatus 300.
The processing unit 303 is further configured to instruct at least one processing node 2 of the plurality of processing nodes 2 to obtain the file to be accessed from the storage server 3 when the file to be accessed is not cached in the caches 21 of the plurality of processing nodes 2.
Optionally, the detailed implementation process of the processing unit 302 determining whether the file to be accessed is cached in the caches 21 of the plurality of processing nodes 2 may refer to relevant contents in step 202 of the embodiment shown in fig. 3, and will not be described in detail here.
Referring to fig. 6, optionally, the apparatus 300 further comprises: the first sending unit 302 is used for sending the data,
a processing unit 302, configured to obtain, from the storage server 3, an identifier of at least one subfile included in the file to be accessed and an identifier of the storage server 3 where each subfile is located; and generating a reading task aiming at each subfile included in the file to be accessed, wherein each reading task comprises an identifier of the subfile and an identifier of the storage server 3 where the subfile is located.
A first sending unit 303, configured to send each read task to one processing node 2, respectively, and instruct the processing node 2 that receives the read task to read the subfile from the storage server 3 that stores the subfile.
A receiving unit 301, configured to receive the subfiles read by the processing node 2 that has received the reading task.
A processing unit 302, configured to merge the received subfiles into a file to be accessed.
Optionally, the processing unit 302 generates a detailed implementation process of the reading task, which can be referred to related contents in step 203 of the embodiment shown in fig. 3. The detailed implementation process of the first sending unit 303 sending the reading task can refer to relevant contents in step 204 of the embodiment shown in fig. 3, and is not described in detail here.
Referring to fig. 6, optionally, the apparatus 300 further comprises: the second sending unit 304 is used to send the second message,
a second sending unit 304, configured to send a caching task to at least one processing node 2 of the multiple processing nodes 2 when the access frequency of the file to be accessed exceeds the preset frequency, so as to instruct the at least one processing node 2 to cache the subfiles included in the file to be accessed to the at least one processing node 2.
The processing unit 302 is further configured to record identifiers of subfiles included in the identifier of the file to be accessed and an identifier of a processing node 2 caching each subfile; when the file to be accessed is cached in the caches 21 of the plurality of processing nodes 2, at least one reading task is generated, and each reading task comprises the identifier of the subfile and the identifier of the processing node 2 where the subfile is located.
The second sending unit 304 is further configured to send at least one read task to the plurality of processing nodes 2, and instruct the plurality of processing nodes 2 to read the subfile from the cache 21 of the processing node 2 storing the subfile.
The processing unit 302 is further configured to synthesize the fetched subfiles into a file to be accessed.
Optionally, the second sending unit 304 sends a detailed implementation procedure of the caching task, which may refer to relevant contents in steps 2023 and 2024 in the embodiment shown in fig. 4. The processing unit 302 generates a detailed implementation of the reading task, which can be seen in the embodiment shown in fig. 3 with reference to the relevant content in step 206. And the detailed implementation process of the second sending unit 304 sending the reading task, can refer to relevant contents in step 207 in the embodiment shown in fig. 3, and will not be described in detail here.
Optionally, the second sending unit 304 is further configured to send, when the access frequency of the file to be accessed is lower than the preset frequency, a deletion task to the processing node 2 where the subfile included in the file to be accessed is located, where the deletion task includes an identifier of the subfile, so as to instruct the processing node 2 to delete the subfile.
The processing unit 302 is further configured to delete the identifier of the subfile and the identifier of the processing node 2 recorded in the apparatus 300.
Optionally, the detailed implementation process of sending the deletion task by the second sending unit 304 can refer to relevant contents in steps 2122 and 2123 in the embodiment shown in fig. 5, and will not be described in detail here.
In the embodiment of the present application, the processing unit 302 determines whether the file to be accessed is cached in the caches 21 of the multiple processing nodes 2 according to the identifier of the file to be accessed; when the file to be accessed is not cached in the caches 21 of the processing nodes 2, at least one processing node 2 of the processing nodes 2 is instructed to obtain the file to be accessed from the storage server 3. In this way, for the at least one processing node 2, the file to be accessed is directly read from the storage server 3 according to the instruction of the processing unit 302, the file to be accessed is directly returned to the apparatus 300 after being read, and the file to be accessed is not cached in the cache 21 of the at least one processing node 2 before the file to be accessed is returned to the apparatus 300. Therefore, the file to be accessed does not need to pass through the cache 21 of the at least one processing node 2, so that the transmission path of the file to be accessed is reduced, the path for reading data is reduced, and the performance of data access is improved.
Referring to fig. 7, an apparatus 400 for data access is provided in the embodiment of the present application, where the apparatus 400 is deployed in the processing node 2, the apparatus 400 is one of a plurality of processing nodes 2 connected to the management server 1, and the plurality of processing nodes 2 are connected to the storage server 3. The apparatus 400 comprises:
a receiving unit 401, configured to receive a read task, where the read task is a task sent by the management server 1 when it is determined that the file to be accessed is not cached in the caches 21 of the multiple processing nodes 2, and the read task includes an identifier of a subfile in the file to be accessed and an identifier of the storage server 3 where the subfile is located.
A processing unit 402, configured to read the subfile from the storage server 3 corresponding to the identifier of the storage server 3 according to the identifier of the subfile.
A sending unit 403, configured to send the read subfile to the management server 1.
Optionally, the detailed implementation process of the processing unit 402 for reading the subfile may refer to the relevant content in step 205 in the embodiment shown in fig. 3, and will not be described in detail here.
Optionally, the receiving unit 401 is further configured to receive a caching task, where the caching task is a task sent by the management server 1 when the access frequency of the file to be accessed exceeds a preset frequency, and the caching task includes an identifier of a subfile of the file to be accessed and an identifier of the storage server 3 where the subfile is located.
The processing unit 402 is further configured to read the subfile from the storage server 3 corresponding to the identifier of the storage server 3 according to the identifier of the subfile; the subfile is stored in the cache 21 of the device 400.
Optionally, the detailed implementation process of the processing unit 402 for caching the subfiles may refer to the relevant content in step 2025 in the embodiment shown in fig. 4, and will not be described in detail here.
In the embodiment of the present application, the reading task received by the receiving unit 401 includes an identifier of a subfile in the file to be accessed and an identifier of the storage server 3 where the subfile is located; the processing unit 402 reads the subfile from the storage server 3 corresponding to the identifier of the storage server 3 according to the identifier of the subfile; the transmission unit 403 transmits the read subfile to the management server 1. Thus, the processing unit 402 can directly read the subfile from the storage server 3 according to the identifier of the storage server 3, and then the sending unit 403 directly returns the subfile to the management server 1, and the processing unit 402 does not cache the subfile in the cache 21 of the apparatus 400 before returning the subfile to the management server 1. Therefore, the subfiles returned to the management server 1 do not pass through the cache 21 of the device 400, the transmission path of the subfiles is reduced, the path for reading data is reduced, and the performance of data access is improved.
Referring to fig. 8, fig. 8 is a schematic diagram illustrating an apparatus 500 for data access according to an embodiment of the present application. The apparatus 500 comprises at least one processor 501, a bus system 502, a memory 503 and a transceiver 504.
The apparatus 500 is a hardware structure apparatus, and can be used to implement the functional units in the apparatus shown in fig. 6. For example, those skilled in the art may appreciate that the processing unit 302 in the apparatus 300 shown in fig. 6 may be implemented by the at least one processor 501 calling an application program code in the memory 503, and the receiving unit 301, the first sending unit 303, and the second sending unit 304 in the apparatus 300 shown in fig. 6 may be implemented by the transceiver 504.
Optionally, the apparatus 500 may also be used to implement the functions of the management server 1 in the embodiments described in fig. 1 or fig. 3.
Alternatively, the processor 501 may be a general processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs according to the present disclosure.
The bus system 502 may include a path that carries information between the components.
The memory 503 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a Random Access Memory (RAM) or other types of dynamic storage devices that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.
The memory 503 is used for storing application program codes for executing the scheme of the application, and the processor 501 controls the execution. The processor 501 is configured to execute application program code stored in the memory 503 to implement the functions of the method of the present patent.
In particular implementations, processor 501 may include one or more CPUs such as CPU0 and CPU1 in fig. 5 as an example.
In particular implementations, the apparatus 500 may include multiple processors, such as the processor 501 and the processor 508 of fig. 5, for example, as an example. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
Referring to fig. 9, fig. 9 is a schematic diagram illustrating an apparatus 600 for data access according to an embodiment of the present application. The apparatus 700 comprises at least one processor 601, a bus system 602, a memory 603 and a transceiver 604. The memory 603 further includes a cache 21, and the cache 21 is configured to store subfiles included in files with access frequencies exceeding a preset frequency.
The apparatus 600 is a hardware structure apparatus, and can be used to implement the functional units in the apparatus described in fig. 7. For example, it is conceivable for a person skilled in the art that the processing unit 402 in the apparatus 400 shown in fig. 7 may be implemented by the at least one processor 601 calling code in the memory 603, and the transmitting unit 403 and the receiving unit 401 in the apparatus 400 shown in fig. 7 may be implemented by the transceiver 604.
Optionally, the apparatus 600 may also be used to implement the functions of the processing node 2 in the embodiments described in fig. 1 or fig. 3.
Alternatively, the processor 601 may be a general processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs according to the present disclosure.
The bus system 602 may include a path that carries information between the components.
The memory 603 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.
The memory 603 is used for storing application program codes for executing the scheme of the application, and the processor 601 controls the execution. The processor 601 is configured to execute application program code stored in the memory 603 to implement the functions of the method of the present patent.
In particular implementations, processor 601 may include one or more CPUs such as CPU0 and CPU1 in fig. 7 as an example.
In particular implementations, the apparatus 600 may include multiple processors, such as the processor 601 and the processor 608 of fig. 7, for example, as an example. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method of data access, the method performed by a management server, the management server connected to a plurality of processing nodes, the plurality of processing nodes connected to a storage server, the method comprising:
receiving a file access request, wherein the file access request carries an identifier of a file to be accessed;
determining whether the file to be accessed is cached in a cache of at least one processing node in the plurality of processing nodes according to the identifier of the file to be accessed, wherein the identifier of the cached file is stored in the management server;
and when the file to be accessed is not cached in the cache of at least one processing node of the plurality of processing nodes, instructing at least one processing node of the plurality of processing nodes to acquire the file to be accessed from the storage server.
2. The method of claim 1, wherein the method further comprises:
acquiring the identifier of at least one subfile included in the file to be accessed and the identifier of the storage server where each subfile is located from the storage server;
the instructing at least one processing node in the plurality of nodes to acquire the file to be accessed from the storage server comprises:
generating a reading task for each subfile included in the file to be accessed, wherein each reading task comprises an identifier of the subfile and an identifier of a storage server where the subfile is located;
sending each reading task to a processing node respectively, and indicating the processing node receiving the reading task to read the subfiles from a storage server storing the subfiles;
receiving subfiles read by processing nodes receiving the reading tasks;
and merging the subfiles into the file to be accessed.
3. The method of claim 1 or 2, wherein the method further comprises:
when the access frequency of the file to be accessed exceeds a preset frequency, sending a caching task to at least one processing node of the plurality of processing nodes so as to instruct the at least one processing node to cache the subfiles included in the file to be accessed to the at least one processing node;
recording subfile identifications included by the identifications of the files to be accessed and identifications of processing nodes caching each subfile;
when the file to be accessed is cached in the caches of the processing nodes, generating at least one reading task, wherein each reading task comprises an identifier of a subfile and an identifier of the processing node where the subfile is located;
sending the at least one read task to the plurality of processing nodes, instructing the plurality of processing nodes to read the subfile from a cache of the processing node in which the subfile is stored;
and synthesizing the read subfiles into the file to be accessed.
4. The method of claim 3, wherein the method further comprises:
when the access frequency of the file to be accessed is lower than a preset frequency, sending a deletion task to a processing node where a subfile included in the file to be accessed is located, wherein the deletion task includes an identifier of the subfile to indicate the processing node to delete the subfile;
and deleting the identifier of the subfile and the identifier of the processing node recorded in the management server.
5. An apparatus for data access, the apparatus being connected to a plurality of processing nodes, the plurality of processing nodes being connected to a storage server, the apparatus comprising:
the device comprises a receiving unit, a processing unit and a processing unit, wherein the receiving unit is used for receiving a file access request which carries an identifier of a file to be accessed;
the processing unit is used for determining whether the file to be accessed is cached in the cache of at least one processing node of the plurality of processing nodes according to the identifier of the file to be accessed, and the identifier of the cached file is stored in the device;
the processing unit is further configured to instruct at least one processing node of the plurality of processing nodes to acquire the file to be accessed from the storage server when the file to be accessed is not cached in the cache of the at least one processing node of the plurality of processing nodes.
6. The apparatus of claim 5, wherein the apparatus further comprises: a first sending unit for sending the data to the first sending unit,
the processing unit is configured to acquire, from the storage server, an identifier of at least one subfile included in the file to be accessed and an identifier of the storage server where each subfile is located; generating a reading task for each subfile included in the file to be accessed, wherein each reading task comprises an identifier of the subfile and an identifier of a storage server where the subfile is located;
the first sending unit is configured to send each read task to one processing node, and instruct the processing node that receives the read task to read the subfile from the storage server that stores the subfile;
the receiving unit is used for receiving the subfiles read by the processing nodes receiving the reading tasks;
and the processing unit is used for merging the subfiles into the file to be accessed.
7. The apparatus of claim 5 or 6, wherein the apparatus further comprises: a second sending unit for sending the first data to the second sending unit,
the second sending unit is configured to send a caching task to at least one processing node of the multiple processing nodes when the access frequency of the file to be accessed exceeds a preset frequency, so as to instruct the at least one processing node to cache the subfiles included in the file to be accessed to the at least one processing node;
the processing unit is further configured to record subfile identifiers included in the identifier of the file to be accessed and identifiers of processing nodes caching each subfile; when the file to be accessed is cached in the caches of the processing nodes, generating at least one reading task, wherein each reading task comprises an identifier of a subfile and an identifier of the processing node where the subfile is located;
the second sending unit is further configured to send the at least one read task to the plurality of processing nodes, and instruct the plurality of processing nodes to read the subfile from the cache of the processing node in which the subfile is stored;
the processing unit is further configured to synthesize the read subfiles into the file to be accessed.
8. The apparatus of claim 7,
the second sending unit is further configured to send a deletion task to a processing node where a subfile included in the file to be accessed is located when the access frequency of the file to be accessed is lower than a preset frequency, where the deletion task includes an identifier of the subfile to instruct the processing node to delete the subfile;
the processing unit is further configured to delete the identifier of the subfile and the identifier of the processing node recorded in the apparatus.
9. A system for data access, the system comprising: the system comprises a management server, a storage server and a plurality of processing nodes;
the management server is used for receiving a file access request, wherein the file access request carries an identifier of a file to be accessed; determining whether the file to be accessed is cached in a cache of at least one processing node in the plurality of processing nodes according to the identifier of the file to be accessed, wherein the identifier of the cached file is stored in the management server; when the file to be accessed is not cached in the cache of at least one processing node of the plurality of processing nodes, acquiring the identifier of at least one subfile included in the file to be accessed and the identifier of the storage server where each subfile is located from the storage server, and generating a reading task for each subfile included in the file to be accessed, wherein each reading task comprises the identifier of one subfile and the identifier of the storage server where the subfile is located; respectively sending each reading task to a processing node;
the processing node which receives the reading task is used for reading the corresponding subfile from the storage server corresponding to the identifier of the storage server according to the identifier of the subfile in the received reading task and sending the read subfile to the management server;
the management server is further configured to receive the subfiles read by the processing nodes that receive the read task.
10. The system of claim 9,
the management server is further configured to send a caching task to at least one processing node of the plurality of processing nodes when the access frequency of the file to be accessed exceeds a preset frequency;
the processing node receives the caching task and is used for caching the subfiles included in the file to be accessed;
the management server is further used for recording the subfile identifications included by the identifications of the files to be accessed and the identifications of the processing nodes caching each subfile;
the management server is further configured to generate at least one read task when the file to be accessed is cached in the caches of the plurality of processing nodes, where each read task includes an identifier of a subfile and an identifier of a processing node where the subfile is located; sending the at least one read task to the plurality of processing nodes;
the processing node which receives the reading task is used for reading the subfiles according to the identifiers of the subfiles in the received reading task and the identifier of the processing node where the subfiles are located, and sending the read subfiles to the management server;
the management server is further configured to receive the subfiles read by the processing nodes receiving the reading task.
CN201910786485.4A 2019-08-23 2019-08-23 Data access method, device and system Active CN112416871B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910786485.4A CN112416871B (en) 2019-08-23 2019-08-23 Data access method, device and system
PCT/CN2020/110819 WO2021036989A1 (en) 2019-08-23 2020-08-24 Method, apparatus and system for data access

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910786485.4A CN112416871B (en) 2019-08-23 2019-08-23 Data access method, device and system

Publications (2)

Publication Number Publication Date
CN112416871A true CN112416871A (en) 2021-02-26
CN112416871B CN112416871B (en) 2023-10-13

Family

ID=74683263

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910786485.4A Active CN112416871B (en) 2019-08-23 2019-08-23 Data access method, device and system

Country Status (2)

Country Link
CN (1) CN112416871B (en)
WO (1) WO2021036989A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150277802A1 (en) * 2014-03-31 2015-10-01 Amazon Technologies, Inc. File storage using variable stripe sizes
CN107026876A (en) * 2016-01-29 2017-08-08 杭州海康威视数字技术股份有限公司 A kind of file data accesses system and method
CN107562757A (en) * 2016-07-01 2018-01-09 阿里巴巴集团控股有限公司 Inquiry, access method based on distributed file system, apparatus and system
CN107920101A (en) * 2016-10-10 2018-04-17 阿里巴巴集团控股有限公司 A kind of file access method, device, system and electronic equipment
CN109002260A (en) * 2018-07-02 2018-12-14 深圳市茁壮网络股份有限公司 A kind of data cached processing method and processing system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150277802A1 (en) * 2014-03-31 2015-10-01 Amazon Technologies, Inc. File storage using variable stripe sizes
CN107026876A (en) * 2016-01-29 2017-08-08 杭州海康威视数字技术股份有限公司 A kind of file data accesses system and method
CN107562757A (en) * 2016-07-01 2018-01-09 阿里巴巴集团控股有限公司 Inquiry, access method based on distributed file system, apparatus and system
CN107920101A (en) * 2016-10-10 2018-04-17 阿里巴巴集团控股有限公司 A kind of file access method, device, system and electronic equipment
CN109002260A (en) * 2018-07-02 2018-12-14 深圳市茁壮网络股份有限公司 A kind of data cached processing method and processing system

Also Published As

Publication number Publication date
WO2021036989A1 (en) 2021-03-04
CN112416871B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN108694075B (en) Method and device for processing report data, electronic equipment and readable storage medium
US8191068B2 (en) Resource management system, resource information providing method and program
US8438336B2 (en) System and method for managing large filesystem-based caches
CN111782692B (en) Frequency control method and device
CN111159219B (en) Data management method, device, server and storage medium
CN114116613A (en) Metadata query method, equipment and storage medium based on distributed file system
CN111221469A (en) Method, device and system for synchronizing cache data
CN109460345B (en) Real-time data calculation method and system
CN112579695A (en) Data synchronization method and device
CN112035766A (en) Webpage access method and device, storage medium and electronic equipment
CN113806305A (en) Data export method and device, computer readable storage medium and electronic equipment
CN111935242A (en) Data transmission method, device, server and storage medium
CN107181773A (en) Data storage and data managing method, the equipment of distributed memory system
CN111158892A (en) Task queue generating method, device and equipment
CN113127477A (en) Method and device for accessing database, computer equipment and storage medium
CN112416871B (en) Data access method, device and system
CN109241219A (en) A kind of map Dynamic Slicing and serializing caching method, device and storage medium
CN108960378A (en) A kind of data download method, system, device and storage medium
CN105022796B (en) A kind of file traversal method, apparatus and system
CN114493875A (en) Transaction execution method, computer device, and storage medium
CN112559570A (en) Cache data acquisition method, device, equipment and storage medium
US9075857B2 (en) Computer-readable non-transitory medium storing therein a control program, management apparatus, and information processing system
CN112749166A (en) Service data processing method, device, equipment and storage medium
JP6522261B1 (en) Method and apparatus for managing file attribute information {METHOD FOR MANAGING ATTRIBUTE INFORMATION OF FILE AND COMPUTING DEVICE USING THE SAME}
CN111259031A (en) Data updating method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant