WO2021036989A1 - Method, apparatus and system for data access - Google Patents

Method, apparatus and system for data access Download PDF

Info

Publication number
WO2021036989A1
WO2021036989A1 PCT/CN2020/110819 CN2020110819W WO2021036989A1 WO 2021036989 A1 WO2021036989 A1 WO 2021036989A1 CN 2020110819 W CN2020110819 W CN 2020110819W WO 2021036989 A1 WO2021036989 A1 WO 2021036989A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
accessed
sub
processing node
identifier
Prior art date
Application number
PCT/CN2020/110819
Other languages
French (fr)
Chinese (zh)
Inventor
李铮
王明月
刘玉
张巍
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021036989A1 publication Critical patent/WO2021036989A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1727Details of free space management performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Definitions

  • This application relates to the computer field, and in particular to a method, device and system for data access.
  • the distributed system includes a coordination server, multiple processing nodes, and a storage server that stores data.
  • the coordination server decomposes the access request into multiple tasks, which are sent to each processing node respectively, and each processing node accesses the data in the storage server, and each processing node reads the data in the storage server.
  • the data is returned to the coordination server, and the coordination server integrates the data returned by each processing node, and then returns it to the client.
  • each processing node After each processing node receives the task sent by the coordination server, it first determines whether the data to be accessed in the received task is in the cache of the processing node. If it is in the cache, it reads the data directly from the cache. If it is not in the cache, you need to read the data to be accessed from the storage server to the cache, and then read the data from the cache. It can be seen that for each processing node, if the accessed data does not hit the cache, it needs to read the data in the storage server to the processing node cache, and then read the data in the cache to the coordination server, so Increase the data read path, thereby affecting the performance of data access.
  • the present application provides a data access method, device and system to reduce the path of data reading and improve the performance of data access.
  • the technical solution is as follows:
  • this application provides a data access method, which is executed by a management server, the management server is connected to a plurality of processing nodes, the plurality of processing nodes are connected to a storage server, and the management server stores the plurality of processing nodes.
  • a file access request is received, and the file access request carries the identification of the file to be accessed; according to the identification of the file to be accessed, it is determined whether the file to be accessed is cached in the caches of the multiple processing nodes; when the file to be accessed is not cached In the cache of the multiple processing nodes, at least one processing node of the multiple processing nodes is instructed to obtain the file to be accessed from the storage server. Because when the file to be accessed is not cached in the caches of the multiple processing nodes, the at least one processing node is instructed to obtain the file to be accessed from the storage server.
  • the at least one processing node directly reads the file to be accessed from the storage server according to the instructions of the management server, and directly returns the file to be accessed to the management server after reading the file to be accessed, and does not return the file to be accessed to the management server.
  • the file to be accessed is cached in the cache of the at least one processing node. In this way, the file to be accessed does not need to pass through the cache of the at least one processing node, thereby reducing the transmission path of the file to be accessed, reducing the path for data reading, and improving the performance of data access.
  • the identification of at least one sub-file included in the file to be accessed and the identification of the storage server where each sub-file is located are obtained from the storage server; a read file is generated for each sub-file included in the file to be accessed.
  • Each read task includes the identification of a sub-file and the identification of the storage server where the sub-file is located; each read task is sent to a processing node, instructing the processing node that received the read task from The storage server storing the subfile reads the subfile; receives the subfile read by the processing node that receives the reading task; merges the subfile into the file to be accessed.
  • the processing node that receives the read task can directly read the subfile from the storage server according to the identifier of the storage server in the read task, And after the sub file is read, the sub file is directly sent to the management server. In this way, the sub-file will not be cached in the cache of the processing node first, and then the processing node will read the sub-file from its own cache and send it to the management server, thereby reducing the transmission path of the sub-file and improving Read the performance of this subfile.
  • the cache task is sent to at least one processing node of the multiple processing nodes to instruct the at least one processing node to place the file to be accessed.
  • the included sub-files are cached to the at least one processing node; the identifier of the sub-file included in the identifier of the file to be accessed and the identifier of the processing node that caches each sub-file are recorded.
  • each reading task includes the identification of the subfile and the identification of the processing node where the subfile is located; sending the at least one reading Assigning tasks to the multiple processing nodes instructs the multiple processing nodes to read the subfile from the cache of the processing node storing the subfile; synthesize the read subfile into the file to be accessed.
  • the access frequency of the file to be accessed exceeds the preset frequency, it indicates that the file to be accessed is a frequently accessed file. Due to the limited cache space in each processing node, the frequently accessed file to be accessed is saved to the at least one processing node. In the cache, this not only improves the cache utilization of the processing node, but also increases the hit rate of the files to be accessed.
  • the generated read task includes the identification of the processing node where the subfile is located, so that the processing node that receives the cache task does not need to determine the processing node where the subfile is located. , Read the sub-file from the processing node where the sub-file is located directly according to the identifier of the processing node where the sub-file is located, which improves the efficiency of reading the sub-file.
  • a deletion task is sent to the processing node where the subfile included in the file to be accessed is located, and the deletion task includes the identifier of the subfile, To instruct the processing node to delete the subfile; delete the identifier of the subfile and the identifier of the processing node recorded in the management server.
  • files to be accessed with a low access frequency can be deleted from the caches of the multiple processing nodes, and more cache space can be saved for storing files with a high access frequency, which not only improves the caches of the multiple processing nodes Utilization rate has also improved the file hit rate.
  • the present application provides a data access method, which is executed by a processing node, which is one of a plurality of processing nodes connected to a management server, and the plurality of processing nodes are connected to a storage server.
  • a reading task is received, and the reading task is a task sent by the management server when it determines that the file to be accessed is not cached in the caches of the multiple processing nodes, and the reading task includes a child in the file to be accessed.
  • the identifier of the file and the identifier of the storage server where the subfile is located read the subfile from the storage server corresponding to the identifier of the storage server according to the identifier of the subfile; send the read subfile to the management server.
  • the processing node can directly read the subfile from the storage server according to the identifier of the storage server, and then directly return it to the management server , And the processing node will not cache the sub-file in the cache of the processing node before returning the sub-file to the management server. Therefore, the child file returned to the management server will not be cached by the processing node, reducing the transmission path of the child file, reducing the data reading path, and improving the performance of data access.
  • a cache task is received.
  • the cache task is a task sent by the management server when the access frequency of the file to be accessed exceeds a preset frequency.
  • the cache task includes the identifier of a subfile of the file to be accessed and the The identifier of the storage server where the subfile is located; read the subfile from the storage server corresponding to the identifier of the storage server according to the identifier of the subfile; store the subfile in the cache of the processing node.
  • the access frequency of the file to be accessed exceeds the preset frequency, it indicates that the file to be accessed is a file that is frequently accessed.
  • the sub-files of the file to be accessed that are frequently accessed are saved to the processing node. In the cache, this not only improves the cache utilization of the processing node, but also improves the hit rate of the files to be accessed.
  • the present application provides a data access method, the method is executed by a processing node, the processing node is one of a plurality of processing nodes connected to a management server, and the plurality of processing nodes are connected to a storage server.
  • a reading task is received, and the reading task is a task sent by the management server when determining that the file to be accessed is cached in the caches of the multiple processing nodes, and the reading task includes a subfile in the file to be accessed
  • the identifier of the subfile and the identifier of the processing node where the subfile is located read the subfile from the processing node corresponding to the identifier of the processing node according to the identifier of the subfile; send the read subfile to the management server.
  • the processing node When the file to be accessed is cached in the caches of the multiple processing nodes, since the read task includes the identification of the processing node where the subfile is located, the processing node does not need to determine the processing node where the subfile is located, and directly based on the The sub-file reads the sub-file from the processing node where the sub-file is located according to the identifier of the processing node, which improves the efficiency of reading the sub-file.
  • a deletion task is received, the deletion task is a task sent by the management server when the access frequency of the file to be accessed is lower than a preset frequency, and the deletion task includes an identifier of a subfile of the file to be accessed; Delete the subfile corresponding to the identifier of the subfile.
  • the processing node can delete the sub-files belonging to the file to be accessed from its own cache, which can save more cache space for storing the files with high access frequency. Increasing the cache utilization of the processing node also improves the file hit rate.
  • the present application provides a data access device, which is used to execute the first aspect or the method in any one of the optional implementation manners of the first aspect.
  • the device includes a unit for executing the method in the first aspect or any one of the possible implementation manners of the first aspect.
  • this application provides a data access device, which is used to execute the second aspect or a method in an optional implementation of the second aspect.
  • the device includes a unit for executing the method in the second aspect or a possible implementation of the second aspect.
  • the device is configured to execute the method in the third aspect or an optional implementation manner of the third aspect.
  • the device includes a unit for executing the method in the third aspect or a possible implementation manner of the third aspect
  • the present application provides a data access device.
  • the device includes a processor, a memory, and a communication interface.
  • the processor is connected to the memory and the communication interface through a bus; the memory stores computer execution instructions,
  • the computer-executable instructions are executed by the processor to implement the operation steps of the first aspect or any one of the possible implementation manners of the first aspect.
  • the present application provides a data access device.
  • the device includes a processor, a memory, and a communication interface.
  • the processor is connected to the memory and the communication interface through a bus; the memory stores computer-executable instructions.
  • the computer-executable instructions are executed by the processor, and are used to execute the operation steps of the second aspect or a possible implementation of the second aspect, or are used to execute the third aspect or a possible implementation of the third aspect Steps to implement the method.
  • the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the methods described in the above aspects.
  • this application provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the methods described in the above aspects.
  • this application provides a data access system, which includes: a management server, a storage server, and multiple processing nodes.
  • the management server receives the file access request, and the file access request carries the identification of the file to be accessed; according to the identification of the file to be accessed, it is determined whether the file to be accessed is cached in the caches of the multiple processing nodes, and the multiple processes are stored in the management server.
  • the identifier of the file cached in the cache of the node; when the file to be accessed is not cached in the caches of the multiple processing nodes, the identifier of at least one subfile included in the file to be accessed and the storage where each subfile is located are obtained from the storage server
  • the processing node that receives the reading task reads the corresponding subfile from the storage server corresponding to the identifier of the storage server according to the identifier of the subfile in the received reading task, and sends the read subfile to the management server.
  • the management server receives the subfile read by the processing node that has received the read task.
  • each read task generated by the management server includes the identifier of a subfile and the identifier of the storage server where the subfile is located.
  • the processing node that receives the read task can directly read the subfile from the storage server according to the identifier of the storage server in the read task, and directly send the subfile to the management server after reading the subfile .
  • the sub-file will not be cached in the cache of the processing node first, and then the processing node will read the sub-file from its own cache and send it to the management server, thereby reducing the transmission path of the sub-file and improving Read the performance of this subfile.
  • the management server when the access frequency of the file to be accessed exceeds the preset frequency, sends the cache task to at least one processing node of the plurality of processing nodes.
  • the processing node receiving the cache task caches the sub-files included in the file to be accessed.
  • the management server also records the identification of the sub-file included in the identification of the file to be accessed and the identification of the processing node that caches each sub-file.
  • the management server When the file to be accessed is cached in the caches of the multiple processing nodes, the management server generates at least one reading task.
  • Each reading task includes the identifier of the subfile and the identifier of the processing node where the subfile is located; sending at least one read Get tasks to the multiple processing nodes.
  • the processing node that receives the reading task reads the subfile according to the identifier of the subfile in the received reading task and the identifier of the processing node where the subfile is located, and sends the read subfile to the management server.
  • the management server receives the subfile read by the processing node that has received the read task.
  • the access frequency of the file to be accessed exceeds the preset frequency, it indicates that the file to be accessed is a frequently accessed file. Due to the limited cache space in each processing node, the frequently accessed file to be accessed is saved to the at least one processing node In the cache, this not only improves the cache utilization of the processing node, but also improves the hit rate of the files to be accessed.
  • the generated read task includes the identification of the processing node where the subfile is located, so that the processing node that receives the cache task does not need to determine the processing node where the subfile is located. , Read the sub-file from the processing node where the sub-file is located directly according to the identifier of the processing node where the sub-file is located, which improves the efficiency of reading the sub-file.
  • Figure 1 is a schematic structural diagram of a data access system provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a client access data access system provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a data access method provided by an embodiment of the present application.
  • FIG. 4 is a flowchart of a method for caching files provided by an embodiment of the present application.
  • FIG. 5 is a flowchart of a method for deleting files provided by an embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of a data access device provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of another data access device provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of another data access device provided by an embodiment of the present application.
  • Fig. 9 is a schematic structural diagram of another data access device provided by an embodiment of the present application.
  • an embodiment of the present application provides a data access system.
  • the system includes: a management server 1, a plurality of processing nodes 2, at least one storage server 3, the management server 1 is connected to each processing node 2, and each processing Node 2 is also connected.
  • the management server 1 and each processing node 2 are connected to each storage server 3 through a network.
  • the management server 1 is used to decompose the received file access request into multiple tasks, which are sent to each processing node 2 respectively, and each processing node 2 accesses the file in the storage server 3, and each processing node 2 will read
  • the files returned to the management server 1 are returned to the client after the management server 1 integrates the files returned by the processing nodes 2 (not shown in the figure).
  • Each storage server 3 stores files for user access.
  • a file can be divided into multiple sub-files for storage.
  • the file may be a form in a database. Assuming that the form includes 100 records, three sub-files are used to save the form in the storage server 3, and the three sub-files are the first sub-file, the second sub-file, and the third sub-file, respectively.
  • the first subfile saves the records of Article 1 to Article 33 of the form
  • the second subfile saves the records of Article 34 to Article 66 of the form
  • the third subfile saves the records of Article 67 to Article 100 of the form. Records.
  • the management server 1 may cache each sub-file included in the file in one or more processing nodes 2 of the data access system. For each sub-file included in the file, when the management server 1 caches the sub-file to a processing node 2, the corresponding relationship between the identifier of the file, the identifier of the sub-file, and the identifier of the processing node 2 is stored in File list.
  • the processing node 2 includes a cache 21, and the management server 1 caches the sub-file in the cache 21 of the processing node 2.
  • the identifier of the form is ID1
  • the form includes the identifier of the first subfile, the identifier of the second subfile, and the third subfile.
  • the identifiers are file1, file2, and file3. It is assumed that the management server 1 caches the first sub-file, the second sub-file, and the third sub-file in the first processing node, the second processing node, and the third processing node, respectively.
  • the identifier of the processing node and the identifier of the third processing node are TE1, TE2, and TE3, respectively.
  • the management server 1 saves the identification ID1 of the form, the identification file1 of the first sub-file, and the identification TE1 of the first processing node in the file list shown in Table 1 below, and the identification ID1 of the form and the identification of the second sub-file are correspondingly saved.
  • File2 and the identification TE2 of the second processing node are correspondingly saved in the file list shown in Table 1 below, and the identification ID1 of the form, the identification file3 of the third subfile, and the identification TE3 of the third processing node are correspondingly saved in the following Table 1 List of files shown.
  • the user when the user needs to access the file to be accessed, the user can input the identification of the file to be accessed to the client 4.
  • the client 4 obtains the input identifier of the file to be accessed, and sends a file access request including the identifier of the file to be accessed to the management server 1.
  • the management server 1 receives the file access request, and determines whether the file to be accessed is cached in the processing node 2 of the data access system according to the file list and the identification of the file to be accessed included in the file access request. If the file to be accessed is cached in the processing node 2 of the data access system, the file to be accessed is obtained from the processing node 2 of the data access system. If the file to be accessed is not cached in the processing node 2 of the data access system, the processing node 2 is controlled to obtain the file to be accessed from the storage server 3 where the file to be accessed is located. Then send the file to be accessed to the client.
  • the identifier of the file may be the file name of the file, etc.
  • the identifier of the sub-file may be the storage path or file name of the sub-file in the storage server, etc.
  • the file list saves the identification of the file cached in the processing node 2 in the data access system, the identification of the subfile in the file, and the processing of the subfile. Correspondence between the identities of node 2.
  • the management server 1 receives the identification of the file to be accessed sent by the client 4, it can determine whether the file to be accessed is cached in the processing node 2 of the data access system according to the identification of the file to be accessed and the file list.
  • the processing node 2 is controlled to obtain the file to be accessed from the storage server 3 where the file to be accessed is located. Among them, under the control of the management server 1, the processing node 2 does not first read the file from the cache 21 of the processing node 2 of the data access system, but directly obtains the file to be accessed from the storage server 3, and then directly sends it to the management server 1.
  • the file will not be cached in the included cache 21, so that the file to be accessed does not need to be cached in the cache 21 of the processing node 2 before being processed
  • the node 2 reads out from its cache 21 and sends it to the management server 1, which improves the efficiency of accessing files.
  • the data access system is mainly used for accessing database data.
  • the following is to take access to the file in the storage server 3 as an example to introduce the file access method provided by the embodiment of the present invention.
  • an embodiment of the present application provides a file access method, which can be applied to the system described in FIG. 1, and includes:
  • Step 201 The management server 1 receives a file access request, and the file access request includes the identification of the file to be accessed.
  • the user logs in to the management server 1 through the client 4, and then the client 4 displays the interface provided by the management server 1.
  • the user can input a file access request through the interface provided by the management server 1.
  • the file access request may be a database access statement, and the database access statement may include the identification of the file to be accessed.
  • the database access statement may be a structured query language (SQL) access statement, and the file to be accessed may be a form in the database.
  • SQL structured query language
  • the management server 1 receives the SQL access statement, it extracts the identifiers of the two forms in the SQL access statement.
  • extracts the identifiers of two forms from the SQL access statement and the identifier of one form is the form “teacher”
  • the identifier of the other form is the identifier "people.id” of the form "people”.
  • the management server 1 also analyzes whether the statement format of the SQL access statement is correct, and if it is correct, execute step 202. If it is incorrect, the client 4 is fed back an alarm that the sentence is incorrect, and the client 4 receives the alarm and displays it to the user.
  • Step 202 The management server 1 determines whether a file to be accessed is cached in the processing node 2 in the data access system according to the identifier and the file list.
  • the file list is used to store the correspondence between the identifier of the file, the identifier of the sub-file, and the identifier of the processing node 2.
  • the management server 1 can cache files in the processing node 2 of the data access system.
  • the file includes multiple sub-files.
  • a sub-file included in the file is cached to a certain processing node 2, the correspondence between the identifier of the file, the identifier of the sub-file, and the identifier of the processing node 2 is stored in the file List.
  • the management server 1 may cache files whose access frequency exceeds a preset frequency in the processing node 2 of the data access system. And, delete files whose access frequency is lower than the preset frequency from the processing node 2 of the data access system.
  • the management server 1 caches the files whose access frequency exceeds the first preset frequency threshold in the most recent preset time period in the processing node 2 of the data access system. And, deleting from the processing node 2 of the data access system the files whose access frequency in the most recent preset period of time does not exceed the second preset frequency threshold.
  • the first preset frequency threshold is greater than or equal to the second preset frequency threshold.
  • the management server 1 saves a historical access record, and each record in the historical access record stores the identification and access time of the file that the user has accessed.
  • the management server 1 periodically or irregularly collects the access frequency of the file accessed in the latest preset time period in the historical access record, and when the access frequency of the file exceeds the first preset frequency threshold , The control processing node 2 obtains the file from the storage server 3 where the file is located and caches the file in the processing node 2 of the data access system.
  • the operations from 2021 to 2026 are:
  • the management server 1 selects the identifier of a file that does not exist in the file list from the historical access record, and calculates the access frequency of the file in the latest preset time period according to the historical access record and the identifier of the file.
  • the selected identifier is not in the file list, which means that the file corresponding to the identifier is not cached in the processing node 2 of the data access system.
  • the management server 1 may obtain the access time corresponding to the identifier of the file in the latest preset time period from the historical access record.
  • the management server 1 counts the number of access times obtained, and the counted number is equal to the number of accesses to the file, and obtains the access frequency of the file in the latest preset time period according to the number of accesses and the preset duration.
  • the management server 1 obtains the identification of the storage server 3 where the file is located and the identification of at least one sub-file included in the file according to the identification of the file.
  • the identifier may be the address of the storage server 3, for example, may be an internet protocol (IP) address for interconnection between networks of the storage server 3.
  • IP internet protocol
  • the technician can input the identification of the storage server 3 in the data access system to the management server 1 in advance.
  • the management server 1 can obtain the identification of each file saved by the storage server 3 according to the identification of the storage server 3, and save the identification of each acquired file and the identification of the storage server 3 to the correspondence between the identification of the file and the identification of the storage server .
  • the management server 1 may also obtain the identification of each sub-file included in the file from the storage server 3, and save the identification of the file and the obtained identification of each sub-file Correspondence between the ID of the file and the ID of the sub-file.
  • the management server 1 counts that the access frequency of a certain file exceeds the first preset frequency threshold, and can obtain the storage server where the file is located from the corresponding relationship between the file ID and the storage server ID according to the file ID. 3 logo.
  • the management server 1 saves the corresponding relationship between the identifier of the file and the identifier of the sub-file
  • the management server 1 obtains each sub-file included in the file from the corresponding relationship between the identifier of the file and the identifier of the sub-file according to the identifier of the file.
  • the management server 1 obtains the identifier of each subfile included in the file from the storage server 3 according to the identifier of the storage server 3.
  • the management server 1 generates at least one cache task, and each cache task includes an identifier of the storage server 3 and an identifier of a subfile in the file.
  • the management server 1 For each cache task, the management server 1 selects a processing node 2 and sends the cache task to the processing node 2.
  • the management server 1 may start traversal from the first cache task among the at least one cache task, and each time a cache task is traversed, a processing node 2 is selected, and then the one cache task is sent to the processing node 2. Then traverse the next cache task and repeat the above process until the last cache task is sent.
  • the management server 1 randomly selects a processing node 2 from the processing nodes 2 of the data access system.
  • the management server 1 may store a correspondence relationship between the identifier of the processing node 2 and the size of the free cache space, and the correspondence relationship stores the identifier of each processing node 2 in the data access system and the size of the free cache space.
  • the management server 1 can first select at least one processing node 2 with the largest amount of free cache space based on the corresponding relationship, and the number of the at least one processing node 2 is equal to the number of sub-files included in the file, and then each traverse to a cache task One processing node 2 is selected from the at least one processing node 2, and then the cache task is sent to the processing node 2.
  • the processing node 2 receives a cache task, and according to the identifier of the storage server 3 in the cache task, obtains the child file corresponding to the identifier of the child file in the cache task from the storage server 3, and the acquired child file Cached in its own cache 21.
  • the processing node 2 may also send a cache success message corresponding to the cache task to the management server 1.
  • processing node 2 may also obtain the remaining free cache space size of itself, and send the remaining free cache space size to the management server 1.
  • the management server 1 may correspondingly save the identification of the file, the identification of the subfile in the cache task, and the identification of the selected processing node 2 into the data list.
  • the management server 1 executes this step after selecting a processing node 2 for the caching task, or executes this step after receiving a caching success message corresponding to the caching task sent by the processing node 2.
  • the management server 1 may also receive the size of the remaining free cache space of the processing node 2, and update the size of the free cache space of the processing node 2 in the corresponding relationship between the identifier of the processing node 2 and the size of the free cache space to receive The size of the remaining free cache space.
  • the management server 1 also obtains the access frequency of each file cached in the processing node 2 of the data access system in the most recent preset time period, and deletes the access frequency from the processing node 2 of the data access system lower than the predetermined time period. Set the file with the second preset frequency threshold.
  • the management server 1 For the identifier of any file in the file list, the management server 1 counts the access frequency of the file in the latest preset time period according to the file identifier and historical access records.
  • the access time corresponding to the file is obtained in the historical access record according to the identifier of the file, and the number of access times in the latest preset time period is counted to obtain the number of accesses to the file, according to The number of visits obtains the access frequency of the file.
  • the management server 1 obtains, from the file list, the identifier of each sub-file included in the file and the identifier of the processing node 2 where each sub-file is located.
  • the management server 1 For each sub-file, the management server 1 sends a deletion task to the processing node 2 where the sub-file is located.
  • the deletion task includes the identifier of the sub-file, and then deletes the record including the identifier of the file from the file list.
  • the processing node 2 receives the deletion task, and deletes the subfile corresponding to the identifier of the subfile in the deletion task from its own cache 21.
  • the processing node 2 may also obtain the remaining free cache space size in its cache 21, and send the remaining free cache space size to the management server 1.
  • the management server 1 also receives the size of the remaining free cache space of the processing node 2, and updates the size of the free cache space of the processing node 2 to the received size in the corresponding relationship between the identifier of the processing node 2 and the size of the free cache space. The size of the remaining free cache space.
  • the management server 1 can query the file list according to the identifier of the file to be accessed. If the identifier of each subfile included in the file to be accessed and the identifier of the processing node 2 where each subfile is located is not found, then the data access is determined The processing node 2 of the system does not cache the file to be accessed. If the identification of each sub-file included in the file to be accessed and the identification of the processing node 2 where each sub-file is located are queried, it is determined that the file to be accessed is cached in the processing node 2 of the data access system.
  • the management server 1 may also use the current time as the access time of the file to be accessed, and may save the correspondence between the identification of the file to be accessed and the access time in the historical access record in.
  • Step 203 When the file to be accessed is not cached in the processing node 2 of the data access system, the management server 1 generates at least one first reading task, and each first reading task includes the address of the storage server 2 where the file to be accessed is located and The identifier of a sub-file in the file to be accessed.
  • the identifiers of the subfiles included in each first reading task are different.
  • the management server 1 may obtain the identification of the storage server 3 where the file to be accessed is located from the correspondence between the identification of the file and the identification of the storage server 3 according to the identification of the file to be accessed.
  • the management server 1 saves the corresponding relationship between the identification of the file and the identification of the sub-file
  • the management server 1 obtains at least the file to be accessed from the corresponding relationship between the identification of the file and the identification of the sub-file according to the identification of the file to be accessed.
  • An identification of a subfile generates at least one first reading task, and each first reading task includes an identification of the storage server 3 and an identification of a subfile in the file to be accessed.
  • the management server 1 obtains the identification of at least one subfile included in the file to be accessed from the storage server 3 according to the identification of the storage server 3, and generates At least one first reading task, and each first reading task includes the identifier of the storage server 3 and the identifier of a subfile in the file to be accessed.
  • the management server 1 may also count the access frequency of the file to be accessed in the latest preset time period.
  • each first read task generated Can also include caching instructions.
  • the cache indication is used to instruct the processing node 2 that has received the first reading task to cache the subfile of the file to be accessed when it obtains the subfile of the file to be accessed from the storage server 3 where the file to be accessed is located.
  • the management server 1 may obtain each access time corresponding to the file to be accessed from the access history record according to the identifier of the file to be accessed, and count the number of access times in the latest preset time period to obtain the file to be accessed The number of accesses, the number of times as the access frequency of the file to be accessed.
  • Step 204 For each first reading task in the at least one first reading task, the management server 1 selects a processing node 2 and sends the first reading task to the processing node 2.
  • the management server 1 may start traversing from the first first reading task of the at least one first reading task, and each time a first reading task is traversed, the processing node included in the data access system Select a processing node 2 in 2, and send the first reading task to the processing node 2. After sending the first reading task, the management server 1 traverses the next first reading task and repeats the above process until the last first reading task is sent.
  • a processing node can be selected from the processing nodes 2 of the data access system in the following two ways.
  • the two methods are:
  • the management server 1 can randomly select a processing node 2 from the processing nodes 2 of the data access system.
  • the management server 1 can select the processing node 2 with the least number of tasks currently processed from the processing nodes 2 of the data access system.
  • the management server 1 saves the correspondence between the identification of the processing node 2 and the number of tasks, and each record in the correspondence includes an identification of the processing node 2 and the number of tasks currently being processed by the processing node 2.
  • the management server 1 reads the number of tasks of each processing node 2 in the data access system from the corresponding relationship when selecting the processing node 2 and selects the processing node 2 with the least number of tasks.
  • the number of tasks of the processing node 2 is increased in the corresponding relationship.
  • the processing node 2 may be used as a file summary node, and then before sending the first reading task, the first reading task Add the ID of the summary node of the file.
  • the management server 1 may select one processing node 2 as the file summary node through the above method 1 or method 2, so that each generated first reading task includes the file The ID of the summary node.
  • the management server 1 also sends a summary task to the file summary node, where the summary task includes the number of sub-files in the file to be accessed.
  • the file summary node selected by the above method 1 or method 2 may be different from the processing node 2 selected by the management server 1 for each first reading task, or may be different from the processing node selected by the management server 1 for a certain first reading task 2 is the same.
  • the management server 1 selects a processing node 2 every time it traverses a first reading task.
  • the processing node 2 may be selected by the management server 1 multiple times, that is, multiple first reading tasks are sent to the processing node 2 at different times.
  • the management server 1 records the number of first read tasks allocated by the selected processing node 2, that is, saves the correspondence between the identifier of the selected processing node 2 and the number of first read tasks.
  • Step 205 The processing node 2 receives the first reading task, and obtains the subfile corresponding to the identifier included in the first reading task from the storage server 3 where the file to be accessed is located according to the first reading task, and sends it to the management server 1 Send the obtained subfile, and go to step 209.
  • the processing node 2 receives the first reading task, and the processing node 2 establishes a network connection between the processing node 2 and the storage server 3 according to the identifier of the storage server 3 included in the first reading task According to the identifier of the sub file included in the first reading task, the sub file is obtained from the storage server 3 through the network connection, and the sub file is sent to the management server 1.
  • the processing node 2 can determine that the processing node 2 included in the data access system does not cache the file to be accessed, so the processing node 2 uses the storage server 3
  • the identifier directly establishes a network connection between the processing node 2 and the storage server 3, obtains the sub-file from the storage server 3 and directly sends the sub-file to the management server 1.
  • the processing node 2 will not cache the sub-file in the cache 21 of the processing node 2 before sending the sub-file to the management server 1, so that the sub-file will not pass through the cache 21 of the processing node 2, reducing the number of sub-files.
  • the transmission path of the file improves the transmission efficiency of the sub-file.
  • the processing node 2 when the processing node 2 is not a document summary node, the processing node 2 sends the obtained child file to the document summary node according to the identification of the document summary node .
  • the processing node 2 also receives summary tasks and receives sub-files sent by other processing nodes 2, and the number of sub-files acquired by itself and the number of received sub-files reach the value in the summary task.
  • the sub-files obtained by itself and the received sub-files form the files to be accessed, and the files to be accessed are sent to the management server 1.
  • the document summary node receives the summary task and receives the child files sent by other processing nodes 2.
  • the received sub-files are formed into files to be accessed, and the files to be accessed are sent to the management server 1.
  • the processing node 2 when the processing node 2 obtains the sub file, it caches the obtained sub file in the cache 21 included in the processing node 2. After the processing node 2 sends the acquired sub-file to the management server 1, the acquired sub-file may be cached in the cache 21 included in the processing node 2. Alternatively, the processing node 2 may cache the acquired sub-files in the cache 21 included in the processing node 2 while sending the acquired sub-files to the management server 1.
  • Step 206 When it is found that the file to be accessed is cached in the processing node 2 in the data access system, the management server 1 generates at least one second reading task.
  • the management server 1 can query the identification of each subfile in the file to be accessed and the identification of the processing node 2 where each subfile is located from the file list.
  • Each second reading task includes the identifier of a sub-file in the file to be accessed and the identifier of the processing node 2 where the sub-file is located.
  • Step 207 For each second reading task in the at least one second reading task, the management server 1 selects a processing node 2 and sends the second reading task to the processing node 2.
  • the management server 1 may start traversing from the first second reading task of the at least one second reading task. Whenever traversing to a second reading task, start from the processing node included in the data access system. Select a processing node 2 in 2, and send the second reading task to the processing node 2. After sending the second reading task, the management server 1 traverses the next second reading task and repeats the above process until the last second reading task is sent.
  • one processing node 2 can be selected from the processing nodes 2 of the data access system through the above-mentioned method one or two.
  • the management server 1 records the number of second reading tasks allocated by the selected processing node 2, that is, saves the correspondence between the identification of the selected processing node 2 and the number of second reading tasks.
  • the following method 3 can also be used to select a processing node 3:
  • Manner 3 The management server 1 directly selects the processing node 2 corresponding to the identifier of the processing node 2 in the second reading task.
  • the processing node 2 may be used as the file summary node, and then before sending the second reading task, the second reading task Add the ID of the summary node of the file.
  • the management server 1 may also select one processing node 2 as the file summary node through the above method 1 or method 2 before generating at least one second reading task, so that each second reading task generated includes the file summary The ID of the node.
  • the management server 1 also sends a summary task to the file summary node, and the summary task includes the number of sub-files in the file to be accessed.
  • the file summary node selected by the above method 1 or method 2 may be different from the processing node 2 selected by the management server 1 for each second reading task, or may be the same as the processing selected by the management server 1 for a certain second reading task. Node 2 is the same.
  • the management server 1 selects a processing node 2 after each second reading task is traversed.
  • the processing node 2 may be selected by the management server 1 multiple times, that is, multiple second reading tasks are sent to the processing node 2 at different times.
  • Step 208 The processing node 2 receives the second reading task, obtains the sub file according to the identifier of the sub file included in the second reading task and the identifier of the processing node 2, and sends the obtained sub file to the management server 1.
  • the processing node 2 receives the second reading task, and the second reading task includes an identifier of a subfile and an identifier of the processing node 2. If the processing node 2 is the processing node 2 corresponding to the identifier of the processing node 2 in the second reading task, the processing node 2 obtains the corresponding subfile according to the identifier of the subfile in the second reading task. If the processing node 2 is not the processing node 2 corresponding to the identification of the processing node 2 in the second reading task, the processing node 2 reads from the second reading task according to the identification of the subfile in the second reading task. The corresponding subfile is obtained from the processing node 2 corresponding to the identifier of the processing node 2 in the task.
  • the processing node 2 when the processing node 2 is not a document summary node, the processing node 2 sends the obtained child file to the document summary node according to the identification of the document summary node .
  • the processing node 2 also receives summary tasks and receives sub-files sent by other processing nodes 2, and the sum of the number of sub-files acquired by itself and the number of received sub-files reaches the summary task
  • the sub-files obtained by itself and the received sub-files are formed into the files to be accessed, and the files to be accessed are sent to the management server 1.
  • the file summary node receives the summary task and receives the sub-files sent by other processing nodes 2, and then When the number of sub-files reaches the number of sub-files in the summary task, the received sub-files are grouped into files to be accessed, and the files to be accessed are sent to the management server 1.
  • Step 209 The management server 1 receives the sub-files sent by each processing node 2, obtains the file to be accessed, and sends the file to be accessed to the client 4.
  • the management server 1 integrates the received sub-files into a file to be accessed, and sends the file to be accessed to the client 4.
  • the first reading task or the second reading task also includes the identification of the file summary node.
  • the management server 1 receives the file to be accessed sent by the file summary node, and sends the file to be accessed to the client 4.
  • the management server 1 saves the corresponding relationship between the identifier of the processing node 2 and the number of tasks.
  • the number of tasks of the processing node 2 stored in the corresponding relationship between the identifier of the processing node 2 and the number of tasks is subtracted from the recorded first read task number or the second number of the processing node 2 The number of read tasks.
  • the management server 1 when the management server 1 determines that the file to be accessed is not cached in the processing node 2 of the data access system, it generates at least one first reading task, and each first reading task includes the storage where the file to be accessed is located.
  • the processing node 2 will not first access the cache 21 in the processing node 2 when receiving the first reading task, but can directly obtain the storage server 3 from the storage server 3 according to the identifier of the storage server 3 included in the first reading task. Sub-file, and then send the sub-file to the management server 1.
  • the processing node 2 will not cache the sub-file in the cache 21 of the processing node 2 before sending the sub-file to the management server 1, so that the sub-file can be Without passing through the cache 21 of the processing node 2, the transmission delay of the file to be accessed is reduced.
  • the generated second reading task includes the identification of the processing node 2 where the child file of the file to be accessed is located, so that the processing node 2 that receives the second reading task is convenient for The identification of the processing node 2 in the second reading task obtains the subfile, which improves the efficiency of accessing the file.
  • control Processing node 2 caches the file to be accessed.
  • the access frequency exceeds the first preset frequency threshold it indicates that the file to be accessed is a file that is frequently accessed recently, and the file to be accessed is saved in the cache 21 of the processing node 2 of the data access system, which not only improves the cache of the processing node 2
  • the utilization rate of 21 has also improved the file hit rate.
  • an embodiment of the present application provides an apparatus 300 for data access.
  • the apparatus 300 is deployed in the above-mentioned management server 1.
  • the apparatus 300 is connected to a plurality of processing nodes 2, and the plurality of processing nodes 2 are connected to a storage server. 3.
  • the device 300 includes:
  • the receiving unit 301 is configured to receive a file access request, and the file access request carries an identifier of the file to be accessed.
  • the processing unit 302 is configured to determine whether the file to be accessed is cached in the cache 21 of the multiple processing nodes 2 according to the identifier of the file to be accessed, and the device 300 stores the files cached in the cache 21 of the multiple processing nodes 2 Logo.
  • the processing unit 303 is further configured to, when the file to be accessed is not cached in the cache 21 of the multiple processing nodes 2, instruct at least one processing node 2 of the multiple processing nodes 2 to obtain the to-be-accessed file from the storage server 3. file.
  • the processing unit 302 determining whether the file to be accessed is cached in the cache 21 of the multiple processing nodes 2, refer to the relevant content in step 202 of the embodiment shown in FIG. 3, which will not be described in detail here. .
  • the apparatus 300 further includes: a first sending unit 302,
  • the processing unit 302 is configured to obtain the identification of at least one sub-file included in the file to be accessed and the identification of the storage server 3 where each sub-file is located from the storage server 3; generate a read file for each sub-file included in the file to be accessed For fetching tasks, each reading task includes an identification of a sub-file and an identification of the storage server 3 where the sub-file is located.
  • the first sending unit 303 is configured to send each reading task to a processing node 2 respectively, and instruct the processing node 2 that has received the reading task to read the sub-file from the storage server 3 that stores the sub-file.
  • the receiving unit 301 is configured to receive the subfile read by the processing node 2 that has received the read task.
  • the processing unit 302 is configured to merge the received sub-files into a file to be accessed.
  • the processing unit 302 generates a detailed implementation process of the reading task, which can be referred to related content in step 203 of the embodiment shown in FIG. 3.
  • related content in step 203 of the embodiment shown in FIG. 3.
  • the processing unit 302 generates a detailed implementation process of the reading task, which can be referred to related content in step 203 of the embodiment shown in FIG. 3.
  • the processing unit 302 generates a detailed implementation process of sending the reading task by the first sending unit 303, which will not be described in detail here.
  • the apparatus 300 further includes: a second sending unit 304,
  • the second sending unit 304 is configured to send a cache task to at least one processing node 2 of the plurality of processing nodes 2 when the access frequency of the file to be accessed exceeds the preset frequency, so as to instruct the at least one processing node 2 to send the file to be accessed
  • the included sub-files are cached to the at least one processing node 2.
  • the processing unit 302 is also used to record the sub-file identifier included in the identifier of the file to be accessed and the identifier of the processing node 2 that caches each sub-file; when the file to be accessed is cached in the cache 21 of the multiple processing nodes 2, generate At least one reading task, each reading task includes the identification of the subfile and the identification of the processing node 2 where the subfile is located.
  • the second sending unit 304 is further configured to send at least one reading task to multiple processing nodes 2 to instruct the multiple processing nodes 2 to read the sub-file from the cache 21 of the processing node 2 storing the sub-file.
  • the processing unit 302 is also used to synthesize the fetched sub-files into the file to be accessed.
  • the processing unit 302 generates a detailed implementation process of the reading task, which can be referred to related content in step 206 in the embodiment shown in FIG. 3.
  • the detailed implementation process of the second sending unit 304 sending the reading task please refer to the related content in step 207 in the embodiment shown in FIG. 3, which will not be described in detail here.
  • the second sending unit 304 is further configured to send a deletion task to the processing node 2 where the subfile included in the file to be accessed is located when the access frequency of the file to be accessed is lower than the preset frequency, and the deletion task includes the subfile.
  • the identifier of the file to instruct the processing node 2 to delete the sub-file.
  • the processing unit 302 is further configured to delete the identifier of the subfile and the identifier of the processing node 2 recorded in the device 300.
  • the processing unit 302 determines whether the file to be accessed is cached in the cache 21 of the multiple processing nodes 2 according to the identifier of the file to be accessed; when the file to be accessed is not cached in the cache of the multiple processing nodes 2 In step 21, at least one processing node 2 of the multiple processing nodes 2 is instructed to obtain the file to be accessed from the storage server 3. In this way, the at least one processing node 2 directly reads the file to be accessed from the storage server 3 according to the instruction of the processing unit 302, and directly returns the file to be accessed to the device 300 after reading the file to be accessed, and then returns the file to be accessed to the device 300.
  • the file to be accessed Before accessing the file, the file to be accessed will not be cached in the cache 21 of the at least one processing node 2. In this way, the file to be accessed does not need to pass through the cache 21 of the at least one processing node 2, thereby reducing the transmission path of the file to be accessed, reducing the path for data reading, and improving the performance of data access.
  • an embodiment of the present application provides an apparatus 400 for data access.
  • the apparatus 400 is deployed in the above-mentioned processing node 2.
  • the apparatus 400 is one of multiple processing nodes 2 connected to the management server 1.
  • Two processing nodes 2 are connected to the storage server 3.
  • the device 400 includes:
  • the receiving unit 401 is configured to receive a reading task, which is a task sent by the management server 1 when it determines that the file to be accessed is not cached in the cache 21 of the multiple processing nodes 2, and the reading task includes the file to be accessed.
  • a reading task which is a task sent by the management server 1 when it determines that the file to be accessed is not cached in the cache 21 of the multiple processing nodes 2, and the reading task includes the file to be accessed.
  • the processing unit 402 is configured to read the sub file from the storage server 3 corresponding to the identifier of the storage server 3 according to the identifier of the sub file.
  • the sending unit 403 is configured to send the read sub-file to the management server 1.
  • the receiving unit 401 is further configured to receive a cache task, which is a task sent by the management server 1 when the access frequency of the file to be accessed exceeds a preset frequency, and the cache task includes the status of a subfile of the file to be accessed.
  • the processing unit 402 is further configured to read the sub file from the storage server 3 corresponding to the identifier of the storage server 3 according to the identifier of the sub file; store the sub file in the cache 21 of the device 400.
  • the reading task received by the receiving unit 401 since the reading task received by the receiving unit 401 includes the identification of a sub-file in the file to be accessed and the identification of the storage server 3 where the sub-file is located;
  • the storage server 3 corresponding to the identifier of the storage server 3 reads the subfile;
  • the sending unit 403 sends the read subfile to the management server 1.
  • the processing unit 402 can directly read the sub-file from the storage server 3 according to the identifier of the storage server 3, and then the sending unit 403 directly returns the sub-file to the management server 1, and does not return the sub-file to the management server 1.
  • the processing unit 402 caches the sub-file in the cache 21 of the device 400. Therefore, the subfile returned to the management server 1 will not pass through the cache 21 of the device 400, reducing the transmission path of the subfile, reducing the data reading path, and improving the performance of data access.
  • FIG. 8 is a schematic diagram of a data access apparatus 500 provided by an embodiment of the application.
  • the device 500 includes at least one processor 501, a bus system 502, a memory 503, and a transceiver 504.
  • the device 500 is a device with a hardware structure, and can be used to implement the functional units in the device described in FIG. 6.
  • the processing unit 302 in the device 300 shown in FIG. 6 can be implemented by the at least one processor 501 calling the application code in the memory 503, and the receiving unit in the device 300 shown in FIG. 6 301.
  • the first sending unit 303 and the second sending unit 304 may be implemented by the transceiver 504.
  • the device 500 may also be used to implement the functions of the management server 1 in the embodiment described in FIG. 1 or FIG. 3.
  • the above-mentioned processor 501 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the computer Apply for integrated circuits for program execution.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • the above-mentioned bus system 502 may include a path for transferring information between the above-mentioned components.
  • the above-mentioned memory 503 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions.
  • the type of dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), or other optical disk storage, optical discs Storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by Any other medium accessed by the computer, but not limited to this.
  • the memory can exist independently and is connected to the processor through a bus.
  • the memory can also be integrated with the processor.
  • the memory 503 is used to store application program codes for executing the solutions of the present application, and the processor 501 controls the execution.
  • the processor 501 is configured to execute the application program code stored in the memory 503, so as to realize the functions in the method of the present patent.
  • the processor 501 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 5.
  • the apparatus 500 may include multiple processors, such as the processor 501 and the processor 508 in FIG. 5.
  • processors can be a single-CPU (single-CPU) processor or a multi-core (multi-CPU) processor.
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
  • FIG. 9 is a schematic diagram of a data access apparatus 600 provided by an embodiment of the application.
  • the device 700 includes at least one processor 601, a bus system 602, a memory 603, and a transceiver 604.
  • the memory 603 also includes a cache 21, and the cache 21 is used to store subfiles included in files whose access frequency exceeds a preset frequency.
  • the device 600 is a device with a hardware structure, and can be used to implement the functional units in the device described in FIG. 7.
  • the processing unit 402 in the device 400 shown in FIG. 7 can be implemented by calling the code in the memory 603 by the at least one processor 601.
  • the receiving unit 401 can be implemented by the transceiver 604.
  • the device 600 may also be used to implement the function of the processing node 2 in the embodiment described in FIG. 1 or FIG. 3.
  • the aforementioned processor 601 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the computer. Apply for integrated circuits for program execution.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • the above-mentioned bus system 602 may include a path for transferring information between the above-mentioned components.
  • the aforementioned memory 603 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), or other types that can store information and instructions.
  • the type of dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), or other optical disk storage, optical discs Storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by Any other medium accessed by the computer, but not limited to this.
  • the memory can exist independently and is connected to the processor through a bus.
  • the memory can also be integrated with the processor.
  • the memory 603 is used to store application program codes for executing the solutions of the present application, and the processor 601 controls the execution.
  • the processor 601 is configured to execute the application program code stored in the memory 603, so as to realize the functions in the method of the present patent.
  • the processor 601 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 7.
  • the apparatus 600 may include multiple processors, such as the processor 601 and the processor 608 in FIG. 7. Each of these processors can be a single-CPU (single-CPU) processor or a multi-core (multi-CPU) processor.
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method, apparatus and system for data access, which relate to the field of communications. The method is executed by a management server (1), the management server (1) is connected to a plurality of processing nodes (2), and the plurality of processing nodes (2) are connected to a storage server (3). The method comprises: receiving a file access request, the file access request carrying the identification of a file to be accessed; according to the identification of the file, determining whether the file is cached in caches of the plurality of processing nodes (2), the management server (1) storing the identification of files cached in the caches of the plurality of processing nodes (2); and when the file is not cached in the caches of the plurality of processing nodes (2), then instructing at least one processing node (2) among the plurality of processing nodes (2) to acquire the file from the storage server (3). The described method may reduce the path of data reading and improve the performance of data access.

Description

一种数据访问的方法、装置及系统Method, device and system for data access 技术领域Technical field
本申请涉及计算机领域,特别涉及一种数据访问的方法、装置及系统。This application relates to the computer field, and in particular to a method, device and system for data access.
背景技术Background technique
随着大数据时代的到来,企业数据规模不断扩张,如何对海量数据进行快速访问是企业面临的核心问题。With the advent of the era of big data, the scale of enterprise data continues to expand, and how to quickly access massive amounts of data is a core issue facing enterprises.
目前,为了提高数据的访存效率,企业一般都采用分布式系统,该分布式系统包括协调服务器、多个处理节点、及存储数据的存储服务器。在接收到客户端发送的访问请求时,协调服务器将所述访问请求分解为多个任务,分别下发至各个处理节点,由各个处理节点分别访问存储服务器中的数据,各处理节点读取的数据返回至协调服务器,协调服务器将各处理节点返回的数据整合后,再返回给客户端。At present, in order to improve the efficiency of data access and storage, enterprises generally adopt a distributed system. The distributed system includes a coordination server, multiple processing nodes, and a storage server that stores data. When receiving the access request sent by the client, the coordination server decomposes the access request into multiple tasks, which are sent to each processing node respectively, and each processing node accesses the data in the storage server, and each processing node reads the data in the storage server. The data is returned to the coordination server, and the coordination server integrates the data returned by each processing node, and then returns it to the client.
每个处理节点在接收到协调服务器发送的任务后,首先判断所接收的任务中要访问的数据是不是在该处理节点的缓存中,如果在缓存中,则直接从缓存中读取该数据,如果不在缓存中,就需要从存储服务器中读取要访问的数据至缓存,然后再从缓存中读取数据。可见,对于每个处理节点来说,如果所访问的数据在缓存中没有命中,则需要将存储服务器中的数据也读取到处理节点缓存,再将缓存中的数据读取至协调服务器,从而增加了数据读取的路径,从而影响数据访问的性能。After each processing node receives the task sent by the coordination server, it first determines whether the data to be accessed in the received task is in the cache of the processing node. If it is in the cache, it reads the data directly from the cache. If it is not in the cache, you need to read the data to be accessed from the storage server to the cache, and then read the data from the cache. It can be seen that for each processing node, if the accessed data does not hit the cache, it needs to read the data in the storage server to the processing node cache, and then read the data in the cache to the coordination server, so Increase the data read path, thereby affecting the performance of data access.
发明内容Summary of the invention
本申请提供了一种数据访问的方法、装置及系统,以减小数据读取的路径,提高数据访问的性能。所述技术方案如下:The present application provides a data access method, device and system to reduce the path of data reading and improve the performance of data access. The technical solution is as follows:
第一方面,本申请提供了一种数据访问的方法,该方法由管理服务器执行,管理服务器与多个处理节点连接,该多个处理节点连接至存储服务器,管理服务器中存储有该多个处理节点的缓存中缓存的文件的标识。在该方法中,接收文件访问请求,该文件访问请求中携带待访问文件的标识;根据待访问文件的标识确定待访问文件是否缓存在该多个处理节点的缓存中;当待访问文件没有缓存在该多个处理节点的缓存中,则指示该多个处理节点中的至少一个处理节点从存储服务器中获取待访问文件。由于当待访问文件没有缓存在该多个处理节点的缓存中,指示该至少一个处理节点从存储服务器中获取待访问文件。这样对于该至少一个处理节点根据管理服务器的指示直接从存储服务器中读取待访问文件,读取到待访问文件后直接向管理服务器返回待访问文件,在向管理服务器返回待访问文件之前不会将待访问文件缓存到该至少一个处理节点的缓存中。如此待访问文件不需要经过该至少一个处理节点的缓存,从而减小了待访问文件的传输路径,实现减小数据读取的路径,提高数据访问的性能。In the first aspect, this application provides a data access method, which is executed by a management server, the management server is connected to a plurality of processing nodes, the plurality of processing nodes are connected to a storage server, and the management server stores the plurality of processing nodes. The identifier of the file cached in the node's cache. In this method, a file access request is received, and the file access request carries the identification of the file to be accessed; according to the identification of the file to be accessed, it is determined whether the file to be accessed is cached in the caches of the multiple processing nodes; when the file to be accessed is not cached In the cache of the multiple processing nodes, at least one processing node of the multiple processing nodes is instructed to obtain the file to be accessed from the storage server. Because when the file to be accessed is not cached in the caches of the multiple processing nodes, the at least one processing node is instructed to obtain the file to be accessed from the storage server. In this way, the at least one processing node directly reads the file to be accessed from the storage server according to the instructions of the management server, and directly returns the file to be accessed to the management server after reading the file to be accessed, and does not return the file to be accessed to the management server. The file to be accessed is cached in the cache of the at least one processing node. In this way, the file to be accessed does not need to pass through the cache of the at least one processing node, thereby reducing the transmission path of the file to be accessed, reducing the path for data reading, and improving the performance of data access.
在一种可能的实现方式中,从存储服务器中获取待访问文件所包括的至少一个子文件的标识及每个子文件所在的存储服务器的标识;针对待访问文件所包括的每个子文件生成一个读取任务,每个读取任务中包括一个子文件的标识,及该子文件所在的存储服 务器的标识;将每个读取任务分别发送至一个处理节点,指示接收到读取任务的处理节点从存储该子文件的存储服务器中读取该子文件;接收接收到读取任务的处理节点读取的子文件;将该子文件合并为待访问文件。由于生成的读取任务中包括子文件所在的存储服务器的标识,这样接收到读取任务的处理节点可以直接根据该读取任务中的存储服务器的标识从该存储服务器中读取该子文件,以及在读取到该子文件后直接向管理服务器发送该子文件。这样该子文件不会被先缓存到该处理节点的缓存中,再由该处理节点从自身的缓存中读取该子文件并发送给管理服务器,从而减小了该子文件的传输路径,提高读取该子文件的性能。In a possible implementation, the identification of at least one sub-file included in the file to be accessed and the identification of the storage server where each sub-file is located are obtained from the storage server; a read file is generated for each sub-file included in the file to be accessed. Each read task includes the identification of a sub-file and the identification of the storage server where the sub-file is located; each read task is sent to a processing node, instructing the processing node that received the read task from The storage server storing the subfile reads the subfile; receives the subfile read by the processing node that receives the reading task; merges the subfile into the file to be accessed. Since the generated read task includes the identifier of the storage server where the subfile is located, the processing node that receives the read task can directly read the subfile from the storage server according to the identifier of the storage server in the read task, And after the sub file is read, the sub file is directly sent to the management server. In this way, the sub-file will not be cached in the cache of the processing node first, and then the processing node will read the sub-file from its own cache and send it to the management server, thereby reducing the transmission path of the sub-file and improving Read the performance of this subfile.
在另一种可能的实现方式中,当待访问文件的访问频率超过预设频率时,发送缓存任务至该多个处理节点的至少一个处理节点,以指示该至少一个处理节点将待访问文件所包括的子文件缓存至该至少一个处理节点;记录待访问文件的标识所包括的子文件标识及缓存每个子文件的处理节点的标识。当待访问文件缓存在该多个处理节点的缓存中时,生成至少一个读取任务,每个读取任务包括子文件的标识及该子文件所在的处理节点的标识;发送该至少一个读取任务至该多个处理节点,指示该多个处理节点从存储有该子文件的处理节点的缓存中读取该子文件;将所读取的子文件合成待访问文件。In another possible implementation manner, when the access frequency of the file to be accessed exceeds the preset frequency, the cache task is sent to at least one processing node of the multiple processing nodes to instruct the at least one processing node to place the file to be accessed. The included sub-files are cached to the at least one processing node; the identifier of the sub-file included in the identifier of the file to be accessed and the identifier of the processing node that caches each sub-file are recorded. When the file to be accessed is cached in the caches of the multiple processing nodes, at least one reading task is generated, and each reading task includes the identification of the subfile and the identification of the processing node where the subfile is located; sending the at least one reading Assigning tasks to the multiple processing nodes instructs the multiple processing nodes to read the subfile from the cache of the processing node storing the subfile; synthesize the read subfile into the file to be accessed.
在待访问文件的访问频率超过预设频率时,表明待访问文件是常被访问的文件,由于各处理节点中的缓存空间有限,将常被访问的待访问文件保存到该至少一个处理节点的缓存中,这样不仅提高处理节点的缓存利用率,还提高了待访问文件的命中率。当待访问文件缓存在该多个处理节点的缓存中时,生成的读取任务包括子文件所在的处理节点的标识,这样接收该缓存任务的处理节点不需要再确定该子文件所在的处理节点,直接根据该子文件所在的处理节点的标识从该子文件所在的处理节点中读取该子文件,提高了读取该子文件的效率。When the access frequency of the file to be accessed exceeds the preset frequency, it indicates that the file to be accessed is a frequently accessed file. Due to the limited cache space in each processing node, the frequently accessed file to be accessed is saved to the at least one processing node. In the cache, this not only improves the cache utilization of the processing node, but also increases the hit rate of the files to be accessed. When the file to be accessed is cached in the caches of the multiple processing nodes, the generated read task includes the identification of the processing node where the subfile is located, so that the processing node that receives the cache task does not need to determine the processing node where the subfile is located. , Read the sub-file from the processing node where the sub-file is located directly according to the identifier of the processing node where the sub-file is located, which improves the efficiency of reading the sub-file.
在另一种可能的实现方式中,当待访问文件的访问频率低于预设频率时,向待访问文件包括的子文件所在的处理节点发送删除任务,该删除任务包括该子文件的标识,以指示该处理节点删除该子文件;删除管理服务器中记录的该子文件的标识及该处理节点的标识。这样可以将访问频率较低的待访问文件从该多个处理节点的缓存中删除,可以节省出较多的缓存空间用于保存访问频率较高的文件,这样不仅提高该多个处理节点的缓存利用率,还提高了文件的命中率。In another possible implementation manner, when the access frequency of the file to be accessed is lower than the preset frequency, a deletion task is sent to the processing node where the subfile included in the file to be accessed is located, and the deletion task includes the identifier of the subfile, To instruct the processing node to delete the subfile; delete the identifier of the subfile and the identifier of the processing node recorded in the management server. In this way, files to be accessed with a low access frequency can be deleted from the caches of the multiple processing nodes, and more cache space can be saved for storing files with a high access frequency, which not only improves the caches of the multiple processing nodes Utilization rate has also improved the file hit rate.
第二方面,本申请提供了一种数据访问的方法,该方法由处理节点执行,该处理节点是与管理服务器连接的多个处理节点中的一个,该多个处理节点连接到存储服务器。在该方法中,接收读取任务,该读取任务是管理服务器在确定待访问文件没有缓存在该多个处理节点的缓存中时发送的任务,该读取任务包括待访问文件中的一个子文件的标识和该子文件所在的存储服务器的标识;根据该子文件的标识从该存储服务器的标识对应的存储服务器中读取该子文件;向管理服务器发送读取的子文件。由于接收的读取任务中包括子文件的标识和该子文件所在存储服务器的标识,这样处理节点可以根据该存储服务器的标识直接从该存储服务器中读取该子文件,然后直接返回给管理服务器,且处理节点在向管理服务器返回该子文件之前,不会将该子文件缓存到处理节点的缓存。所以返回给管理服务器的子文件不会经过处理节点的缓存,减小了该子文件的传输路径, 实现减小数据读取的路径,提高数据访问的性能。In a second aspect, the present application provides a data access method, which is executed by a processing node, which is one of a plurality of processing nodes connected to a management server, and the plurality of processing nodes are connected to a storage server. In this method, a reading task is received, and the reading task is a task sent by the management server when it determines that the file to be accessed is not cached in the caches of the multiple processing nodes, and the reading task includes a child in the file to be accessed. The identifier of the file and the identifier of the storage server where the subfile is located; read the subfile from the storage server corresponding to the identifier of the storage server according to the identifier of the subfile; send the read subfile to the management server. Since the received reading task includes the identifier of the subfile and the identifier of the storage server where the subfile is located, the processing node can directly read the subfile from the storage server according to the identifier of the storage server, and then directly return it to the management server , And the processing node will not cache the sub-file in the cache of the processing node before returning the sub-file to the management server. Therefore, the child file returned to the management server will not be cached by the processing node, reducing the transmission path of the child file, reducing the data reading path, and improving the performance of data access.
在一种可能的实现方式中,接收缓存任务,该缓存任务是管理服务器在待访问文件的访问频率超过预设频率时发送的任务,该缓存任务包括待访问文件的一个子文件的标识和该子文件所在的存储服务器的标识;根据该子文件的标识从该存储服务器的标识对应的存储服务器中读取该子文件;将该子文件存储在该处理节点的缓存中。在待访问文件的访问频率超过预设频率时,表明待访问文件是常被访问的文件,由于处理节点中的缓存空间有限,将常被访问的待访问文件的子文件保存到该处理节点的缓存中,这样不仅提高该处理节点的缓存利用率,还提高了待访问文件的命中率。In a possible implementation manner, a cache task is received. The cache task is a task sent by the management server when the access frequency of the file to be accessed exceeds a preset frequency. The cache task includes the identifier of a subfile of the file to be accessed and the The identifier of the storage server where the subfile is located; read the subfile from the storage server corresponding to the identifier of the storage server according to the identifier of the subfile; store the subfile in the cache of the processing node. When the access frequency of the file to be accessed exceeds the preset frequency, it indicates that the file to be accessed is a file that is frequently accessed. Due to the limited cache space in the processing node, the sub-files of the file to be accessed that are frequently accessed are saved to the processing node. In the cache, this not only improves the cache utilization of the processing node, but also improves the hit rate of the files to be accessed.
第三方面,本申请提供了一种数据访问的方法,该方法由处理节点执行,该处理节点是与管理服务器连接的多个处理节点中的一个,该多个处理节点连接到存储服务器。在该方法中,接收读取任务,该读取任务是管理服务器在确定待访问文件缓存在该多个处理节点的缓存中时发送的任务,该读取任务包括待访问文件中的一个子文件的标识和该子文件所在的处理节点的标识;根据该子文件的标识从该处理节点的标识对应的处理节点中读取该子文件;向管理服务器发送读取的子文件。当待访问文件缓存在该多个处理节点的缓存中时,由于该读取任务包括子文件所在的处理节点的标识,这样该处理节点不需要再确定该子文件所在的处理节点,直接根据该子文件据处理节点的标识从该子文件所在的处理节点中读取该子文件,提高了读取该子文件的效率。In a third aspect, the present application provides a data access method, the method is executed by a processing node, the processing node is one of a plurality of processing nodes connected to a management server, and the plurality of processing nodes are connected to a storage server. In this method, a reading task is received, and the reading task is a task sent by the management server when determining that the file to be accessed is cached in the caches of the multiple processing nodes, and the reading task includes a subfile in the file to be accessed The identifier of the subfile and the identifier of the processing node where the subfile is located; read the subfile from the processing node corresponding to the identifier of the processing node according to the identifier of the subfile; send the read subfile to the management server. When the file to be accessed is cached in the caches of the multiple processing nodes, since the read task includes the identification of the processing node where the subfile is located, the processing node does not need to determine the processing node where the subfile is located, and directly based on the The sub-file reads the sub-file from the processing node where the sub-file is located according to the identifier of the processing node, which improves the efficiency of reading the sub-file.
在一种可能的实现方式中,接收删除任务,该删除任务是管理服务器当待访问文件的访问频率低于预设频率时发送的任务,该删除任务包括待访问文件的一个子文件的标识;删除该子文件的标识对应的子文件。这样在待访问文件的访问频率较低时,该处理节点可以从自身的缓存中删除属于待访问文件的子文件,可以节省出较多的缓存空间用于保存访问频率较高的文件,这样不仅提高该处理节点的缓存利用率,还提高了文件的命中率。In a possible implementation manner, a deletion task is received, the deletion task is a task sent by the management server when the access frequency of the file to be accessed is lower than a preset frequency, and the deletion task includes an identifier of a subfile of the file to be accessed; Delete the subfile corresponding to the identifier of the subfile. In this way, when the access frequency of the file to be accessed is low, the processing node can delete the sub-files belonging to the file to be accessed from its own cache, which can save more cache space for storing the files with high access frequency. Increasing the cache utilization of the processing node also improves the file hit rate.
第四方面,本申请提供了一种数据访问的装置,所述装置用于执行第一方面或第一方面的任意一种可选的实现方式中的方法。具体地,所述装置包括用于执行第一方面或第一方面的任意一种可能的实现方式中所述方法的单元。In a fourth aspect, the present application provides a data access device, which is used to execute the first aspect or the method in any one of the optional implementation manners of the first aspect. Specifically, the device includes a unit for executing the method in the first aspect or any one of the possible implementation manners of the first aspect.
第五方面,本申请提供了一种数据访问的装置,所述装置用于执行第二方面或第二方面的一种可选的实现方式中的方法。具体地,所述装置包括用于执行第二方面或第二方面的一种可能的实现方式中所述方法的单元。或者,所述装置用于执行第三方面或第三方面的一种可选的实现方式中的方法。具体地,所述装置包括用于执行第三方面或第三方面的一种可能的实现方式中所述方法的单元In a fifth aspect, this application provides a data access device, which is used to execute the second aspect or a method in an optional implementation of the second aspect. Specifically, the device includes a unit for executing the method in the second aspect or a possible implementation of the second aspect. Or, the device is configured to execute the method in the third aspect or an optional implementation manner of the third aspect. Specifically, the device includes a unit for executing the method in the third aspect or a possible implementation manner of the third aspect
第六方面,本申请提供了一种数据访问的装置,所述装置包括:处理器,存储器和通信接口,所述处理器通过总线与所述存储器和通信接口连接;存储器存储有计算机执行指令,所述计算机执行指令由所述处理器执行,以实现第一方面或第一方面的任意一种可能的实现方式的方法的操作步骤。In a sixth aspect, the present application provides a data access device. The device includes a processor, a memory, and a communication interface. The processor is connected to the memory and the communication interface through a bus; the memory stores computer execution instructions, The computer-executable instructions are executed by the processor to implement the operation steps of the first aspect or any one of the possible implementation manners of the first aspect.
第七方面,本申请提供了一种数据访问的装置,所述装置包括:处理器,存储器和通信接口,处理器通过总线与存储器和通信接口连接;所述存储器存储有计算机执行指令,所述计算机执行指令由所述处理器执行,用于执行第二方面或第二方面的一种可能的实现方式的方法的操作步骤,或者,用于执行第三方面或第三方面的一种可能的实现方式的方法的操作步骤。In a seventh aspect, the present application provides a data access device. The device includes a processor, a memory, and a communication interface. The processor is connected to the memory and the communication interface through a bus; the memory stores computer-executable instructions. The computer-executable instructions are executed by the processor, and are used to execute the operation steps of the second aspect or a possible implementation of the second aspect, or are used to execute the third aspect or a possible implementation of the third aspect Steps to implement the method.
第八方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。In an eighth aspect, the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the methods described in the above aspects.
第九方面,本申请提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。In a ninth aspect, this application provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the methods described in the above aspects.
第十方面,本申请提供了一种数据访问的系统,该系统包括:管理服务器、存储服务器和多个处理节点。In a tenth aspect, this application provides a data access system, which includes: a management server, a storage server, and multiple processing nodes.
管理服务器接收文件访问请求,该文件访问请求中携带待访问文件的标识;根据待访问文件的标识确定待访问文件是否缓存在该多个处理节点的缓存中,管理服务器中存储有该多个处理节点的缓存中缓存的文件的标识;当待访问文件没有缓存在该多个处理节点的缓存中,从存储服务器中获取待访问文件所包括的至少一个子文件的标识及每个子文件所在的存储服务器的标识,针对待访问文件所包括的每个子文件生成一个读取任务,每个读取任务中包括一个子文件的标识及该子文件所在的存储服务器的标识;将每个读取任务分别发送至一个处理节点。接收到读取任务的处理节点根据接收的读取任务中的子文件的标识从该存储服务器的标识对应的存储服务器中读取对应的子文件,向管理服务器发送读取的子文件。管理服务器接收接收到读取任务的处理节点读取的子文件。The management server receives the file access request, and the file access request carries the identification of the file to be accessed; according to the identification of the file to be accessed, it is determined whether the file to be accessed is cached in the caches of the multiple processing nodes, and the multiple processes are stored in the management server. The identifier of the file cached in the cache of the node; when the file to be accessed is not cached in the caches of the multiple processing nodes, the identifier of at least one subfile included in the file to be accessed and the storage where each subfile is located are obtained from the storage server The identification of the server, a read task is generated for each sub-file included in the file to be accessed, and each read task includes the identification of a sub-file and the identification of the storage server where the sub-file is located; separate each read task Send to a processing node. The processing node that receives the reading task reads the corresponding subfile from the storage server corresponding to the identifier of the storage server according to the identifier of the subfile in the received reading task, and sends the read subfile to the management server. The management server receives the subfile read by the processing node that has received the read task.
由于当待访问文件没有缓存在该多个处理节点的缓存中,管理服务器生成的每个读取任务中包括一个子文件的标识及该子文件所在的存储服务器的标识。这样接收到读取任务的处理节点可以直接根据该读取任务中的存储服务器的标识从该存储服务器中读取该子文件,以及在读取到该子文件后直接向管理服务器发送该子文件。这样该子文件不会被先缓存到该处理节点的缓存中,再由该处理节点从自身的缓存中读取该子文件并发送给管理服务器,从而减小了该子文件的传输路径,提高读取该子文件的性能。Because when the file to be accessed is not cached in the caches of the multiple processing nodes, each read task generated by the management server includes the identifier of a subfile and the identifier of the storage server where the subfile is located. In this way, the processing node that receives the read task can directly read the subfile from the storage server according to the identifier of the storage server in the read task, and directly send the subfile to the management server after reading the subfile . In this way, the sub-file will not be cached in the cache of the processing node first, and then the processing node will read the sub-file from its own cache and send it to the management server, thereby reducing the transmission path of the sub-file and improving Read the performance of this subfile.
在一种可能的实现方式中,管理服务器当待访问文件的访问频率超过预设频率时,发送缓存任务至该多个处理节点的至少一个处理节点。接收缓存任务的处理节点缓存待访问文件包括的子文件。以及,管理服务器还记录待访问文件的标识所包括的子文件标识及缓存每个子文件的处理节点的标识。管理服务器当待访问文件缓存在该多个处理节点的缓存中时,生成至少一个读取任务,每个读取任务包括子文件的标识及该子文件所在的处理节点的标识;发送至少一个读取任务至该多个处理节点。接收到读取任务的处理节点根据接收的读取任务中的子文件的标识和该子文件所在的处理节点的标识读取子文件,向管理服务器发送读取的子文件。管理服务器接收接收到读取任务的处理节点读取的子文件。In a possible implementation manner, when the access frequency of the file to be accessed exceeds the preset frequency, the management server sends the cache task to at least one processing node of the plurality of processing nodes. The processing node receiving the cache task caches the sub-files included in the file to be accessed. And, the management server also records the identification of the sub-file included in the identification of the file to be accessed and the identification of the processing node that caches each sub-file. When the file to be accessed is cached in the caches of the multiple processing nodes, the management server generates at least one reading task. Each reading task includes the identifier of the subfile and the identifier of the processing node where the subfile is located; sending at least one read Get tasks to the multiple processing nodes. The processing node that receives the reading task reads the subfile according to the identifier of the subfile in the received reading task and the identifier of the processing node where the subfile is located, and sends the read subfile to the management server. The management server receives the subfile read by the processing node that has received the read task.
其中在待访问文件的访问频率超过预设频率时,表明待访问文件是常被访问的文件, 由于各处理节点中的缓存空间有限,将常被访问的待访问文件保存到该至少一个处理节点的缓存中,这样不仅提高处理节点的缓存利用率,还提高了待访问文件的命中率。当待访问文件缓存在该多个处理节点的缓存中时,生成的读取任务包括子文件所在的处理节点的标识,这样接收该缓存任务的处理节点不需要再确定该子文件所在的处理节点,直接根据该子文件所在的处理节点的标识从该子文件所在的处理节点中读取该子文件,提高了读取该子文件的效率。When the access frequency of the file to be accessed exceeds the preset frequency, it indicates that the file to be accessed is a frequently accessed file. Due to the limited cache space in each processing node, the frequently accessed file to be accessed is saved to the at least one processing node In the cache, this not only improves the cache utilization of the processing node, but also improves the hit rate of the files to be accessed. When the file to be accessed is cached in the caches of the multiple processing nodes, the generated read task includes the identification of the processing node where the subfile is located, so that the processing node that receives the cache task does not need to determine the processing node where the subfile is located. , Read the sub-file from the processing node where the sub-file is located directly according to the identifier of the processing node where the sub-file is located, which improves the efficiency of reading the sub-file.
附图说明Description of the drawings
图1是本申请实施例提供的一种数据访问系统结构示意图;Figure 1 is a schematic structural diagram of a data access system provided by an embodiment of the present application;
图2是本申请实施例提供的一种客户端访问数据访问系统的示意图;2 is a schematic diagram of a client access data access system provided by an embodiment of the present application;
图3是本申请实施例提供的一种数据访问的方法流程图;FIG. 3 is a flowchart of a data access method provided by an embodiment of the present application;
图4是本申请实施例提供的一种缓存文件的方法流程图;FIG. 4 is a flowchart of a method for caching files provided by an embodiment of the present application;
图5是本申请实施例提供的一种删除文件的方法流程图;FIG. 5 is a flowchart of a method for deleting files provided by an embodiment of the present application;
图6是本申请实施例提供的一种数据访问的装置结构示意图;Fig. 6 is a schematic structural diagram of a data access device provided by an embodiment of the present application;
图7是本申请实施例提供的另一种数据访问的装置结构示意图;FIG. 7 is a schematic structural diagram of another data access device provided by an embodiment of the present application;
图8是本申请实施例提供的另一种数据访问的装置结构示意图;FIG. 8 is a schematic structural diagram of another data access device provided by an embodiment of the present application;
图9是本申请实施例提供的另一种数据访问的装置结构示意图。Fig. 9 is a schematic structural diagram of another data access device provided by an embodiment of the present application.
具体实施方式detailed description
下面将结合附图对本申请实施方式作进一步地详细描述。The implementation manners of the present application will be described in further detail below in conjunction with the accompanying drawings.
参见图1,本申请实施例提供了一种数据访问系统,所述系统包括:管理服务器1、多个处理节点2、至少一个存储服务器3,管理服务器1与每个处理节点2相连,各处理节点2之间也相连。管理服务器1和每个处理节点2通过网络与各存储服务器3连接。Referring to Figure 1, an embodiment of the present application provides a data access system. The system includes: a management server 1, a plurality of processing nodes 2, at least one storage server 3, the management server 1 is connected to each processing node 2, and each processing Node 2 is also connected. The management server 1 and each processing node 2 are connected to each storage server 3 through a network.
所述管理服务器1用于将所接收到文件访问请求分解为多个任务,分别下发至各个处理节点2,由各个处理节点2分别访问存储服务器3中的文件,各处理节点2将读取的文件返回至管理服务器1,管理服务器1将各处理节点2返回的文件整合后,再返回给客户端(图未示)。The management server 1 is used to decompose the received file access request into multiple tasks, which are sent to each processing node 2 respectively, and each processing node 2 accesses the file in the storage server 3, and each processing node 2 will read The files returned to the management server 1 are returned to the client after the management server 1 integrates the files returned by the processing nodes 2 (not shown in the figure).
每个存储服务器3中存储有供用户访问的文件。在存储服务器3中一个文件可以分成多个子文件来保存。例如,所述文件可以为数据库中的一个表单。假设该表单包括100条记录,在存储服务器3中使用三个子文件来保存该表单,该三个子文件分别为第一子文件、第二子文件和第三子文件。第一子文件保存该表单的第1条至第33条的记录,第二子文件保存该表单的第34条至第66条的记录,第三子文件保存该表单的第67条至第100条记录。Each storage server 3 stores files for user access. In the storage server 3, a file can be divided into multiple sub-files for storage. For example, the file may be a form in a database. Assuming that the form includes 100 records, three sub-files are used to save the form in the storage server 3, and the three sub-files are the first sub-file, the second sub-file, and the third sub-file, respectively. The first subfile saves the records of Article 1 to Article 33 of the form, the second subfile saves the records of Article 34 to Article 66 of the form, and the third subfile saves the records of Article 67 to Article 100 of the form. Records.
对于存储服务器3中的任一个文件,管理服务器1可以将该文件包括的各子文件缓存到数据访问系统的一个或多个处理节点2中。对于该文件包括的每个子文件,管理服务器1在将该子文件缓存到一处理节点2时,将该文件的标识、该子文件的标识和该处理节点2的标识之间的对应关系保存在文件列表中。For any file in the storage server 3, the management server 1 may cache each sub-file included in the file in one or more processing nodes 2 of the data access system. For each sub-file included in the file, when the management server 1 caches the sub-file to a processing node 2, the corresponding relationship between the identifier of the file, the identifier of the sub-file, and the identifier of the processing node 2 is stored in File list.
可选的,该处理节点2中包括缓存21,管理服务器1将该子文件缓存到处理节点2的缓存21中。Optionally, the processing node 2 includes a cache 21, and the management server 1 caches the sub-file in the cache 21 of the processing node 2.
管理服务器1向数据访问系统的处理节点2中缓存文件包括的各子文件的详细实现过程,可以参见后续图3所示实施例中的相关内容,在此先不做详细说明。The detailed implementation process of each sub-file included in the cache file in the processing node 2 of the data access system by the management server 1 can be referred to related content in the subsequent embodiment shown in FIG. 3, which will not be described in detail here.
关于上述的文件列表,接下来举个例子对其说明,对于上述的表单,假设该表单的标识为ID1,该表单包括的第一子文件的标识、第二子文件的标识和第三子文件的标识分别为file1、file2和file3。假设管理服务器1将第一子文件、第二子文件和第三子文件分别缓存在第一处理节点、第二处理节点和第三处理节点中,以及,假设第一处理节点的标识、第二处理节点的标识和第三处理节点的标识分别为TE1、TE2和TE3。管理服务器1将该表单的标识ID1、第一子文件的标识file1和第一处理节点的标识TE1对应保存在如下表1所示的文件列表,将该表单的标识ID1、第二子文件的标识file2和第二处理节点的标识TE2对应保存在如下表1所示的文件列表,以及将该表单的标识ID1、第三子文件的标识file3和第三处理节点的标识TE3对应保存在如下表1所示的文件列表。Regarding the above-mentioned file list, let’s take an example to explain it. For the above-mentioned form, suppose the identifier of the form is ID1, and the form includes the identifier of the first subfile, the identifier of the second subfile, and the third subfile. The identifiers are file1, file2, and file3. It is assumed that the management server 1 caches the first sub-file, the second sub-file, and the third sub-file in the first processing node, the second processing node, and the third processing node, respectively. The identifier of the processing node and the identifier of the third processing node are TE1, TE2, and TE3, respectively. The management server 1 saves the identification ID1 of the form, the identification file1 of the first sub-file, and the identification TE1 of the first processing node in the file list shown in Table 1 below, and the identification ID1 of the form and the identification of the second sub-file are correspondingly saved. File2 and the identification TE2 of the second processing node are correspondingly saved in the file list shown in Table 1 below, and the identification ID1 of the form, the identification file3 of the third subfile, and the identification TE3 of the third processing node are correspondingly saved in the following Table 1 List of files shown.
表1Table 1
文件的标识Identification of the file 子文件的标识ID of the subfile 处理节点的标识ID of the processing node
ID1ID1 file1file1 TE1TE1
ID1ID1 file2file2 TE2TE2
ID1ID1 file3file3 TE3TE3
……... ……... ……...
可选的,参见图2,用户需要访问待访问文件时,可以向客户端4输入待访问文件的标识。客户端4获取输入的待访问文件的标识,向管理服务器1发送包括待访问文件的标识的文件访问请求。Optionally, referring to FIG. 2, when the user needs to access the file to be accessed, the user can input the identification of the file to be accessed to the client 4. The client 4 obtains the input identifier of the file to be accessed, and sends a file access request including the identifier of the file to be accessed to the management server 1.
管理服务器1接收该文件访问请求,根据该文件列表和该文件访问请求包括的待访问文件的标识,确定待访问文件是否缓存在数据访问系统的处理节点2中。如果待访问文件缓存在数据访问系统的处理节点2中,则从数据访问系统的处理节点2中获取待访问文件。如果待访问文件没有缓存在数据访问系统的处理节点2中,则控制处理节点2从待访问文件所在的存储服务器3中获取待访问文件。然后向客户端发送待访问文件。The management server 1 receives the file access request, and determines whether the file to be accessed is cached in the processing node 2 of the data access system according to the file list and the identification of the file to be accessed included in the file access request. If the file to be accessed is cached in the processing node 2 of the data access system, the file to be accessed is obtained from the processing node 2 of the data access system. If the file to be accessed is not cached in the processing node 2 of the data access system, the processing node 2 is controlled to obtain the file to be accessed from the storage server 3 where the file to be accessed is located. Then send the file to be accessed to the client.
管理服务器1获取待访问文件的详细实现过程参见后续图3所示的实施例中的相关内容,在此不再详细说明。For the detailed implementation process of the management server 1 obtaining the file to be accessed, please refer to the related content in the embodiment shown in FIG. 3, which will not be described in detail here.
可选的,文件的标识可以为该文件的文件名等,子文件的标识可以为该子文件在存储服务器中的存储路径或文件名等。Optionally, the identifier of the file may be the file name of the file, etc., and the identifier of the sub-file may be the storage path or file name of the sub-file in the storage server, etc.
在本申请实施例中,由于管理服务器1保存有文件列表,该文件列表保存了数据访问系统中的处理节点2中缓存的文件的标识、该文件中的子文件的标识和该子文件所在处理节点2的标识之间的对应关系。这样管理服务器1在接收到客户端4发送的待访问文件的标识时,可以根据待访问文件的标识和该文件列表,确定待访问文件是否缓存在数据访问系统的处理节点2中。在待访问文件没有缓存在数据访问系统的处理节点2的情况,则控制处理节点2从待访问文件所在的存储服务器3中获取待访问文件。其中,处理节点2在管理服务器1的控制下不会先从数据访问系统的处理节点2的缓存21中读取文件,而是直接从存储服务器3中获取待访问文件,然后直接向管理服务器1发送获取的文件,且在向管理服务器1发送该文件之前不会将该文件缓存到其包括的缓存21中,这样待访问文件不需要先缓存到该处理节点2的缓存21中,再由处理节点2从其缓存21中读出并发送到管理服务器1,提高了访问文件的效率。In the embodiment of the present application, since the management server 1 saves a file list, the file list saves the identification of the file cached in the processing node 2 in the data access system, the identification of the subfile in the file, and the processing of the subfile. Correspondence between the identities of node 2. In this way, when the management server 1 receives the identification of the file to be accessed sent by the client 4, it can determine whether the file to be accessed is cached in the processing node 2 of the data access system according to the identification of the file to be accessed and the file list. In the case that the file to be accessed is not cached in the processing node 2 of the data access system, the processing node 2 is controlled to obtain the file to be accessed from the storage server 3 where the file to be accessed is located. Among them, under the control of the management server 1, the processing node 2 does not first read the file from the cache 21 of the processing node 2 of the data access system, but directly obtains the file to be accessed from the storage server 3, and then directly sends it to the management server 1. Send the acquired file, and before sending the file to the management server 1, the file will not be cached in the included cache 21, so that the file to be accessed does not need to be cached in the cache 21 of the processing node 2 before being processed The node 2 reads out from its cache 21 and sends it to the management server 1, which improves the efficiency of accessing files.
本发明实施例中,所述数据访问系统主要用于数据库数据的访问,下面即以访问所述存储服务器3中文件为例介绍本发明实施例所提供的文件访问方法。In the embodiment of the present invention, the data access system is mainly used for accessing database data. The following is to take access to the file in the storage server 3 as an example to introduce the file access method provided by the embodiment of the present invention.
参见图3,本申请实施例提供了一种文件访问的方法,所述方法可以应用于图1所述的系统,包括:Referring to FIG. 3, an embodiment of the present application provides a file access method, which can be applied to the system described in FIG. 1, and includes:
步骤201:管理服务器1接收文件访问请求,该文件访问请求包括待访问文件的标识。Step 201: The management server 1 receives a file access request, and the file access request includes the identification of the file to be accessed.
用户通过客户端4登录至管理服务器1,然后客户端4显示管理服务器1提供的界面。用户可以通过管理服务器1提供的该界面输入文件访问请求。该文件访问请求可以为数据库访问语句,该数据库访问语句可以包括待访问文件的标识。该数据库访问语句可以为结构化访问语言(structured query language,SQL)访问语句,待访问文件可以为数据库中的表单。The user logs in to the management server 1 through the client 4, and then the client 4 displays the interface provided by the management server 1. The user can input a file access request through the interface provided by the management server 1. The file access request may be a database access statement, and the database access statement may include the identification of the file to be accessed. The database access statement may be a structured query language (SQL) access statement, and the file to be accessed may be a form in the database.
例如,假设用户通过客户端4输入SQL访问语句为“select name from teacher join people on teacher.id=people.id”,该SQL访问语句用于请求访问两个表单,其中一个表单的名称为“teacher”,该表单的标识为“teacher.id”,另一个表单的名称为“people”,该另一个表单的标识为“people.id”。管理服务器1在接收到该SQL访问语句时,提取该SQL访问语句中的该两个表单的标识,例如,在该SQL访问语句中提取两个表单的标识,其中一个表单的标识为表单“teacher”的标识“teacher.id”,另一个表单的标识为表单“people”的标识“people.id”。For example, suppose that the user enters the SQL access statement "select name from teacher join people on teacher.id=people.id" through the client 4, and the SQL access statement is used to request access to two forms, and the name of one form is "teacher ", the identifier of this form is "teacher.id", the name of another form is "people", and the identifier of this other form is "people.id". When the management server 1 receives the SQL access statement, it extracts the identifiers of the two forms in the SQL access statement. For example, extracts the identifiers of two forms from the SQL access statement, and the identifier of one form is the form "teacher" The identifier of "teacher.id", and the identifier of the other form is the identifier "people.id" of the form "people".
可选的,管理服务器1还会分析该SQL访问语句的语句格式是否正确,如果正确,则执行步骤202。如果不正确,则向客户端4反馈语句不正确的告警,客户端4接收告警并显示给用户。Optionally, the management server 1 also analyzes whether the statement format of the SQL access statement is correct, and if it is correct, execute step 202. If it is incorrect, the client 4 is fed back an alarm that the sentence is incorrect, and the client 4 receives the alarm and displays it to the user.
步骤202:管理服务器1根据该标识和文件列表确定数据访问系统中的处理节点2中是否缓存有待访问文件。Step 202: The management server 1 determines whether a file to be accessed is cached in the processing node 2 in the data access system according to the identifier and the file list.
参见上述表1所示,该文件列表用于保存文件的标识、子文件的标识与处理节点2的标识的对应关系。As shown in Table 1 above, the file list is used to store the correspondence between the identifier of the file, the identifier of the sub-file, and the identifier of the processing node 2.
管理服务器1可以向数据访问系统的处理节点2中缓存文件。该文件包括多个子文件,在向某个处理节点2缓存该文件包括的一子文件时,将该文件的标识、该子文件的标识和该处理节点2的标识之间的对应关系保存在文件列表中。The management server 1 can cache files in the processing node 2 of the data access system. The file includes multiple sub-files. When a sub-file included in the file is cached to a certain processing node 2, the correspondence between the identifier of the file, the identifier of the sub-file, and the identifier of the processing node 2 is stored in the file List.
可选的,管理服务器1可以将访问频率超过预设频率的文件缓存到数据访问系统的处理节点2中。以及,从数据访问系统的处理节点2中删除访问频率低于预设频率的文件。Optionally, the management server 1 may cache files whose access frequency exceeds a preset frequency in the processing node 2 of the data access system. And, delete files whose access frequency is lower than the preset frequency from the processing node 2 of the data access system.
可选的,管理服务器1将在最近预设时长的时间段内的访问频率超过第一预设频率阈值的文件缓存在数据访问系统的处理节点2中。以及,从数据访问系统的处理节点2中删除在最近预设时长的时间段的访问频率未超过第二预设频率阈值的文件。第一预设频率阈值大于或等于第二预设频率阈值。Optionally, the management server 1 caches the files whose access frequency exceeds the first preset frequency threshold in the most recent preset time period in the processing node 2 of the data access system. And, deleting from the processing node 2 of the data access system the files whose access frequency in the most recent preset period of time does not exceed the second preset frequency threshold. The first preset frequency threshold is greater than or equal to the second preset frequency threshold.
管理服务器1中保存有历史访问记录,该历史访问记录中的每条记录保存了用户已访问的文件的标识和访问时间。The management server 1 saves a historical access record, and each record in the historical access record stores the identification and access time of the file that the user has accessed.
可选的,管理服务器1定期或不定期地在历史访问记录中统计在最近的预设时长的时间段内被访问的文件的访问频率,在该文件的访问频率超过第一预设频率阈值时,控制处理节点2从该文件所在的存储服务器3中获取该文件并将该文件缓存在数据访问系统的处理节点2中。Optionally, the management server 1 periodically or irregularly collects the access frequency of the file accessed in the latest preset time period in the historical access record, and when the access frequency of the file exceeds the first preset frequency threshold , The control processing node 2 obtains the file from the storage server 3 where the file is located and caches the file in the processing node 2 of the data access system.
参见图4,在实现时,可以通过如下2021至2026的操作来实现。该2021至2026 的操作分别为:Referring to FIG. 4, in implementation, it can be implemented through the following operations from 2021 to 2026. The operations from 2021 to 2026 are:
2021:管理服务器1从历史访问记录中选择文件列表中不存在的一个文件的标识,根据该历史访问记录和该文件的标识,统计该文件在最近的预设时长的时间段内的访问频率。2021: The management server 1 selects the identifier of a file that does not exist in the file list from the historical access record, and calculates the access frequency of the file in the latest preset time period according to the historical access record and the identifier of the file.
选择的标识不在文件列表中,表示该标识对应的文件没有被缓存在数据访问系统的处理节点2中。The selected identifier is not in the file list, which means that the file corresponding to the identifier is not cached in the processing node 2 of the data access system.
在本步骤中,管理服务器1可以从历史访问记录获取该文件的标识对应的在最近的预设时长的时间段内的访问时间。管理服务器1统计获取的访问时间的数目,统计的数目等于该文件的访问次数,根据该访问次数和预设时长得到该文件在最近的预设时长的时间段内的访问频率。In this step, the management server 1 may obtain the access time corresponding to the identifier of the file in the latest preset time period from the historical access record. The management server 1 counts the number of access times obtained, and the counted number is equal to the number of accesses to the file, and obtains the access frequency of the file in the latest preset time period according to the number of accesses and the preset duration.
2022:管理服务器1在该访问频率超过第一预设频率阈值时,根据该文件的标识获取该文件所在的存储服务器3的标识和该文件包括的至少一个子文件的标识,该存储服务器3的标识可以为为存储服务器3的地址,例如可以是该存储服务器3的网络之间互连的协议(internet protocol,IP)地址。2022: When the access frequency exceeds the first preset frequency threshold, the management server 1 obtains the identification of the storage server 3 where the file is located and the identification of at least one sub-file included in the file according to the identification of the file. The identifier may be the address of the storage server 3, for example, may be an internet protocol (IP) address for interconnection between networks of the storage server 3.
技术人员可以事先向管理服务器1输入数据访问系统中的存储服务器3的标识。管理服务器1可以根据存储服务器3的标识,获取该存储服务器3保存的各文件的标识,将获取的各文件的标识和该存储服务器3的标识保存到文件的标识与存储服务器的标识的对应关系。The technician can input the identification of the storage server 3 in the data access system to the management server 1 in advance. The management server 1 can obtain the identification of each file saved by the storage server 3 according to the identification of the storage server 3, and save the identification of each acquired file and the identification of the storage server 3 to the correspondence between the identification of the file and the identification of the storage server .
可选的,对于该存储服务器3保存的各文件,管理服务器1还可以从该存储服务器3中获取该文件包括的各子文件的标识,将该文件的标识和获取的各子文件的标识保存在文件的标识与子文件的标识的对应关系。Optionally, for each file saved by the storage server 3, the management server 1 may also obtain the identification of each sub-file included in the file from the storage server 3, and save the identification of the file and the obtained identification of each sub-file Correspondence between the ID of the file and the ID of the sub-file.
在本步骤中,管理服务器1统计出某个文件的访问频率超过第一预设频率阈值,可以根据该文件的标识,从文件的标识与存储服务器的标识的对应关系中获取该文件所在存储服务器3的标识。在管理服务器1保存有文件的标识与子文件的标识的对应关系的情况,管理服务器1根据该文件的标识,从文件的标识与子文件的标识的对应关系中获取该文件包括的每个子文件的标识。在管理服务器1没有保存文件的标识与子文件的标识的对应关系的情况,管理服务器1根据该存储服务器3的标识从该存储服务器3中获取该文件包括的每个子文件的标识。In this step, the management server 1 counts that the access frequency of a certain file exceeds the first preset frequency threshold, and can obtain the storage server where the file is located from the corresponding relationship between the file ID and the storage server ID according to the file ID. 3 logo. In the case that the management server 1 saves the corresponding relationship between the identifier of the file and the identifier of the sub-file, the management server 1 obtains each sub-file included in the file from the corresponding relationship between the identifier of the file and the identifier of the sub-file according to the identifier of the file. Logo. In the case that the management server 1 does not save the correspondence between the identifier of the file and the identifier of the subfile, the management server 1 obtains the identifier of each subfile included in the file from the storage server 3 according to the identifier of the storage server 3.
2023:管理服务器1生成至少一个缓存任务,每个缓存任务包括该存储服务器3的标识和该文件中的一个子文件的标识。2023: The management server 1 generates at least one cache task, and each cache task includes an identifier of the storage server 3 and an identifier of a subfile in the file.
2024:对于每个缓存任务,管理服务器1选择一个处理节点2并向该处理节点2发送该缓存任务。2024: For each cache task, the management server 1 selects a processing node 2 and sends the cache task to the processing node 2.
管理服务器1可以从该至少一个缓存任务中的第一个缓存任务开始遍历,每遍历到一个缓存任务,选择一个处理节点2,然后向该处理节点2发送该一个缓存任务。然后再遍历下一个缓存任务,重复上述过程,直到发送完最后一个缓存任务为止。The management server 1 may start traversal from the first cache task among the at least one cache task, and each time a cache task is traversed, a processing node 2 is selected, and then the one cache task is sent to the processing node 2. Then traverse the next cache task and repeat the above process until the last cache task is sent.
可选的,管理服务器1在数据访问系统的处理节点2中随机选择一个处理节点2。或者,Optionally, the management server 1 randomly selects a processing node 2 from the processing nodes 2 of the data access system. or,
可选的,管理服务器1中可以保存有处理节点2的标识与空闲缓存空间大小的对应关系,该对应关系中保存了数据访问系统中的每个处理节点2的标识和空闲缓存空间大小。这样管理服务器1可以先基于该对应关系选择空闲缓存空间大小最大的至少一个处理节点2,该至少一个处理节点2的数目等于该文件中包括的子文件数目,然后在每遍 历到一个缓存任务中从该至少一个处理节点2中选择一个处理节点2,再向该处理节点2发送该缓存任务。Optionally, the management server 1 may store a correspondence relationship between the identifier of the processing node 2 and the size of the free cache space, and the correspondence relationship stores the identifier of each processing node 2 in the data access system and the size of the free cache space. In this way, the management server 1 can first select at least one processing node 2 with the largest amount of free cache space based on the corresponding relationship, and the number of the at least one processing node 2 is equal to the number of sub-files included in the file, and then each traverse to a cache task One processing node 2 is selected from the at least one processing node 2, and then the cache task is sent to the processing node 2.
2025:该处理节点2接收一个缓存任务,根据该缓存任务中的该存储服务器3的标识,从该存储服务器3中获取该缓存任务中的子文件的标识对应的子文件,将获取的子文件缓存在自身的缓存21中。2025: The processing node 2 receives a cache task, and according to the identifier of the storage server 3 in the cache task, obtains the child file corresponding to the identifier of the child file in the cache task from the storage server 3, and the acquired child file Cached in its own cache 21.
可选的,该处理节点2还可以向管理服务器1发送与该缓存任务相对应的缓存成功消息。Optionally, the processing node 2 may also send a cache success message corresponding to the cache task to the management server 1.
可选的,该处理节点2在缓存该子文件后,还可以获取自身的剩余空闲缓存空间大小,向管理服务器1发送该剩余空闲缓存空间大小。Optionally, after the processing node 2 caches the subfile, it may also obtain the remaining free cache space size of itself, and send the remaining free cache space size to the management server 1.
2026:管理服务器1可以将该文件的标识、该缓存任务中的子文件的标识和选择的处理节点2的标识对应保存到数据列表中。2026: The management server 1 may correspondingly save the identification of the file, the identification of the subfile in the cache task, and the identification of the selected processing node 2 into the data list.
可选的,管理服务器1在为该缓存任务选择一个处理节点2之后执行本步骤,或者,在接收到该处理节点2发送的与该缓存任务相对应的缓存成功消息后执行本步骤。Optionally, the management server 1 executes this step after selecting a processing node 2 for the caching task, or executes this step after receiving a caching success message corresponding to the caching task sent by the processing node 2.
可选的,管理服务器1还可以接收该处理节点2的剩余空闲缓存空间大小,在处理节点2的标识与空闲缓存空间大小的对应关系中将该处理节点2的空闲缓存空间大小分别更新为接收的剩余空闲缓存空间大小。Optionally, the management server 1 may also receive the size of the remaining free cache space of the processing node 2, and update the size of the free cache space of the processing node 2 in the corresponding relationship between the identifier of the processing node 2 and the size of the free cache space to receive The size of the remaining free cache space.
可选的,管理服务器1还获取数据访问系统的处理节点2中缓存的各文件在最近的预设时长的时间段内的访问频率,从数据访问系统的处理节点2中删除访问频率低于预设第二预设频率阈值的文件。Optionally, the management server 1 also obtains the access frequency of each file cached in the processing node 2 of the data access system in the most recent preset time period, and deletes the access frequency from the processing node 2 of the data access system lower than the predetermined time period. Set the file with the second preset frequency threshold.
参见图5,在实现时,可以通过如下2121至2123的操作来实现。该2121至2123的操作分别为:Referring to Fig. 5, in implementation, it can be implemented through the following operations 2121 to 2123. The operations from 2121 to 2123 are:
2121:对于该文件列表中的任一个文件的标识,管理服务器1根据该文件的标识和历史访问记录统计在最近的预设时长的时间段内该文件的访问频率。2121: For the identifier of any file in the file list, the management server 1 counts the access frequency of the file in the latest preset time period according to the file identifier and historical access records.
在实现时,根据该文件的标识在该历史访问记录中获取该文件对应的各访问时间,统计在最近的预设时长的时间段内的各访问时间的数目,得到该文件的访问次数,根据该访问次数得到该文件的访问频率。In implementation, the access time corresponding to the file is obtained in the historical access record according to the identifier of the file, and the number of access times in the latest preset time period is counted to obtain the number of accesses to the file, according to The number of visits obtains the access frequency of the file.
2122:管理服务器1在该文件的访问频率低于第二预设频率阈值时,从该文件列表中获取该文件包括的各子文件的标识和各子文件所在的处理节点2的标识。2122: When the access frequency of the file is lower than the second preset frequency threshold, the management server 1 obtains, from the file list, the identifier of each sub-file included in the file and the identifier of the processing node 2 where each sub-file is located.
2123:对于每个子文件,管理服务器1向该子文件所在的处理节点2发送删除任务,该删除任务包括该子文件的标识,然后从该文件列表中删除包括该文件的标识的记录。2123: For each sub-file, the management server 1 sends a deletion task to the processing node 2 where the sub-file is located. The deletion task includes the identifier of the sub-file, and then deletes the record including the identifier of the file from the file list.
该处理节点2接收该删除任务,从自身的缓存21中删除该删除任务中的子文件的标识对应的子文件。The processing node 2 receives the deletion task, and deletes the subfile corresponding to the identifier of the subfile in the deletion task from its own cache 21.
可选的,该处理节点2在删除该子文件后,还可以获取其缓存21中的剩余空闲缓存空间大小,向管理服务器1发送该剩余空闲缓存空间大小。Optionally, after deleting the subfile, the processing node 2 may also obtain the remaining free cache space size in its cache 21, and send the remaining free cache space size to the management server 1.
可选的,管理服务器1还接收该处理节点2的剩余空闲缓存空间大小,在处理节点2的标识与空闲缓存空间大小的对应关系中将该处理节点2的空闲缓存空间大小更新为接收的该剩余空闲缓存空间大小。Optionally, the management server 1 also receives the size of the remaining free cache space of the processing node 2, and updates the size of the free cache space of the processing node 2 to the received size in the corresponding relationship between the identifier of the processing node 2 and the size of the free cache space. The size of the remaining free cache space.
由于数据访问系统的处理节点2中保存在最近的预设时长的时间段内的访问频率超过第一预设频率阈值的文件,这样在访问文件时可以提高处理节点2中缓存的每个文件的命中率。Since the files in the processing node 2 of the data access system whose access frequency exceeds the first preset frequency threshold in the most recent preset period of time, when accessing the files, the cost of each file cached in the processing node 2 can be improved. Hit rate.
上述只是本申请中列举的向数据访问系统的处理节点2中缓存文件的一种实现示 例,以从数据访问系统的处理节点2中淘汰文件的一种实现示例。对于其他向数据访问系统的处理节点2中缓存文件的实现方式,以及其他从数据访问系统的处理节点2中淘汰文件的实现方式也可应用本申请,在此不再一一列举。The foregoing is only an implementation example of caching files in the processing node 2 of the data access system listed in this application, so as to eliminate files from the processing node 2 of the data access system. For other implementations of caching files in the processing node 2 of the data access system, and other implementations of eliminating files from the processing node 2 of the data access system, this application can also be applied, which will not be listed here.
在本步骤中,管理服务器1可以根据待访问文件的标识查询该文件列表,如果没有查询出待访问文件包括的各子文件的标识和各子文件所在的处理节点2的标识,则确定数据访问系统的处理节点2中没有缓存待访问文件。如果查询出待访问文件包括的各子文件的标识和各子文件所在的处理节点2的标识,则确定数据访问系统的处理节点2中缓存有待访问文件。In this step, the management server 1 can query the file list according to the identifier of the file to be accessed. If the identifier of each subfile included in the file to be accessed and the identifier of the processing node 2 where each subfile is located is not found, then the data access is determined The processing node 2 of the system does not cache the file to be accessed. If the identification of each sub-file included in the file to be accessed and the identification of the processing node 2 where each sub-file is located are queried, it is determined that the file to be accessed is cached in the processing node 2 of the data access system.
可选的,管理服务器1在接收到该文件访问请求后,还可以将当前时间作为待访问文件的访问时间,可以将待访问文件的标识和该访问时间之间的对应关系保存在历史访问记录中。Optionally, after receiving the file access request, the management server 1 may also use the current time as the access time of the file to be accessed, and may save the correspondence between the identification of the file to be accessed and the access time in the historical access record in.
步骤203:在数据访问系统的处理节点2中没有缓存待访问文件时,管理服务器1生成至少一个第一读取任务,每个第一读取任务包括待访问文件所在的存储服务器2的地址和待访问文件中的一个子文件的标识。Step 203: When the file to be accessed is not cached in the processing node 2 of the data access system, the management server 1 generates at least one first reading task, and each first reading task includes the address of the storage server 2 where the file to be accessed is located and The identifier of a sub-file in the file to be accessed.
每个第一读取任务包括的子文件的标识不同。The identifiers of the subfiles included in each first reading task are different.
在本步骤中,管理服务器1可以根据待访问文件的标识从文件的标识与存储服务器3的标识的对应关系中获取待访问文件所在的存储服务器3的标识。In this step, the management server 1 may obtain the identification of the storage server 3 where the file to be accessed is located from the correspondence between the identification of the file and the identification of the storage server 3 according to the identification of the file to be accessed.
在管理服务器1保存有文件的标识与子文件的标识的对应关系的情况,管理服务器1根据待访问文件的标识,从文件的标识与子文件的标识的对应关系中获取待访问文件包括的至少一个子文件的标识,生成至少一个第一读取任务,每个第一读取任务包括该存储服务器3的标识和待访问文件中的一个子文件的标识。In the case that the management server 1 saves the corresponding relationship between the identification of the file and the identification of the sub-file, the management server 1 obtains at least the file to be accessed from the corresponding relationship between the identification of the file and the identification of the sub-file according to the identification of the file to be accessed. An identification of a subfile generates at least one first reading task, and each first reading task includes an identification of the storage server 3 and an identification of a subfile in the file to be accessed.
在管理服务器1没有保存文件的标识与子文件的标识的对应关系的情况,管理服务器1根据该存储服务器3的标识从该存储服务器3中获取待访问文件包括的至少一个子文件的标识,生成至少一个第一读取任务,每个第一读取任务包括该存储服务器3的标识和待访问文件中的一个子文件的标识。In the case that the management server 1 does not save the corresponding relationship between the identification of the file and the identification of the subfile, the management server 1 obtains the identification of at least one subfile included in the file to be accessed from the storage server 3 according to the identification of the storage server 3, and generates At least one first reading task, and each first reading task includes the identifier of the storage server 3 and the identifier of a subfile in the file to be accessed.
可选的,管理服务器1还可以统计待访问文件在最近的预设时长的时间段内的访问频率,在该访问频率为超过第一预设频率阈值时,生成的每个第一读取任务还可以包括缓存指示。该缓存指示用于指示接收到第一读取任务的处理节点2在从待访问文件所在存储服务器3中获取到待访问文件的子文件时,缓存该子文件。Optionally, the management server 1 may also count the access frequency of the file to be accessed in the latest preset time period. When the access frequency exceeds the first preset frequency threshold, each first read task generated Can also include caching instructions. The cache indication is used to instruct the processing node 2 that has received the first reading task to cache the subfile of the file to be accessed when it obtains the subfile of the file to be accessed from the storage server 3 where the file to be accessed is located.
可选的,管理服务器1可以根据待访问文件的标识从该访问历史记录中获取待访问文件对应的各访问时间,统计在最近的预设时长的时间段内的访问时间的数目得到待访问文件被访问的次数,将该次数作为待访问文件的访问频率。Optionally, the management server 1 may obtain each access time corresponding to the file to be accessed from the access history record according to the identifier of the file to be accessed, and count the number of access times in the latest preset time period to obtain the file to be accessed The number of accesses, the number of times as the access frequency of the file to be accessed.
步骤204:对于该至少一个第一读取任务中的每个第一读取任务,管理服务器1选择一个处理节点2并向该处理节点2发送该第一读取任务。Step 204: For each first reading task in the at least one first reading task, the management server 1 selects a processing node 2 and sends the first reading task to the processing node 2.
在本步骤中,管理服务器1可以从该至少一个第一读取任务的第一个第一读取任务开始遍历,每当遍历到一个第一读取任务时,从数据访问系统包括的处理节点2中选择一个处理节点2,向该处理节点2发送该第一读取任务。在发送完该第一读取任务时,管理服务器1再遍历下一个第一读取任务,再重复上述过程,直到发送完最后一个第一读取任务时为止。In this step, the management server 1 may start traversing from the first first reading task of the at least one first reading task, and each time a first reading task is traversed, the processing node included in the data access system Select a processing node 2 in 2, and send the first reading task to the processing node 2. After sending the first reading task, the management server 1 traverses the next first reading task and repeats the above process until the last first reading task is sent.
可选的,可以通过如下两种方式从数据访问系统的处理节点2中选择一个处理节点。该两种方式分别为:Optionally, a processing node can be selected from the processing nodes 2 of the data access system in the following two ways. The two methods are:
方式一,管理服务器1可以从数据访问系统的处理节点2中随机选择一个处理节点2。Manner 1: The management server 1 can randomly select a processing node 2 from the processing nodes 2 of the data access system.
方式二,管理服务器1可以从数据访问系统的处理节点2中选择当前处理的任务数目最少的一个处理节点2。Manner 2: The management server 1 can select the processing node 2 with the least number of tasks currently processed from the processing nodes 2 of the data access system.
在方式二中,管理服务器1中保存有处理节点2的标识与任务数目的对应关系,该对应关系中的每条记录包括一个处理节点2的标识和该处理节点2当前正在处理的任务数目。In the second manner, the management server 1 saves the correspondence between the identification of the processing node 2 and the number of tasks, and each record in the correspondence includes an identification of the processing node 2 and the number of tasks currently being processed by the processing node 2.
这样管理服务器1在选择处理节点2时从该对应关系中读取该数据访问系统中的每个处理节点2的任务数目,选择任务数目最少的一个处理节点2。In this way, the management server 1 reads the number of tasks of each processing node 2 in the data access system from the corresponding relationship when selecting the processing node 2 and selects the processing node 2 with the least number of tasks.
在方式二中,当选择任务数目最少的一个处理节点2后,在该对应关系中增加该处理节点2的任务数目。In the second method, after selecting the processing node 2 with the least number of tasks, the number of tasks of the processing node 2 is increased in the corresponding relationship.
可选的,管理服务器1在给第一个第一读取任务选择处理节点2时,可以将该处理节点2作为文件汇总节点,然后在发送第一读取任务之前,在第一读取任务中添加该文件汇总节点的标识。或者,管理服务器1也可以在生成该至少一个第一读取任务之前,通过上述方式一或方式二选择一个处理节点2作为文件汇总节点,这样生成的每个第一读取任务中包括该文件汇总节点的标识。Optionally, when the management server 1 selects the processing node 2 for the first first reading task, the processing node 2 may be used as a file summary node, and then before sending the first reading task, the first reading task Add the ID of the summary node of the file. Alternatively, before generating the at least one first reading task, the management server 1 may select one processing node 2 as the file summary node through the above method 1 or method 2, so that each generated first reading task includes the file The ID of the summary node.
可选的,管理服务器1还向文件汇总节点发送汇总任务,该汇总任务包括待访问文件中的子文件数目。Optionally, the management server 1 also sends a summary task to the file summary node, where the summary task includes the number of sub-files in the file to be accessed.
通过上述方式一或方式二选择的文件汇总节点可能与管理服务器1为每个第一读取任务选择的处理节点2不同,也可能与管理服务器1为某个第一读取任务选择的处理节点2相同。The file summary node selected by the above method 1 or method 2 may be different from the processing node 2 selected by the management server 1 for each first reading task, or may be different from the processing node selected by the management server 1 for a certain first reading task 2 is the same.
可选的,管理服务器1每遍历到一个第一读取任务后,便选择一个处理节点2。对于某个处理节点2,该处理节点2可能被管理服务器1选择多次,即在不同时间向该处理节点2发送多个第一读取任务。Optionally, the management server 1 selects a processing node 2 every time it traverses a first reading task. For a certain processing node 2, the processing node 2 may be selected by the management server 1 multiple times, that is, multiple first reading tasks are sent to the processing node 2 at different times.
在使用方式二时,管理服务器1记录被选择的处理节点2分配的第一读取任务数目,即保存被选择的处理节点2的标识和第一读取任务数目的对应关系。In the second mode of use, the management server 1 records the number of first read tasks allocated by the selected processing node 2, that is, saves the correspondence between the identifier of the selected processing node 2 and the number of first read tasks.
步骤205:该处理节点2接收该第一读取任务,根据该第一读取任务从待访问文件所在的存储服务器3中获取该第一读取任务包括的标识对应的子文件,向管理服务器1发送获取的子文件,执行步骤209。Step 205: The processing node 2 receives the first reading task, and obtains the subfile corresponding to the identifier included in the first reading task from the storage server 3 where the file to be accessed is located according to the first reading task, and sends it to the management server 1 Send the obtained subfile, and go to step 209.
在本步骤中,该处理节点2接收该第一读取任务,该处理节点2根据该第一读取任务包括的存储服务器3的标识建立该处理节点2与该存储服务器3之间的网络连接,根据该第一读取任务包括的子文件的标识通过该网络连接从该存储服务器3中获取该子文件,向管理服务器1发送该子文件。In this step, the processing node 2 receives the first reading task, and the processing node 2 establishes a network connection between the processing node 2 and the storage server 3 according to the identifier of the storage server 3 included in the first reading task According to the identifier of the sub file included in the first reading task, the sub file is obtained from the storage server 3 through the network connection, and the sub file is sent to the management server 1.
可选的,由于第一读取任务包括的存储服务器3的标识,这样处理节点2可以确定出数据访问系统包括的处理节点2中没有缓存待访问文件,所以该处理节点2根据该存储服务器3的标识直接建立该处理节点2与该存储服务器3之间的网络连接,从该存储服务器3中获取子文件并直接向管理服务器1发送该子文件。该处理节点2在向管理服务器1发送该子文件之前不会将该子文件缓存到该处理节点2的缓存21中,这样该子文件不会经过该处理节点2的缓存21,减少了该子文件的传输路径,提高了该子文件的传输效率。Optionally, due to the identification of the storage server 3 included in the first reading task, the processing node 2 can determine that the processing node 2 included in the data access system does not cache the file to be accessed, so the processing node 2 uses the storage server 3 The identifier directly establishes a network connection between the processing node 2 and the storage server 3, obtains the sub-file from the storage server 3 and directly sends the sub-file to the management server 1. The processing node 2 will not cache the sub-file in the cache 21 of the processing node 2 before sending the sub-file to the management server 1, so that the sub-file will not pass through the cache 21 of the processing node 2, reducing the number of sub-files. The transmission path of the file improves the transmission efficiency of the sub-file.
可选的,如果第一读取任务中还包括文件汇总节点的标识,在该处理节点2不是文 件汇总节点时,该处理节点2根据该文件汇总节点的标识向文件汇总节点发送获取的子文件。在该处理节点2是文件汇总节点时,该处理节点2还接收汇总任务,以及接收其他处理节点2发送的子文件,在自身获取的子文件数目和接收的子文件数目达到该汇总任务中的子文件数目时,将自身获取的子文件和接收的子文件组成待访问文件,向管理服务器1发送待访问文件。Optionally, if the first reading task also includes the identification of the document summary node, when the processing node 2 is not a document summary node, the processing node 2 sends the obtained child file to the document summary node according to the identification of the document summary node . When the processing node 2 is a document summary node, the processing node 2 also receives summary tasks and receives sub-files sent by other processing nodes 2, and the number of sub-files acquired by itself and the number of received sub-files reach the value in the summary task. In the case of the number of sub-files, the sub-files obtained by itself and the received sub-files form the files to be accessed, and the files to be accessed are sent to the management server 1.
可选的,在文件汇总节点与管理服务器1为每个第一读取任务选择的处理节点2不同的情况,该文件汇总节点接收汇总任务,以及接收其他处理节点2发送的子文件,在接收的子文件数目达到该汇总任务包括的子文件数目时,将接收的子文件组成待访问文件,向管理服务器1发送待访问文件。Optionally, when the document summary node and the processing node 2 selected by the management server 1 for each first reading task are different, the document summary node receives the summary task and receives the child files sent by other processing nodes 2. When the number of sub-files reaches the number of sub-files included in the summary task, the received sub-files are formed into files to be accessed, and the files to be accessed are sent to the management server 1.
可选的,如果第一读取任务中还包括缓存指示,该处理节点2在获取到子文件时,将获取的子文件缓存在处理节点2包括的缓存21中。该处理节点2在向管理服务器1发送获取的子文件之后,可以将获取的子文件缓存在该处理节点2包括的缓存21中。或者,该处理节点2在向管理服务器1发送获取的子文件的同时,可以将获取的子文件缓存在该处理节点2包括的缓存21中。Optionally, if the first reading task further includes a cache instruction, when the processing node 2 obtains the sub file, it caches the obtained sub file in the cache 21 included in the processing node 2. After the processing node 2 sends the acquired sub-file to the management server 1, the acquired sub-file may be cached in the cache 21 included in the processing node 2. Alternatively, the processing node 2 may cache the acquired sub-files in the cache 21 included in the processing node 2 while sending the acquired sub-files to the management server 1.
步骤206:在查询出数据访问系统中的处理节点2中缓存有待访问文件时,管理服务器1生成至少一个第二读取任务。Step 206: When it is found that the file to be accessed is cached in the processing node 2 in the data access system, the management server 1 generates at least one second reading task.
在数据访问系统中的处理节点2中缓存有待访问文件时,管理服务器1可以从文件列表中查询出待访问文件中的每个子文件的标识和每个子文件所在的处理节点2的标识。When a file to be accessed is cached in the processing node 2 in the data access system, the management server 1 can query the identification of each subfile in the file to be accessed and the identification of the processing node 2 where each subfile is located from the file list.
每个第二读取任务包括待访问文件中的一个子文件的标识和该子文件所在的处理节点2的标识。Each second reading task includes the identifier of a sub-file in the file to be accessed and the identifier of the processing node 2 where the sub-file is located.
步骤207:对于该至少一个第二读取任务中的每个第二读取任务,管理服务器1选择一个处理节点2并向该处理节点2发送该第二读取任务。Step 207: For each second reading task in the at least one second reading task, the management server 1 selects a processing node 2 and sends the second reading task to the processing node 2.
在本步骤中,管理服务器1可以从该至少一个第二读取任务的第一个第二读取任务开始遍历,每当遍历到一个第二读取任务时,从数据访问系统包括的处理节点2中选择一个处理节点2,向该处理节点2发送该第二读取任务。在发送完该第二读取任务时,管理服务器1再遍历下一个第二读取任务,再重复上述过程,直到发送完最后一个第二读取任务时为止。In this step, the management server 1 may start traversing from the first second reading task of the at least one second reading task. Whenever traversing to a second reading task, start from the processing node included in the data access system. Select a processing node 2 in 2, and send the second reading task to the processing node 2. After sending the second reading task, the management server 1 traverses the next second reading task and repeats the above process until the last second reading task is sent.
可选的,可以通过上述方式一或方式二从数据访问系统的处理节点2中选择一个处理节点2。Optionally, one processing node 2 can be selected from the processing nodes 2 of the data access system through the above-mentioned method one or two.
在使用方式二时,管理服务器1记录被选择的处理节点2分配的第二读取任务数目,即保存被选择的处理节点2的标识和第二读取任务数目的对应关系。In the second method of use, the management server 1 records the number of second reading tasks allocated by the selected processing node 2, that is, saves the correspondence between the identification of the selected processing node 2 and the number of second reading tasks.
除了上述方式一和方式二外,还可以采用如下方式三来选择一个处理节点3:In addition to the above method 1 and method 2, the following method 3 can also be used to select a processing node 3:
方式三、管理服务器1直接选择该第二读取任务中的处理节点2的标识对应的处理节点2。Manner 3: The management server 1 directly selects the processing node 2 corresponding to the identifier of the processing node 2 in the second reading task.
可选的,管理服务器1在给第一个第二读取任务选择处理节点2时,可以将该处理节点2作为文件汇总节点,然后在发送第二读取任务之前,在第二读取任务中添加该文件汇总节点的标识。或者,管理服务器1也可以在生成至少一个第二读取任务之前,通过上述方式一或方式二选择一个处理节点2作为文件汇总节点,这样生成的每个第二读取任务中包括该文件汇总节点的标识。Optionally, when the management server 1 selects the processing node 2 for the first second reading task, the processing node 2 may be used as the file summary node, and then before sending the second reading task, the second reading task Add the ID of the summary node of the file. Alternatively, the management server 1 may also select one processing node 2 as the file summary node through the above method 1 or method 2 before generating at least one second reading task, so that each second reading task generated includes the file summary The ID of the node.
可选的,管理服务器1还向文件汇总节点发送汇总任务,该汇总任务包括待访问文 件中的子文件数目。Optionally, the management server 1 also sends a summary task to the file summary node, and the summary task includes the number of sub-files in the file to be accessed.
在通过上述方式一或方式二选择的文件汇总节点可能与管理服务器1为每个第二读取任务选择的处理节点2不同,也可能与管理服务器1为某个第二读取任务选择的处理节点2相同。The file summary node selected by the above method 1 or method 2 may be different from the processing node 2 selected by the management server 1 for each second reading task, or may be the same as the processing selected by the management server 1 for a certain second reading task. Node 2 is the same.
可选的,管理服务器1每遍历到一个第二读取任务后,便选择一个处理节点2。对于某个处理节点2,该处理节点2可能被管理服务器1选择多次,即在不同时间向该处理节点2发送多个第二读取任务。Optionally, the management server 1 selects a processing node 2 after each second reading task is traversed. For a certain processing node 2, the processing node 2 may be selected by the management server 1 multiple times, that is, multiple second reading tasks are sent to the processing node 2 at different times.
步骤208:该处理节点2接收该第二读取任务,根据该第二读取任务包括的子文件的标识和处理节点2的标识获取该子文件,向管理服务器1发送获取的子文件。Step 208: The processing node 2 receives the second reading task, obtains the sub file according to the identifier of the sub file included in the second reading task and the identifier of the processing node 2, and sends the obtained sub file to the management server 1.
在本步骤中,该处理节点2接收该第二读取任务,该第二读取任务包括一个子文件的标识和一个处理节点2的标识。如果该处理节点2是该第二读取任务中的处理节点2的标识对应的处理节点2,则该处理节点2根据该第二读取任务中的子文件的标识获取对应的子文件。如果该处理节点2不是该第二读取任务中的处理节点2的标识对应的处理节点2,则该处理节点2根据该第二读取任务中的子文件的标识,从该第二读取任务中的处理节点2的标识对应的处理节点2中获取对应的子文件。In this step, the processing node 2 receives the second reading task, and the second reading task includes an identifier of a subfile and an identifier of the processing node 2. If the processing node 2 is the processing node 2 corresponding to the identifier of the processing node 2 in the second reading task, the processing node 2 obtains the corresponding subfile according to the identifier of the subfile in the second reading task. If the processing node 2 is not the processing node 2 corresponding to the identification of the processing node 2 in the second reading task, the processing node 2 reads from the second reading task according to the identification of the subfile in the second reading task. The corresponding subfile is obtained from the processing node 2 corresponding to the identifier of the processing node 2 in the task.
可选的,如果第二读取任务中还包括文件汇总节点的标识,在该处理节点2不是文件汇总节点时,该处理节点2根据该文件汇总节点的标识向文件汇总节点发送获取的子文件。在该处理节点2是文件汇总节点时,该处理节点2还接收汇总任务,以及接收其他处理节点2发送的子文件,在自身获取的子文件数目和接收的子文件数目之和达到该汇总任务中的子文件数目时,将自身获取的子文件和接收的子文件组成待访问文件,向管理服务器1发送待访问文件。Optionally, if the second reading task also includes the identification of the document summary node, when the processing node 2 is not a document summary node, the processing node 2 sends the obtained child file to the document summary node according to the identification of the document summary node . When the processing node 2 is a document summary node, the processing node 2 also receives summary tasks and receives sub-files sent by other processing nodes 2, and the sum of the number of sub-files acquired by itself and the number of received sub-files reaches the summary task When the number of sub-files in, the sub-files obtained by itself and the received sub-files are formed into the files to be accessed, and the files to be accessed are sent to the management server 1.
可选的,在文件汇总节点与管理服务器1为每个第二读取任务选择的处理节点2不同的情况,该文件汇总节点接收汇总任务,以及接收其他处理节点2发送的子文件,在接收的子文件数目达到该汇总任务中的子文件数目时,将接收的子文件组成待访问文件,向管理服务器1发送待访问文件。Optionally, in the case where the file summary node and the processing node 2 selected by the management server 1 for each second reading task are different, the file summary node receives the summary task and receives the sub-files sent by other processing nodes 2, and then When the number of sub-files reaches the number of sub-files in the summary task, the received sub-files are grouped into files to be accessed, and the files to be accessed are sent to the management server 1.
步骤209:管理服务器1接收各处理节点2发送的子文件,得到待访问文件,向客户端4发送待访问文件。Step 209: The management server 1 receives the sub-files sent by each processing node 2, obtains the file to be accessed, and sends the file to be accessed to the client 4.
可选的,管理服务器1将接收的各子文件整合成待访问文件,向客户端4发送待访问文件。Optionally, the management server 1 integrates the received sub-files into a file to be accessed, and sends the file to be accessed to the client 4.
在第一读取任务或第二读取任务中还包括文件汇总节点的标识,管理服务器1接收文件汇总节点发送的待访问文件,向客户端4发送待访问文件。The first reading task or the second reading task also includes the identification of the file summary node. The management server 1 receives the file to be accessed sent by the file summary node, and sends the file to be accessed to the client 4.
可选的,在使用方式二选择处理节点2的情况下,管理服务器1中保存有处理节点2的标识和任务数目的对应关系。对于被选择的任一个处理节点2,将处理节点2的标识和任务数目的对应关系中保存的该处理节点2的任务数目减去记录的该处理节点2的第一读取任务数目或第二读取任务数目。Optionally, when the processing node 2 is selected in the second method, the management server 1 saves the corresponding relationship between the identifier of the processing node 2 and the number of tasks. For any selected processing node 2, the number of tasks of the processing node 2 stored in the corresponding relationship between the identifier of the processing node 2 and the number of tasks is subtracted from the recorded first read task number or the second number of the processing node 2 The number of read tasks.
在本申请实施例中,管理服务器1在确定数据访问系统的处理节点2中没有缓存待访问文件时,生成至少一个第一读取任务,每个第一读取任务包括待访问文件所在的存储服务器3的标识和待访问文件中的一子文件的标识。这样使得处理节点2接收第一读取任务时不会先访问处理节点2中的缓存21,而是可以直接根据第一读取任务包括的该存储服务器3的标识从该存储服务器3中获取该子文件,然后向管理服务器1发送该子文件,该处理节点2在向管理服务器1发送该子文件之前不会将该子文件缓存到该处理 节点2的缓存21中,从而使得该子文件可以不经过该处理节点2的缓存21,降低了待访问文件的传输时延。在数据访问系统的处理节点2缓存有待访问文件时,生成的第二读取任务中包括待访问文件的子文件所在处理节点2的标识,这样便于接收该第二读取任务的处理节点2基于该第二读取任务中的处理节点2的标识获取子文件,提高了访问文件的效率。另外,在数据访问系统的处理节点2中没有保存待访问文件时,获取待访问文件在最近的预设时长的时间段内的访问频率,在该访问频率超过第一预设频率阈值时,控制处理节点2缓存待访问文件。在访问频率超过第一预设频率阈值时,表明待访问文件是近期常被访问的文件,将待访问文件保存到数据访问系统的处理节点2的缓存21中,这样不仅提高处理节点2的缓存21的利用率,还提高了文件的命中率。In the embodiment of the present application, when the management server 1 determines that the file to be accessed is not cached in the processing node 2 of the data access system, it generates at least one first reading task, and each first reading task includes the storage where the file to be accessed is located. The identifier of the server 3 and the identifier of a sub-file in the file to be accessed. In this way, the processing node 2 will not first access the cache 21 in the processing node 2 when receiving the first reading task, but can directly obtain the storage server 3 from the storage server 3 according to the identifier of the storage server 3 included in the first reading task. Sub-file, and then send the sub-file to the management server 1. The processing node 2 will not cache the sub-file in the cache 21 of the processing node 2 before sending the sub-file to the management server 1, so that the sub-file can be Without passing through the cache 21 of the processing node 2, the transmission delay of the file to be accessed is reduced. When the processing node 2 of the data access system caches the file to be accessed, the generated second reading task includes the identification of the processing node 2 where the child file of the file to be accessed is located, so that the processing node 2 that receives the second reading task is convenient for The identification of the processing node 2 in the second reading task obtains the subfile, which improves the efficiency of accessing the file. In addition, when the file to be accessed is not saved in the processing node 2 of the data access system, the access frequency of the file to be accessed in the latest preset time period is obtained, and when the access frequency exceeds the first preset frequency threshold, control Processing node 2 caches the file to be accessed. When the access frequency exceeds the first preset frequency threshold, it indicates that the file to be accessed is a file that is frequently accessed recently, and the file to be accessed is saved in the cache 21 of the processing node 2 of the data access system, which not only improves the cache of the processing node 2 The utilization rate of 21 has also improved the file hit rate.
参见图6,本申请实施例提供了一种数据访问的装置300,该装置300部署在上述管理服务器1中,该装置300与多个处理节点2连接,该多个处理节点2连接至存储服务器3。该装置300包括:Referring to FIG. 6, an embodiment of the present application provides an apparatus 300 for data access. The apparatus 300 is deployed in the above-mentioned management server 1. The apparatus 300 is connected to a plurality of processing nodes 2, and the plurality of processing nodes 2 are connected to a storage server. 3. The device 300 includes:
接收单元301,用于接收文件访问请求,该文件访问请求中携带待访问文件的标识。The receiving unit 301 is configured to receive a file access request, and the file access request carries an identifier of the file to be accessed.
处理单元302,用于根据待访问文件的标识确定待访问文件是否缓存在该多个处理节点2的缓存21中,该装置300中存储有该多个处理节点2中的缓存21中缓存的文件的标识。The processing unit 302 is configured to determine whether the file to be accessed is cached in the cache 21 of the multiple processing nodes 2 according to the identifier of the file to be accessed, and the device 300 stores the files cached in the cache 21 of the multiple processing nodes 2 Logo.
处理单元303,还用于当待访问文件没有缓存在该多个处理节点2的缓存21中,则指示该多个处理节点2中的至少一个处理节点2从存储服务器3中获取所述待访问文件。The processing unit 303 is further configured to, when the file to be accessed is not cached in the cache 21 of the multiple processing nodes 2, instruct at least one processing node 2 of the multiple processing nodes 2 to obtain the to-be-accessed file from the storage server 3. file.
可选的,处理单元302确定待访问文件是否缓存在该多个处理节点2的缓存21的详细实现过程,可以参见图3所示实施例的步骤202中的相关内容,在此不再详细说明。Optionally, for the detailed implementation process of the processing unit 302 determining whether the file to be accessed is cached in the cache 21 of the multiple processing nodes 2, refer to the relevant content in step 202 of the embodiment shown in FIG. 3, which will not be described in detail here. .
参见图6,可选的,该装置300还包括:第一发送单元302,Referring to FIG. 6, optionally, the apparatus 300 further includes: a first sending unit 302,
处理单元302,用于从该存储服务器3中获取待访问文件所包括的至少一个子文件的标识及每个子文件所在的存储服务器3的标识;针对待访问文件所包括的每个子文件生成一个读取任务,每个读取任务中包括一个子文件的标识,及该子文件所在的存储服务器3的标识。The processing unit 302 is configured to obtain the identification of at least one sub-file included in the file to be accessed and the identification of the storage server 3 where each sub-file is located from the storage server 3; generate a read file for each sub-file included in the file to be accessed For fetching tasks, each reading task includes an identification of a sub-file and an identification of the storage server 3 where the sub-file is located.
第一发送单元303,用于将每个读取任务分别发送至一个处理节点2,指示接收到读取任务的处理节点2从存储该子文件的存储服务器3中读取该子文件。The first sending unit 303 is configured to send each reading task to a processing node 2 respectively, and instruct the processing node 2 that has received the reading task to read the sub-file from the storage server 3 that stores the sub-file.
接收单元301,用于接收接收到读取任务的处理节点2读取的子文件。The receiving unit 301 is configured to receive the subfile read by the processing node 2 that has received the read task.
处理单元302,用于将接收的子文件合并为待访问文件。The processing unit 302 is configured to merge the received sub-files into a file to be accessed.
可选的,处理单元302生成读取任务的详细实现过程,可以参见图3所示实施例的步骤203中的相关内容。第一发送单元303发送读取任务的详细实现过程,可以参见图3所示实施例的步骤204中的相关内容,在此不再详细说明。Optionally, the processing unit 302 generates a detailed implementation process of the reading task, which can be referred to related content in step 203 of the embodiment shown in FIG. 3. For the detailed implementation process of sending the reading task by the first sending unit 303, please refer to the related content in step 204 of the embodiment shown in FIG. 3, which will not be described in detail here.
参见图6,可选的,该装置300还包括:第二发送单元304,Referring to FIG. 6, optionally, the apparatus 300 further includes: a second sending unit 304,
第二发送单元304,用于当待访问文件的访问频率超过预设频率时,发送缓存任务至该多个处理节点2的至少一个处理节点2,以指示该至少一个处理节点2将待访问文件所包括的子文件缓存至该至少一个处理节点2。The second sending unit 304 is configured to send a cache task to at least one processing node 2 of the plurality of processing nodes 2 when the access frequency of the file to be accessed exceeds the preset frequency, so as to instruct the at least one processing node 2 to send the file to be accessed The included sub-files are cached to the at least one processing node 2.
处理单元302,还用于记录待访问文件的标识所包括的子文件标识及缓存每个子文件的处理节点2的标识;当待访问文件缓存在该多个处理节点2的缓存21中时,生成至少一个读取任务,每个读取任务包括子文件的标识及该子文件所在的处理节点2的标识。The processing unit 302 is also used to record the sub-file identifier included in the identifier of the file to be accessed and the identifier of the processing node 2 that caches each sub-file; when the file to be accessed is cached in the cache 21 of the multiple processing nodes 2, generate At least one reading task, each reading task includes the identification of the subfile and the identification of the processing node 2 where the subfile is located.
第二发送单元304,还用于发送至少一个读取任务至多个处理节点2,指示该多个处理节点2从存储有该子文件的处理节点2的缓存21中读取该子文件。The second sending unit 304 is further configured to send at least one reading task to multiple processing nodes 2 to instruct the multiple processing nodes 2 to read the sub-file from the cache 21 of the processing node 2 storing the sub-file.
处理单元302,还用于将取的子文件合成待访问文件。The processing unit 302 is also used to synthesize the fetched sub-files into the file to be accessed.
可选的,第二发送单元304发送缓存任务的详细实现过程,可以参见图4所示实施例中的步骤2023和2024中的相关内容。处理单元302生成读取任务的详细实现过程,可以参见图3所示实施例中的步骤206中的相关内容。以及第二发送单元304发送读取任务的详细实现过程,可以参见图3所示实施例中的步骤207中的相关内容,在此不再详细说明。Optionally, for the detailed implementation process of the second sending unit 304 sending the buffering task, refer to related content in steps 2023 and 2024 in the embodiment shown in FIG. 4. The processing unit 302 generates a detailed implementation process of the reading task, which can be referred to related content in step 206 in the embodiment shown in FIG. 3. As well as the detailed implementation process of the second sending unit 304 sending the reading task, please refer to the related content in step 207 in the embodiment shown in FIG. 3, which will not be described in detail here.
可选的,第二发送单元304,还用于当待访问文件的访问频率低于预设频率时,向待访问文件包括的子文件所在的处理节点2发送删除任务,该删除任务包括该子文件的标识,以指示该处理节点2删除该子文件。Optionally, the second sending unit 304 is further configured to send a deletion task to the processing node 2 where the subfile included in the file to be accessed is located when the access frequency of the file to be accessed is lower than the preset frequency, and the deletion task includes the subfile. The identifier of the file to instruct the processing node 2 to delete the sub-file.
处理单元302,还用于删除该装置300中记录的该子文件的标识及该处理节点2的标识。The processing unit 302 is further configured to delete the identifier of the subfile and the identifier of the processing node 2 recorded in the device 300.
可选的,第二发送单元304发送删除任务的详细实现过程,可以参见图5所示实施例中的步骤2122和2123中的相关内容,在此不再详细说明。Optionally, for the detailed implementation process of the second sending unit 304 sending the deletion task, refer to the related content in steps 2122 and 2123 in the embodiment shown in FIG. 5, which will not be described in detail here.
在本申请实施例中,由于处理单元302根据待访问文件的标识确定待访问文件是否缓存在该多个处理节点2的缓存21中;当待访问文件没有缓存在该多个处理节点2的缓存21中,则指示该多个处理节点2中的至少一个处理节点2从存储服务器3中获取待访问文件。这样对于该至少一个处理节点2根据处理单元302的指示直接从存储服务器3中读取待访问文件,读取到待访问文件后直接向该装置300返回待访问文件,在向该装置300返回待访问文件之前不会将待访问文件缓存到该至少一个处理节点2的缓存21中。如此待访问文件不需要经过该至少一个处理节点2的缓存21,从而减小了待访问文件的传输路径,实现减小数据读取的路径,提高数据访问的性能。In the embodiment of the present application, the processing unit 302 determines whether the file to be accessed is cached in the cache 21 of the multiple processing nodes 2 according to the identifier of the file to be accessed; when the file to be accessed is not cached in the cache of the multiple processing nodes 2 In step 21, at least one processing node 2 of the multiple processing nodes 2 is instructed to obtain the file to be accessed from the storage server 3. In this way, the at least one processing node 2 directly reads the file to be accessed from the storage server 3 according to the instruction of the processing unit 302, and directly returns the file to be accessed to the device 300 after reading the file to be accessed, and then returns the file to be accessed to the device 300. Before accessing the file, the file to be accessed will not be cached in the cache 21 of the at least one processing node 2. In this way, the file to be accessed does not need to pass through the cache 21 of the at least one processing node 2, thereby reducing the transmission path of the file to be accessed, reducing the path for data reading, and improving the performance of data access.
参见图7,本申请实施例提供了一种数据访问的装置400,该装置400部署在上述处理节点2中,该装置400是与管理服务器1连接的多个处理节点2中的一个,该多个处理节点2连接到存储服务器3。该装置400包括:Referring to FIG. 7, an embodiment of the present application provides an apparatus 400 for data access. The apparatus 400 is deployed in the above-mentioned processing node 2. The apparatus 400 is one of multiple processing nodes 2 connected to the management server 1. Two processing nodes 2 are connected to the storage server 3. The device 400 includes:
接收单元401,用于接收读取任务,该读取任务是管理服务器1在确定待访问文件没有缓存在该多个处理节点2的缓存21中时发送的任务,该读取任务包括待访问文件中的一个子文件的标识和该子文件所在的存储服务器3的标识。The receiving unit 401 is configured to receive a reading task, which is a task sent by the management server 1 when it determines that the file to be accessed is not cached in the cache 21 of the multiple processing nodes 2, and the reading task includes the file to be accessed The identifier of a sub-file in and the identifier of the storage server 3 where the sub-file is located.
处理单元402,用于根据该子文件的标识从存储服务器3的标识对应的存储服务器3中读取该子文件。The processing unit 402 is configured to read the sub file from the storage server 3 corresponding to the identifier of the storage server 3 according to the identifier of the sub file.
发送单元403,用于向管理服务器1发送读取的子文件。The sending unit 403 is configured to send the read sub-file to the management server 1.
可选的,处理单元402读取子文件的详细实现过程,可以参见图3所示实施例中的步骤205中的相关内容,在此不再详细说明。Optionally, for the detailed implementation process of the processing unit 402 reading the sub-file, refer to the related content in step 205 in the embodiment shown in FIG. 3, which will not be described in detail here.
可选的,接收单元401,还用于接收缓存任务,该缓存任务是管理服务器1在待访问文件的访问频率超过预设频率时发送的任务,该缓存任务包括待访问文件的一个子文件的标识和该子文件所在的存储服务器3的标识。Optionally, the receiving unit 401 is further configured to receive a cache task, which is a task sent by the management server 1 when the access frequency of the file to be accessed exceeds a preset frequency, and the cache task includes the status of a subfile of the file to be accessed. The identifier and the identifier of the storage server 3 where the sub-file is located.
处理单元402,还用于根据该子文件的标识从存储服务器3的标识对应的存储服务器3中读取该子文件;将该子文件存储在该装置400的缓存21中。The processing unit 402 is further configured to read the sub file from the storage server 3 corresponding to the identifier of the storage server 3 according to the identifier of the sub file; store the sub file in the cache 21 of the device 400.
可选的,处理单元402缓存子文件的详细实现过程,可以参见图4所示实施例中的步骤2025中的相关内容,在此不再详细说明。Optionally, for the detailed implementation process of the processing unit 402 caching the sub-files, refer to the related content in step 2025 in the embodiment shown in FIG. 4, which will not be described in detail here.
在本申请实施例中,由于接收单元401接收的读取任务包括待访问文件中的一个子文件的标识和该子文件所在的存储服务器3的标识;处理单元402根据该子文件的标识 从该存储服务器3的标识对应的存储服务器3中读取该子文件;发送单元403向管理服务器1发送读取的子文件。这样处理单元402可以根据该存储服务器3的标识直接从该存储服务器3中读取该子文件,然后发送单元403直接返回给管理服务器1,且在向管理服务器1返回该子文件之前,不会处理单元402将该子文件缓存到该装置400的缓存21中。所以返回给管理服务器1的子文件不会经过该装置400的缓存21,减小了该子文件的传输路径,实现减小数据读取的路径,提高数据访问的性能。In the embodiment of the present application, since the reading task received by the receiving unit 401 includes the identification of a sub-file in the file to be accessed and the identification of the storage server 3 where the sub-file is located; The storage server 3 corresponding to the identifier of the storage server 3 reads the subfile; the sending unit 403 sends the read subfile to the management server 1. In this way, the processing unit 402 can directly read the sub-file from the storage server 3 according to the identifier of the storage server 3, and then the sending unit 403 directly returns the sub-file to the management server 1, and does not return the sub-file to the management server 1. The processing unit 402 caches the sub-file in the cache 21 of the device 400. Therefore, the subfile returned to the management server 1 will not pass through the cache 21 of the device 400, reducing the transmission path of the subfile, reducing the data reading path, and improving the performance of data access.
参见图8,图8所示为本申请实施例提供的一种数据访问的装置500示意图。该装置500包括至少一个处理器501,总线系统502、存储器503和收发器504。Referring to FIG. 8, FIG. 8 is a schematic diagram of a data access apparatus 500 provided by an embodiment of the application. The device 500 includes at least one processor 501, a bus system 502, a memory 503, and a transceiver 504.
该装置500是一种硬件结构的装置,可以用于实现图6所述的装置中的功能单元。例如,本领域技术人员可以想到图6所示的装置300中的处理单元302可以通过该至少一个处理器501调用存储器503中的应用程序代码来实现,图6所示的装置300中的接收单元301、第一发送单元303和第二发送单元304可以通过该收发器504来实现。The device 500 is a device with a hardware structure, and can be used to implement the functional units in the device described in FIG. 6. For example, those skilled in the art can imagine that the processing unit 302 in the device 300 shown in FIG. 6 can be implemented by the at least one processor 501 calling the application code in the memory 503, and the receiving unit in the device 300 shown in FIG. 6 301. The first sending unit 303 and the second sending unit 304 may be implemented by the transceiver 504.
可选的,该装置500还可用于实现如图1或图3所述的实施例中管理服务器1的功能。Optionally, the device 500 may also be used to implement the functions of the management server 1 in the embodiment described in FIG. 1 or FIG. 3.
可选的,上述处理器501可以是一个通用中央处理器(central processing unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。Optionally, the above-mentioned processor 501 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the computer Apply for integrated circuits for program execution.
上述总线系统502可包括一通路,在上述组件之间传送信息。The above-mentioned bus system 502 may include a path for transferring information between the above-mentioned components.
上述存储器503可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。The above-mentioned memory 503 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions. The type of dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), or other optical disk storage, optical discs Storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by Any other medium accessed by the computer, but not limited to this. The memory can exist independently and is connected to the processor through a bus. The memory can also be integrated with the processor.
其中,存储器503用于存储执行本申请方案的应用程序代码,并由处理器501来控制执行。处理器501用于执行存储器503中存储的应用程序代码,从而实现本专利方法中的功能。The memory 503 is used to store application program codes for executing the solutions of the present application, and the processor 501 controls the execution. The processor 501 is configured to execute the application program code stored in the memory 503, so as to realize the functions in the method of the present patent.
在具体实现中,作为一种实施例,处理器501可以包括一个或多个CPU,例如图5中的CPU0和CPU1。In a specific implementation, as an embodiment, the processor 501 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 5.
在具体实现中,作为一种实施例,该装置500可以包括多个处理器,例如图5中的处理器501和处理器508。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。In specific implementation, as an embodiment, the apparatus 500 may include multiple processors, such as the processor 501 and the processor 508 in FIG. 5. Each of these processors can be a single-CPU (single-CPU) processor or a multi-core (multi-CPU) processor. The processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
参见图9,图9所示为本申请实施例提供的一种数据访问的装置600示意图。该装置700包括至少一个处理器601,总线系统602、存储器603和收发器604。该存储器603中还包括缓存21,该缓存21用于保存访问频率超过预设频率的文件包括的子文件。Referring to FIG. 9, FIG. 9 is a schematic diagram of a data access apparatus 600 provided by an embodiment of the application. The device 700 includes at least one processor 601, a bus system 602, a memory 603, and a transceiver 604. The memory 603 also includes a cache 21, and the cache 21 is used to store subfiles included in files whose access frequency exceeds a preset frequency.
该装置600是一种硬件结构的装置,可以用于实现图7所述的装置中的功能单元。例如,本领域技术人员可以想到图7所示的装置400中的处理单元402可以通过该至少 一个处理器601调用存储器603中的代码来实现,图7所示的装置400中的发送单元403和接收单元401可以通过该收发器604来实现。The device 600 is a device with a hardware structure, and can be used to implement the functional units in the device described in FIG. 7. For example, those skilled in the art can imagine that the processing unit 402 in the device 400 shown in FIG. 7 can be implemented by calling the code in the memory 603 by the at least one processor 601. The sending unit 403 and the sending unit 403 in the device 400 shown in FIG. The receiving unit 401 can be implemented by the transceiver 604.
可选的,该装置600还可用于实现如图1或图3所述的实施例中处理节点2的功能。Optionally, the device 600 may also be used to implement the function of the processing node 2 in the embodiment described in FIG. 1 or FIG. 3.
可选的,上述处理器601可以是一个通用中央处理器(central processing unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。Optionally, the aforementioned processor 601 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the computer. Apply for integrated circuits for program execution.
上述总线系统602可包括一通路,在上述组件之间传送信息。The above-mentioned bus system 602 may include a path for transferring information between the above-mentioned components.
上述存储器603可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。The aforementioned memory 603 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), or other types that can store information and instructions. The type of dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), or other optical disk storage, optical discs Storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by Any other medium accessed by the computer, but not limited to this. The memory can exist independently and is connected to the processor through a bus. The memory can also be integrated with the processor.
其中,存储器603用于存储执行本申请方案的应用程序代码,并由处理器601来控制执行。处理器601用于执行存储器603中存储的应用程序代码,从而实现本专利方法中的功能。The memory 603 is used to store application program codes for executing the solutions of the present application, and the processor 601 controls the execution. The processor 601 is configured to execute the application program code stored in the memory 603, so as to realize the functions in the method of the present patent.
在具体实现中,作为一种实施例,处理器601可以包括一个或多个CPU,例如图7中的CPU0和CPU1。In a specific implementation, as an embodiment, the processor 601 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 7.
在具体实现中,作为一种实施例,该装置600可以包括多个处理器,例如图7中的处理器601和处理器608。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。In specific implementation, as an embodiment, the apparatus 600 may include multiple processors, such as the processor 601 and the processor 608 in FIG. 7. Each of these processors can be a single-CPU (single-CPU) processor or a multi-core (multi-CPU) processor. The processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above are only optional embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection of this application. Within range.

Claims (10)

  1. 一种数据访问的方法,其特征在于,所述方法由管理服务器执行,所述管理服务器与多个处理节点连接,所述多个处理节点连接至存储服务器,所述方法包括:A method for data access, characterized in that the method is executed by a management server, the management server is connected to a plurality of processing nodes, and the plurality of processing nodes are connected to a storage server, and the method includes:
    接收文件访问请求,所述文件访问请求中携带待访问文件的标识;Receiving a file access request, where the file access request carries an identifier of the file to be accessed;
    根据所述待访问文件的标识确定所述待访问文件是否缓存在所述多个处理节点中的至少一个处理节点的缓存中,所述管理服务器中存储有被缓存的文件的标识;Determining, according to the identifier of the file to be accessed, whether the file to be accessed is cached in the cache of at least one of the multiple processing nodes, and the management server stores the identifier of the cached file;
    当所述待访问文件没有缓存在所述多个处理节点的至少一个处理节点的缓存中,则指示所述多个处理节点中的至少一个处理节点从所述存储服务器中获取所述待访问文件。When the file to be accessed is not cached in the cache of at least one processing node of the multiple processing nodes, at least one processing node of the multiple processing nodes is instructed to obtain the file to be accessed from the storage server .
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, wherein the method further comprises:
    从所述存储服务器中获取所述待访问文件所包括的至少一个子文件的标识及所述每个子文件所在的存储服务器的标识;Acquiring, from the storage server, the identifier of at least one sub-file included in the file to be accessed and the identifier of the storage server where each sub-file is located;
    所述指示所述多个节点中的至少一个处理节点从所述存储服务器中获取所述待访问文件包括:The instructing at least one processing node of the plurality of nodes to obtain the file to be accessed from the storage server includes:
    针对所述待访问文件所包括的每个子文件生成一个读取任务,每个读取任务中包括一个子文件的标识,及所述子文件所在的存储服务器的标识;Generating a reading task for each sub-file included in the file to be accessed, and each reading task includes an identification of a sub-file and an identification of the storage server where the sub-file is located;
    将每个读取任务分别发送至一个处理节点,指示接收到读取任务的处理节点从存储所述子文件的存储服务器中读取所述子文件;Sending each reading task to a processing node, and instructing the processing node that has received the reading task to read the sub-file from the storage server that stores the sub-file;
    接收接收到读取任务的处理节点读取的子文件;Receive the subfile read by the processing node that has received the read task;
    将所述子文件合并为所述待访问文件。The sub-files are merged into the file to be accessed.
  3. 如权利要求1或2所述的方法,其特征在于,所述方法还包括:The method according to claim 1 or 2, wherein the method further comprises:
    当所述待访问文件的访问频率超过预设频率时,发送缓存任务至所述多个处理节点的至少一个处理节点,以指示所述至少一个处理节点将所述待访问文件所包括的子文件缓存至所述至少一个处理节点;When the access frequency of the file to be accessed exceeds the preset frequency, send a cache task to at least one processing node of the multiple processing nodes to instruct the at least one processing node to remove the subfiles included in the file to be accessed Cache to the at least one processing node;
    记录所述待访问文件的标识所包括的子文件标识及缓存每个子文件的处理节点的标识;Record the identifier of the sub-file included in the identifier of the file to be accessed and the identifier of the processing node that caches each sub-file;
    当所述待访问文件缓存在所述多个处理节点的缓存中时,生成至少一个读取任务,每个读取任务包括子文件的标识及所述子文件所在的处理节点的标识;When the file to be accessed is cached in the caches of the multiple processing nodes, at least one reading task is generated, and each reading task includes the identifier of the subfile and the identifier of the processing node where the subfile is located;
    发送所述至少一个读取任务至所述多个处理节点,指示所述多个处理节点从存储有所述子文件的处理节点的缓存中读取所述子文件;Sending the at least one reading task to the multiple processing nodes, instructing the multiple processing nodes to read the sub-file from the cache of the processing node storing the sub-file;
    将所读取的子文件合成所述待访问文件。Synthesize the read sub-files into the file to be accessed.
  4. 如权利要求3所述的方法,其特征在于,所述方法还包括:The method of claim 3, wherein the method further comprises:
    当所述待访问文件的访问频率低于预设频率时,向所述待访问文件包括的子文件所在的处理节点发送删除任务,所述删除任务包括所述子文件的标识,以指示所述处理节点删除所述子文件;When the access frequency of the file to be accessed is lower than the preset frequency, a deletion task is sent to the processing node where the subfile included in the file to be accessed is located, and the deletion task includes the identifier of the subfile to instruct the The processing node deletes the sub-file;
    删除所述管理服务器中记录的所述子文件的标识及所述处理节点的标识。Delete the identifier of the subfile and the identifier of the processing node recorded in the management server.
  5. 一种数据访问的装置,其特征在于,所述装置与多个处理节点连接,所述多个处理节点连接至存储服务器,所述装置包括:A data access device, characterized in that the device is connected to multiple processing nodes, the multiple processing nodes are connected to a storage server, and the device includes:
    接收单元,用于接收文件访问请求,所述文件访问请求中携带待访问文件的标识;The receiving unit is configured to receive a file access request, where the file access request carries an identifier of the file to be accessed;
    处理单元,用于根据所述待访问文件的标识确定所述待访问文件是否缓存在所述多个处理节点的至少一个处理节点的缓存中,所述装置中存储有被缓存的文件的标识;A processing unit, configured to determine whether the file to be accessed is cached in the cache of at least one processing node of the plurality of processing nodes according to the identifier of the file to be accessed, and the device stores the identifier of the file to be cached;
    所述处理单元,还用于当所述待访问文件没有缓存在所述多个处理节点的至少一个处理节点的缓存中,则指示所述多个处理节点中的至少一个处理节点从所述存储服务器中获取所述待访问文件。The processing unit is further configured to, when the file to be accessed is not cached in the cache of at least one processing node of the multiple processing nodes, instruct at least one processing node of the multiple processing nodes to download from the storage Obtain the file to be accessed from the server.
  6. 如权利要求5所述的装置,其特征在于,所述装置还包括:第一发送单元,The device according to claim 5, wherein the device further comprises: a first sending unit,
    所述处理单元,用于从所述存储服务器中获取所述待访问文件所包括的至少一个子文件的标识及所述每个子文件所在的存储服务器的标识;针对所述待访问文件所包括的每个子文件生成一个读取任务,每个读取任务中包括一个子文件的标识,及所述子文件所在的存储服务器的标识;The processing unit is configured to obtain the identification of at least one sub-file included in the file to be accessed and the identification of the storage server where each sub-file is located from the storage server; Each sub-file generates a reading task, and each reading task includes an identifier of a sub-file and an identifier of the storage server where the sub-file is located;
    所述第一发送单元,用于将每个读取任务分别发送至一个处理节点,指示接收到读取任务的处理节点从存储所述子文件的存储服务器中读取所述子文件;The first sending unit is configured to send each reading task to a processing node, and instruct the processing node that has received the reading task to read the sub-file from a storage server that stores the sub-file;
    所述接收单元,用于接收接收到读取任务的处理节点读取的子文件;The receiving unit is configured to receive the subfile read by the processing node that has received the read task;
    所述处理单元,用于将所述子文件合并为所述待访问文件。The processing unit is configured to merge the sub-files into the file to be accessed.
  7. 如权利要求5或6所述的装置,其特征在于,所述装置还包括:第二发送单元,The device according to claim 5 or 6, wherein the device further comprises: a second sending unit,
    所述第二发送单元,用于当所述待访问文件的访问频率超过预设频率时,发送缓存任务至所述多个处理节点的至少一个处理节点,以指示所述至少一个处理节点将所述待访问文件所包括的子文件缓存至所述至少一个处理节点;The second sending unit is configured to send a cache task to at least one processing node of the multiple processing nodes when the access frequency of the to-be-accessed file exceeds a preset frequency, so as to instruct the at least one processing node to Cache the sub-files included in the file to be accessed to the at least one processing node;
    所述处理单元,还用于记录所述待访问文件的标识所包括的子文件标识及缓存每个子文件的处理节点的标识;当所述待访问文件缓存在所述多个处理节点的缓存中时,生成至少一个读取任务,每个读取任务包括子文件的标识及所述子文件所在的处理节点的标识;The processing unit is further configured to record the identifier of the sub-file included in the identifier of the file to be accessed and the identifier of the processing node that caches each sub-file; when the file to be accessed is cached in the caches of the multiple processing nodes When generating at least one reading task, each reading task includes the identifier of the sub-file and the identifier of the processing node where the sub-file is located;
    所述第二发送单元,还用于发送所述至少一个读取任务至所述多个处理节点,指示所述多个处理节点从存储有所述子文件的处理节点的缓存中读取所述子文件;The second sending unit is further configured to send the at least one reading task to the multiple processing nodes, and instruct the multiple processing nodes to read the sub file;
    所述处理单元,还用于将所读取的子文件合成所述待访问文件。The processing unit is also used to synthesize the read sub-files into the file to be accessed.
  8. 如权利要求7所述的装置,其特征在于,The device of claim 7, wherein:
    所述第二发送单元,还用于当所述待访问文件的访问频率低于预设频率时,向所述待访问文件包括的子文件所在的处理节点发送删除任务,所述删除任务包括所述子文件的标识,以指示所述处理节点删除所述子文件;The second sending unit is further configured to send a deletion task to the processing node where the subfile included in the file to be accessed is located when the access frequency of the file to be accessed is lower than the preset frequency, where the deletion task includes all The identifier of the subfile to instruct the processing node to delete the subfile;
    所述处理单元,还用于删除所述装置中记录的所述子文件的标识及所述处理节点的标识。The processing unit is further configured to delete the identifier of the subfile and the identifier of the processing node recorded in the device.
  9. 一种数据访问的系统,其特征在于,所述系统包括:管理服务器、存储服务器和多个处理节点;A data access system, characterized in that the system includes: a management server, a storage server and multiple processing nodes;
    所述管理服务器,用于接收文件访问请求,所述文件访问请求中携带待访问文件的标识;根据所述待访问文件的标识确定所述待访问文件是否缓存在所述多个处理节点中的至少一个处理节点的缓存中,所述管理服务器中存储有被缓存的文件的标识;当所述待访问文件没有缓存在所述多个处理节点的至少一个处理节点的缓存中,从所述存储服务器中获取所述待访问文件所包括的至少一个子文件的标识及所述每个子文件所在的存储服务器的标识,针对所述待访问文件所包括的每个子文件生成一个读取任务,每个读取任务中包括一个子文件的标识及所述子文件所在的存储服务器的标识;将所述每个读 取任务分别发送至一个处理节点;The management server is configured to receive a file access request, where the file access request carries an identifier of the file to be accessed; determining whether the file to be accessed is cached in the multiple processing nodes according to the identifier of the file to be accessed In the cache of at least one processing node, the management server stores the identifier of the file to be cached; when the file to be accessed is not cached in the cache of at least one of the multiple processing nodes, from the storage The server acquires the identifier of at least one sub-file included in the file to be accessed and the identifier of the storage server where each sub-file is located, and generates a read task for each sub-file included in the file to be accessed, each The reading task includes the identifier of a subfile and the identifier of the storage server where the subfile is located; sending each of the reading tasks to a processing node;
    接收到读取任务的处理节点,用于根据接收的读取任务中的子文件的标识从所述存储服务器的标识对应的存储服务器中读取对应的子文件,向所述管理服务器发送所述读取的子文件;The processing node that has received the reading task is configured to read the corresponding sub-file from the storage server corresponding to the identifier of the storage server according to the identifier of the sub-file in the received reading task, and send the corresponding sub-file to the management server. Sub-files read;
    所述管理服务器,还用于接收所述接收到读取任务的处理节点读取的子文件。The management server is further configured to receive the subfile read by the processing node that has received the read task.
  10. 如权利要求9所述的系统,其特征在于,The system of claim 9, wherein:
    所述管理服务器,还用于当所述待访问文件的访问频率超过预设频率时,发送缓存任务至所述多个处理节点的至少一个处理节点;The management server is further configured to send a cache task to at least one processing node of the plurality of processing nodes when the access frequency of the file to be accessed exceeds a preset frequency;
    接收到缓存任务的处理节点,用于缓存所述待访问文件包括的子文件;The processing node that receives the cache task is used to cache the sub-files included in the file to be accessed;
    所述管理服务器,还用于记录所述待访问文件的标识所包括的子文件标识及缓存每个子文件的处理节点的标识;The management server is also used to record the identifier of the sub-file included in the identifier of the file to be accessed and the identifier of the processing node that caches each sub-file;
    所述管理服务器,还用于当所述待访问文件缓存在所述多个处理节点的缓存中时,生成至少一个读取任务,每个读取任务包括子文件的标识及所述子文件所在的处理节点的标识;发送所述至少一个读取任务至所述多个处理节点;The management server is further configured to generate at least one reading task when the file to be accessed is cached in the caches of the multiple processing nodes, and each reading task includes the identifier of the subfile and the location of the subfile. The identification of the processing node; sending the at least one read task to the multiple processing nodes;
    接收到读取任务的处理节点,用于根据接收的读取任务中的子文件的标识和所述子文件所在的处理节点的标识读取子文件,向所述管理服务器发送所述读取的子文件;The processing node that receives the reading task is configured to read the sub-file according to the identifier of the sub-file in the received reading task and the identifier of the processing node where the sub-file is located, and send the read to the management server sub file;
    所述管理服务器,还用于接收所述接收读取任务的处理节点读取的子文件。The management server is further configured to receive the subfile read by the processing node that receives the read task.
PCT/CN2020/110819 2019-08-23 2020-08-24 Method, apparatus and system for data access WO2021036989A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910786485.4A CN112416871B (en) 2019-08-23 2019-08-23 Data access method, device and system
CN201910786485.4 2019-08-23

Publications (1)

Publication Number Publication Date
WO2021036989A1 true WO2021036989A1 (en) 2021-03-04

Family

ID=74683263

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/110819 WO2021036989A1 (en) 2019-08-23 2020-08-24 Method, apparatus and system for data access

Country Status (2)

Country Link
CN (1) CN112416871B (en)
WO (1) WO2021036989A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150277802A1 (en) * 2014-03-31 2015-10-01 Amazon Technologies, Inc. File storage using variable stripe sizes
CN107026876A (en) * 2016-01-29 2017-08-08 杭州海康威视数字技术股份有限公司 A kind of file data accesses system and method
CN107562757A (en) * 2016-07-01 2018-01-09 阿里巴巴集团控股有限公司 Inquiry, access method based on distributed file system, apparatus and system
CN107920101A (en) * 2016-10-10 2018-04-17 阿里巴巴集团控股有限公司 A kind of file access method, device, system and electronic equipment
CN109002260A (en) * 2018-07-02 2018-12-14 深圳市茁壮网络股份有限公司 A kind of data cached processing method and processing system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150277802A1 (en) * 2014-03-31 2015-10-01 Amazon Technologies, Inc. File storage using variable stripe sizes
CN107026876A (en) * 2016-01-29 2017-08-08 杭州海康威视数字技术股份有限公司 A kind of file data accesses system and method
CN107562757A (en) * 2016-07-01 2018-01-09 阿里巴巴集团控股有限公司 Inquiry, access method based on distributed file system, apparatus and system
CN107920101A (en) * 2016-10-10 2018-04-17 阿里巴巴集团控股有限公司 A kind of file access method, device, system and electronic equipment
CN109002260A (en) * 2018-07-02 2018-12-14 深圳市茁壮网络股份有限公司 A kind of data cached processing method and processing system

Also Published As

Publication number Publication date
CN112416871A (en) 2021-02-26
CN112416871B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
KR102133951B1 (en) Short link handling methods, devices, and servers
CN108694075B (en) Method and device for processing report data, electronic equipment and readable storage medium
WO2017092351A1 (en) Cache data update method and device
CN109656956B (en) Method and device for realizing centralized caching of service system data
TW201339833A (en) Method and system for caching monitoring data
CN111782692B (en) Frequency control method and device
WO2021073510A1 (en) Statistical method and device for database
CN107026879B (en) Data caching method and background application system
CN111159219B (en) Data management method, device, server and storage medium
WO2014161261A1 (en) Data storage method and apparatus
CN111158892B (en) Task queue generating method, device and equipment
CN111597259B (en) Data storage system, method, device, electronic equipment and storage medium
CN110908965A (en) Object storage management method, device, equipment and storage medium
JP2013156765A (en) Information processing apparatus, distributed processing system, cache management program, and distributed processing method
CN113127477A (en) Method and device for accessing database, computer equipment and storage medium
US20220342888A1 (en) Object tagging
WO2021036989A1 (en) Method, apparatus and system for data access
JP2004272668A (en) Distributed file system
US20230237043A1 (en) Accelerating change data capture determination using row bitsets
JP6189266B2 (en) Data processing apparatus, data processing method, and data processing program
CN108228644B (en) Method and device for exporting report
CN112559570B (en) Cache data acquisition method, device, equipment and storage medium
JP4303828B2 (en) Print management system and method
JP6607044B2 (en) Server device, distributed file system, distributed file system control method, and program
JP7392168B2 (en) URL refresh method, device, equipment and CDN node in CDN

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20858975

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20858975

Country of ref document: EP

Kind code of ref document: A1