CN111258956B - Method and device for prereading far-end mass data files - Google Patents

Method and device for prereading far-end mass data files Download PDF

Info

Publication number
CN111258956B
CN111258956B CN201910222715.4A CN201910222715A CN111258956B CN 111258956 B CN111258956 B CN 111258956B CN 201910222715 A CN201910222715 A CN 201910222715A CN 111258956 B CN111258956 B CN 111258956B
Authority
CN
China
Prior art keywords
file
target
data
information
read
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910222715.4A
Other languages
Chinese (zh)
Other versions
CN111258956A (en
Inventor
高磊
高正
邹海锋
陈子光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Vispractice Technology Co ltd
Original Assignee
Shenzhen Vispractice Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Vispractice Technology Co ltd filed Critical Shenzhen Vispractice Technology Co ltd
Priority to CN201910222715.4A priority Critical patent/CN111258956B/en
Publication of CN111258956A publication Critical patent/CN111258956A/en
Application granted granted Critical
Publication of CN111258956B publication Critical patent/CN111258956B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention is applicable to the technical field of computers, and provides a method and equipment for prereading far-end mass data files, wherein the method comprises the following steps: acquiring a reading request sent by a terminal; searching target index information of the target pre-read file based on the offset; analyzing the target index information to obtain the data address information of the target pre-read file; and reading file data of the target pre-read file at the cloud based on the data address information, and caching the file data to a server. In the embodiment of the invention, for the disk I/O request which does not contain information such as file path, file type and the like, the index information can be determined according to the offset in the request, the index information is analyzed to obtain the data address information of the pre-read file, the file data of the pre-read file is read according to the data address information, and the file data is cached to the server. The embodiment of the invention can determine the pre-read file address without depending on file path information, and extract the pre-read file data to the buffer area.

Description

Method and device for prereading far-end mass data files
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a method and equipment for prereading far-end mass data files.
Background
The file pre-reading refers to reading data into a buffer area in advance when sequentially reading files, and directly reading the data from the buffer area when reading the files next time so as to improve the reading performance of the files.
The structured I/O request refers to the input data/output data request including information such as file path, file type, I/O request type, etc.; disk I/O requests refer to input data/output data requests that do not include such information as file path, file type, I/O request type, etc.
The existing method for pre-reading files obtains list information through the catalogue of the current read file, and then reads all files smaller than the preset file size in the catalogue into a cache; or by pre-reading the data into a file cache. However, these methods of reading the read-ahead file only need to know the file path to find the read-ahead file accurately, so that only the structured I/O request can be responded, the read-ahead file cannot be determined for the disk I/O request, and the required file cannot be extracted to the buffer.
Disclosure of Invention
In view of this, the embodiment of the invention provides a method and a device for prereading a remote mass data file, so as to solve the problem that when the prior prereading method is adopted to preread a file, the prereading file cannot be determined for a disk I/O request and the required prereading file cannot be extracted to a cache region in the prior art.
A first aspect of an embodiment of the present invention provides a method for prereading a remote-oriented mass data file, including:
acquiring a reading request sent by a terminal; the read request comprises an offset of a target read-ahead file requested to be acquired;
searching target index information of the target pre-read file based on the offset; the target index information is obtained through a file system of the analysis terminal; the target index information comprises index nodes of the target pre-read file;
analyzing the target index information to obtain the data address information of the target pre-read file;
and reading file data of the target pre-read file at the cloud based on the data address information, and caching the file data to a server.
A second aspect of an embodiment of the present invention provides a device for prereading a remote-oriented mass data file, the device including:
the acquisition unit is used for acquiring a reading request sent by the terminal; the read request comprises an offset of a target read-ahead file requested to be acquired;
the searching unit is used for searching the target index information of the target pre-read file based on the offset; the target index information is obtained through a file system of the analysis terminal; the target index information comprises index nodes of the target pre-read file;
the analysis unit is used for analyzing the target index information to obtain the data address information of the target pre-read file;
and the reading unit is used for reading the file data of the target pre-read file at the cloud based on the data address information and caching the file data to a server.
A third aspect of an embodiment of the present invention provides another device for remote mass data file prereading, including a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, and where the memory is configured to store a computer program supporting the device to perform the above method, where the computer program includes program instructions, and where the processor is configured to invoke the program instructions to perform the following steps:
acquiring a reading request sent by a terminal; the read request comprises an offset of a target read-ahead file requested to be acquired;
searching target index information of the target pre-read file based on the offset; the target index information is obtained through a file system of the analysis terminal; the target index information comprises index nodes of the target pre-read file;
analyzing the target index information to obtain the data address information of the target pre-read file;
and reading file data of the target pre-read file at the cloud based on the data address information, and caching the file data to a server.
A fourth aspect of embodiments of the present invention provides a computer readable storage medium storing a computer program which when executed by a processor performs the steps of:
acquiring a reading request sent by a terminal; the read request comprises an offset of a target read-ahead file requested to be acquired;
searching target index information of the target pre-read file based on the offset; the target index information is obtained through a file system of the analysis terminal; the target index information comprises index nodes of the target pre-read file;
analyzing the target index information to obtain the data address information of the target pre-read file;
and reading file data of the target pre-read file at the cloud based on the data address information, and caching the file data to a server.
The method and the device for prereading the far-end mass data files provided by the embodiment of the invention have the following beneficial effects:
according to the embodiment of the invention, the reading request sent by the terminal is obtained; searching target index information of the target pre-read file based on the offset; analyzing the target index information to obtain the data address information of the target pre-read file; and reading file data of the target pre-read file at the cloud based on the data address information, and caching the file data to a server. In the embodiment of the invention, for the disk I/O request which does not contain information such as file path, file type and the like, the index information can be determined according to the offset in the request, the index information is analyzed to obtain the data address information of the pre-read file, the file data of the pre-read file is read according to the data address information, and the file data is cached to the server. The embodiment of the invention can determine the pre-read file address without depending on file path information, and extract the pre-read file data to the buffer area.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for prereading a remote mass data file according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for prereading a remote mass data file according to another embodiment of the present invention;
FIG. 3 is a schematic diagram of a device for remote mass data file prereading according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an apparatus for remote mass data file prereading according to another embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a schematic flow chart of a method for prereading a remote mass data file according to an embodiment of the present invention. The execution body of the file pre-reading method in this embodiment is a device for pre-reading a remote mass data file, including but not limited to a terminal or a server for file pre-reading. The file pre-reading method as shown in fig. 1 may include:
s101: acquiring a reading request sent by a terminal; the read request includes an offset of the target read-ahead file requested to be acquired.
The server acquires a reading request sent by the terminal. Specifically, the server receives a read request sent by the terminal, or the server detects whether read request information sent by the terminal exists, and when the read request sent by the terminal is detected, the read request information is extracted. The read request is a disk I/O request, i.e., an input data/output data request that does not include information such as a file path, a file type, an I/O request type, and the like. The read request may include an offset of the target read-ahead file requested to be obtained, a size of the target read-ahead file, a length occupied by the target read-ahead file data store, and the like. The number of the read requests may be one or a plurality of; the target pre-read file acquired by the read request can be one or a plurality of target pre-read files. The number, content, etc. of the read requests are not limited in terms of the actual situation acquired by the server.
S102: and searching target index information of the target pre-read file based on the offset.
The index information is obtained by a file system of the server analysis terminal, the index information comprises index nodes of target pre-read files, and the target index information comprises index nodes of the target pre-read files. Wherein a file system is a method and data structure used by an operating system to explicitly store files on a device or partition; the index node is used for storing basic information of files and catalogs, including time, file name, users, groups and the like. Specifically, the server runs a file system analysis thread, analyzes the file system through the file system analysis thread, obtains index information of all files in the file system and the first address offset of file data corresponding to the files identified by the index information, wherein the index information comprises index nodes of all the files.
The searching of the target index information of the target pre-read file based on the offset may specifically be that the server acquires the offset of the target pre-read file requested to be acquired in the read request, and searches the target index information of the target pre-read file according to the offset of the target pre-read file. For example, the server traverses the first address offsets of the file data corresponding to the files respectively identified by the index information of all the files obtained by the analysis file system, searches the first address offset identical to the offset of the target pre-read file in all the first address offsets, searches the file associated with the first address offset identical to the offset of the target pre-read file, and obtains the index information of the file; or the server searches the target head address offset which is the same as the offset of the target pre-read file in the database, and when the target head address offset which is the same as the offset of the target pre-read file is found, the target pre-read file associated with the target head address offset is searched, and the target index information of the target pre-read file is obtained. The database stores index information of all files obtained by analyzing the file system by the server and the first address offset of file data corresponding to the files identified by the index information.
S103: and analyzing the target index information to obtain the data address information of the target pre-read file.
The target index information includes an inode of the target read-ahead file. The server acquires the index node in the target index information, and analyzes the index node to obtain the data address information of the target pre-read file. If the data structure of the index node is analyzed, obtaining the data address information of the target pre-read file in the concrete index node data structure; or analyzing the index node, and obtaining metadata information of the file corresponding to the index node to obtain data address information of the target pre-read file. The metadata information may include a file offset address and a file data amount, or a start address and an end address of a file data storage, or a start bit of the file data storage, a length of the file data storage, and the like.
Further, S103 may include S1031-S1032, specifically as follows:
s1031: and analyzing the target index information to obtain the index nodes in the target index information.
The server analyzes the target index information to obtain index nodes in the target index information. Specifically, the server analyzes the target index information and extracts index nodes in the target index information.
S1032: acquiring metadata information of a file corresponding to the index node based on the index node; the metadata information includes a file offset address and a file data amount.
The server analyzes the index node and acquires metadata information of the file corresponding to the index node. Specifically, the server obtains metadata information of the file corresponding to the index node according to basic information, which is included in the index node and used for storing the file, the archive, the catalog and the like. The metadata information may include a file offset address and a file data amount, or a start address and an end address of a file data store, or a start bit of a file data store, a length of a file data store, and the like.
Further, in order to make the file data cached to the server more complete, when the server obtains the metadata information of the file corresponding to the index node, the server may also obtain the metadata information of the rest of files in the parent directory to which the file belongs. Specifically, the server acquires the index node of the parent directory of the file by analyzing the index node, analyzes the index node of the parent directory, and acquires metadata information of all files except the target pre-read file in the parent directory. If the server analyzes the index node, acquiring metadata information of a file corresponding to the index node, determining specific data address information of the target pre-read file according to a file offset address and a file data amount included in the metadata information, and searching the previous-stage information of the data address information, namely parent directory information of the target pre-read file. And obtaining the index node of the parent directory, and analyzing the index node of the parent directory to obtain the index nodes of all files in the parent directory. Since the metadata information of the target pre-read file is already obtained, the inodes of the target pre-read file in the parent directory can be removed at this time. And analyzing index nodes corresponding to all other files except the target pre-read file in the parent directory to obtain metadata information of all other files in the parent directory. The metadata information may include a file offset address and a file data amount, or a start address and an end address of a file data store, or a start bit of a file data store and a length of a file data store, etc., and the specific address of the file may be determined according to the calculated file offset address and file data amount, or the start address and end address of the file data store, or the start bit of the file data store and the length of the file data store.
S104: and reading file data of the target pre-read file at the cloud based on the data address information, and caching the file data to a server.
And the server reads the file data of the target pre-read file at the cloud according to the data address information and caches the file data to the server. Specifically, according to the file offset address and the file data amount included in the metadata information, determining a specific position of the target pre-read file, searching the target pre-read file in the cloud based on the specific position of the target pre-read file, acquiring specific file data of the target pre-read file, caching the file data to a server, and also caching the file data to a local storage system.
Further, when the server also obtains metadata information of the other files in the parent directory to which the target pre-read file belongs, the server can determine specific positions of the other files in the parent directory according to the metadata information, search the files in the cloud, obtain specific file data of the files, cache the file data to the server, and also can cache the file data to the local storage system.
Further, S104 may include S1041-S1043, specifically as follows:
s1041: an address identification is created for the data information and stored in a data transmission queue.
The server acquires the data information of the target pre-read file according to the corresponding relation preset by the user, creates an address identifier for the data information of the target pre-read file, and stores the address identifier in a data transmission queue. The preset corresponding relation is used for the server to acquire the data information of the target read-ahead file from the cloud according to the address information. The address identifier is used for identifying the specific storage position of the data information of the target pre-read file, the form of the address identifier can be letters, characters, numbers, custom names and the like, and the user can set the address identifier according to actual conditions, so that the address identifier is not limited. The data transmission queue is a queue created by the server when the server performs data interaction with the cloud, and can be used for transmitting data or storing address identifiers. To speed up the transfer data rate, the data transfer queue may also be created in advance, such as when the server parses the file system.
The server can also segment the acquired data information evenly or randomly according to the data quantity, create segment address identifiers corresponding to each segment of the data information according to the sequence of the segments, and store the segment address identifiers of each segment of the data information in a data transmission queue. The segment address identifier is used for identifying the specific storage position of each piece of data information of the target pre-read file, the form of the segment address identifier can be letters, characters, numbers, custom names and the like, and the user can set the segment address identifier according to actual conditions, so that the segment address identifier is not limited.
S1042: when the address identification is detected in the data transmission queue, reading file data corresponding to the address identification from the cloud through a pre-reading thread.
When the server detects the address identification in the data transmission queue, the file data corresponding to the address identification is read in the cloud through a pre-reading thread in the server. Specifically, the server searches the file data according to the position of the file data represented by the address identifier in the cloud, and starts a pre-reading thread to read the file data. And when the server does not detect the address identification in the data transmission queue, a sleep instruction is sent out to enable the pre-reading thread to enter a sleep state.
Further, in order to improve the efficiency of reading file data, the user may set the period of detecting the address identifier in the data transmission queue by the server according to the actual situation, for example, may set detection once every second, detection once every two seconds, or the like, which is not limited.
The pre-reading thread is used for reading file data in the cloud, and can be created when the server interacts with the cloud, and can also be created when the server analyzes a file system. The number of the read-ahead threads may be one or a plurality of the read-ahead threads. For example, when the server segments the acquired data information evenly or randomly according to the data amount, creates a segment address identifier corresponding to each segment of the data information according to the sequence of the segments, and stores the segment address identifier of each segment of the data information in the data transmission queue, a plurality of pre-read threads can be created. When the server detects the segment address identifiers in the data transmission queue, sequentially reading each segment of data corresponding to each segment address identifier through a plurality of pre-reading threads, and combining the read segments of data according to the sequence of reading the data by each pre-reading thread to obtain the file data of the target pre-reading file.
S1043: and caching the file data to a server.
The server caches the file data read from the cloud through the pre-reading thread to the server or to a local database.
According to the embodiment of the invention, the reading request sent by the terminal is obtained; searching target index information of the target pre-read file based on the offset; analyzing the target index information to obtain the data address information of the target pre-read file; and reading file data of the target pre-read file at the cloud based on the data address information, and caching the file data to a server. In the embodiment of the invention, for the disk I/O request which does not contain information such as file path, file type and the like, the index information can be determined according to the offset in the request, the index information is analyzed to obtain the data address information of the pre-read file, the file data of the pre-read file is read according to the data address information, and the file data is cached to the server. The embodiment of the invention can determine the pre-read file address without depending on file path information, and extract the pre-read file data to the buffer area.
Referring to fig. 2, fig. 2 is a schematic flow chart of a method for prereading a remote mass data file according to an embodiment of the present invention. The execution body of the file pre-reading method in this embodiment is a device for pre-reading a remote mass data file, including but not limited to a terminal or a server for file pre-reading.
In order to accelerate the processing of the read request and increase the speed of reading the pre-read file, the difference between this embodiment and the previous embodiment is that S202-S203 may be further included before S204. In this embodiment, S201, S204-S206 are identical to S101, S102-S104 in the embodiment corresponding to fig. 1, and specific reference is made to the description related to S101, S102-S104 in the embodiment corresponding to fig. 1, which is not repeated here. It should be noted that S202-S203 need only be in S204: the searching of the target index information of the target pre-read file based on the offset may be performed before S201, and S202 to S203 may also be performed before S201, which is not limited.
In order to accelerate the processing of the read request and increase the speed of reading the pre-read file, S202-S203 may be further included before S204, and S202-S203 are specifically as follows:
s202: when the number of times of data writing requests is detected to reach a preset threshold value, analyzing a file system of the terminal to obtain a file information set; the file information set comprises index information of all files in the file system and the head address offset of file data corresponding to the files identified by the index information.
When the number of times of data writing requests is detected to reach a preset threshold value, analyzing a file system of the terminal to obtain a file information set. The file information set may include index information of all files in the file system and a first address offset of file data corresponding to each file identified by each index information. Specifically, when receiving a data writing request, the file system of the terminal sends the request to the server, and the server records the number of times the data writing request is received. When the number of times of receiving the data writing request reaches a preset threshold value, the server starts a file system analysis thread to analyze a file system of the terminal, and a file information set is obtained. The preset threshold is the number of times of receiving the data writing request, which is set by the user according to the actual situation, and the preset threshold can be one time, two times, five times, etc., and is not limited.
Further, in order to avoid that the server frequently analyzes the file system of the terminal, a time period for the server to analyze the file system may be set. If the analysis time period of the server meets the first preset threshold, and when the number of times of data writing requests is detected to meet the second preset threshold, analyzing the file system of the terminal to obtain a file information set. The first preset threshold is a time period for the user to analyze the file system according to the server set by the actual situation, and the first preset threshold can be analyzed once every 120 seconds, once every 60 seconds, and the like, which is not limited. The second preset threshold is the number of times of receiving the data writing request, which is set by the user according to the actual situation, and the second preset threshold can be one time, two times, five times, etc., which is not limited.
Further, the file system includes a target type identification.
The file system includes a target type identifier that identifies the type of the file system. When the server analyzes the file system, the type of the file system can be determined by acquiring the target type identification of the file.
Further, when the file system includes the target type identifier, S202 may include S2021-S2022, specifically as follows:
s2021: and determining a target analysis method corresponding to the target type identifier based on a preset corresponding relation between the type identifier and the analysis method.
The corresponding relation between the type identifiers and the analysis methods can be specifically a preset corresponding relation for a user, each type identifier has an analysis method which is uniquely corresponding to the type identifier, and the target analysis method corresponding to the target type identifier can be determined according to the preset corresponding relation between the type identifier and the analysis method. For example, the type identifier of the file system may be a file system (New Technology File System, NTFS) in the log file system XFS, windowsNT environment, an optical disc file system (Compact Disc File System, CDFS), a file configuration table (File Allocation Table, FAT), or the like, and the corresponding parsing method may be an XFS parsing method, an NTFS parsing method, a CDFS parsing method, a FAT parsing method, or the like.
S2022: and analyzing the file system by adopting the target analysis method to obtain a file information set.
And analyzing the file system of the terminal according to the obtained target analysis method corresponding to the target type identifier, and obtaining index information of all files in the file system and the first address offset of the file data corresponding to the files identified by each index information, namely the file information set.
For example, when the object type is identified as XFS, the corresponding parsing method is an XFS parsing method, and the file system is parsed by the XFS parsing method. Specifically, a block of a root directory of the XFS file system is obtained by analyzing partial data at the beginning of a hard disk partition of the file system, an index node of the root directory is obtained by analyzing a data structure corresponding to the block, and index information of all primary subdirectories, files and links under the root directory is obtained by analyzing the index node of the root directory. If the index information is index information of the catalogue, analyzing all secondary subdirectories, files and links under the primary subdirectories according to the index information of the primary subdirectories, acquiring the index information of the secondary subdirectories, the files and the links, and continuously analyzing and analyzing the tertiary subdirectories according to the acquired index information of the secondary subdirectories until the secondary subdirectories are not analyzed. If the index information is the index information of the file, analyzing the index information of the file to obtain the first address offset of the file. Wherein, the block is the minimum storage and processing unit in the database, and contains the head information data of the block itself. If the index information is linked index information or index information of other non-files and non-directories, the index information is skipped and is not processed.
S203: and storing the file information set into a local database.
And the server stores the acquired index information of all the files in the file system and the initial address offset of the file data corresponding to the files identified by each index information into a local database.
S204: searching target index information of the target pre-read file based on the offset; the target index information is obtained through a file system of the analysis terminal; the target index information includes an inode of the target read-ahead file.
In this embodiment, S204 is identical to S102 in the previous embodiment, and detailed descriptions of S102 in the previous embodiment are omitted here.
Further, S204 may include S2041-S2042, which is specifically as follows:
s2041: and searching the target head address offset which is the same as the offset in the local database.
The local database stores index information of all files obtained by analyzing the file system by the server, and the head address offset of file data corresponding to the files identified by the index information. The server searches the local database for the target head address offset which is the same as the offset based on the offset of the target pre-read file.
S2042: and determining the target index information based on the target head address offset.
When the target head address offset which is the same as the offset of the target pre-read file is found in the local database, the target pre-read file associated with the target head address offset is found in the local database, and target index information of the target pre-read file is obtained. When the target head address offset which is the same as the offset of the target pre-read file is not found in the database, the server may not process the read request, or may send a response message to the terminal that the target pre-read file is not found or the target index information is not found.
According to the embodiment of the invention, the reading request sent by the terminal is obtained; searching target index information of the target pre-read file based on the offset; analyzing the target index information to obtain the data address information of the target pre-read file; and reading file data of the target pre-read file at the cloud based on the data address information, and caching the file data to a server. In the embodiment of the invention, for the disk I/O request which does not contain information such as file path, file type and the like, the index information can be determined according to the offset in the request, the index information is analyzed to obtain the data address information of the pre-read file, the file data of the pre-read file is read according to the data address information, and the file data is cached to the server. The embodiment of the invention can determine the pre-read file address without depending on file path information, and extract the pre-read file data to the buffer area.
Referring to fig. 3, fig. 3 is a schematic diagram of an apparatus for remote mass data file prereading according to an embodiment of the present invention. The device for pre-reading the remote mass data file comprises units for executing the steps in the embodiments corresponding to fig. 1 and fig. 2. Refer specifically to the related descriptions in the respective embodiments of fig. 1 and fig. 2. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 3, a device 3 for remote mass data file prereading comprises:
an acquiring unit 310, configured to acquire a read request sent by a terminal; the read request comprises an offset of a target read-ahead file requested to be acquired;
a searching unit 320, configured to search for target index information of the target pre-read file based on the offset; the target index information is obtained through a file system of the analysis terminal; the target index information comprises index nodes of the target pre-read file;
the parsing unit 330 is configured to parse the target index information to obtain data address information of the target pre-read file;
the reading unit 340 is configured to read file data of the target pre-read file at the cloud end based on the data address information, and cache the file data to a server.
Further, the device for pre-reading the remote mass data file further comprises:
the file system analysis unit is used for analyzing the file system of the terminal to obtain a file information set when the number of times of data writing requests is detected to reach a preset threshold value; the file information set comprises index information of all files in the file system and the head address offset of file data corresponding to the files identified by the index information respectively;
and the storage unit is used for storing the file information set into a local database.
Further, the file system includes a target type identifier; the file system parsing unit is specifically configured to: determining a target analysis method corresponding to the target type identifier based on a preset corresponding relation between the type identifier and the analysis method;
and analyzing the file system by adopting the target analysis method to obtain a file information set.
Further, the searching unit is specifically configured to:
searching the target head address offset which is the same as the offset in the local database;
and determining the target index information based on the target head address offset.
Further, the parsing unit is specifically configured to:
analyzing the target index information to obtain the index nodes in the target index information;
acquiring metadata information of a file corresponding to the index node based on the index node; the metadata information includes a file offset address and a file data amount.
Further, the reading unit is specifically configured to:
creating an address identifier for the data information and storing the address identifier in a data transmission queue;
when the address identification is detected in the data transmission queue, reading file data corresponding to the address identification from the cloud through a pre-reading thread;
and caching the file data to a server.
Referring to fig. 4, fig. 4 is a schematic diagram of an apparatus for remote mass data file prereading according to another embodiment of the present invention. As shown in fig. 4, the device 4 for remote-oriented mass data file pre-reading of this embodiment includes: a processor 40, a memory 41 and a computer program 42 stored in the memory 41 and executable on the processor 40. The steps of the above-described respective device file pre-reading method embodiments are implemented when the processor 40 executes the computer program 42, for example, S101 to S104 shown in fig. 1. Alternatively, the processor 40, when executing the computer program 42, performs the functions of the units in the above-described device embodiments, for example, the functions of the units 310 to 340 shown in fig. 3.
Illustratively, the computer program 42 may be partitioned into one or more units that are stored in the memory 41 and executed by the processor 40 to complete the present invention. The one or more elements may be a series of computer program instruction segments capable of performing a specific function describing the execution of the computer program 42 in the device 4. For example, the computer program 42 may be divided into an acquisition unit, a search unit, an analysis unit, and a reading unit, each unit functioning specifically as described above.
Including but not limited to a processor 40, a memory 41. It will be appreciated by those skilled in the art that fig. 4 is merely an example of device 4 and does not constitute a limitation of device 4, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the device may further include an input-output device, a network access device, a bus, etc.
The processor 40 may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may be an internal storage unit of the device 4, such as a hard disk or a memory of the device 4. The memory 41 may also be an external storage device of the device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the device 4. Further, the memory 41 may also comprise both an internal storage unit and an external storage device of the device 4. The memory 41 is used for storing the computer program as well as other programs and data required by the device. The memory 41 may also be used for temporarily storing data that has been output or is to be output.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (10)

1. The method for prereading the far-end mass data file is characterized by comprising the following steps:
acquiring a reading request sent by a terminal; the read request is a disk I/O request, and the disk I/O request does not comprise a file path, a file type and an I/O request type; the disk I/O request comprises an offset of a target read-ahead file requested to be acquired;
searching target index information of the target pre-read file based on the offset; the searching the target index information of the target pre-read file based on the offset comprises: the server runs a file system analysis thread, analyzes the file system through the file system analysis thread, and obtains index information of all files in the file system and the first address offset of file data corresponding to the files identified by each index information, wherein the index information of each file comprises index nodes of each file; the server traverses and analyzes all the first address offsets, searches the first address offset which is the same as the offset of the target pre-read file in all the first address offsets, searches the file which is associated with the first address offset which is the same as the offset of the target pre-read file, and acquires the index information of the file to obtain the target index information;
analyzing the target index information to obtain the data address information of the target pre-read file;
and reading file data of the target pre-read file at the cloud based on the data address information, and caching the file data to a server.
2. The method of claim 1, wherein prior to the looking up the target index information for the target pre-read file based on the offset, further comprising:
when the number of times of data writing requests is detected to reach a preset threshold value, analyzing a file system of the terminal to obtain a file information set; the file information set comprises index information of all files in the file system and the head address offset of file data corresponding to the files identified by the index information respectively;
and storing the file information set into a local database.
3. The method of claim 2, wherein the looking up the target index information of the target pre-read file based on the offset comprises:
searching the target head address offset which is the same as the offset in the local database;
and determining the target index information based on the target head address offset.
4. The method of claim 2, wherein the file system includes a target type identification; when the number of times of data writing requests is detected to reach a preset threshold, analyzing a file system of the terminal to obtain a file information set, wherein the method comprises the following steps:
determining a target analysis method corresponding to the target type identifier based on a preset corresponding relation between the type identifier and the analysis method;
and analyzing the file system by adopting the target analysis method to obtain a file information set.
5. The method of claim 1, wherein the parsing the target index information to obtain the data address information of the target pre-read file comprises:
analyzing the target index information to obtain the index nodes in the target index information;
acquiring metadata information of a file corresponding to the index node based on the index node; the metadata information includes a file offset address and a file data amount.
6. The method of claim 1, wherein the reading the data information of the target pre-read file at the cloud based on the data address information and caching the data information to a server comprises:
creating an address identifier for the data information and storing the address identifier in a data transmission queue;
when the address identification is detected in the data transmission queue, reading file data corresponding to the address identification from the cloud through a pre-reading thread;
and caching the file data to a server.
7. A device for remote mass data file prereading, comprising:
the acquisition unit is used for acquiring a reading request sent by the terminal; the read request is a disk I/O request, and the disk I/O request does not comprise a file path, a file type and an I/O request type; the disk I/O request comprises an offset of a target read-ahead file requested to be acquired;
the searching unit is used for searching the target index information of the target pre-read file based on the offset; the searching the target index information of the target pre-read file based on the offset comprises: the server runs a file system analysis thread, analyzes the file system through the file system analysis thread, and obtains index information of all files in the file system and the first address offset of file data corresponding to the files identified by each index information, wherein the index information of each file comprises index nodes of each file; the server traverses and analyzes all the first address offsets, searches the first address offset which is the same as the offset of the target pre-read file in all the first address offsets, searches the file which is associated with the first address offset which is the same as the offset of the target pre-read file, and acquires the index information of the file to obtain the target index information;
the analysis unit is used for analyzing the target index information to obtain the data address information of the target pre-read file;
and the reading unit is used for reading the file data of the target pre-read file at the cloud based on the data address information and caching the file data to a server.
8. The device of claim 7, wherein the lookup unit is specifically configured to:
searching a target first address offset which is the same as the offset in a local database;
and determining the target index information based on the target head address offset.
9. A device for remote mass data file pre-reading comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 6 when the computer program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 6.
CN201910222715.4A 2019-03-22 2019-03-22 Method and device for prereading far-end mass data files Active CN111258956B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910222715.4A CN111258956B (en) 2019-03-22 2019-03-22 Method and device for prereading far-end mass data files

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910222715.4A CN111258956B (en) 2019-03-22 2019-03-22 Method and device for prereading far-end mass data files

Publications (2)

Publication Number Publication Date
CN111258956A CN111258956A (en) 2020-06-09
CN111258956B true CN111258956B (en) 2023-11-24

Family

ID=70952042

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910222715.4A Active CN111258956B (en) 2019-03-22 2019-03-22 Method and device for prereading far-end mass data files

Country Status (1)

Country Link
CN (1) CN111258956B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113220656B (en) * 2020-12-10 2024-04-16 格创东智(深圳)科技有限公司 Analysis method and device for liquid crystal panel glass production data file
CN113703413B (en) * 2021-11-01 2022-01-25 西安热工研究院有限公司 Data interaction method, system, equipment and storage medium based on secondary index
CN114117530B (en) * 2021-11-29 2023-08-22 抖音视界有限公司 File leakage detection method and device
CN114356232B (en) * 2021-12-30 2024-04-09 西北工业大学 Data reading and writing method and device

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6973542B1 (en) * 2000-07-18 2005-12-06 International Business Machines Corporation Detecting when to prefetch inodes and then prefetching inodes in parallel
CN1790335A (en) * 2005-12-19 2006-06-21 无锡永中科技有限公司 XML file data access method
CN101158965A (en) * 2007-10-25 2008-04-09 中国科学院计算技术研究所 File reading system and method of distributed file systems
CN101901263A (en) * 2010-07-22 2010-12-01 华为终端有限公司 Access method and device of file system
CN102158546A (en) * 2011-02-28 2011-08-17 中国科学院计算技术研究所 Cluster file system and file service method thereof
CN102184211A (en) * 2011-05-03 2011-09-14 成都市华为赛门铁克科技有限公司 File system, and method and device for retrieving, writing, modifying or deleting file
CN102521349A (en) * 2011-12-12 2012-06-27 深圳市创新科信息技术有限公司 Pre-reading method of files
CN102541985A (en) * 2011-10-25 2012-07-04 曙光信息产业(北京)有限公司 Organization method of client directory cache in distributed file system
CN103020315A (en) * 2013-01-10 2013-04-03 中国人民解放军国防科学技术大学 Method for storing mass of small files on basis of master-slave distributed file system
CN103608785A (en) * 2013-06-21 2014-02-26 华为技术有限公司 Method for reading file, storage device and reading system
US8732406B1 (en) * 2011-03-15 2014-05-20 Netapp, Inc. Mechanism for determining read-ahead length in a storage system
CN106874147A (en) * 2017-03-01 2017-06-20 四川艾特赢泰智能科技有限责任公司 A kind of recovery simultaneously parses the method that Windows operating system pre-reads file
CN107562915A (en) * 2017-09-12 2018-01-09 郑州云海信息技术有限公司 Read the method, apparatus and equipment and computer-readable recording medium of small documents
CN107613016A (en) * 2017-10-11 2018-01-19 网宿科技股份有限公司 Files in batch method for down loading, client, Distributor and system
CN107633102A (en) * 2017-10-25 2018-01-26 郑州云海信息技术有限公司 A kind of method, apparatus, system and equipment for reading metadata
WO2018064319A1 (en) * 2016-09-29 2018-04-05 Veritas Technologies Llc Tracking access pattern of inodes and pre-fetching inodes
CN108377394A (en) * 2018-03-06 2018-08-07 珠海全志科技股份有限公司 Image data read method, computer installation and the computer readable storage medium of video encoder

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060184719A1 (en) * 2005-02-16 2006-08-17 Sinclair Alan W Direct data file storage implementation techniques in flash memories
CN102693286B (en) * 2012-05-10 2014-03-26 华中科技大学 Method for organizing and managing file content and metadata
CN104809183B (en) * 2015-04-17 2018-06-22 北京奇艺世纪科技有限公司 A kind of digital independent and the method and apparatus of write-in

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6973542B1 (en) * 2000-07-18 2005-12-06 International Business Machines Corporation Detecting when to prefetch inodes and then prefetching inodes in parallel
CN1790335A (en) * 2005-12-19 2006-06-21 无锡永中科技有限公司 XML file data access method
CN101158965A (en) * 2007-10-25 2008-04-09 中国科学院计算技术研究所 File reading system and method of distributed file systems
CN101901263A (en) * 2010-07-22 2010-12-01 华为终端有限公司 Access method and device of file system
CN102158546A (en) * 2011-02-28 2011-08-17 中国科学院计算技术研究所 Cluster file system and file service method thereof
US8732406B1 (en) * 2011-03-15 2014-05-20 Netapp, Inc. Mechanism for determining read-ahead length in a storage system
CN102184211A (en) * 2011-05-03 2011-09-14 成都市华为赛门铁克科技有限公司 File system, and method and device for retrieving, writing, modifying or deleting file
CN102541985A (en) * 2011-10-25 2012-07-04 曙光信息产业(北京)有限公司 Organization method of client directory cache in distributed file system
CN102521349A (en) * 2011-12-12 2012-06-27 深圳市创新科信息技术有限公司 Pre-reading method of files
CN103020315A (en) * 2013-01-10 2013-04-03 中国人民解放军国防科学技术大学 Method for storing mass of small files on basis of master-slave distributed file system
CN103608785A (en) * 2013-06-21 2014-02-26 华为技术有限公司 Method for reading file, storage device and reading system
WO2018064319A1 (en) * 2016-09-29 2018-04-05 Veritas Technologies Llc Tracking access pattern of inodes and pre-fetching inodes
CN106874147A (en) * 2017-03-01 2017-06-20 四川艾特赢泰智能科技有限责任公司 A kind of recovery simultaneously parses the method that Windows operating system pre-reads file
CN107562915A (en) * 2017-09-12 2018-01-09 郑州云海信息技术有限公司 Read the method, apparatus and equipment and computer-readable recording medium of small documents
CN107613016A (en) * 2017-10-11 2018-01-19 网宿科技股份有限公司 Files in batch method for down loading, client, Distributor and system
CN107633102A (en) * 2017-10-25 2018-01-26 郑州云海信息技术有限公司 A kind of method, apparatus, system and equipment for reading metadata
CN108377394A (en) * 2018-03-06 2018-08-07 珠海全志科技股份有限公司 Image data read method, computer installation and the computer readable storage medium of video encoder

Also Published As

Publication number Publication date
CN111258956A (en) 2020-06-09

Similar Documents

Publication Publication Date Title
CN111258956B (en) Method and device for prereading far-end mass data files
US11726892B2 (en) Realtime data stream cluster summarization and labeling system
EP3251031B1 (en) Techniques for compact data storage of network traffic and efficient search thereof
US9715507B2 (en) Techniques for reconciling metadata and data in a cloud storage system without service interruption
KR100330576B1 (en) System and method for locating pages on the world wide web and locating documents from a network of computers
US11294920B2 (en) Method and apparatus for accessing time series data in memory
US7797324B2 (en) Document retrieval system, document number subsequence acquisition apparatus, and document retrieval method
US9940360B2 (en) Streaming optimized data processing
CN106445944A (en) Data query request processing method and apparatus, and electronic device
US8959062B2 (en) Data storage device with duplicate elimination function and control device for creating search index for the data storage device
US9021087B1 (en) Method to improve caching accuracy by using snapshot technology
US10346496B2 (en) Information category obtaining method and apparatus
US10771358B2 (en) Data acquisition device, data acquisition method and storage medium
US9262511B2 (en) System and method for indexing streams containing unstructured text data
CN105183873A (en) Malicious clicking behavior detection method and device
CN109766318B (en) File reading method and device
CN105653697B (en) Recommended word retrieval method and system
US10255325B2 (en) Extreme value computation
CN109815240B (en) Method, apparatus, device and storage medium for managing index
CN110889023A (en) Distributed multifunctional search engine of elastic search
US20130066849A1 (en) Tag management device, system and recording medium
CN104252447A (en) File behavior analysis method and device
US9886446B1 (en) Inverted index for text searching within deduplication backup system
CN103177080B (en) The method and apparatus that file pre-reads
CN113536763A (en) Information processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant