CN115269522A - Distributed file caching method, system, equipment and storage medium - Google Patents

Distributed file caching method, system, equipment and storage medium Download PDF

Info

Publication number
CN115269522A
CN115269522A CN202210887021.4A CN202210887021A CN115269522A CN 115269522 A CN115269522 A CN 115269522A CN 202210887021 A CN202210887021 A CN 202210887021A CN 115269522 A CN115269522 A CN 115269522A
Authority
CN
China
Prior art keywords
data file
file
cache
hard disk
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210887021.4A
Other languages
Chinese (zh)
Inventor
赵铖皓
王红宾
胡方炜
陈飞
谭伟华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Weride Technology Co Ltd
Original Assignee
Guangzhou Weride Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Weride Technology Co Ltd filed Critical Guangzhou Weride Technology Co Ltd
Priority to CN202210887021.4A priority Critical patent/CN115269522A/en
Publication of CN115269522A publication Critical patent/CN115269522A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

The application discloses a distributed file caching method, a distributed file caching system, a distributed file caching device and a storage medium, wherein when a simulation task is received, whether a data file required by the simulation task is cached locally is judged, if yes, the local data file is used for carrying out the simulation task, and if not, the data file is downloaded from a data center to the local for carrying out the simulation task; updating target attributes of the data file, including file attributes, network attributes, display card cluster attributes and simulation task attributes, and predicting whether the data file should be cached in a local hard disk or a cached hard disk and the cache life according to the target attributes by a preset multi-task prediction model to obtain a cache strategy result of the data file; the data files are processed according to the cache strategy result, and the technical problems that a large amount of training data can be repeatedly transmitted from the data center to the video card cluster for many times when a large-scale simulation task is performed in the prior art, so that the speed of the simulation task is influenced, and even the simulation task fails due to the data transmission problem are solved.

Description

Distributed file caching method, system, equipment and storage medium
Technical Field
The present application relates to the field of data storage technologies, and in particular, to a distributed file caching method, system, device, and storage medium.
Background
In the field of automatic driving, a simulation task is a relatively important link, and data depended by the simulation task can increase along with the mileage of vehicle driving, the increase of the number of vehicles and the increase of the iteration geometric times of vehicle-mounted hardware. Most enterprises can establish or use an internet data center to store the data, but a single display card cluster is often used for supporting the data during simulation, so that a large amount of training data can be repeatedly transmitted from the data center to the display card cluster for many times, the speed of a simulation task can be reduced, and even the simulation task fails due to the data transmission problem. Therefore, for a large-scale simulation task, how to quickly implement transmission, exchange, and use of file data to increase the speed of the simulation task is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The application provides a distributed file caching method, a distributed file caching system, distributed file caching equipment and a distributed file caching storage medium, which are used for solving the technical problems that a large amount of training data can be repeatedly transmitted from a data center to a video card cluster for multiple times when a large-scale simulation task is performed in the prior art, so that the speed of the simulation task is influenced, and even the simulation task fails due to the data transmission problem.
In view of this, a first aspect of the present application provides a distributed file caching method, including:
when a simulation task is received, judging whether a data file required by the simulation task is cached locally or not, if so, using the locally cached data file to perform the simulation task, and if not, downloading the data file from a data center to locally perform the simulation task;
updating the target attribute of the data file, predicting whether the data file is cached locally, a hard disk for caching the data file and the cache life according to the target attribute by a preset multi-task prediction model, and obtaining a cache strategy result of the data file, wherein the target attribute comprises a file attribute, a network attribute, a display card cluster attribute and a simulation task attribute;
and processing the data file according to the cache strategy result of the data file.
Optionally, the preset multitask prediction model is composed of a discrimination model, a file importance model and a file life model which are parallel to each other, and the cache strategy result includes a cache discrimination result, a cache hard disk position result and a cache life result;
the discrimination model is used for predicting whether the data file is cached locally according to the target attribute of the data file to obtain a cache discrimination result;
the file importance model is used for predicting a hard disk cached by the data file according to the target attribute of the data file to obtain a cached hard disk position result, and the hard disk comprises a solid state hard disk and a mechanical hard disk;
the file life model is used for predicting the local cache life of the data file according to the target attribute of the data file to obtain a cache life result.
Optionally, the processing the data file according to the result of the caching policy of the data file includes:
if the cache judging result is that the data file is not cached locally, deleting the local data file after the simulation task is finished;
if the cache judging result is that the data file is cached locally, caching the data file to a corresponding hard disk according to the cache hard disk position result, and setting the file life of the data file according to the cache life result.
Optionally, the configuration process of the preset multitask prediction model is as follows:
constructing a multi-task learning network, wherein the multi-task learning network is formed by three sub-convolution neural networks which are arranged in parallel;
acquiring a training sample, wherein the training sample comprises target attributes of a plurality of files and corresponding cache strategy tags, and the cache strategy tags comprise three sub-tags of cache tags, cache hard disk position tags and cache life tags;
inputting the training samples into the multi-task learning network for multi-task learning to obtain sub-prediction results output by each sub-convolution neural network, wherein network parameters are shared among the sub-convolution neural networks;
and adjusting the network parameters of the multi-task learning network according to the sub-prediction results of each sub-convolution neural network and the corresponding sub-labels until the multi-task learning network converges to obtain a trained preset multi-task prediction model.
Optionally, the file attribute includes a file size, a file owner, a file access frequency, and/or a file creation time;
the network attribute comprises network speed, network packet loss rate and/or network delay;
the video card cluster attributes comprise video card cluster addresses, hard disk read-write speed, total hard disk capacity, used hard disk capacity and/or video card cluster health degree;
the simulation task attributes include a simulation task type and/or a simulation task priority.
A second aspect of the present application provides a distributed file caching system, including:
the judging module is used for judging whether a data file required by the simulation task is cached locally or not when the simulation task is received, if so, the data file cached locally is used for carrying out the simulation task, and if not, the data file is downloaded from a data center to the local for carrying out the simulation task;
the cache strategy prediction module is used for updating the target attribute of the data file, predicting whether the data file is cached locally, a hard disk for caching the data file and the cache life according to the target attribute through a preset multi-task prediction model, and obtaining a cache strategy result of the data file, wherein the target attribute comprises a file attribute, a network attribute, a display card cluster attribute and a simulation task attribute;
and the processing module is used for processing the data file according to the cache strategy result of the data file.
Optionally, the preset multitask prediction model is composed of a discrimination model, a file importance model and a file life model which are parallel to each other, and the cache strategy result includes a cache discrimination result, a cache hard disk position result and a cache life result;
the discrimination model is used for predicting whether the data file is cached locally according to the target attribute of the data file to obtain a cache discrimination result;
the file importance model is used for predicting the hard disk cached by the data file according to the target attribute of the data file to obtain a cached hard disk position result, and the hard disk comprises a solid state hard disk and a mechanical hard disk;
the file life model is used for predicting the local cache life of the data file according to the target attribute of the data file to obtain a cache life result.
Optionally, the processing module is specifically configured to:
if the cache judging result is that the data file is not cached locally, deleting the local data file after the simulation task is finished;
if the cache judging result is that the data file is cached locally, caching the data file to a corresponding hard disk according to the cache hard disk position result, and setting the file life of the data file according to the cache life result.
A third aspect of the present application provides a distributed file caching apparatus, the apparatus comprising a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the distributed file caching method according to any one of the first aspect according to instructions in the program code.
A fourth aspect of the present application provides a computer-readable storage medium for storing program code, which when executed by a processor, implements the distributed file caching method of any one of the first aspects.
According to the technical scheme, the method has the following advantages:
the application provides a distributed file caching method, which comprises the following steps: when a simulation task is received, judging whether a data file required by the simulation task is cached locally or not, if so, using the locally cached data file to perform the simulation task, and if not, downloading the data file from a data center to locally perform the simulation task; updating the target attribute of the data file, predicting whether the data file should be cached locally, a hard disk for caching the data file and the cache life according to the target attribute by a preset multi-task prediction model, and obtaining a cache strategy result of the data file, wherein the target attribute comprises a file attribute, a network attribute, a display card cluster attribute and a simulation task attribute; and processing the data file according to the cache strategy result of the data file.
According to the method, if a data file required by a simulation task is cached locally, the data is directly obtained locally to perform the simulation task, if the data file is not cached locally, the data file is downloaded from a data center to the local to perform the simulation task, the file attribute, the network attribute, the video card cluster attribute and the simulation task attribute of the data file are updated during simulation, whether the data file is cached locally or not is predicted according to the target attribute of the data file through a preset multi-task prediction model, the cached hard disk and the cached service life are obtained, the caching strategy result of the data file is obtained, caching processing is performed on the data file through the caching strategy result, the caching strategy result is obtained by taking multi-dimensional information such as the file attribute, the network attribute, the video card cluster attribute and the simulation task attribute of the data file into consideration, the hit rate of the locally cached file is improved, repeated and multi-time data file transmission can be avoided, loss of the network and the hard disk is reduced, the simulation task efficiency and the completion degree are improved, and the technical problem that when a large amount of training data is repeatedly transmitted from the data center to the video card cluster is solved during large-scale simulation task, and the simulation task fails, and the problem of the simulation task is even caused by the simulation task failure of the simulation task.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic flowchart of a distributed file caching method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a network structure of a preset multitask prediction model according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a distributed file cache system according to an embodiment of the present application.
Detailed Description
The application provides a distributed file caching method, a distributed file caching system, distributed file caching equipment and a distributed file caching storage medium, which are used for solving the technical problems that a large amount of training data can be repeatedly transmitted from a data center to a video card cluster for multiple times when a large-scale simulation task is performed in the prior art, so that the speed of the simulation task is influenced, and even the simulation task fails due to the data transmission problem.
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The present application contemplates that the data on which the simulation task depends will increase with the mileage traveled by the vehicle, the number of vehicles, and the iterative geometric multiple of the onboard hardware. Most enterprises can establish or use an internet data center to store the data, but a single display card cluster is often used for supporting the data during simulation, so that a large amount of training data can be repeatedly and repeatedly transmitted from the data center to the display card cluster, the speed of a simulation task is reduced due to the speed of a network, the fluctuation of the network, and IO bottlenecks of a data center and a display card cluster hard disk, and the simulation task fails due to the damage of the transmission of the training data. Therefore, how to quickly and intelligently realize the transmission, exchange and use of file data for large-scale simulation tasks has important significance for improving the training efficiency and effect of the model and reducing the fluctuation and the failure of the tasks caused by network or disk IO.
In order to solve the above problem, please refer to fig. 1, an embodiment of the present application provides a distributed file caching method, including:
step 101, when receiving a simulation task, judging whether a data file required by the simulation task is cached locally, if so, using the locally cached data file to perform the simulation task, and if not, downloading the data file from a data center to the local to perform the simulation task.
When a simulation task is received, whether a data file required by the simulation task is cached locally is judged, if so, the data file cached locally is directly used for the simulation task, if not, the data file is downloaded from a data center to the local, and the data file downloaded from the data center for the first time can be cached in a solid state disk with a higher speed for the current simulation task.
And 102, updating target attributes of the data file, predicting whether the data file should be cached locally or not according to the target attributes, a hard disk for caching the data file and the cache life by a preset multi-task prediction model, and obtaining a cache strategy result of the data file, wherein the target attributes comprise file attributes, network attributes, display card cluster attributes and simulation task attributes.
When a simulation task is carried out through a data file, target attributes of the data file are asynchronously updated, wherein the target attributes comprise file attributes, network attributes, display card cluster attributes and simulation task attributes, and the file attributes can comprise file size, file owner, file access frequency and/or file creation time and other attributes of the file; the network attribute may include network speed, network packet loss rate and/or network delay and other attributes related to the network condition; the display card cluster attributes comprise inherent attributes of some display card clusters such as display card cluster addresses, hard disk read-write speed, total hard disk capacity, used hard disk capacity and/or display card cluster health degree; the simulation task attributes may include some simulation task-related attributes such as a simulation task type, a simulation task required data set size, and/or a simulation task priority. And inputting the file attribute, the network attribute, the display card cluster attribute and the simulation task attribute of the updated data file into a preset multi-task prediction model to predict a cache strategy, and specifically predicting whether the data file should be cached locally or not, and predicting the cache service life of a cached hard disk so as to obtain a cache strategy result of the data file.
The preset multi-task prediction model in the embodiment of the application is composed of a discrimination model, a file importance model and a file life model which are arranged in parallel, and a specific network structure can refer to fig. 2; the judging model is used for predicting whether the data file is cached locally or not according to the target attribute of the data file to obtain a cache judging result; the file importance model is used for predicting the hard disk cached by the data file according to the target attribute of the data file to obtain a cached hard disk position result, and the hard disk comprises a solid state hard disk and a mechanical hard disk; the file life model is used for predicting the local cache life of the data file according to the target attribute of the data file to obtain a cache life result.
After the target attribute of the data file is input into a preset multitask prediction model, a judgment model judges whether the data file is cached locally or not according to the target attribute of the data file to obtain a cache judgment result, a file importance model predicts whether the data file is cached in a mechanical hard disk with a slower speed or a solid state hard disk with a faster speed on the premise of determining that the data file is cached locally to obtain a cache hard disk position result, and meanwhile, a file life model predicts the cache life of the data file in the local according to the target attribute of the data file, namely, when the data file is deleted in a local cache, a cache life result is obtained, namely, a cache strategy result comprises a cache judgment result, a cache hard disk position result and a cache life result.
Further, the configuration process of the preset multi-task prediction model in the embodiment of the present application is as follows:
constructing a multi-task learning network, wherein the multi-task learning network is formed by three sub-convolution neural networks which are arranged in parallel;
acquiring a training sample, wherein the training sample comprises target attributes of a plurality of files and corresponding cache strategy tags, and the cache strategy tags comprise three sub-tags of cache tags, cache hard disk position tags and cache life tags;
inputting the training samples into a multi-task learning network for multi-task learning to obtain sub-prediction results output by each sub-convolution neural network, wherein network parameters are shared among the sub-convolution neural networks;
and adjusting network parameters of the multi-task learning network according to the sub-prediction results of each sub-convolution neural network and the corresponding sub-labels until the multi-task learning network converges to obtain a trained preset multi-task prediction model.
The method comprises the steps of judging whether the data files are cached locally, extracting the importance of the files, selecting the cached hard disks and predicting the service life of the files, processing the three tasks in parallel, using a multi-task learning method in deep learning, and sharing input data and bottom layer characteristics to enable different tasks to be mutually associated and influenced so as to obtain a better file caching strategy and help to decide whether the data files are cached locally, what hard disks are cached and how long the service life of the data files is cached locally. The multitask learning network constructed in the embodiment of the application comprises three parallel sub-convolution neural networks, each sub-convolution neural network can be of an existing network structure, such as a residual error network and a lightweight network, a training sample is input into the multitask learning network for multitask learning, a first sub-convolution neural network can perform feature extraction on a target attribute to judge whether the training sample is cached locally or not to obtain a first sub-prediction result, a second sub-convolution neural network can perform feature extraction on the target attribute, when the first sub-convolution neural network judges that the training sample is cached locally, whether the training sample is cached in a mechanical hard disk or a solid hard disk is judged to obtain a second prediction result, a third sub-convolution neural network performs feature extraction on the target attribute to predict the local cache life of the training sample to obtain a third sub-prediction result, then a loss value is calculated according to each sub-prediction result and a corresponding sub-label, network parameters are updated reversely through the loss value until the multitask learning network converges (if the iteration number reaches the maximum iteration number, or the training error is lower than a preset threshold value), and the second sub-convolution neural network is a preset training error of the multitask learning network, namely the second important learning model.
In the embodiment of the application, when the caching strategy of the data file is obtained, the file attribute, the network attribute, the video card cluster attribute and the simulation task attribute of the data file are considered and are jointly input into a preset multi-task prediction model for prediction, three groups of different outputs are obtained by the same input through three different parallel deep learning models, so that the caching strategy of the data file is generated intelligently, plurally and more flexibly, the efficiency and the success rate of the simulation task are improved, compared with the traditional caching methods of an algorithm which is not used for the longest time recently, an algorithm which is used for the least recently and a first-in first-out algorithm, the information considered in the embodiment of the application is multi-dimensional and more comprehensive, and the obtained caching strategy is more comprehensive and reliable.
And 103, processing the data file according to the caching strategy result of the data file.
If the cache judgment result is that the data file is not cached locally, deleting the local data file after the simulation task is finished; if the cache judging result is that the data file is cached locally, caching the data file to a corresponding hard disk according to the cache hard disk position result, and setting the file life of the data file according to the cache life result.
In the embodiment of the application, if a data file required by a simulation task is cached locally, the data is directly acquired locally to perform the simulation task, if the data file is not cached locally, the data file is downloaded from a data center to the local to perform the simulation task, and the file attribute, the network attribute, the video card cluster attribute and the simulation task attribute of the data file are updated during simulation, whether the data file is cached locally or not is predicted according to the target attribute of the data file through a preset multitask prediction model, the cached hard disk and the cache life are obtained, so that the cache strategy result of the data file is acquired through the cache strategy result, the cache strategy result is acquired by taking multi-dimensional information of the file attribute, the network attribute, the video card cluster attribute, the simulation task attribute and the like of the data file into consideration, the hit rate of the locally cached file is improved, the problem that the data file is repeatedly transmitted for many times, the loss of the network and the hard disk is reduced, the efficiency and the completion degree of the simulation task are improved, and the problem that a large amount of data can be repeatedly transmitted from the data center to the video card cluster for many times when the simulation task is performed in the prior art is solved, and the problem that the simulation task is even the problem that the simulation task fails.
The foregoing is an embodiment of a distributed file caching method provided by the present application, and the following is an embodiment of a distributed file caching system provided by the present application.
Referring to fig. 3, an embodiment of the present application provides a distributed file caching system, including:
the judging module is used for judging whether a data file required by the simulation task is cached locally or not when the simulation task is received, if so, the data file cached locally is used for carrying out the simulation task, and if not, the data file is downloaded from the data center to the local for carrying out the simulation task;
the cache strategy prediction module is used for updating the target attribute of the data file, predicting whether the data file is cached locally or not according to the target attribute, predicting the hard disk of the data file cache and predicting the cache life of the data file through a preset multi-task prediction model, and obtaining a cache strategy result of the data file, wherein the target attribute comprises a file attribute, a network attribute, a display card cluster attribute and a simulation task attribute;
and the processing module is used for processing the data file according to the cache strategy result of the data file.
The preset multi-task prediction model is formed by a parallel discrimination model, a file importance model and a file life model, and the cache strategy result comprises a cache discrimination result, a cache hard disk position result and a cache life result;
the judging model is used for predicting whether the data file should be cached locally or not according to the target attribute of the data file to obtain a cache judging result;
the file importance model is used for predicting the hard disk cached by the data file according to the target attribute of the data file to obtain a cached hard disk position result, and the hard disk comprises a solid state hard disk and a mechanical hard disk;
the file life model is used for predicting the local cache life of the data file according to the target attribute of the data file to obtain a cache life result.
As a further improvement, the processing module is specifically configured to:
if the cache judgment result is that the data file is not cached locally, deleting the local data file after the simulation task is finished;
if the cache judging result is that the data file is cached locally, caching the data file to a corresponding hard disk according to the cache hard disk position result, and setting the file life of the data file according to the cache life result.
As a further refinement, the file attributes include file size, file owner, file access frequency and/or file creation time;
the network attribute comprises network speed, network packet loss rate and/or network delay;
the display card cluster attributes comprise a display card cluster address, a hard disk read-write speed, a total hard disk capacity, a used hard disk capacity and/or a display card cluster health degree;
the simulation task attributes include a simulation task type and/or a simulation task priority.
In the embodiment of the application, if a data file required by a simulation task is cached locally, the data is directly acquired locally to perform the simulation task, if the data file is not cached locally, the data file is downloaded from a data center to the local to perform the simulation task, and the file attribute, the network attribute, the video card cluster attribute and the simulation task attribute of the data file are updated during simulation, whether the data file is cached locally or not is predicted according to the target attribute of the data file through a preset multitask prediction model, the cached hard disk and the cache life are obtained, so that the cache strategy result of the data file is acquired through the cache strategy result, the cache strategy result is acquired by taking multi-dimensional information of the file attribute, the network attribute, the video card cluster attribute, the simulation task attribute and the like of the data file into consideration, the hit rate of the locally cached file is improved, the problem that the data file is repeatedly transmitted for many times, the loss of the network and the hard disk is reduced, the efficiency and the completion degree of the simulation task are improved, and the problem that a large amount of data can be repeatedly transmitted from the data center to the video card cluster for many times when the simulation task is performed in the prior art is solved, and the problem that the simulation task is even the problem that the simulation task fails.
The embodiment of the application also provides distributed file caching equipment, which comprises a processor and a memory;
the memory is used for storing the program codes and transmitting the program codes to the processor;
the processor is configured to execute the distributed file caching method in the foregoing method embodiments according to instructions in the program code.
The embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium is used for storing program codes, and when the program codes are executed by a processor, the distributed file caching method in the foregoing method embodiments is implemented.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the module described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b and c may be single or plural.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, indirect coupling or communication connection between devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for executing all or part of the steps of the method described in the embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device). And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A distributed file caching method is characterized by comprising the following steps:
when a simulation task is received, judging whether a data file required by the simulation task is cached locally or not, if so, using the locally cached data file to perform the simulation task, and if not, downloading the data file from a data center to locally perform the simulation task;
updating the target attribute of the data file, predicting whether the data file is cached locally, a hard disk for caching the data file and the cache life according to the target attribute by a preset multi-task prediction model, and obtaining a cache strategy result of the data file, wherein the target attribute comprises a file attribute, a network attribute, a display card cluster attribute and a simulation task attribute;
and processing the data file according to the cache strategy result of the data file.
2. The distributed file caching method according to claim 1, wherein the preset multitask prediction model is composed of a discrimination model, a file importance model and a file life model which are arranged in parallel, and the caching strategy result comprises a caching discrimination result, a caching hard disk position result and a caching life result;
the discrimination model is used for predicting whether the data file is cached locally according to the target attribute of the data file to obtain a cache discrimination result;
the file importance model is used for predicting the hard disk cached by the data file according to the target attribute of the data file to obtain a cached hard disk position result, and the hard disk comprises a solid state hard disk and a mechanical hard disk;
the file life model is used for predicting the local cache life of the data file according to the target attribute of the data file to obtain a cache life result.
3. The distributed file caching method according to claim 2, wherein the processing the data file according to the caching policy result of the data file includes:
if the cache judging result is that the data file is not cached locally, deleting the local data file after the simulation task is finished;
if the cache judging result is that the data file is cached locally, caching the data file to a corresponding hard disk according to the cache hard disk position result, and setting the file life of the data file according to the cache life result.
4. The distributed file caching method according to claim 2, wherein the preset multitask prediction model is configured by:
constructing a multi-task learning network, wherein the multi-task learning network is formed by three sub-convolution neural networks which are arranged in parallel;
acquiring a training sample, wherein the training sample comprises target attributes of a plurality of files and corresponding cache strategy tags, and the cache strategy tags comprise three sub-tags of cache tags, cache hard disk position tags and cache life tags;
inputting the training samples into the multi-task learning network for multi-task learning to obtain sub-prediction results output by each sub-convolution neural network, wherein network parameters are shared among the sub-convolution neural networks;
and adjusting the network parameters of the multi-task learning network according to the sub-prediction results of each sub-convolution neural network and the corresponding sub-labels until the multi-task learning network converges to obtain a trained preset multi-task prediction model.
5. The distributed file caching method according to claim 1, wherein the file attributes comprise file size, file owner, file access frequency and/or file creation time;
the network attribute comprises network speed, network packet loss rate and/or network delay;
the video card cluster attributes comprise video card cluster addresses, hard disk read-write speed, total hard disk capacity, used hard disk capacity and/or video card cluster health degree;
the simulation task attributes include a simulation task type and/or a simulation task priority.
6. A distributed file caching system, comprising:
the judging module is used for judging whether a data file required by the simulation task is cached locally or not when the simulation task is received, if so, the data file cached locally is used for carrying out the simulation task, and if not, the data file is downloaded from a data center to the local for carrying out the simulation task;
the cache strategy prediction module is used for updating the target attribute of the data file, predicting whether the data file is cached locally, a hard disk for caching the data file and the cache life according to the target attribute through a preset multi-task prediction model, and obtaining a cache strategy result of the data file, wherein the target attribute comprises a file attribute, a network attribute, a display card cluster attribute and a simulation task attribute;
and the processing module is used for processing the data file according to the cache strategy result of the data file.
7. The distributed file caching system according to claim 6, wherein the preset multitask prediction model is composed of a discrimination model, a file importance model and a file life model which are arranged in parallel, and the caching strategy result comprises a caching discrimination result, a caching hard disk position result and a caching life result;
the discrimination model is used for predicting whether the data file is cached locally according to the target attribute of the data file to obtain a cache discrimination result;
the file importance model is used for predicting a hard disk cached by the data file according to the target attribute of the data file to obtain a cached hard disk position result, and the hard disk comprises a solid state hard disk and a mechanical hard disk;
the file life model is used for predicting the local cache life of the data file according to the target attribute of the data file to obtain a cache life result.
8. The distributed file caching system of claim 7, wherein the processing module is specifically configured to:
if the cache judging result is that the data file is not cached locally, deleting the local data file after the simulation task is finished;
if the cache judging result is that the data file is cached locally, caching the data file to a corresponding hard disk according to the cache hard disk position result, and setting the file life of the data file according to the cache life result.
9. A distributed file caching apparatus, comprising a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the distributed file caching method according to any one of claims 1 to 5 according to instructions in the program code.
10. A computer-readable storage medium for storing program code, which when executed by a processor implements the distributed file caching method of any one of claims 1 to 5.
CN202210887021.4A 2022-07-26 2022-07-26 Distributed file caching method, system, equipment and storage medium Pending CN115269522A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210887021.4A CN115269522A (en) 2022-07-26 2022-07-26 Distributed file caching method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210887021.4A CN115269522A (en) 2022-07-26 2022-07-26 Distributed file caching method, system, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115269522A true CN115269522A (en) 2022-11-01

Family

ID=83768930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210887021.4A Pending CN115269522A (en) 2022-07-26 2022-07-26 Distributed file caching method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115269522A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737606A (en) * 2023-08-15 2023-09-12 英诺达(成都)电子科技有限公司 Data caching method, device, equipment and medium based on hardware simulation accelerator

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737606A (en) * 2023-08-15 2023-09-12 英诺达(成都)电子科技有限公司 Data caching method, device, equipment and medium based on hardware simulation accelerator
CN116737606B (en) * 2023-08-15 2023-12-05 英诺达(成都)电子科技有限公司 Data caching method, device, equipment and medium based on hardware simulation accelerator

Similar Documents

Publication Publication Date Title
CN108134691B (en) Model building method, Internet resources preload method, apparatus, medium and terminal
CN103197899B (en) Life and performance enhancement of storage based on flash memory
CN110598802A (en) Memory detection model training method, memory detection method and device
CN114219097B (en) Federal learning training and predicting method and system based on heterogeneous resources
US20120246411A1 (en) Cache eviction using memory entry value
EP4203437A1 (en) Data set and node cache-based scheduling method and device
CN103856516B (en) Data storage, read method and data storage, reading device
CN104133783B (en) Method and device for processing distributed cache data
CN115269522A (en) Distributed file caching method, system, equipment and storage medium
CN108694188A (en) A kind of newer method of index data and relevant apparatus
CN116578593A (en) Data caching method, system, device, computer equipment and storage medium
CN109542612A (en) A kind of hot spot keyword acquisition methods, device and server
CN117194502B (en) Database content cache replacement method based on long-term and short-term memory network
CN116862580A (en) Short message reaching time prediction method and device, computer equipment and storage medium
CN117370058A (en) Service processing method, device, electronic equipment and computer readable medium
CN109189696B (en) SSD (solid State disk) caching system and caching method
CN116089477A (en) Distributed training method and system
US9268809B2 (en) Method and system for document update
CN116391177A (en) Prioritized inactive memory device updates
CN114025017A (en) Network edge caching method, device and equipment based on deep cycle reinforcement learning
CN113885801A (en) Memory data processing method and device
CN110362769A (en) A kind of data processing method and device
US20240005146A1 (en) Extraction of high-value sequential patterns using reinforcement learning techniques
CN111813711B (en) Method and device for reading training sample data, storage medium and electronic equipment
CN113296710B (en) Cloud storage data reading method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination