CN115269522A

CN115269522A - Distributed file caching method, system, equipment and storage medium

Info

Publication number: CN115269522A
Application number: CN202210887021.4A
Authority: CN
Inventors: 赵铖皓; 王红宾; 胡方炜; 陈飞; 谭伟华
Original assignee: Guangzhou Weride Technology Co Ltd
Current assignee: Guangzhou Weride Technology Co Ltd
Priority date: 2022-07-26
Filing date: 2022-07-26
Publication date: 2022-11-01

Abstract

The application discloses a distributed file caching method, a distributed file caching system, a distributed file caching device and a storage medium, wherein when a simulation task is received, whether a data file required by the simulation task is cached locally is judged, if yes, the local data file is used for carrying out the simulation task, and if not, the data file is downloaded from a data center to the local for carrying out the simulation task; updating target attributes of the data file, including file attributes, network attributes, display card cluster attributes and simulation task attributes, and predicting whether the data file should be cached in a local hard disk or a cached hard disk and the cache life according to the target attributes by a preset multi-task prediction model to obtain a cache strategy result of the data file; the data files are processed according to the cache strategy result, and the technical problems that a large amount of training data can be repeatedly transmitted from the data center to the video card cluster for many times when a large-scale simulation task is performed in the prior art, so that the speed of the simulation task is influenced, and even the simulation task fails due to the data transmission problem are solved.

Description

Distributed file caching method, system, equipment and storage medium

Technical Field

The present application relates to the field of data storage technologies, and in particular, to a distributed file caching method, system, device, and storage medium.

Background

In the field of automatic driving, a simulation task is a relatively important link, and data depended by the simulation task can increase along with the mileage of vehicle driving, the increase of the number of vehicles and the increase of the iteration geometric times of vehicle-mounted hardware. Most enterprises can establish or use an internet data center to store the data, but a single display card cluster is often used for supporting the data during simulation, so that a large amount of training data can be repeatedly transmitted from the data center to the display card cluster for many times, the speed of a simulation task can be reduced, and even the simulation task fails due to the data transmission problem. Therefore, for a large-scale simulation task, how to quickly implement transmission, exchange, and use of file data to increase the speed of the simulation task is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The application provides a distributed file caching method, a distributed file caching system, distributed file caching equipment and a distributed file caching storage medium, which are used for solving the technical problems that a large amount of training data can be repeatedly transmitted from a data center to a video card cluster for multiple times when a large-scale simulation task is performed in the prior art, so that the speed of the simulation task is influenced, and even the simulation task fails due to the data transmission problem.

In view of this, a first aspect of the present application provides a distributed file caching method, including:

when a simulation task is received, judging whether a data file required by the simulation task is cached locally or not, if so, using the locally cached data file to perform the simulation task, and if not, downloading the data file from a data center to locally perform the simulation task;

updating the target attribute of the data file, predicting whether the data file is cached locally, a hard disk for caching the data file and the cache life according to the target attribute by a preset multi-task prediction model, and obtaining a cache strategy result of the data file, wherein the target attribute comprises a file attribute, a network attribute, a display card cluster attribute and a simulation task attribute;

and processing the data file according to the cache strategy result of the data file.

Optionally, the preset multitask prediction model is composed of a discrimination model, a file importance model and a file life model which are parallel to each other, and the cache strategy result includes a cache discrimination result, a cache hard disk position result and a cache life result;

the discrimination model is used for predicting whether the data file is cached locally according to the target attribute of the data file to obtain a cache discrimination result;

the file importance model is used for predicting a hard disk cached by the data file according to the target attribute of the data file to obtain a cached hard disk position result, and the hard disk comprises a solid state hard disk and a mechanical hard disk;

the file life model is used for predicting the local cache life of the data file according to the target attribute of the data file to obtain a cache life result.

Optionally, the processing the data file according to the result of the caching policy of the data file includes:

if the cache judging result is that the data file is not cached locally, deleting the local data file after the simulation task is finished;

if the cache judging result is that the data file is cached locally, caching the data file to a corresponding hard disk according to the cache hard disk position result, and setting the file life of the data file according to the cache life result.

Optionally, the configuration process of the preset multitask prediction model is as follows:

constructing a multi-task learning network, wherein the multi-task learning network is formed by three sub-convolution neural networks which are arranged in parallel;

acquiring a training sample, wherein the training sample comprises target attributes of a plurality of files and corresponding cache strategy tags, and the cache strategy tags comprise three sub-tags of cache tags, cache hard disk position tags and cache life tags;

inputting the training samples into the multi-task learning network for multi-task learning to obtain sub-prediction results output by each sub-convolution neural network, wherein network parameters are shared among the sub-convolution neural networks;

and adjusting the network parameters of the multi-task learning network according to the sub-prediction results of each sub-convolution neural network and the corresponding sub-labels until the multi-task learning network converges to obtain a trained preset multi-task prediction model.

Optionally, the file attribute includes a file size, a file owner, a file access frequency, and/or a file creation time;

the network attribute comprises network speed, network packet loss rate and/or network delay;

the video card cluster attributes comprise video card cluster addresses, hard disk read-write speed, total hard disk capacity, used hard disk capacity and/or video card cluster health degree;

the simulation task attributes include a simulation task type and/or a simulation task priority.

A second aspect of the present application provides a distributed file caching system, including:

the judging module is used for judging whether a data file required by the simulation task is cached locally or not when the simulation task is received, if so, the data file cached locally is used for carrying out the simulation task, and if not, the data file is downloaded from a data center to the local for carrying out the simulation task;

the cache strategy prediction module is used for updating the target attribute of the data file, predicting whether the data file is cached locally, a hard disk for caching the data file and the cache life according to the target attribute through a preset multi-task prediction model, and obtaining a cache strategy result of the data file, wherein the target attribute comprises a file attribute, a network attribute, a display card cluster attribute and a simulation task attribute;

and the processing module is used for processing the data file according to the cache strategy result of the data file.

the file importance model is used for predicting the hard disk cached by the data file according to the target attribute of the data file to obtain a cached hard disk position result, and the hard disk comprises a solid state hard disk and a mechanical hard disk;

Optionally, the processing module is specifically configured to:

A third aspect of the present application provides a distributed file caching apparatus, the apparatus comprising a processor and a memory;

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to execute the distributed file caching method according to any one of the first aspect according to instructions in the program code.

A fourth aspect of the present application provides a computer-readable storage medium for storing program code, which when executed by a processor, implements the distributed file caching method of any one of the first aspects.

According to the technical scheme, the method has the following advantages:

the application provides a distributed file caching method, which comprises the following steps: when a simulation task is received, judging whether a data file required by the simulation task is cached locally or not, if so, using the locally cached data file to perform the simulation task, and if not, downloading the data file from a data center to locally perform the simulation task; updating the target attribute of the data file, predicting whether the data file should be cached locally, a hard disk for caching the data file and the cache life according to the target attribute by a preset multi-task prediction model, and obtaining a cache strategy result of the data file, wherein the target attribute comprises a file attribute, a network attribute, a display card cluster attribute and a simulation task attribute; and processing the data file according to the cache strategy result of the data file.

According to the method, if a data file required by a simulation task is cached locally, the data is directly obtained locally to perform the simulation task, if the data file is not cached locally, the data file is downloaded from a data center to the local to perform the simulation task, the file attribute, the network attribute, the video card cluster attribute and the simulation task attribute of the data file are updated during simulation, whether the data file is cached locally or not is predicted according to the target attribute of the data file through a preset multi-task prediction model, the cached hard disk and the cached service life are obtained, the caching strategy result of the data file is obtained, caching processing is performed on the data file through the caching strategy result, the caching strategy result is obtained by taking multi-dimensional information such as the file attribute, the network attribute, the video card cluster attribute and the simulation task attribute of the data file into consideration, the hit rate of the locally cached file is improved, repeated and multi-time data file transmission can be avoided, loss of the network and the hard disk is reduced, the simulation task efficiency and the completion degree are improved, and the technical problem that when a large amount of training data is repeatedly transmitted from the data center to the video card cluster is solved during large-scale simulation task, and the simulation task fails, and the problem of the simulation task is even caused by the simulation task failure of the simulation task.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic flowchart of a distributed file caching method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a network structure of a preset multitask prediction model according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a distributed file cache system according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The present application contemplates that the data on which the simulation task depends will increase with the mileage traveled by the vehicle, the number of vehicles, and the iterative geometric multiple of the onboard hardware. Most enterprises can establish or use an internet data center to store the data, but a single display card cluster is often used for supporting the data during simulation, so that a large amount of training data can be repeatedly and repeatedly transmitted from the data center to the display card cluster, the speed of a simulation task is reduced due to the speed of a network, the fluctuation of the network, and IO bottlenecks of a data center and a display card cluster hard disk, and the simulation task fails due to the damage of the transmission of the training data. Therefore, how to quickly and intelligently realize the transmission, exchange and use of file data for large-scale simulation tasks has important significance for improving the training efficiency and effect of the model and reducing the fluctuation and the failure of the tasks caused by network or disk IO.

In order to solve the above problem, please refer to fig. 1, an embodiment of the present application provides a distributed file caching method, including:

step 101, when receiving a simulation task, judging whether a data file required by the simulation task is cached locally, if so, using the locally cached data file to perform the simulation task, and if not, downloading the data file from a data center to the local to perform the simulation task.

When a simulation task is received, whether a data file required by the simulation task is cached locally is judged, if so, the data file cached locally is directly used for the simulation task, if not, the data file is downloaded from a data center to the local, and the data file downloaded from the data center for the first time can be cached in a solid state disk with a higher speed for the current simulation task.

And 102, updating target attributes of the data file, predicting whether the data file should be cached locally or not according to the target attributes, a hard disk for caching the data file and the cache life by a preset multi-task prediction model, and obtaining a cache strategy result of the data file, wherein the target attributes comprise file attributes, network attributes, display card cluster attributes and simulation task attributes.

When a simulation task is carried out through a data file, target attributes of the data file are asynchronously updated, wherein the target attributes comprise file attributes, network attributes, display card cluster attributes and simulation task attributes, and the file attributes can comprise file size, file owner, file access frequency and/or file creation time and other attributes of the file; the network attribute may include network speed, network packet loss rate and/or network delay and other attributes related to the network condition; the display card cluster attributes comprise inherent attributes of some display card clusters such as display card cluster addresses, hard disk read-write speed, total hard disk capacity, used hard disk capacity and/or display card cluster health degree; the simulation task attributes may include some simulation task-related attributes such as a simulation task type, a simulation task required data set size, and/or a simulation task priority. And inputting the file attribute, the network attribute, the display card cluster attribute and the simulation task attribute of the updated data file into a preset multi-task prediction model to predict a cache strategy, and specifically predicting whether the data file should be cached locally or not, and predicting the cache service life of a cached hard disk so as to obtain a cache strategy result of the data file.

The preset multi-task prediction model in the embodiment of the application is composed of a discrimination model, a file importance model and a file life model which are arranged in parallel, and a specific network structure can refer to fig. 2; the judging model is used for predicting whether the data file is cached locally or not according to the target attribute of the data file to obtain a cache judging result; the file importance model is used for predicting the hard disk cached by the data file according to the target attribute of the data file to obtain a cached hard disk position result, and the hard disk comprises a solid state hard disk and a mechanical hard disk; the file life model is used for predicting the local cache life of the data file according to the target attribute of the data file to obtain a cache life result.

After the target attribute of the data file is input into a preset multitask prediction model, a judgment model judges whether the data file is cached locally or not according to the target attribute of the data file to obtain a cache judgment result, a file importance model predicts whether the data file is cached in a mechanical hard disk with a slower speed or a solid state hard disk with a faster speed on the premise of determining that the data file is cached locally to obtain a cache hard disk position result, and meanwhile, a file life model predicts the cache life of the data file in the local according to the target attribute of the data file, namely, when the data file is deleted in a local cache, a cache life result is obtained, namely, a cache strategy result comprises a cache judgment result, a cache hard disk position result and a cache life result.

Further, the configuration process of the preset multi-task prediction model in the embodiment of the present application is as follows:

inputting the training samples into a multi-task learning network for multi-task learning to obtain sub-prediction results output by each sub-convolution neural network, wherein network parameters are shared among the sub-convolution neural networks;

and adjusting network parameters of the multi-task learning network according to the sub-prediction results of each sub-convolution neural network and the corresponding sub-labels until the multi-task learning network converges to obtain a trained preset multi-task prediction model.

The method comprises the steps of judging whether the data files are cached locally, extracting the importance of the files, selecting the cached hard disks and predicting the service life of the files, processing the three tasks in parallel, using a multi-task learning method in deep learning, and sharing input data and bottom layer characteristics to enable different tasks to be mutually associated and influenced so as to obtain a better file caching strategy and help to decide whether the data files are cached locally, what hard disks are cached and how long the service life of the data files is cached locally. The multitask learning network constructed in the embodiment of the application comprises three parallel sub-convolution neural networks, each sub-convolution neural network can be of an existing network structure, such as a residual error network and a lightweight network, a training sample is input into the multitask learning network for multitask learning, a first sub-convolution neural network can perform feature extraction on a target attribute to judge whether the training sample is cached locally or not to obtain a first sub-prediction result, a second sub-convolution neural network can perform feature extraction on the target attribute, when the first sub-convolution neural network judges that the training sample is cached locally, whether the training sample is cached in a mechanical hard disk or a solid hard disk is judged to obtain a second prediction result, a third sub-convolution neural network performs feature extraction on the target attribute to predict the local cache life of the training sample to obtain a third sub-prediction result, then a loss value is calculated according to each sub-prediction result and a corresponding sub-label, network parameters are updated reversely through the loss value until the multitask learning network converges (if the iteration number reaches the maximum iteration number, or the training error is lower than a preset threshold value), and the second sub-convolution neural network is a preset training error of the multitask learning network, namely the second important learning model.

In the embodiment of the application, when the caching strategy of the data file is obtained, the file attribute, the network attribute, the video card cluster attribute and the simulation task attribute of the data file are considered and are jointly input into a preset multi-task prediction model for prediction, three groups of different outputs are obtained by the same input through three different parallel deep learning models, so that the caching strategy of the data file is generated intelligently, plurally and more flexibly, the efficiency and the success rate of the simulation task are improved, compared with the traditional caching methods of an algorithm which is not used for the longest time recently, an algorithm which is used for the least recently and a first-in first-out algorithm, the information considered in the embodiment of the application is multi-dimensional and more comprehensive, and the obtained caching strategy is more comprehensive and reliable.

And 103, processing the data file according to the caching strategy result of the data file.

If the cache judgment result is that the data file is not cached locally, deleting the local data file after the simulation task is finished; if the cache judging result is that the data file is cached locally, caching the data file to a corresponding hard disk according to the cache hard disk position result, and setting the file life of the data file according to the cache life result.

In the embodiment of the application, if a data file required by a simulation task is cached locally, the data is directly acquired locally to perform the simulation task, if the data file is not cached locally, the data file is downloaded from a data center to the local to perform the simulation task, and the file attribute, the network attribute, the video card cluster attribute and the simulation task attribute of the data file are updated during simulation, whether the data file is cached locally or not is predicted according to the target attribute of the data file through a preset multitask prediction model, the cached hard disk and the cache life are obtained, so that the cache strategy result of the data file is acquired through the cache strategy result, the cache strategy result is acquired by taking multi-dimensional information of the file attribute, the network attribute, the video card cluster attribute, the simulation task attribute and the like of the data file into consideration, the hit rate of the locally cached file is improved, the problem that the data file is repeatedly transmitted for many times, the loss of the network and the hard disk is reduced, the efficiency and the completion degree of the simulation task are improved, and the problem that a large amount of data can be repeatedly transmitted from the data center to the video card cluster for many times when the simulation task is performed in the prior art is solved, and the problem that the simulation task is even the problem that the simulation task fails.

The foregoing is an embodiment of a distributed file caching method provided by the present application, and the following is an embodiment of a distributed file caching system provided by the present application.

Referring to fig. 3, an embodiment of the present application provides a distributed file caching system, including:

the judging module is used for judging whether a data file required by the simulation task is cached locally or not when the simulation task is received, if so, the data file cached locally is used for carrying out the simulation task, and if not, the data file is downloaded from the data center to the local for carrying out the simulation task;

the cache strategy prediction module is used for updating the target attribute of the data file, predicting whether the data file is cached locally or not according to the target attribute, predicting the hard disk of the data file cache and predicting the cache life of the data file through a preset multi-task prediction model, and obtaining a cache strategy result of the data file, wherein the target attribute comprises a file attribute, a network attribute, a display card cluster attribute and a simulation task attribute;

The preset multi-task prediction model is formed by a parallel discrimination model, a file importance model and a file life model, and the cache strategy result comprises a cache discrimination result, a cache hard disk position result and a cache life result;

the judging model is used for predicting whether the data file should be cached locally or not according to the target attribute of the data file to obtain a cache judging result;

As a further improvement, the processing module is specifically configured to:

if the cache judgment result is that the data file is not cached locally, deleting the local data file after the simulation task is finished;

As a further refinement, the file attributes include file size, file owner, file access frequency and/or file creation time;

the display card cluster attributes comprise a display card cluster address, a hard disk read-write speed, a total hard disk capacity, a used hard disk capacity and/or a display card cluster health degree;

The embodiment of the application also provides distributed file caching equipment, which comprises a processor and a memory;

the memory is used for storing the program codes and transmitting the program codes to the processor;

the processor is configured to execute the distributed file caching method in the foregoing method embodiments according to instructions in the program code.

The embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium is used for storing program codes, and when the program codes are executed by a processor, the distributed file caching method in the foregoing method embodiments is implemented.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the module described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b and c may be single or plural.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, indirect coupling or communication connection between devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for executing all or part of the steps of the method described in the embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device). And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A distributed file caching method is characterized by comprising the following steps:

2. The distributed file caching method according to claim 1, wherein the preset multitask prediction model is composed of a discrimination model, a file importance model and a file life model which are arranged in parallel, and the caching strategy result comprises a caching discrimination result, a caching hard disk position result and a caching life result;

3. The distributed file caching method according to claim 2, wherein the processing the data file according to the caching policy result of the data file includes:

4. The distributed file caching method according to claim 2, wherein the preset multitask prediction model is configured by:

5. The distributed file caching method according to claim 1, wherein the file attributes comprise file size, file owner, file access frequency and/or file creation time;

6. A distributed file caching system, comprising:

7. The distributed file caching system according to claim 6, wherein the preset multitask prediction model is composed of a discrimination model, a file importance model and a file life model which are arranged in parallel, and the caching strategy result comprises a caching discrimination result, a caching hard disk position result and a caching life result;

8. The distributed file caching system of claim 7, wherein the processing module is specifically configured to:

9. A distributed file caching apparatus, comprising a processor and a memory;

the processor is configured to execute the distributed file caching method according to any one of claims 1 to 5 according to instructions in the program code.

10. A computer-readable storage medium for storing program code, which when executed by a processor implements the distributed file caching method of any one of claims 1 to 5.