CN117370272A - File management method, device, equipment and storage medium based on file heat - Google Patents

File management method, device, equipment and storage medium based on file heat Download PDF

Info

Publication number
CN117370272A
CN117370272A CN202311389337.1A CN202311389337A CN117370272A CN 117370272 A CN117370272 A CN 117370272A CN 202311389337 A CN202311389337 A CN 202311389337A CN 117370272 A CN117370272 A CN 117370272A
Authority
CN
China
Prior art keywords
archive
access
heat
file
files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311389337.1A
Other languages
Chinese (zh)
Inventor
梁尔真
袁学群
夏磊
陈平刚
郑望献
蔡利华
周蕾
曹军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Xinghan Information Technology Ltd By Share Ltd
Original Assignee
Zhejiang Xinghan Information Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Xinghan Information Technology Ltd By Share Ltd filed Critical Zhejiang Xinghan Information Technology Ltd By Share Ltd
Priority to CN202311389337.1A priority Critical patent/CN117370272A/en
Publication of CN117370272A publication Critical patent/CN117370272A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/185Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a file management method, device, equipment and storage medium based on file heat. The method comprises the steps of obtaining access record data of files to be managed in a preset past time period; inputting the access record data into a pre-trained LSTM model to perform access frequency prediction, and obtaining an access frequency prediction result; determining a predicted heat level of the file to be managed based on the access frequency prediction result and a preset access heat level; based on the predicted heat level, the files to be managed are moved to the corresponding solid state disk, mechanical hard disk or magnetic tape, so that reasonable distribution of storage resources is realized, the access efficiency of files with high access frequency is improved, and the overall storage cost of the files is reduced.

Description

File management method, device, equipment and storage medium based on file heat
Technical Field
The embodiment of the invention relates to a data processing technology, in particular to a file management method, a device, equipment and a storage medium based on file heat.
Background
In the information age, the rapid growth of data has become a normative state. Enterprises, organizations, and individuals are all faced with the challenge of handling large volumes of electronic files. These archives may include text documents, images, audio, video, and other data in a variety of formats. In handling such large amounts of data, efficient archive management becomes critical.
In most cases, the archive is not accessed uniformly. Some files may be accessed frequently while other files are rarely or hardly accessed. The traditional storage method is usually static, and is easy to cause (1) resource waste: storing all files in the same location results in the high-heat files and low-heat files occupying the same storage resources, wasting valuable storage space. (2) inefficient access: the high-heat files are stored in the same location as the low-heat files, which may result in slower access speeds of the high-heat files, as they compete with a large number of low-heat files for access to resources. (3) data management is complex: when backup, migration, or deletion is required, traditional methods may require manual intervention, adding to the complexity and cost of management.
Disclosure of Invention
The invention provides a file management method, device, equipment and storage medium based on file heat, so as to realize dynamic management of files, and enable the files to have higher access efficiency and resource utilization rate.
In a first aspect, an embodiment of the present invention provides a archive management method based on archive heat, including:
acquiring access record data of files to be managed in a preset past time period;
inputting the access record data into a pre-trained LSTM model to conduct access frequency prediction, and obtaining an access frequency prediction result;
determining the predicted heat level of the file to be managed based on the access frequency prediction result and a preset access heat level;
and moving the files to be managed to corresponding solid state disks, mechanical hard disks or magnetic tapes based on the predicted heat level.
Optionally, after the obtaining the access record data of the file to be managed in the preset past time period, the method includes:
carrying out structuring treatment on the access record data based on a preset data structure to obtain process access record data with a unified data structure;
and quantizing the process access record to obtain target access record data based on the one-hot coding.
Optionally, the pre-trained LSTM model includes:
processing a sample file of the LSTM model for training to obtain the access frequency to obtain a sample set;
initializing weights and deviations of a preset LSTM model based on randomized seeds;
and training and testing the LSTM model by using the sample set to obtain a target LSTM model which meets the consistency requirement and takes the access frequency of the file as an output target.
Optionally, the processing the sample file of the LSTM model for training the access frequency to obtain a training set and a test set includes:
taking historical access record data of files within a first preset time length as sample data, and taking file access frequency of a second preset time length after the first preset time length as a sample label of the sample data to obtain a training set and a test set which are composed of the sample data and the sample label.
Optionally, processing the sample file of the LSTM model for training to obtain the access frequency to obtain a sample set further includes:
the sample set was randomly partitioned into training and test sets using standard z-score normalization methods.
Optionally, a cross entropy loss function is selected in the LSTM model as a loss function in the training process.
Optionally, after training and testing the LSTM model by using the sample set to obtain a target LSTM model with the access frequency of the archive as an output target, the method further includes:
calculating Kappa coefficient and model accuracy of the target LSTM model;
and updating the target LSTM model based on a preset Kappa threshold value and an accuracy rate threshold value.
In a second aspect, an embodiment of the present invention further provides a archive management device based on archive heat, including:
the acquisition module is used for acquiring access record data of the files to be managed in a preset past time period;
the prediction module is used for inputting the access record data into a pre-trained LSTM model to perform access frequency prediction, and obtaining an access frequency prediction result;
the determining module is used for determining the predicted heat level of the file to be managed based on the access frequency prediction result and a preset access heat level;
and the execution module is used for moving the files to be managed to the corresponding solid state disk, mechanical hard disk or magnetic tape based on the predicted heat level.
In a third aspect, an embodiment of the present invention further provides a archive management device based on archive heat, where the device includes:
one or more processors;
a storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the archive management method based on archive heat as described in the first aspect.
In a fourth aspect, embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are for performing the archive management method based on archive heat as described in the first aspect.
According to the invention, access record data of the files to be managed in a preset past time period are obtained, access frequency prediction is carried out by utilizing a pre-trained LSTM model, an access frequency prediction result and a predicted heat level of the files to be managed are obtained, and the files to be managed are moved to corresponding solid state disks, mechanical hard disks or magnetic tapes based on the predicted heat level, so that reasonable allocation of storage resources is realized, the access efficiency of the files with high access frequency is improved, and the overall storage cost of the files is reduced.
Drawings
FIG. 1 is a flowchart of a file management method based on file hotness according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a file management apparatus based on file heat according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a file management apparatus based on file hotness according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Fig. 1 is a flowchart of a file management method based on file heat according to an embodiment of the present invention, where the embodiment is applicable to a case of a method for dynamically managing files, and the method may be executed by a file management device based on file heat, and specifically includes the following steps:
step 110, access record data of the files to be managed in a preset past time period are obtained.
With the development of the information age, businesses, organizations, and individuals are now in the daily business of creating, accessing, and processing a vast array of electronic files, which may include text documents, images, audio, video, and other data in a variety of formats. In handling such large amounts of data, efficient archive management becomes critical.
The heat (or access heat) of a profile is a key concept. It refers to the frequency with which files are accessed or used. In most cases, the archive is not accessed uniformly. Some files may be accessed frequently while other files are rarely or hardly accessed. Conventional storage methods are generally static, and they do not take into account the difference in the heat of the files, but store all files in the same location or device, and files with low access heat will seriously affect the user's access efficiency to files with high access heat when the user accesses the files. The high access heat files may be stored in a device with low access efficiency, while the low access heat files are stored in a device with high access efficiency, in which case the user accessing the high access heat files will be affected by the device with low access efficiency, severely reducing the efficiency of the user to obtain and access the target files.
In a specific implementation, when each archive performs an access operation, corresponding scheme record data is generated, and the data can record data such as archive information, user information, archive creation, opening and closing operations, and each time of file pointer movement and data reading and writing.
And 120, inputting the access record data into a pre-trained LSTM model to conduct access frequency prediction, and obtaining an access frequency prediction result.
In the embodiment of the invention, the access frequency is predicted based on the access record data of the file by adopting a pre-trained LSTM model, so that an access frequency prediction result is obtained.
In the embodiment of the invention, the LSTM model is used for prediction, so that the prediction effect of file heat is improved by more matching with a certain time characteristic of file access, and the scientificity and practicability of file migration and classified storage are enhanced.
And 130, determining the predicted heat level of the file to be managed based on the access frequency prediction result and the preset access heat level.
In the embodiment of the invention, different access frequency prediction results are divided into different access heat levels, and files with different access heat levels are stored by adopting different storage strategies so as to match the access requirements of users on the files, so that files with higher access frequency can be accessed more efficiently.
And 140, moving the files to be managed to the corresponding solid state disk, mechanical hard disk or magnetic tape based on the predicted heat level.
Illustratively, the predicted heat level is divided into a cold archive, a warm archive, and a hot archive, and the cold archive is periodically migrated to storage to tape. In order to further distinguish warm files from hot files, an access frequency threshold gamma is defined, files with a frequency less than or equal to the threshold gamma are defined as warm files, a migration system periodically migrates the files to a mechanical hard disk, files with a frequency greater than gamma are defined as hot files, and the migration system periodically migrates the files to a solid state hard disk.
According to the technical scheme, access record data of the files to be managed in a preset past time period are obtained, access frequency prediction is carried out in a pre-trained LSTM model, an access frequency prediction result and a predicted heat level of the files to be managed are obtained, the files to be managed are moved to corresponding solid state disks, mechanical hard disks or magnetic tapes based on the predicted heat level, reasonable distribution of storage resources is achieved, access efficiency of files with high access frequency is improved, and overall storage cost of the files is reduced.
In an embodiment of the present invention, n archive storage categories may be defined, with each storage category having different access performance and resource allocation, by way of example. Access heat levels (0, 1,..n-1) for n files are defined. The hotness labels of an archive are converted to a sparse vector y= {0,..1,..0 }, using one-hot encoding, respectively.
Taking the archive access record of the archive storage server for the past 30 days, and setting the access characteristics extracted from the file access log in the previous 27 days as the input of a prediction model. The access frequency Q3 days after the file is divided into a plurality of sections based on the aforementioned access heat level division method. The 0 file in Q defines the file as a cold file. The archive migration system periodically stores such files to tape. In order to further distinguish warm files from hot files, an access frequency threshold gamma is defined, files with a frequency less than or equal to the threshold gamma are defined as warm files, a migration system periodically migrates the files to a mechanical hard disk, files with a frequency greater than gamma are defined as hot files, and the migration system periodically migrates the files to a solid state hard disk.
The archive storage system provides a history access log in units of archive names for each archive and persistently stores the history access log. Recording file creation, opening and closing operations, each time file pointer movement, data reading and writing, and the like. Calculating the mean value and variance of various file operations to measure the discrete trend change on a time axis, mining the time characteristics of file access, and sorting the time characteristics into a time sequence access characteristic sequence of the file according to a proper time window.
The archive I/O access record data structure is defined as a 24 byte string. The 0 th byte is a file operation type field, such as file opening, closing, reading and writing; bytes 1 to 16 are file name hash value fields, and the hashed file names have uniform lengths so as to improve query efficiency; the 17 th byte to the 20 th byte are file operation time fields; the 21 st byte to the 23 rd byte are extension fields, record the user name, the file operation authority and the like of the file. When model training data is prepared, the initial time of accessing the acquisition file to the I/O record is set to be t respectively s And t e The time span is: Δt=t e -t s
The minimum loss function is set as a training target of the model, and given a randomization seed randomizes the weights and deviations in the LSTM network. Model training uses a gradient back-propagation algorithm and updates parameters in the network using Adam's random optimization algorithm.
Defining the original file access characteristic time sequence as F o ={f 1 ,...,f n N is the total number of files, f t For the time sequence of the t-th archive, t is E [1, n]。
The training set and the test set are randomly divided, and a standard z-score standardization method is adopted, so that the standardized training set can be expressed as:
F' train ={f' 1 ,...,f' n }
wherein t is more than or equal to 1 and less than or equal to L, t is the file sequence number, L is the model expansion step length, namely the hidden layer comprises L connected LSTM neurons. The input of the segmented model is: x= { X 1 ,X 2 ,...,X L And (2) X is the file access I/O record extracted in the second step, and the corresponding output Y is the file access heat label defined in the second step.
The model input layer transmits the file access I/O record X to the hidden layer, and the output after passing through the hidden layer is as follows:
O={O 1 ,O 2 ,...,O L }
O p =LSTM forward (X p ,C p-1 ,H p-1 )
wherein C is p-1 And H p-1 Corresponding to the state and output of the last LSTM neuron, respectively, function LSTM forward Representing the method of forward transfer of information in LSTM neurons. Here, assuming that the neuron state vector is S in size, it is known that C p-1 And H p-1 The vector sizes are also S.
A softmax layer is connected after the LSTM hidden layer output to output the probability of various access hotness. And outputting class labels corresponding to the maximum probability value during prediction, namely:
the model training adopts a cross entropy loss function as a loss function in the training process, and is defined as follows:
the output of the model is the access heat of the prediction file, namely the range of the access frequency falls in which interval, and the prediction accuracy is an important evaluation index of the model performance. The invention requires that frequent file class migration be reduced as much as possible to reduce resource consumption. Typically, the access frequency of an archive fluctuates slightly and does not change the storage class, i.e., no migration is required.
Kappa coefficients were used to evaluate the consistency of the model. The Kappa coefficient value range is set to be 0,1, and the higher the value is, the higher the prediction confidence on each archive category is. Conversely, if approaching 0, it is explained that the model classification result is close to the random classification. The Kappa coefficient is calculated as follows:
wherein p is o Is the overall accuracy, p e Is an occasional consistency error.
And (3) taking the model accuracy and the Kappa coefficient as indexes (for example, the model accuracy is greater than 80 percent, and the Kappa coefficient is greater than 0.75), and continuing training the model until the indexes are met.
And calling the first step to preprocess files needing classified storage, and calling the second step on the basis to generate model input meeting the model requirement.
And carrying out heat prediction by using the trained model in the previous step, and storing and migrating the archives according to the heat prediction result and the archives storage specification, so as to realize classified storage based on heat.
Fig. 2 is a schematic structural diagram of a file management apparatus based on file heat according to an embodiment of the present invention, and as shown in fig. 2, the file management apparatus based on file heat includes an obtaining module 21, a predicting module 22, a determining module 23 and an executing module 24. Wherein:
an acquisition module 21, configured to acquire access record data of a file to be managed in a preset past period;
the prediction module 22 is configured to input the access record data into a pre-trained LSTM model to perform access frequency prediction, and obtain an access frequency prediction result;
a determining module 23, configured to determine a predicted heat level of the file to be managed based on the access frequency prediction result and a preset access heat level;
the execution module 24 is configured to move the file to be managed to a corresponding solid state disk, mechanical disk, or tape based on the predicted heat level.
Optionally, after obtaining the access record data of the file to be managed in the preset past time period, the method includes:
carrying out structuring treatment on the access record data based on a preset data structure to obtain process access record data with a unified data structure;
and quantizing the process access record to obtain the target access record data based on the one-hot coding.
Optionally, the pre-trained LSTM model includes:
processing a sample file of the LSTM model for training to obtain the access frequency to obtain a sample set;
initializing weights and deviations of a preset LSTM model based on randomized seeds;
and training and testing the LSTM model by using the sample set to obtain a target LSTM model which meets the consistency requirement and takes the access frequency of the file as an output target.
Optionally, processing the sample archive for training the LSTM model of the access frequency to obtain a training set and a test set includes:
taking historical access record data of the archives within a first preset time length as sample data, and taking archives access frequency of a second preset time length after the first preset time length as sample labels of the sample data to obtain a training set and a test set which are composed of the sample data and the sample labels.
Optionally, processing the sample file of the LSTM model for training to obtain the access frequency to obtain a sample set further includes:
the sample set was randomly divided into training and test sets using standard z-score normalization methods.
Optionally, a cross entropy loss function is selected in the LSTM model as the loss function in the training process.
Optionally, after training and testing the LSTM model by using the sample set to obtain a target LSTM model with the access frequency of the archive as an output target, the method further includes:
calculating Kappa coefficient and model accuracy of the target LSTM model;
updating the target LSTM model based on a preset Kappa threshold and an accuracy threshold.
The file management device based on the file heat provided by the embodiment of the invention can execute the file management method based on the file heat provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 3 is a schematic structural diagram of a file management apparatus based on file heat according to an embodiment of the present invention, as shown in fig. 3, the apparatus includes a processor 30, a memory 31, a communication module 32, an input device 33 and an output device 34; the number of processors 30 in the device may be one or more, one processor 30 being taken as an example in fig. 3; the processor 30, the memory 31, the communication module 32, the input means 33 and the output means 34 in the device may be connected by a bus or other means, in fig. 3 by way of example.
The memory 31 is a computer readable storage medium, and may be used to store a software program, a computer executable program, and modules, such as program instructions/modules corresponding to the archive management method based on archive heat in the embodiment of the present invention (for example, the acquisition module 21, the prediction module 22, the determination module 23, and the execution module 24 in the archive management device based on archive heat). The processor 30 executes various functional applications of the device and data processing by running software programs, instructions and modules stored in the memory 31, i.e. implements the above-described archive management method based on archive heat.
The memory 31 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for functions; the storage data area may store data created according to the use of the terminal, etc. In addition, the memory 31 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, memory 31 may further include memory located remotely from processor 30, which may be connected to the device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
And the communication module 32 is used for establishing connection with the display screen and realizing data interaction with the display screen. The input means 33 may be used for receiving input numeric or character information and for generating key signal inputs related to user settings and function control of the electronic device, and the output means 34 may comprise a display device such as a display screen.
The file management device based on the file heat provided by the embodiment of the invention can execute the file management method based on the file heat provided by any embodiment of the invention, and particularly has corresponding functions and beneficial effects.
Embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are for performing a archive management method based on archive heat, the method comprising:
acquiring access record data of files to be managed in a preset past time period;
inputting the access record data into a pre-trained LSTM model to perform access frequency prediction, and obtaining an access frequency prediction result;
determining a predicted heat level of the file to be managed based on the access frequency prediction result and a preset access heat level;
and moving the files to be managed to the corresponding solid state disk, mechanical hard disk or magnetic tape based on the predicted heat level.
Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present invention is not limited to the above-mentioned method operations, and may also perform the related operations in the file management method based on file hotness provided in any embodiment of the present invention.
From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.
It should be noted that, in the above embodiment of the archive management device based on archive heat, each unit and module included are only divided according to the functional logic, but not limited to the above division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (10)

1. A archive management method based on archive heat, comprising:
acquiring access record data of files to be managed in a preset past time period;
inputting the access record data into a pre-trained LSTM model to conduct access frequency prediction, and obtaining an access frequency prediction result;
determining the predicted heat level of the file to be managed based on the access frequency prediction result and a preset access heat level;
and moving the files to be managed to corresponding solid state disks, mechanical hard disks or magnetic tapes based on the predicted heat level.
2. A archive management method based on archive heat according to claim 1, comprising, after the access record data of the archive to be managed for a preset past period of time is acquired:
carrying out structuring treatment on the access record data based on a preset data structure to obtain process access record data with a unified data structure;
and quantizing the process access record to obtain target access record data based on the one-hot coding.
3. The archive management method based on archive heat of claim 1, wherein the pre-trained LSTM model comprises:
processing a sample file of the LSTM model for training to obtain the access frequency to obtain a sample set;
initializing weights and deviations of a preset LSTM model based on randomized seeds;
and training and testing the LSTM model by using the sample set to obtain a target LSTM model which meets the consistency requirement and takes the access frequency of the file as an output target.
4. A archive management method based on archive heat according to claim 3 wherein processing the sample archive of LSTM model for training access frequency to obtain training set and test set comprises:
taking historical access record data of files within a first preset time length as sample data, and taking file access frequency of a second preset time length after the first preset time length as a sample label of the sample data to obtain a training set and a test set which are composed of the sample data and the sample label.
5. A archive management method based on archive heat according to claim 3 wherein, in processing the sample archive of LSTM model for training access frequency to obtain a sample set, further comprising:
the sample set was randomly partitioned into training and test sets using standard z-score normalization methods.
6. A archive management method based on archive heat according to claim 3 wherein cross entropy loss function is selected in the LSTM model as the loss function in the training process.
7. A archive management method based on archive heat according to claim 3, further comprising, after training and testing the LSTM model with the sample set to obtain a target LSTM model with an access frequency of archive as an output target, the target LSTM model meeting a consistency requirement:
calculating Kappa coefficient and model accuracy of the target LSTM model;
and updating the target LSTM model based on a preset Kappa threshold value and an accuracy rate threshold value.
8. A archive management device based on archive heat, comprising:
the acquisition module is used for acquiring access record data of the files to be managed in a preset past time period;
the prediction module is used for inputting the access record data into a pre-trained LSTM model to perform access frequency prediction, and obtaining an access frequency prediction result;
the determining module is used for determining the predicted heat level of the file to be managed based on the access frequency prediction result and a preset access heat level;
and the execution module is used for moving the files to be managed to the corresponding solid state disk, mechanical hard disk or magnetic tape based on the predicted heat level.
9. A archive management device based on archive heat, the device comprising:
one or more processors;
a storage means for storing one or more programs;
when executed by the one or more processors, causes the one or more processors to implement the archive management method of any one of claims 1-7 based on archive heat.
10. A storage medium containing computer executable instructions which, when executed by a computer processor, are for performing a archive management method based on archive heat of any one of claims 1 to 7.
CN202311389337.1A 2023-10-25 2023-10-25 File management method, device, equipment and storage medium based on file heat Pending CN117370272A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311389337.1A CN117370272A (en) 2023-10-25 2023-10-25 File management method, device, equipment and storage medium based on file heat

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311389337.1A CN117370272A (en) 2023-10-25 2023-10-25 File management method, device, equipment and storage medium based on file heat

Publications (1)

Publication Number Publication Date
CN117370272A true CN117370272A (en) 2024-01-09

Family

ID=89401942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311389337.1A Pending CN117370272A (en) 2023-10-25 2023-10-25 File management method, device, equipment and storage medium based on file heat

Country Status (1)

Country Link
CN (1) CN117370272A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108462605A (en) * 2018-02-06 2018-08-28 国家电网公司 A kind of prediction technique and device of data
US20180357246A1 (en) * 2017-06-07 2018-12-13 Acronis International Gmbh System and method for file archiving using machine learning
WO2019056499A1 (en) * 2017-09-20 2019-03-28 平安科技(深圳)有限公司 Prediction model training method, data monitoring method, apparatuses, device and medium
CN110610382A (en) * 2019-09-10 2019-12-24 浙江大搜车软件技术有限公司 Vehicle sales prediction method, apparatus, computer device, and storage medium
CN111158613A (en) * 2020-04-07 2020-05-15 上海飞旗网络技术股份有限公司 Data block storage method and device based on access heat and storage equipment
CN113703688A (en) * 2021-09-20 2021-11-26 河南锦誉网络科技有限公司 Distributed storage node load adjustment method based on big data and file heat
CN114049968A (en) * 2021-10-21 2022-02-15 北京北明数科信息技术有限公司 Infectious disease development trend prediction method, system, device and storage medium
CN114239949A (en) * 2021-12-10 2022-03-25 中信银行股份有限公司 Website access amount prediction method and system based on two-stage attention mechanism
CN114912666A (en) * 2022-04-24 2022-08-16 同济大学 Short-time passenger flow volume prediction method based on CEEMDAN algorithm and attention mechanism
CN115544377A (en) * 2022-11-25 2022-12-30 浙江星汉信息技术股份有限公司 Cloud storage-based file heat evaluation and updating method
CN116245227A (en) * 2023-02-02 2023-06-09 国家气候中心 Daily weather drought prediction method, device, storage medium and equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180357246A1 (en) * 2017-06-07 2018-12-13 Acronis International Gmbh System and method for file archiving using machine learning
WO2019056499A1 (en) * 2017-09-20 2019-03-28 平安科技(深圳)有限公司 Prediction model training method, data monitoring method, apparatuses, device and medium
CN108462605A (en) * 2018-02-06 2018-08-28 国家电网公司 A kind of prediction technique and device of data
CN110610382A (en) * 2019-09-10 2019-12-24 浙江大搜车软件技术有限公司 Vehicle sales prediction method, apparatus, computer device, and storage medium
CN111158613A (en) * 2020-04-07 2020-05-15 上海飞旗网络技术股份有限公司 Data block storage method and device based on access heat and storage equipment
CN113703688A (en) * 2021-09-20 2021-11-26 河南锦誉网络科技有限公司 Distributed storage node load adjustment method based on big data and file heat
CN114049968A (en) * 2021-10-21 2022-02-15 北京北明数科信息技术有限公司 Infectious disease development trend prediction method, system, device and storage medium
CN114239949A (en) * 2021-12-10 2022-03-25 中信银行股份有限公司 Website access amount prediction method and system based on two-stage attention mechanism
CN114912666A (en) * 2022-04-24 2022-08-16 同济大学 Short-time passenger flow volume prediction method based on CEEMDAN algorithm and attention mechanism
CN115544377A (en) * 2022-11-25 2022-12-30 浙江星汉信息技术股份有限公司 Cloud storage-based file heat evaluation and updating method
CN116245227A (en) * 2023-02-02 2023-06-09 国家气候中心 Daily weather drought prediction method, device, storage medium and equipment

Similar Documents

Publication Publication Date Title
US20200050968A1 (en) Interactive interfaces for machine learning model evaluations
US10726356B1 (en) Target variable distribution-based acceptance of machine learning test data sets
CA2953826C (en) Machine learning service
CN113610239B (en) Feature processing method and feature processing system for machine learning
RU2693324C2 (en) Method and a server for converting a categorical factor value into its numerical representation
CN111881447B (en) Intelligent evidence obtaining method and system for malicious code fragments
CN110866107A (en) Method and device for generating material corpus, computer equipment and storage medium
CN110968272A (en) Time sequence prediction-based method and system for optimizing storage performance of mass small files
Setiawan et al. Function interpolation for learned index structures
Zamzami et al. Model selection and application to high-dimensional count data clustering: via finite EDCM mixture models
CN110019017B (en) High-energy physical file storage method based on access characteristics
CN117370272A (en) File management method, device, equipment and storage medium based on file heat
Ding et al. HB-file: An efficient and effective high-dimensional big data storage structure based on US-ELM
CN112348041A (en) Log classification and log classification training method and device, equipment and storage medium
CN113032575B (en) Document blood relationship mining method and device based on topic model
Ragavan et al. A Novel Big Data Storage Reduction Model for Drill Down Search.
Soltani et al. Developing software signature search engines using paragraph vector model: a triage approach for digital forensics
CN113190662A (en) Topic segmentation method based on discourse structure diagram network
Luo et al. A comparison of som based document categorization systems
Zhang et al. The incremental knowledge acquisition based on hash algorithm
US11868329B2 (en) Multidimensional cube multivariate regression
US20240037067A1 (en) File system provisioning for workload
CN113220994B (en) User personalized information recommendation method based on target object enhanced representation
Rodrigues Big Data Machine Learning Benchmark on Spark
Shayegan et al. An extended version of sectional MinHash method for near-duplicate detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination