CN116089364B

CN116089364B - Storage file management method and device, AI platform and storage medium

Info

Publication number: CN116089364B
Application number: CN202310377465.8A
Authority: CN
Inventors: 姬贵阳
Original assignee: Shandong Yingxin Computer Technology Co Ltd
Current assignee: Shandong Yingxin Computer Technology Co Ltd
Priority date: 2023-04-11
Filing date: 2023-04-11
Publication date: 2023-07-14
Anticipated expiration: 2043-04-11
Also published as: CN116089364A

Abstract

The application relates to the field of computers, and discloses a storage file management method, a storage file management device, an AI platform and a storage medium, wherein the storage file management method comprises the following steps: when the latest modification time of a first directory taken out of a stack changes, acquiring a subdirectory of the first directory; the first catalog is stored in an AI platform; judging whether the subdirectory is empty or not; if the subdirectory is not empty, putting the subdirectory into the stack; if the child directory is empty, putting the child directory and a parent directory of the child directory meeting preset conditions into a queue; the preset condition is that all child directories of the father directory are put into the queue; and taking out the second catalogue from the queue until the stack and the catalogue in the queue are empty, and carrying out corresponding operation on a database for storing the catalogue according to the latest modification time change condition of the second catalogue. The method and the device can shorten the statistical management time, improve the storage operation efficiency and reduce the consumption of AI platform resources.

Description

Storage file management method and device, AI platform and storage medium

Technical Field

The present disclosure relates to the field of computers, and in particular, to a method and apparatus for managing storage files, an AI platform, and a computer readable storage medium.

Background

The AI (Artificial Intelligence ) platform is a platform that can manage and schedule the use of computing resources (e.g., GPU (Graphics Processing Unit, graphics processor), CPU (Central Processing Unit ), etc.) and storage resources, and can support business scenarios such as AI training and AI reasoning on a large scale.

One prominent feature of the AI platform with respect to storage is that the number of stored files is massive (above TB level), so an important basic function of the AI platform is statistical management of the massive stored files. Currently, there are several methods for statistically managing massive files in the AI platform. Firstly, directly traversing all the stored files to obtain the size of each file directory; secondly, using concurrency statistics, dividing a storage catalog scheme; third, the storage itself provides a Quote Quota function, such as Nfs and Beegfs, etc. file storage systems. The first management method can continuously consume the stored IO (Input/Output) resources in the traversal process, and simultaneously can easily cause the problem of operation blocking of other files of the storage node when the resources such as a CPU (central processing unit) and a memory of the service are continuously consumed, meanwhile, the method is also not ideal in statistical result, the statistical size has time delay under a huge amount of files, and the obtained statistical result has errors. The second method also generates a great deal of resource consumption, which is easy to cause the problem of blocking other file operations of the storage node. The third method has great consumption on a network, a disk IO, a CPU and a memory of storage resources, and increases the pressure on the storage of an AI platform.

Therefore, how to solve the above technical problems should be of great interest to those skilled in the art.

Disclosure of Invention

The object of the present application is to provide a method, an apparatus, an AI platform, and a computer readable storage medium for managing storage files, so as to reduce resource consumption and shorten statistical management time.

In order to solve the above technical problems, the present application provides a storage file management method, including:

when the latest modification time of a first directory taken out of a stack changes, acquiring a subdirectory of the first directory; the first catalog is stored in an AI platform;

judging whether the subdirectory is empty or not;

if the subdirectory is not empty, putting the subdirectory into the stack;

if the child directory is empty, putting the child directory and a parent directory of the child directory meeting preset conditions into a queue; the preset condition is that all child directories of the father directory are put into the queue;

and taking out the second catalogue from the queue until the stack and the catalogue in the queue are empty, and carrying out corresponding operation on a database for storing the catalogue according to the latest modification time change condition of the second catalogue.

Optionally, according to the latest modified time change condition of the second catalog, performing corresponding operations on the database for storing the catalog includes:

When the latest modification time of the second catalog is unchanged, determining whether the database needs to be updated according to the size change condition of the second catalog;

and when the latest modification time of the second catalog changes, carrying out corresponding operation on the database according to the storage condition of the second catalog in the database.

Optionally, determining whether the database needs to be updated according to the size change condition of the second directory includes:

judging whether the size of the second catalog is changed or not;

if the size of the second catalog is changed, updating the storage path information of the second catalog in the database;

and if the size of the second catalog is unchanged, not performing updating operation on the database.

Optionally, according to the storage condition of the second directory in the database, performing corresponding operations on the database includes:

judging whether the second catalogue is stored in the database or not;

if the second catalogue is stored in the database, updating the storage path information of the second catalogue in the database;

and if the second catalogue is not stored in the database, inserting the second catalogue into the database.

Optionally, the method further comprises:

and establishing an index for a table in the database.

Optionally, the method further comprises:

and according to the storage path information, performing sub-table storage on the second catalogue.

Optionally, the method further comprises:

and judging whether the latest modification time of the second catalog changes or not.

Alternatively, the database is a micro-service, migratable, and non-configuration installed database that can be embedded in an AI platform.

Optionally, retrieving the second directory from the queue includes:

and simultaneously taking out the second catalogue which is not the parent-child catalogue from the queue.

Optionally, the method further comprises:

and deleting the directories which exist in the database and do not exist at the bottom layer from the database, wherein the directories comprise a parent directory and all child directories under the parent directory.

Optionally, if the subdirectory is not empty, further comprising:

and recording the information of the parent directory of the child directory.

Optionally, if the child directory is empty, placing the child directory and a parent directory of the child directory that meets a preset condition into a queue includes:

if the subdirectory is empty, putting the subdirectory into a queue;

judging whether all the child directories of the parent directory of the child directory are all put into a queue;

And if all the child directories of the parent directory of the child directory are put into the queue, putting the parent directory into the queue until the parent directory which does not meet the preset condition is put into the queue.

Optionally, when the latest modification time of the first directory fetched from the stack changes, before obtaining the subdirectory of the first directory, the method includes:

it is determined whether the last modification time of the first directory fetched from the stack has changed.

Optionally, before determining whether the latest modification time of the first directory fetched from the stack has changed, the method further includes:

placing the first directory in a stack;

and taking the first catalogue out of the stack.

Optionally, obtaining the subdirectory of the first directory includes

Traversing the first directory in a cleanable manner to obtain subdirectories of the first directory.

Optionally, traversing the first directory in a cleanup manner includes:

and opening the first directory by using an opening function, and reading the first directory by using a reading function.

Optionally, the method further comprises:

and when the latest modification time of the first directory fetched from the stack is unchanged, acquiring a subdirectory of the first directory from the database.

The application also provides a storage file management device, which comprises:

The first acquisition module is used for acquiring the subdirectory of the first catalogue when the latest modification time of the first catalogue taken out from the stack changes; the first catalog is stored in an AI platform;

the first judging module is used for judging whether the subdirectory is empty or not;

the first storage module is used for placing the subdirectories into the stack if the subdirectories are not empty;

the second storage module is used for placing the child directory and the parent directory of the child directory meeting preset conditions into a queue if the child directory is empty; the preset condition is that all child directories of the father directory are put into the queue;

and the removing and processing module is used for taking out the second catalogue from the queue until the catalogue in the stack and the queue is empty, and carrying out corresponding operation on a database for storing the catalogue according to the latest modified time change condition of the second catalogue.

The application also provides an AI platform comprising:

a memory for storing a computer program;

and a processor for implementing any one of the above stored file management methods when executing the computer program.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any one of the storage file management methods described above.

The storage file management method provided by the application comprises the following steps: when the latest modification time of a first directory taken out of a stack changes, acquiring a subdirectory of the first directory; the first catalog is stored in an AI platform; judging whether the subdirectory is empty or not; if the subdirectory is not empty, putting the subdirectory into the stack; if the child directory is empty, putting the child directory and a parent directory of the child directory meeting preset conditions into a queue; the preset condition is that all child directories of the father directory are put into the queue; and taking out the second catalogue from the queue until the stack and the catalogue in the queue are empty, and carrying out corresponding operation on a database for storing the catalogue according to the latest modification time change condition of the second catalogue.

Therefore, the first directory is stored in the stack, the subdirectory of the first directory is obtained by taking the first directory out of the stack and when the latest modification time of the first directory changes, and whether the subdirectory is put into a queue or put into the stack is determined according to whether the subdirectory is empty or not. When the catalogue exists in the queue, the catalogue is taken out from the queue until the stack and the queue are all empty, and meanwhile, corresponding operation is carried out on the database according to whether the latest modification time of the catalogue taken out from the queue changes or not. The stack is a data structure mode for realizing data first in and last out, and the queue is a data structure mode for realizing data first in first out, so the application realizes the directory 'bottom up' storage statistics management by repeating 'stack out-stack in-stack out and queue in-queue-out' until all stacks and queues are empty. In the statistical management process, the memory overflow of the system is not caused due to continuous deletion (taking out) in stacks and queues, and the changed catalogue is counted, so that the waste of time and platform resources caused by total statistics is avoided.

Therefore, on one hand, the method occupies very few resources of the AI platform, reduces the consumption of resources such as CPU, memory and IO of the AI platform, reduces the storage and load pressure of the AI platform, prevents operation clamping of other files of a storage node, improves the training efficiency of the artificial intelligent model, reduces the consumption and long-term occupation of resources of a service module, and enhances the storage performance of the AI platform. On the other hand, the method shortens the statistical management time, improves the storage operation efficiency of each AI platform, shortens the model training time, reduces the operation and maintenance cost of operation and maintenance personnel, and improves the market competitiveness of the AI platform.

Furthermore, the present application provides an apparatus, an AI platform, and a computer-readable storage medium having the above advantages.

Drawings

For a clearer description of embodiments of the present application or of the prior art, the drawings that are used in the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description that follow are only some embodiments of the present application, and that other drawings may be obtained from these drawings by a person of ordinary skill in the art without inventive effort.

FIG. 1 is a flowchart illustrating a method for managing storage files according to an embodiment of the present disclosure;

FIG. 2 is a diagram of a stack data structure according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a queue data structure according to an embodiment of the present disclosure;

FIG. 4 is a second flowchart of a method for managing storage files according to an embodiment of the present disclosure;

FIG. 5 is a flowchart III of a method for managing storage files according to an embodiment of the present application;

FIG. 6 is a flowchart of a method for managing storage files according to an embodiment of the present disclosure;

FIG. 7 is a flowchart fifth of a method for managing storage files according to an embodiment of the present disclosure;

FIG. 8 is a flowchart sixth of a method for managing storage files according to an embodiment of the present disclosure;

FIG. 9 is a flowchart seventh of a method for managing storage files according to an embodiment of the present application;

FIG. 10 is a block diagram illustrating a storage file management apparatus according to an embodiment of the present disclosure;

FIG. 11 is a block diagram of an AI platform provided in an embodiment of the disclosure;

fig. 12 is a frame diagram of a storage statistics management system on an AI platform according to an embodiment of the present application.

Detailed Description

In order to provide a better understanding of the present application, those skilled in the art will now make further details of the present application with reference to the drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

As in the background art, since a large number of files are stored on the AI platform, when the large number of files are statistically managed, the related art has the following disadvantages that firstly, various resources of the AI platform are consumed, the pressure for storing the AI platform is increased, and meanwhile, the problem that other files of a storage node are blocked in operation is easily caused; second, the time-consuming period of statistical management.

In view of this, the present application provides a storage file management method, please refer to fig. 1, which includes:

step S101: when the latest modification time of the first catalogue taken out of the stack changes, acquiring a subdirectory of the first catalogue; the first directory is stored on the AI platform.

The AI platform in the present application may be an AI cluster platform.

A stack (stack) is a linear memory structure that can only access data from one end and follows the "first in last out" principle.

Based on the characteristics of the stack structure, in practical application, the following two operations are generally only performed on the stack: first, adding data to a stack, a process called "push" (push or push); second, data is extracted from the stack, a process called "pop" (spring stack). The position where the data is first stacked is called the "stack bottom", and the position where the data is last stacked is called the "stack top".

For example, as shown in fig. 2, when there are n directories of a1 directory, a2 directory, …, an directory, the a1 directory, a2 directory, …, and the an directory are put in the stack in order. When the directories are fetched from the stack, the n directories are fetched in the reverse order of the entries, i.e., an directory is fetched first, then an (n-1) directory is fetched, …, an 2 directory is fetched, and finally an a1 directory is fetched.

The first directory exists in the stack, and the first directory is taken out of the stack, namely the first directory is popped.

The application is based on the characteristics of Linux operating system files. The file system only has the size of the file, the catalogue does not have the actual occupied size information, and the storage of the AI platform performs catalogue statistics, so that the consumption of resources such as a storage CPU, a memory, IO and the like can be reduced, the storage and load pressure of the AI platform are reduced, and the storage performance of the AI platform is enhanced.

Operations that may cause the last modification time of the first directory to change include, but are not limited to, adding soft links, adding hard links, compressing and decompressing, renaming, adding hidden files, deleting, modifying, etc.

The file is added, deleted and modified to change the latest modification time of the parent directory, but the file is limited to the parent directory, and the latest modification time of the parent directory is not changed, namely, when the stored directory is changed, the file is only changed in the upper-layer stored directory, and other directories are not changed. Therefore, when the size of the stored catalogue is counted, the size of the catalogue needs to be counted according to the change of the latest modification time of the catalogue, and the counting is performed from the bottommost catalogue to the outmost catalogue.

It should be noted that, in the present application, the type of the first directory is not limited, and may be determined according to circumstances. For example, the first catalog includes, but is not limited to, a user home catalog, a public catalog, a dataset catalog, a model catalog, and the like.

The first catalog can be a user home catalog, a public catalog, a data set catalog, a model catalog and the like, so that the storage file management method can be applied to different AI business scenes to count the sizes of the public catalog and the data set catalog.

The subdirectories of the first directory, i.e. the next-level directory of the first directory. The number of subdirectories under the first directory is not specifically limited in this application, as the case may be. For example, the number of the sub-directories of the first directory may be one, or the number of the sub-directories of the first directory may be two or more.

Step S102: it is determined whether the subdirectory is empty.

It should be noted that, whether the subdirectory is empty or not is determined, that is, whether the subdirectory exists in the next layer of the subdirectory or not is determined.

Step S103: if the subdirectory is not empty, the subdirectory is put in the stack.

If the subdirectory of the first directory is not empty, i.e. there is a next-layer directory under the subdirectory, the subdirectory is put in the stack at this time.

In one embodiment of the present application, if the subdirectory is not empty, further comprising:

the information of the parent directory of the child directory is recorded to ensure that the database is not queried repeatedly.

Step S104: if the child directory is empty, putting the child directory and a parent directory of the child directory meeting preset conditions into a queue;

the preset condition is that all subdirectories of the parent directory are put into the queue.

If the child directory of the first directory is empty, that is, the child directory does not have the next directory, that is, the first directory fetched from the stack at this time is already the bottom directory, then the child directory is put into the queue, and then all the parent directories meeting the preset condition are put into the queue.

For example, when there is only one B directory in the next layer of a directories and there is no next layer of B directories, after the B directory is put in the queue, the a directory is also put in the queue. When the next layer of the A catalogue has the B1 catalogue and the B2 catalogue and the B1 catalogue and the B2 catalogue do not have the next layer of catalogue, after the B1 catalogue and the B2 catalogue are all put into the queue, the A catalogue is put into the queue.

A Queue (Queue), like a stack, is also a linear memory structure that has stringent requirements for "stores" and "fetches" of data. Unlike the stack structure, both ends of the queue are "open", requiring that data can only be entered from one end and exited from the other. Typically, one end of incoming data is called the "tail of queue", one end of outgoing data is called the "head of queue", the process of queuing data elements is called "enqueuing", and the process of dequeuing is called "dequeuing".

The data in the queue should be in and out according to the principle of first-in first-out, i.e. the data of the first-in queue should be out of the queue first-out.

For example, as shown in fig. 3, when there are n directories of the a1 directory, the a2 directory, the … directory and the an directory, the a1 directory, the a2 directory, the … directory and the an directory are placed in the queue sequentially, that is, the a1 directory is placed in the queue first, then the a2 directory is placed in the queue. When a directory is fetched from the queue, n directories are fetched in the same order as the order of entry, i.e., an a1 directory is fetched first, then an a2 directory is fetched.

Step S105: and taking out the second catalogue from the queue until the catalogue in the stack and the queue is empty, and performing corresponding operation on a database for storing the catalogue according to the latest modified time change condition of the second catalogue.

The process of "pop-push-pop and enqueue-dequeue" is repeated until the stack and queue are all empty. The subsequent processing is then based on whether the last modification time of the second directory taken from the queue has changed, as will be described in the following examples, with reference to the following.

Only the directory is stored in the database, and the file is not stored. And the catalog information is stored in a database, so that the file catalog can be conveniently and rapidly searched in a mass storage catalog.

As one implementation, the database is a micro-service, migratable, and non-configuration installed database that can be embedded in an AI platform. The AI platform may have one or more micro-services therein, and the database may be embedded in the one or more micro-services as desired.

Preferably, the database is the SQLite3 database.

The SQLite3 database is a lightweight database for realizing storage management, is an ACID-compliant relational database management system, and realizes a self-sufficient, server-free, zero-configuration and transactional SQL database engine. And the SQLite3 database ensures that the use of the database by the AI platform service module is not influenced, the installation in the AI platform service module is not needed, the method has the advantage of convenient migration along with storage, and the method can be used along with the upgrade and patch of the AI platform, and the stored data cannot be lost.

When the SQLite3 database is operated by using multiple data sources, the use of other databases, such as Mysql and the like, is not affected. The SQLite3 database does not need to be installed and configured, tens of millions of levels of data are in milliseconds (ms) under the indexing condition, the SQLite3 can generate a binary file (database name. Db) on an operating system, if an AI platform machine is down, the file only needs to be copied to other machines to be used continuously, and the database can support 128TiB at maximum. The method is very suitable for solving the storage of the storage catalog data file in the AI scene, and can quickly obtain the size of the storage catalog.

Optionally, in one embodiment of the present application, before the second directory is fetched from the queue, the method further includes:

judging whether a second catalogue exists in the queue;

if the second catalogue exists in the queue, executing the step of taking out the second catalogue from the queue;

and if the second catalogue does not exist in the queue, waiting for the stack to finish.

And after the stack is finished, dequeuing, namely taking out the second catalogue from the queue, so as to realize the statistics of the catalogue from bottom to top.

In this embodiment, the first directory is stored in the stack, and when the latest modification time of the first directory changes, the sub-directory of the first directory is obtained by taking the first directory out of the stack, and whether the sub-directory is put in the queue or put in the stack is determined according to whether the sub-directory is empty. When the catalogue exists in the queue, the catalogue is taken out from the queue until the stack and the queue are all empty, and meanwhile, corresponding operation is carried out on the database according to whether the latest modification time of the catalogue taken out from the queue changes or not. The stack is a data structure mode for realizing data first in and last out, and the queue is a data structure mode for realizing data first in first out, so the application realizes the directory 'bottom up' storage statistics management by repeating 'stack out-stack in-stack out and queue in-queue-out' until all stacks and queues are empty. In the statistical management process, the memory overflow of the system is not caused due to continuous deletion (taking out) in stacks and queues, and the changed catalogue is counted, so that the waste of time and platform resources caused by total statistics is avoided. Therefore, on one hand, the method occupies very few resources of the AI platform, reduces the consumption of resources such as CPU, memory and IO of the AI platform, reduces the storage and load pressure of the AI platform, prevents operation clamping of other files of a storage node, improves the training efficiency of the artificial intelligent model, reduces the consumption and long-term occupation of resources of a service module, and enhances the storage performance of the AI platform. On the other hand, the method shortens the statistical management time, improves the storage operation efficiency of each AI platform, shortens the model training time, reduces the operation and maintenance cost of operation and maintenance personnel, and improves the market competitiveness of the AI platform.

On the basis of the above embodiments, in one embodiment of the present application, please refer to fig. 4, the storage file management method includes:

step S201: when the latest modification time of the first catalogue taken out of the stack changes, acquiring a subdirectory of the first catalogue; the first directory is stored on the AI platform.

Step S202: it is determined whether the subdirectory is empty.

Step S203: if the subdirectory is not empty, the subdirectory is put in the stack.

Step S204: if the child directory is empty, putting the child directory and a parent directory of the child directory meeting preset conditions into a queue; the preset condition is that all subdirectories of the parent directory are put into the queue.

Step S205: and taking out the second catalogue from the queue until the catalogue in the stack and the queue is empty.

Optionally, in an embodiment of the present application, after the second directory is fetched from the queue until the stack and the directory in the queue are empty, the method further includes:

it is determined whether the most recent modification time of the second directory has changed.

Step S206: and when the latest modification time of the second catalog is unchanged, determining whether the database needs to be updated according to the size change condition of the second catalog.

Operations that do not cause a change in the last modification time of the second directory include: file rights modification, etc.

As an implementation manner, determining whether the database needs to be updated according to the size change condition of the second directory includes:

step S2061: and judging whether the size of the second catalog is changed or not.

Step S2062: and if the size of the second catalog is changed, updating the storage path information of the second catalog in the database.

The storage path information is the directory path of the second directory.

The directory information stored in the database includes storage path information of the directory, the latest modification time of the directory, the size of the directory, directory owner information, and the like.

Step S2063: if the size of the second catalog is unchanged, the database is not updated.

When the size of the second directory is unchanged, no processing is required for the database.

Step S207: when the latest modification time of the second catalog changes, corresponding operation is carried out on the database according to the preservation condition of the second catalog in the database.

Operations that may cause the most recent modification time of the second directory to change include, but are not limited to, adding soft links, adding hard links, compressing and decompressing, renaming, adding hidden files, deleting, modifying, etc.

As an implementation manner, according to the storage condition of the second catalog in the database, performing corresponding operations on the database includes:

Step S2071: and judging whether the second catalogue is stored in the database.

Step S2072: and if the second catalogue is stored in the database, updating the storage path information of the second catalogue in the database.

The latest modification time of the second catalog changes, and the second catalog is also stored in the database, namely the storage path information of the second catalog in the database needs to be updated.

Step S2073: if the second directory is not stored in the database, the second directory is inserted into the database.

When the latest modification time of the second directory changes and the second directory is not saved in the database, this is the case for the first statistical traversal, so the second directory needs to be inserted in the database.

The database is updated in order to ensure that the directory query is only once.

It should be noted that, please refer to the content of the above embodiment in step S201 to step S205, and detailed descriptions thereof are omitted herein.

In order to improve the retrieval efficiency, quickly obtain the catalog size information, and reduce the consumption of resources such as CPU and IO generated by AI platform storage during statistics, in one embodiment of the present application, the storage file management method further includes:

An index is established to the tables in the database.

Further, in one embodiment of the present application, the storage file management method further includes:

and performing sub-table storage on the second catalogue according to the storage path information.

And more than two tables are established in the database, the second directory sub-table is stored according to the storage path information, so that the pressure of the database to a certain extent caused by the use of the database of the AI platform is avoided, the AI platform is more suitable for searching large-scale cluster files, and meanwhile, the directory information can be quickly searched, and the searching efficiency is improved.

When the table is built in the database, the database is updated, namely, the table in the database is updated.

Referring to fig. 5, in one embodiment of the present application, the storage file management method includes:

step S301: when the latest modification time of the first catalogue taken out of the stack changes, acquiring a subdirectory of the first catalogue; the first directory is stored on the AI platform.

Step S302: it is determined whether the subdirectory is empty.

Step S303: if the subdirectory is not empty, the subdirectory is put in the stack.

Step S304: if the child directory is empty, putting the child directory and a parent directory of the child directory meeting preset conditions into a queue; the preset condition is that all subdirectories of the parent directory are put into the queue.

Step S305: and simultaneously taking out the second catalogues which are not the parent-child catalogues from the queue until the catalogues in the stack and the queue are empty.

Step S306: it is determined whether the most recent modification time of the second directory has changed.

Step S307: and when the latest modification time of the second catalog is unchanged, determining whether the database needs to be updated according to the size change condition of the second catalog.

Step S308: when the latest modification time of the second catalog changes, corresponding operation is carried out on the database according to the preservation condition of the second catalog in the database.

It should be noted that, please refer to the content of the above embodiments in steps S301 to S304 and steps S306 to S308, and detailed descriptions thereof are omitted herein.

In this embodiment, when the second directory is taken out of the queue, the second directory which is not the parent-child directory is taken out of the queue at the same time, that is, the directory data which is not the parent-child directory are counted asynchronously, so as to speed up the counting in the upward process, shorten the counting management time, and improve the counting efficiency.

Referring to fig. 6, in one embodiment of the present application, the storage file management method includes:

Step S401: when the latest modification time of the first catalogue taken out of the stack changes, acquiring a subdirectory of the first catalogue; the first directory is stored on the AI platform.

Step S402: it is determined whether the subdirectory is empty.

Step S403: if the subdirectory is not empty, the subdirectory is put in the stack.

Step S404: if the child directory is empty, putting the child directory and a parent directory of the child directory meeting preset conditions into a queue; the preset condition is that all subdirectories of the parent directory are put into the queue.

Step S405: and taking out the second catalogue from the queue until the catalogue in the stack and the queue is empty, and performing corresponding operation on a database for storing the catalogue according to the latest modified time change condition of the second catalogue.

Step S406: directories that exist in the database and do not exist at the bottom layer are deleted from the database, and the directories include a parent directory and all child directories under the parent directory.

It should be noted that, please refer to the content of the above embodiment in steps S401 to S405, and detailed descriptions thereof are omitted herein.

In the implementation, the catalogues which do not exist at the bottom layer but exist in the database are deleted and cleaned, so that dirty data do not exist in the database, and the database is prevented from being bigger and bigger.

Referring to fig. 7, in one embodiment of the present application, the storage file management method includes:

step S501: when the latest modification time of the first catalogue taken out of the stack changes, acquiring a subdirectory of the first catalogue; the first directory is stored on the AI platform.

Step S502: it is determined whether the subdirectory is empty.

Step S503: if the subdirectory is not empty, the subdirectory is put in the stack.

Step S504: if the subdirectory is empty, the subdirectory is placed in a queue.

Step S505: it is determined whether all child directories of the parent directory of the child directory are all placed in the queue.

Step S506: if all the child directories of the parent directory of the child directory are put into the queue, putting the parent directory into the queue until the parent directory does not meet the preset condition.

Step S507: and taking out the second catalogue from the queue until the catalogue in the stack and the queue is empty, and performing corresponding operation on a database for storing the catalogue according to the latest modified time change condition of the second catalogue.

Step S508: if all the child directories of the parent directory of the child directory are not all put in the queue, the parent directory is not put in the queue.

It should be noted that, please refer to the content of the above embodiment in steps S501 to S505 and step S507, and detailed descriptions thereof are omitted herein.

On the basis of any one of the foregoing embodiments, in one embodiment of the present application, when a latest modification time of a first directory fetched from a stack changes, before obtaining a subdirectory of the first directory, the method includes:

When the last modification time of the first directory fetched from the stack changes, the step of obtaining a subdirectory of the first directory is performed.

The specific operations performed when the last modification time of the first directory fetched from the stack has not changed are described in the embodiments described below.

Further, in one embodiment of the present application, before determining whether the latest modification time of the first directory fetched from the stack has changed, the method further includes:

placing the first directory in a stack;

the first directory is fetched from the stack.

Referring to fig. 8, in one embodiment of the present application, the storage file management method includes:

step S601: when the latest modification time of the first catalogue taken out from the stack changes, traversing the first catalogue in a cleanable mode to obtain a subdirectory of the first catalogue; the first directory is stored on the AI platform.

The cleanable mode can close the first catalogue after the first catalogue is opened, and has the characteristic of high performance. Alternatively, the cleanup mode may be a stream (stream) mode.

As one implementation, traversing the first directory in a cleanup manner includes:

the first directory is opened using an open function and read using a read function.

The open function may be an opendir function and the read function may be a readdir function.

Step S602: it is determined whether the subdirectory is empty.

Step S603: if the subdirectory is not empty, the subdirectory is put in the stack.

Step S604: if the child directory is empty, putting the child directory and a parent directory of the child directory meeting preset conditions into a queue; the preset condition is that all subdirectories of the parent directory are put into the queue.

Step S605: and taking out the second catalogue from the queue until the catalogue in the stack and the queue is empty, and performing corresponding operation on a database for storing the catalogue according to the latest modified time change condition of the second catalogue.

It should be noted that, please refer to the content of the above embodiment from step S602 to step S605, and detailed description thereof is omitted here.

Referring to fig. 9, in one embodiment of the present application, the storage file management method includes:

Step S701: it is determined whether the last modification time of the first directory fetched from the stack has changed.

Step S702: when the latest modification time of the first directory fetched from the stack has not changed, a subdirectory of the first directory is obtained from the database.

If the number of files with the level of ten thousand or more is in a single directory, the directory is unchanged, and the directory under the directory is only required to be acquired in a database, so that the bottom storage traversal is not required to be performed by using an opendir function and a readdir function. The number of the catalogs is far smaller than that of the files, and the number of the catalogs is about 2 ten thousand under the condition that the storage usage amount of 1TB under the AI platform, so that the catalogs only need to be stored with the catalogs information by the database.

The 99% inventory in the AI platform will not change during a time interval (e.g., 5 minutes). For the size before the unchanged catalogue is used, only the changed catalogue size needs to be counted.

Step S703: when the latest modification time of the first catalogue taken out of the stack changes, acquiring a subdirectory of the first catalogue; the first directory is stored on the AI platform.

Step S704: it is determined whether the subdirectory is empty.

Step S705: if the subdirectory is not empty, the subdirectory is put in the stack.

Step S706: if the subdirectory is empty, the subdirectory is placed in a queue.

Step S707: it is determined whether all child directories of the parent directory of the child directory are all placed in the queue.

Step S708: if all the child directories of the parent directory of the child directory are put into the queue, putting the parent directory into the queue until the parent directory does not meet the preset condition.

Step S709: and taking out the second catalogue from the queue until the catalogue in the stack and the queue is empty.

Step S710: it is determined whether the most recent modification time of the second directory has changed.

Step S711: and when the latest modification time of the second catalog is unchanged, determining whether the database needs to be updated according to the size change condition of the second catalog.

Step S712: when the latest modification time of the second catalog changes, corresponding operation is carried out on the database according to the preservation condition of the second catalog in the database.

When the latest modification time of the catalogs does not change, sub-catalogs of the first catalogs are obtained from the database, namely, the sizes of the changed catalogs are counted only by using the sizes of the previous catalogs, repeated catalog traversal is prevented by each time of storage counting, a traditional full-scale storage counting method is avoided, and storage counting efficiency is improved.

The storage file management method in the present application is explained below in a case.

The first step: first, a data structure stack and a data structure queue are defined.

And a second step of: after the catalogs are continuously put in the stack, the catalogs in the stack are taken out, if the subdirectories are not available under the catalogs (namely, the data taken out of the stack are necessarily the bottom layer), a data structure queue is defined to be used, and the catalogs taken out of the stack are put in the queue.

And a third step of: after the second step is completed, whether all the child directories of the parent directory of the child directory are all queued is required to be judged, and if all the child directories are all queued, the parent directory is queued until the parent directory which does not meet the condition. This step ensures that all directories can be queued. Time efficiency O (1) (meaning that no time is consumed), efficiency is related to directory hierarchy.

Fourth step: the directory of the queue is dequeued, which goes through three scenarios:

1) The latest modification time changes, and the table of the SQLite3 database does not exist (the first statistics traversal condition);

2) The latest modification time changes, the SQLite3 database exists in the table, and the table field is updated;

3) The latest modification time is not changed, the table field of the SQLite3 database is updated when the catalog size is changed, otherwise, the table field is not updated, (most of the scenes encountered are counted as the situation);

Wherein, the storage path information is updated during updating.

The dequeue can use multithreading to simultaneously dequeue, so that the statistics of the upward process is quickened, and the outgoing catalogs are required to be ensured not to be in a parent-child catalogue relationship.

Fifth step: the stack and the queue are continuously popped, pushed, popped and queued and dequeued until the stack and the queue are empty. The memory overflow of the system is not caused by continuous deletion (taking out) in the stack and the queue.

Sixth step: deleting and cleaning the catalog of which the bottom layer does not exist and the SQLite3 database exists, deleting the database containing all subdirectories under the catalog to ensure that the database does not exist dirty data and the data table cannot be made larger and larger.

The storage file management method in the application has the following advantages:

firstly, the size of each catalog is counted and stored in a full-quantity manner rapidly, the catalog which is not changed is not counted, the size of each catalog is counted, updated and stored, the SQLite3 lightweight database is utilized for system design, and finally the size of each service function storage catalog of the AI platform is used for rapidly acquiring the use size of the storage space of the AI platform, so that the AI platform is convenient for management, display and limitation of the storage space. According to the method and the device, the storage and load pressure of the storage server is reduced, the resource consumption and the long-term occupation of the service module are reduced, and the storage performance of the AI platform is enhanced. According to the method and the system, the service performance of the AI platform can be improved, SQLite3 is used for establishing a storage catalog index, the storage catalog is divided into tables, the storage statistical catalog size request is obtained rapidly, the storage statistical management system is established to improve the storage operation efficiency of the AI service, the model training time is shortened, the model training efficiency is improved, the operation and maintenance cost of operation and maintenance personnel is reduced, and the market competitiveness of the AI platform is improved;

Secondly, the method and the device ensure the efficient and stable operation of the AI platform, effectively shorten the time for model training of algorithm personnel, improve the storage performance of the AI platform, solve the technical problem of large-storage file catalog statistics, solve the problem of frequent interaction statistics of the network and IO of the AI platform, improve the file operation and management performance, reduce the overall utilization rate of resources of the AI platform, enable the AI platform to be used more smoothly, and strengthen the competitiveness of the AI platform.

The storage file management apparatus provided in the embodiments of the present application will be described below, and the storage file management apparatus described below and the storage file management method described above may be referred to correspondingly.

Fig. 10 is a block diagram of a storage file management apparatus according to an embodiment of the present application, and referring to fig. 10, the storage file management apparatus may include:

a first obtaining module 100, configured to obtain a subdirectory of the first directory when a latest modification time of the first directory fetched from the stack changes; the first catalog is stored in an AI platform;

a first judging module 200, configured to judge whether the subdirectory is empty;

a first storing module 300, configured to put the subdirectory in a stack if the subdirectory is not empty;

A second storing module 400, configured to put the child directory, and a parent directory of the child directory satisfying a preset condition, into a queue if the child directory is empty; all subdirectories of the parent directory are put into a queue under the preset condition;

the removal and processing module 500 is configured to take out the second directory from the queue until the stack and the directory in the queue are empty, and perform corresponding operations on the database for storing the directory according to the latest modification time change condition of the second directory.

The storage file management apparatus of the present embodiment is used to implement the aforementioned storage file management method, so that the specific implementation of the storage file management apparatus can be seen from the foregoing example portions of the storage file management method, for example, the first acquisition module 100, the first determination module 200, the first storage module 300, the second storage module 400, and the removal and processing module 500, which are respectively used to implement steps S101, S102, S103, S104, and S105 in the aforementioned storage file management method, so that the specific implementation thereof will not be repeated herein with reference to the description of the examples of the respective portions.

Optionally, the removing and processing module 500 includes:

the first operation submodule is used for determining whether the database needs to be updated according to the size change condition of the second catalogue when the latest modification time of the second catalogue is unchanged;

And the second operation sub-module is used for carrying out corresponding operation on the database according to the storage condition of the second catalog in the database when the latest modification time of the second catalog changes.

Optionally, the first operation submodule includes:

the first judging unit is used for judging whether the size of the second catalogue changes or not;

the first updating unit is used for updating the storage path information of the second catalogue in the database if the size of the second catalogue changes;

and the stopping operation unit is used for not updating the database if the size of the second catalog is not changed.

Optionally, the second operation submodule includes:

the second judging unit is used for judging whether the second catalogue is stored in the database or not;

the second updating unit is used for updating the storage path information of the second catalogue in the database if the second catalogue is stored in the database;

and the inserting unit is used for inserting the second catalogue into the database if the second catalogue is not stored in the database.

Optionally, the storage file management apparatus further includes:

and the index establishing module is used for establishing an index for the table in the database.

Optionally, the storage file management apparatus further includes:

And the storage module is used for carrying out sub-table storage on the second catalogue according to the storage path information.

Optionally, the storage file management apparatus further includes:

and the second judging module is used for judging whether the latest modification time of the second catalog changes or not.

Optionally, the removing and processing module 500 is specifically configured to simultaneously take out the second directories that are not parent-child directories from the queue.

Optionally, the storage file management apparatus further includes:

and the deleting module is used for deleting the directories which exist in the database and do not exist at the bottom layer from the database, wherein the directories comprise a parent directory and all child directories under the parent directory.

Optionally, if the subdirectory is not empty, the storage file management apparatus further includes:

and the recording module is used for recording the information of the parent directory of the child directory.

Optionally, the second storage module 400 includes:

the first storage sub-module is used for placing the subdirectories into a queue if the subdirectories are empty;

the judging sub-module is used for judging whether all the child directories of the parent directory of the child directory are all put into the queue;

and the second storage sub-module is used for placing the parent catalogue into the queue until the parent catalogue does not meet the preset condition if all the child catalogues of the parent catalogue of the child catalogues are placed into the queue.

Optionally, the storage file management apparatus further includes:

and the third judging module is used for judging whether the latest modification time of the first catalogue taken out from the stack changes or not.

Optionally, the storage file management apparatus further includes:

the third storage module is used for placing the first catalogue in a stack;

and the fetching module is used for fetching the first catalogue from the stack.

Optionally, the first obtaining module 100 is specifically configured to traverse the first directory in a cleanable manner, so as to obtain a subdirectory of the first directory.

Optionally, the first acquisition module 100 includes:

an opening sub-module for opening the first directory using an opening function;

and the reading sub-module is used for reading the first catalogue by utilizing the reading function.

Optionally, the storage file management apparatus further includes:

and the second acquisition module is used for acquiring the subdirectory of the first directory from the database when the latest modification time of the first directory taken out from the stack is unchanged.

The following description will be made of an AI platform provided in the embodiment of the present application, and the AI platform described below and the storage file management method described above may be referred to correspondingly.

Fig. 11 is a structural block diagram of an AI platform provided in an embodiment of the present application, where the AI platform includes:

A memory 11 for storing a computer program;

a processor 12 for implementing the steps of the storage file management method of any of the above embodiments when executing a computer program.

FIG. 12 shows a frame diagram of a storage statistics management system on an AI platform, wherein the frame diagram comprises a frame dependency, a frame reference, a frame method and an SQLite3 database, and the frame dependency comprises a database driver (Sqlite-jdbc), a Dynamic loading table, a database query statement (mybats) and a Dynamic loading multiple data sources (Dynamic-datasource); frame entry includes a statistics Storage path (Storage path), a statistics thread number (threadum), a Filter directory list (Filter path); the framework method includes storing statistical task down (Storage Statistic service.storage Statistic (Storage Parmeter)), obtaining Storage Path size (Storage Statistic service.size By Storage Path), obtaining all file sizes (Storage Statistic service.size By Share Path Owner (owner, path)) of group sharing and global sharing belonging to the user; when the database file is mapped onto the storage directory, the storage path may be/mnt/Inpurfs/db/storageSqlite.

When the service module of the AI platform performs storage statistics, on the premise of introducing framework dependence, only a management method is required to be called, and statistical parameters are introduced: the method comprises the steps of counting storage paths, counting thread data, filtering a catalog list (supporting counting and filtering part of catalogs, wherein in an actual scene, a part of catalogs do not need to be counted, for example, the catalogs do not belong to a user home catalog, the counting efficiency is guaranteed, the catalogs can be filtered when counting tasks are issued), and the size of each catalogs can be obtained quickly. The system performs high-efficiency storage statistics based on the AI platform storage integrated AI platform service, manages the storage catalogue, performs storage size statistics by using the scheme, can improve the storage performance and management efficiency in the AI platform, and expands the node number and the number of users of the AI platform for scale management.

The following describes a computer readable storage medium provided in an embodiment of the present application, where the computer readable storage medium described below and the storage file management method described above may be referred to correspondingly.

A computer readable storage medium having a computer program stored thereon, which when executed by a processor performs the steps of any of the above embodiments of a method for storing a file.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The storage file management method, apparatus, AI platform, and computer-readable storage medium provided by the present application are described in detail above. Specific examples are set forth herein to illustrate the principles and embodiments of the present application, and the description of the examples above is only intended to assist in understanding the methods of the present application and their core ideas. It should be noted that it would be obvious to those skilled in the art that various improvements and modifications can be made to the present application without departing from the principles of the present application, and such improvements and modifications fall within the scope of the claims of the present application.

Claims

1. A storage file management method, comprising:

judging whether the subdirectory is empty or not; judging whether the subdirectory is empty or not, namely judging whether the subdirectory exists in the next layer of the subdirectory or not;

if the subdirectory is not empty, putting the subdirectory into the stack;

Taking out the second catalogue from the queue until the catalogues in the stack and the queue are empty, and performing corresponding operation on a database for storing the catalogues according to the latest modification time change condition of the second catalogue;

wherein before the second directory is fetched from the queue, further comprising: judging whether the second catalogue exists in the queue or not; if the second catalogue exists in the queue, executing the step of taking the second catalogue out of the queue; if the second catalogue does not exist in the queue, waiting for the end of the stack, and taking out the second catalogue from the queue after the end of the stack waiting;

according to the latest time change condition of the second catalog, the corresponding operation on the database for storing the catalog comprises the following steps:

when the latest modification time of the second catalog changes, carrying out corresponding operation on the database according to the storage condition of the second catalog in the database;

determining whether the database needs to be updated according to the size change condition of the second catalog comprises:

Judging whether the size of the second catalog is changed or not;

2. The storage file management method according to claim 1, wherein performing a corresponding operation on said database according to a save condition in said database with respect to said second directory comprises:

judging whether the second catalogue is stored in the database or not;

3. The storage file management method according to claim 2, further comprising:

and establishing an index for a table in the database.

4. The storage file management method according to claim 3, further comprising:

5. The storage file management method according to claim 1, further comprising:

6. The storage file management method according to claim 1, wherein said database is a micro-service, migratable and configurably installable database that can be embedded in an AI platform.

7. The storage file management method of claim 1, wherein retrieving the second directory from the queue comprises:

8. The storage file management method according to claim 1, further comprising:

9. The storage file management method according to claim 1, wherein if said subdirectory is not empty, further comprising:

and recording the information of the parent directory of the child directory.

10. The method of claim 1, wherein if the child directory is empty, placing the child directory, a parent directory of the child directory satisfying a predetermined condition, into a queue comprises:

If the subdirectory is empty, putting the subdirectory into a queue;

11. The storage file management method according to claim 1, wherein when a latest modification time of a first directory fetched from a stack changes, before a sub-directory of said first directory is acquired, comprising:

12. The storage file management method according to claim 11, wherein before determining whether a most recent modification time of a first directory fetched from a stack has changed, further comprising:

placing the first directory in a stack;

and taking the first catalogue out of the stack.

13. The storage file management method according to claim 1, wherein acquiring a sub-directory of said first directory comprises

14. The storage file management method of claim 13, wherein traversing said first directory in a cleanable manner comprises:

15. The storage file management method according to any one of claims 1 to 14, further comprising:

16. A storage file management apparatus, comprising:

the first judging module is used for judging whether the subdirectory is empty or not; judging whether the subdirectory is empty or not, namely judging whether the subdirectory exists in the next layer of the subdirectory or not;

The removing and processing module is used for taking out the second catalogue from the queue until the catalogue in the stack and the queue is empty, and carrying out corresponding operation on a database for storing the catalogue according to the latest modified time change condition of the second catalogue;

the removal and processing module comprises:

the second operation submodule is used for carrying out corresponding operation on the database according to the storage condition of the second catalogue in the database when the latest modification time of the second catalogue changes;

the first operation submodule includes:

A first judging unit, configured to judge whether the size of the second directory changes;

a first updating unit, configured to update storage path information of the second directory in the database if the size of the second directory changes;

17. An AI platform, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the storage file management method according to any one of claims 1 to 15 when executing said computer program.

18. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the storage file management method according to any of claims 1 to 15.