CN113703956A - Training task execution method, device, equipment and storage medium - Google Patents

Training task execution method, device, equipment and storage medium Download PDF

Info

Publication number
CN113703956A
CN113703956A CN202110402585.XA CN202110402585A CN113703956A CN 113703956 A CN113703956 A CN 113703956A CN 202110402585 A CN202110402585 A CN 202110402585A CN 113703956 A CN113703956 A CN 113703956A
Authority
CN
China
Prior art keywords
directory
container
storage system
network storage
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110402585.XA
Other languages
Chinese (zh)
Inventor
查冲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110402585.XA priority Critical patent/CN113703956A/en
Publication of CN113703956A publication Critical patent/CN113703956A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Abstract

The application discloses a training task execution method, device, equipment and storage medium, and belongs to the technical field of artificial intelligence. Optionally, the application relates to a cloud storage technology, and training data is stored through a cloud to meet requirements of a multi-end training task. According to the embodiment of the application, the mounting step of the network storage system is preposed before the container is created, the time consumption of the mounting step is not counted in the time consumption statistics of the task execution flow, and the task execution efficiency is improved. When the container is created, the mounted directory and the directory where the container is located are mapped, the directory mapping process is the localization operation of the directory, the mounting failure condition possibly caused by network communication can be avoided, the directory mapping process almost consumes little time, the task execution efficiency is high, and the success rate of task execution is improved. The network storage system mounting step and the task execution flow are decoupled, the condition that the task execution fails due to the fact that the network storage system mounting fails is avoided, and the success rate of task execution is improved.

Description

Training task execution method, device, equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a training task execution method, apparatus, device, and storage medium.
Background
Container technology refers to technology that effectively partitions resources of a single operating system into isolated groups, so as to balance conflicting resource usage requirements among the isolated groups. The application of container technology is more and more widespread, and tasks are executed by creating containers to improve task processing efficiency. Wherein in the field of artificial intelligence technology, training tasks can be performed by creating containers. The training data is usually stored in a network storage system because the data volume of the training data required by the training task is large, and the network storage system is mounted in a container when the container executes the training task, so that the container can acquire the required training data from the network storage system. The network storage system can be realized by adopting a cloud storage technology, namely, the training data is stored in the distributed storage system.
Currently, training task execution generally includes creating a container corresponding to a training task when the training task needs to be executed, during the container creation process, mounting a network storage system to a directory where the container is located, and then enabling the container to acquire training data from the mounted directory to execute the training task.
In the method, the mounting process of the network storage system is coupled to the critical path of the container creation and the task execution, the mounting step of the network storage system is executed when the container is created, and the mounting step has a certain time delay, which results in low efficiency of task execution. The mounting step also has a risk of mounting failure, which directly results in task execution failure, and especially when the mounting step depends on network communication, the success rate of task execution is seriously affected when the network communication is not good.
Disclosure of Invention
The embodiment of the application provides a training task execution method, a training task execution device and a training task execution storage medium, and the task execution efficiency and the success rate can be improved. The technical scheme is as follows:
in one aspect, a training task execution method is provided, and the method includes:
acquiring an address of a network storage system, wherein training data required for executing a training task are stored in subdirectories under a root directory in the network storage system;
mounting the root directory in the address to a local directory of the current equipment;
in response to the container corresponding to any training task being created, creating a link between the directory where the container is located and the subdirectory of the root directory in the local directory;
and reading training data required for executing the training task from the subdirectory based on the link in the process of executing the training task by the container.
In some embodiments, the network storage system is a distributed storage system.
In some embodiments, the network storage system is a distributed portable operating system interface posix system.
In some embodiments, the network storage system is a block chain system.
In some embodiments, the root directory mounted in the local directory may be multiplexed.
In some embodiments, the container corresponding to the training task is a first container; the method further comprises the following steps:
and in response to the creation of a second container corresponding to another training task, creating a link between the directory in which the second container is located and the subdirectory of the root directory in the local directory.
In one aspect, a training task performing apparatus is provided, the apparatus including:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring an address of a network storage system, and training data required for executing a training task are stored in subdirectories under a root directory in the network storage system;
the mounting module is used for mounting the root directory in the address into a local directory of the current equipment;
the creating module is used for responding to the creation of a container corresponding to any training task and creating a link between the directory where the container is located and the subdirectory of the root directory in the local directory;
and the execution module is used for reading training data required by the training task from the subdirectory based on the link in the process that the container executes the training task.
In some embodiments, the obtaining module is to perform any one of:
responding to the restart of the current equipment, and executing the step of acquiring the address of the network storage system;
and responding to the container corresponding to the training task created for the first time in the current equipment, and executing the step of acquiring the address of the network storage system.
In some embodiments, the creation module is to:
responding to at least two subdirectories under the root directory, and determining a target subdirectory from the at least two subdirectories, wherein the target subdirectory is a subdirectory where training data required by the container to execute the training task are located;
creating a link between the directory in which the container resides and the target subdirectory.
In some embodiments, the creation module is to:
in response to the creation of a container corresponding to any training task, inquiring whether a root directory of the network storage system is contained in the local directory of the current equipment;
and responding to a root directory of a network storage system contained in the local directory, and executing the step of creating the link between the directory in which the container is located and the subdirectory of the root directory in the local directory.
In some embodiments, the obtaining module and the mounting module are configured to, in response to that a root directory of the network storage system is not included in the local directory, perform the steps of obtaining an address of the network storage system and mounting the root directory in the address into a local directory of a current device.
In some embodiments, the apparatus further comprises:
and the deleting module is used for responding to a destroying instruction of any container and deleting the link between the directory where the container is located and the subdirectory of the root directory in the local directory.
In some embodiments, the deletion module is further configured to retry deleting the link in response to a failure to delete the link between the directory in which the container is located and the subdirectory of the root directory in the local directory.
In some embodiments, the network storage system is a distributed storage system.
In some embodiments, the network storage system is a distributed portable operating system interface posix system.
In some embodiments, the network storage system is a block chain system.
In some embodiments, the root directory mounted in the local directory may be multiplexed.
In some embodiments, the container corresponding to the training task is a first container; and the creating module is also used for creating a link between the directory in which the second container is positioned and the subdirectory of the root directory in the local directory in response to creating a second container corresponding to another training task.
In one aspect, an electronic device is provided that includes one or more processors and one or more memories having stored therein at least one computer program that is loaded and executed by the one or more processors to implement various alternative implementations of the training task execution method described above.
In one aspect, a computer-readable storage medium is provided, in which at least one computer program is stored, which is loaded and executed by a processor to implement various alternative implementations of the training task execution method described above.
In one aspect, a computer program product or computer program is provided that includes one or more program codes stored in a computer-readable storage medium. The one or more program codes can be read from the computer-readable storage medium by one or more processors of the electronic device, and the one or more processors execute the one or more program codes, so that the electronic device can execute the training task execution method of any one of the above-mentioned possible embodiments.
The embodiment of the application provides a network storage system pre-mounting mode, a mounting step of a network storage system is preposed before a container is created, the mounting step is not executed when the container is created, on one hand, the mounting step is preposed, and the time consumption of the mounting step is not counted in the time consumption statistics of a task execution flow, so that the task execution efficiency is improved. On the other hand, the mounting step is preposed, the mounted directory and the directory where the container is located are mapped when the container is created, the directory mapping process is the localization operation of the directory, network communication is not needed, the mounting failure condition possibly caused by the network communication can be avoided, the directory mapping process almost consumes little time, the task execution efficiency is high, and the success rate of task execution is also improved. On the other hand, the network storage system mounting step and the task execution flow are decoupled, if the network storage system mounting fails, mounting is carried out again, the task execution flow cannot be directly influenced, the condition that the task execution fails due to the network storage system mounting failure is avoided, and the success rate of task execution is further improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to be able to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of an implementation environment of a training task execution method according to an embodiment of the present application;
FIG. 2 is a flowchart of a training task execution method provided by an embodiment of the present application;
FIG. 3 is a flowchart of a training task execution method provided by an embodiment of the present application;
FIG. 4 is a flowchart of a training task execution method provided by an embodiment of the present application;
FIG. 5 is a flowchart of a training task execution method provided by an embodiment of the present application;
FIG. 6 is a flowchart of a training task execution method provided by an embodiment of the present application;
FIG. 7 is a flowchart of a training task execution method provided by an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a training task performing device according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 10 is a block diagram of a terminal according to an embodiment of the present disclosure;
fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The terms "first," "second," and the like in this application are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it should be understood that "first," "second," and "nth" do not have any logical or temporal dependency or limitation on the number or order of execution. It will be further understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, the first image can be referred to as a second image, and similarly, the second image can be referred to as a first image without departing from the scope of the various examples. The first image and the second image can both be images, and in some cases, can be separate and distinct images.
The term "at least one" is used herein to mean one or more, and the term "plurality" is used herein to mean two or more, e.g., a plurality of packets means two or more packets.
It is to be understood that the terminology used in the description of the various examples herein is for the purpose of describing particular examples only and is not intended to be limiting.
It should also be understood that, in the embodiments of the present application, the size of the serial number of each process does not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
It should also be understood that determining B from a does not mean determining B from a alone, but can also determine B from a and/or other information.
The following is a description of terms involved in the present application.
A network storage system: refers to a system for providing network storage services. Network storage is one way of data storage. Network storage is typically implemented through a special dedicated data storage server. Such a server includes storage devices (e.g., disk arrays, CD/DVD drives, tape drives, or removable storage media) and embedded system software, thereby providing cross-platform file sharing functionality.
The network storage system may be embodied as a network disk. The network disk (Cloud Storage) is also called a network hard disk, a network space, a Cloud hard disk, and the like. A network disk is a website that provides file hosting and file upload and download services. Most of File hosting services (File hosting services) provided by the network disk are network services similar to a File Transfer Protocol (FTP), and a simple uploading and downloading function is added on the basis, so that a user can conveniently access files. Compared with the situation that the files are stored in the local disk, so that the files are poor in mobility and sharing performance, the network disk has the advantage that the files are stored in the server of the service provider, and anyone can access the files through the network at any time and any place. Under the condition that the used broadband is fast, the time is about the same as that required by using a local disk, and the file can be quickly accessed.
A container: the principle of the method is that different system views are provided for different processes, isolation among the processes is achieved through a Linux namespace mechanism, the processes in the same namespace can be visible mutually, can access to each other and can communicate with each other, the namespace is a resource localization set and is formed by combining namespaces of a plurality of subsystems, an object of each subsystem is localized into a plurality of instances through an original globally unique instance, the instances are not interfered with each other, and one instance cannot access to elements in another instance. Linux is a UNIX-like operating system for free and open source code. UNIX (unified Information and Computing Service), also called UnICS for short, is a multi-user, multi-process computer operating system. The container is capable of performing tasks and may therefore also be referred to as a task container.
Task: in life, the system refers to various purposeful activities performed by people in daily life, work and entertainment activities, and generally refers to work assigned by a superior, responsibility assumed by the superior and the like. In a computer system, a task is a term of art of its basic unit of work, that is, work, processes, or procedures that the computer system needs to perform or complete.
Training tasks: in the field of artificial intelligence technology, training tasks are performed to enable computers to react similarly to humans, replacing human processing businesses. For example, an image classification model may be trained over sample images such that the image classification model can classify an input image, determining the type of the image. For another example, a sample text training text processing model may be provided, so that the image classification model can process the text based on natural language processing technology to obtain an abstract of the text, or translate the text into a text in another language, and the like. Of course, the above description only takes an image classification scenario and a text processing scenario as an example, and the training task may be applied to any scenario, for example, a speech recognition scenario, an image processing scenario, a video processing scenario, a text processing scenario, and the like, which is not limited in the embodiment of the present application.
The following is a brief description of artificial intelligence.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.
The following describes an embodiment of the present application.
Fig. 1 is a schematic diagram of an implementation environment of a training task execution method according to an embodiment of the present application. The implementation environment includes a terminal 101 and a network storage system 102. The terminal 101 is connected to the network storage system 102 through a wireless network or a wired network.
The terminal 101 is at least one of a desktop computer, a smart phone, a game console, a tablet computer, an e-book reader, a laptop portable computer. The terminal 101 is installed and running with an application program that supports the execution of the training task. The terminal 101 executes a training task, and when executing the training task, creates a container through which the training task is executed.
The network storage system 102 includes at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. The network storage system 102 is used to provide data services for training tasks. The network storage system 102 may store training data, and when a training task is executed, the terminal 101 needs to obtain the training data from the network storage system 102 to execute the training task based on the training data. In the embodiment of the present application, the network storage system 102 is mounted on the terminal 101, and a place where the network storage system 102 is mounted may be referred to as a mounting point. The terminal 101 accesses the mount point to access training data stored by the network storage system 102 when performing a training task.
Optionally, the network storage system 102 includes at least one server 1021 and at least one database 1022, where the database 1022 is used for storing training data, and in this embodiment, the database 1022 stores training data and provides data services for the at least one server 1021.
Optionally, the network storage system 102 may include at least one server 1021, where the server 1021 stores training data locally without data services provided by the database 1022.
The server is an independent physical server, is also a server cluster or distributed system formed by a plurality of physical servers, and is also a cloud server for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN (content delivery network) and a big data and artificial intelligence platform. The terminal is a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto. The following detailed description is made with respect to the network storage system 102 being a distributed system, the distributed system may be a block chain system, and the network storage system may adopt a cloud technology, and will not be described herein again.
Those skilled in the art will appreciate that there may be more or fewer terminals 101 and servers 1021. For example, there is only one terminal 101 or one server 1021, or tens or hundreds of the terminals 101 and the servers 1021, or more, and the number of the terminals or the servers and the device types are not limited in the embodiments of the present application.
The network storage system can be realized by a cloud technology, and particularly relates to a cloud storage technology in the cloud technology, and the cloud technology and the cloud storage are explained below.
Cloud technology refers to a hosting technology for unifying serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. Cloud technology (Cloud technology) is based on a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied in a Cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.
A distributed cloud storage system (hereinafter, referred to as a storage system) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network through application software or application interfaces to cooperatively work by using functions such as cluster application, grid technology, and a distributed storage file system, and provides a data storage function and a service access function to the outside.
At present, a storage method of a storage system is as follows: logical volumes are created, and when created, each logical volume is allocated physical storage space, which may be the disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as data identification (ID, ID entry), the file system writes each object into a physical storage space of the logical volume, and the file system records storage location information of each object, so that when the client requests to access the data, the file system can allow the client to access the data according to the storage location information of each object.
The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided in advance into stripes according to a group of capacity measures of objects stored in a logical volume (the measures often have a large margin with respect to the capacity of the actual objects to be stored) and Redundant Array of Independent Disks (RAID), and one logical volume can be understood as one stripe, thereby allocating physical storage space to the logical volume.
The network storage system 102 may be a distributed system. The distributed system stores data in a distributed manner through the server cluster, so that the storage burden of a single server can be reduced, and the accuracy of the stored data can be improved.
Alternatively, the distributed system may be a blockchain system, and training data required for training tasks may be stored on blockchains of the blockchain system. For example, a corresponding block may be generated based on training data, the block including the training data, and then stored in the form of a block on the block chain. In some embodiments, data obtained after the training task is performed may also be stored on the blockchain of the blockchain system. The following description is directed to a distributed system and a blockchain system.
The network storage system related to the embodiment of the invention can be a distributed system formed by connecting a client, a plurality of nodes (any form of computing equipment in an access network, such as a server and a user terminal) through a network communication form.
Taking a distributed system as an example of a blockchain system, referring To fig. 2, fig. 2 is an optional structural schematic diagram of a blockchain system To which the distributed system 200 provided by the embodiment of the present invention is applied, and is formed by a plurality of nodes 201 (computing devices in any form in an access network, such as servers and user terminals) and a client 202, a Peer-To-Peer (P2P, Peer To Peer) network is formed between the nodes, and the P2P Protocol is an application layer Protocol operating on a Transmission Control Protocol (TCP). In a distributed system, any machine, such as a server or a terminal, can join to become a node, and the node comprises a hardware layer, a middle layer, an operating system layer and an application layer.
Referring to the functions of each node in the blockchain system shown in fig. 2, the functions involved include:
1) routing, a basic function that a node has, is used to support communication between nodes.
Besides the routing function, the node may also have the following functions:
2) the application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization functions to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain system, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully.
For example, the services implemented by the application include:
2.1) wallet, for providing the function of transaction of electronic money, including initiating transaction (i.e. sending the transaction record of current transaction to other nodes in the blockchain system, after the other nodes are successfully verified, storing the record data of transaction in the temporary blocks of the blockchain as the response of confirming the transaction is valid; of course, the wallet also supports the querying of the electronic money remaining in the electronic money address.
And 2.2) sharing the account book, wherein the shared account book is used for providing functions of operations such as storage, query and modification of account data, record data of the operations on the account data are sent to other nodes in the block chain system, and after the other nodes verify the validity, the record data are stored in a temporary block as a response for acknowledging that the account data are valid, and confirmation can be sent to the node initiating the operations.
2.3) Intelligent contracts, computerized agreements, which can enforce the terms of a contract, implemented by codes deployed on a shared ledger for execution when certain conditions are met, for completing automated transactions according to actual business requirement codes, such as querying the logistics status of goods purchased by a buyer, transferring the buyer's electronic money to the merchant's address after the buyer signs for the goods; of course, smart contracts are not limited to executing contracts for trading, but may also execute contracts that process received information.
3) And the Block chain comprises a series of blocks (blocks) which are mutually connected according to the generated chronological order, new blocks cannot be removed once being added into the Block chain, and recorded data submitted by nodes in the Block chain system are recorded in the blocks.
Fig. 3 is a schematic diagram of a training task execution system architecture according to an embodiment of the present disclosure, and in this embodiment, a description is given by taking the network storage system as a network disk as an example. Referring to fig. 3, the training task execution system architecture may include a mesh disk storage layer 301, a physical device layer 302, and a task container layer 303. The network disk storage layer 301 may be implemented as a network storage system, which may be a network disk. The physical device layer 302 may be implemented as a physical device with computing capability, where the physical device is also an electronic device, and here, one physical device may be denoted as a node (node), a single physical device is a node, a plurality of physical devices are a plurality of nodes, and the electronic device may be a server or a terminal. The task container layer 303 may be implemented as a container. Since in the embodiments of the present application, the container is used for performing a training task, the container may also be referred to as a task container. The task container layer 303 may be arranged and controlled by a kubernets system. Kubernetes is a container layout engine open to Google. The task container layer 303 uses a docker (container) technology to realize delivery of containerized resources and start and stop of tasks.
In the embodiment of the present application, the disk storage layer 301 can store training data. If the mesh disk in the mesh disk storage layer 301 is mounted in a physical device (node) of the physical device layer 302, the physical device can create a task container, and a training task is executed through the task container.
The physical device is capable of creating one or more task containers to perform tasks. If a task is included, a task container can be created, and the task container can execute the task; if a task is included, a set of task containers may also be created to perform the task. If multiple tasks are included, multiple task containers may be created from which the multiple tasks are processed in parallel. Wherein each task container is used for processing a task, and a task can be executed by one or a group of task containers. The task container is represented by a pod (a data structure compatible with C language), which is a data structure that can encapsulate a task container or a group of task containers into a pod. One physical device (node) may create one pod or multiple pods, which is not limited in this embodiment of the present application.
In this embodiment of the present application, the electronic device may load (mount) a network disk locally to the physical device, and when creating the task container, the sub-directory of the mounting point may be mapped into the container in a directory mapping manner, where the directory mapping process may be referred to as a binding (bind) process, and is a process of establishing a link between the sub-directory of the mounting point and the directory where the container is located, or establishing a mapping relationship between the sub-directory of the mounting point and the directory where the container is located.
It should be noted that the number of the physical devices, the network disks, and the task containers shown in fig. 3 is only an exemplary illustration, and the number of the physical devices, the network disks, and the task containers may be one or more, and the embodiment of the present application does not limit this.
Fig. 4 is a flowchart of a training task execution method provided in an embodiment of the present application, where the method is applied to an electronic device, and the electronic device is a terminal or a server. Referring to fig. 2, the method includes the following steps.
401. The electronic equipment acquires the address of a network storage system, and training data required for executing a training task are stored in subdirectories under a root directory in the network storage system.
The address is used to uniquely identify the network storage system through which the network storage system can be accessed.
The root directory refers to the top level directory of the logical drive, which is relative to the subdirectories. For example, the electronic device includes a local hard disk C disk and a D disk, and double-clicking the C disk enters the root directory of the C disk, and double-clicking the D disk enters the root directory of the D disk. And so on. The root directory is created at the time of file system creation, and its purpose is to store directory entries of sub-directories (also called folders) or files. Assuming the directory is compared to a tree, the root of the tree is its root, which is the root directory.
A child directory is a directory within a parent directory. A parent directory is a child directory, for example, a root directory may include one or more child directories, where the root directory is a parent directory of the one or more child directories. In the following description, a sub-directory of a root directory is referred to as a first sub-directory, and there may be a sub-directory in the first sub-directory, which is referred to as a second sub-directory, in this case, the second sub-directory is a sub-directory of the first sub-directory, and the first sub-directory is a parent directory of the second sub-directory.
The training data refers to data required by a training task, for example, the training data may include sample data, and may also include an initial model or model configuration data, and the specific content of the training data may be set by a related technician as required, which is not limited in this embodiment of the application.
In an embodiment of the present application, a file system of a network storage system stores training data, and the training data is stored in a subdirectory of a root directory of the file system. The electronic equipment can mount the network storage system into a local directory of the electronic equipment, and then access the mounted directory to acquire the training data when a training task is subsequently executed. Therefore, the electronic device can obtain the address of the network storage system to be mounted first, and then can access the file system of the network storage system through the address, so that the root directory in the file system can be mounted locally.
402. And the electronic equipment mounts the root directory in the address into a local directory of the current equipment.
Mounting (mounting): refers to a process by which computer files and directories on a network storage device (such as a hard disk, CD-ROM, or shared resource) are made available to a user through the computer's file system by an operating system. That is, the network storage device is mounted on a computer, and a user can access a computer file or directory on the network storage device through a file system of the computer.
The file system may also be called an information management module, or a file management module, and is mainly responsible for managing software resources. All software resources are stored in a storage medium in the form of files, and information is transmitted in a computer in units of files. Thus, a file is defined as a collection of related information elements. All files form a file system in a computer, which is synonymous with a management module of the operating system, but is usually distinguishable by the context and the occasion in which they occur.
Pre-mounting a net disk: training data used in the course of running AI training is generally large in data volume, and the space of a local hard disk of a device (a device performing a training task) is basically not satisfied, and a network storage system (i.e., a network disk) is usually required to be mounted on the device. The network storage system is used for storing the training data of the service. Therefore, the net disk is mounted before the service starting training, namely, the disk mounting operation is preposed to the service starting stage, and the process is called net disk pre-mounting.
The local directory is a directory on the current device, i.e., a path on the current device. The process of finding along the path need not be networked.
In this step 402, the electronic device performs a mount step, i.e. a pre-mount process, when the container has not been created yet to perform the training task. When the network storage system is mounted, the root directory in the address of the network storage system is mounted to the local directory of the current device, so that when the network storage system needs to be accessed subsequently, the data or the directory on the network storage system can be accessed by accessing the mounted position.
In the above steps 401 and 402, the electronic device has already performed the mounting step of the network storage system, and at this time, there may not yet be a training task to be executed, and when a training task needs to be executed subsequently, the electronic device may execute the following steps 403 and 404 to extract training data from the network storage system by accessing the mounted position, so as to support the execution of the training task.
403. And the electronic equipment responds to the creation of a container corresponding to any training task, and creates a link between the directory where the container is located and the subdirectory of the root directory in the local directory.
When the electronic device needs to execute a training task, a container is created for the task, and the training task is executed in the container. When the container executes a training task, training data needs to be acquired, and training is performed based on the training data.
The root directory of the network storage system is already mounted in the local directory of the electronic device, and since the root directory is already mounted on the electronic device, the subdirectories under the root directory are also naturally already mounted on the electronic device. The training data may be stored in a subdirectory under the root directory, which is naturally accessible if the container needs to obtain the training data.
In consideration of the operating characteristics of the container, the container usually extracts data in the directory where the container is located during operation, and the training data is located in the local directory of the electronic device, so that the two directories need to be mapped, so that the container can extract the required data from the local directory through the mapping.
404. And the electronic equipment reads training data required for executing the training task from the subdirectory based on the link in the process of executing the training task by the container.
The link is established, that is, the directory mapping is well done, and the electronic device can read the training data from the local directory based on the link when the container runs, so that the training data can be extracted without connecting a network to mount the network storage system to the directory where the container is located, and the container can be completed only by performing local read-write operation.
The embodiment of the application provides a network storage system pre-mounting mode, a mounting step of a network storage system is preposed before a container is created, the mounting step is not executed when the container is created, on one hand, the mounting step is preposed, and the time consumption of the mounting step is not counted in the time consumption statistics of a task execution flow, so that the task execution efficiency is improved. On the other hand, the mounting step is preposed, the mounted directory and the directory where the container is located are mapped when the container is created, the directory mapping process is the localization operation of the directory, network communication is not needed, the mounting failure condition possibly caused by the network communication can be avoided, the directory mapping process almost consumes little time, the task execution efficiency is high, and the success rate of task execution is also improved. On the other hand, the network storage system mounting step and the task execution flow are decoupled, if the network storage system mounting fails, mounting is carried out again, the task execution flow cannot be directly influenced, the condition that the task execution fails due to the network storage system mounting failure is avoided, and the success rate of task execution is further improved.
Fig. 5 is a flowchart of a training task execution method provided by an embodiment of the present application, and referring to fig. 5, the method includes the following steps.
501. The electronic equipment responds to the restart of the current equipment or responds to the first creation of a container corresponding to the training task in the current equipment, and the address of a network storage system is obtained, wherein training data required for executing the training task are stored in subdirectories under a root directory in the network storage system.
Because the data volume of the training data is huge, the training data is stored through the network storage system, and further, when the electronic equipment needs to use the training data to execute a training task, the network storage system can be mounted on the electronic equipment, so that the container on the electronic equipment can obtain the training data through the mounted position by the network storage system.
In some embodiments, the training data is stored in a subdirectory of a root directory of the network storage system. In some embodiments, at least one subdirectory may be included under the root directory. The at least one subdirectory may be one subdirectory or a plurality of subdirectories.
In some embodiments, the training data required for different training tasks may be different, and different training data may be stored in different subdirectories. And when the container executes the training task, extracting the training data from the subdirectory where the training data required by the training task is located.
In step 501, the time when the electronic device executes to acquire the address of the network storage system is when the electronic device restarts or a container corresponding to a training task is created for the first time. The following description will be made for each of the two cases.
In the first situation, when the device is restarted, the device can complete some service requirements after being restarted, such as executing a training task, at this time, the electronic device obtains an address of the network storage system, executes a subsequent mounting step, and puts the mounting step of the network storage system before creating a container to execute the training task, the subsequent training task execution process is not affected by time delay generated in the mounting process, failure caused by bad networks, and the like, and task execution efficiency and success rate can be greatly improved.
In the second case, when the device creates a container for the first time to execute a training task, the mounting step of the network storage system can be executed, so that when the training task is executed, the mounting step is completed, and when other training tasks are subsequently performed, the mounted network storage system is already available, the mounted network storage system can be directly applied, and the efficiency and the success rate of the subsequent task execution are also improved.
In some embodiments, the electronic device may also have previously performed a mount step, and thus, the electronic device may also determine whether the network storage system is mounted in the local directory of the current device in response to the current device restarting or in response to a container corresponding to a training task being created for the first time in the current device. If the network storage system is already mounted in the local directory of the current device, the electronic device may not need to perform step 501 and step 502, but directly perform the subsequent step 503. If the network storage system is not already mounted in the local directory of the current device, the electronic device may perform step 501 and step 502.
502. And the electronic equipment mounts the root directory in the address into a local directory of the current equipment.
After the electronic device obtains the address of the network storage system, the electronic device can access the file system on the network storage system through the address, and naturally can also access the file directory stored on the network storage system and the files in the directory.
The mount process may be a mount process, the electronic device itself has its own file system, and the mount process is used to mount the network storage system to an existing directory in its own file system of the electronic device, so that the network storage system is attached to the electronic device, and when the mounted directory is accessed, the network storage system is also accessed.
In some embodiments, the network storage system may be provided with a target interface for providing data manipulation services of the network storage system. The address of the network storage system acquired by the electronic device is also the identification information of the interface. Through the identification information of the target interface, the electronic equipment can access the target interface and access the file system in the network storage system through the target interface.
Specifically, the electronic device may access the target interface, obtain a root directory of the network storage system through the target interface, where the root directory may include one or more subdirectories, then establish a link between the root directory and the local directory, and then, when we access the local directory, access the root directory of the network storage system through the link.
In some embodiments, the network storage system may be a distributed storage system, that is, the network storage system may be composed of a plurality of computer devices, for the plurality of computer devices, an address list of the computer devices may be maintained in the distributed storage system, and the distributed storage system may map the directories of the plurality of computer devices with the root directory provided by the target interface through the address list. From this mapping, the distributed storage system is able to determine from which directory of which computer devices in the address list data needs to be retrieved. The address list of the distributed storage system may be data invisible to the user, that is, the address list is maintained in the distributed storage system, the electronic device accesses a general directory obtained by summarizing or directory mapping the distributed storage system, which is accessed by the target interface, and the electronic device does not need to go to a single computer device to obtain related data according to the address list.
In a specific possible embodiment, the network storage System is a distributed Operating System Interface (Portable Operating System Interface) System. Accordingly, the target interface may be a posix file manipulation interface. For example, the network storage system may be a ceph, glusterds, or the like storage system. For such a storage system, developers are required to make deployment and construction of distributed storage in advance, which is not specifically described herein.
503. And the electronic equipment responds to the first container corresponding to the first training task, and creates a link between the directory where the first container is located and the subdirectory of the root directory in the local directory.
The electronic device may receive some training tasks and use the training data to perform the training tasks. In performing the training task, the electronic device may create a container based on which to perform the training task. The training data set required by the training task is stored in the network storage system, and when the training task is executed, the container needs to go to the network storage system to obtain the required training data.
Through the above steps 501 and 502, the electronic device has mounted the network storage system into the local directory, so that when the container is created, the container can be guided to access the local directory to obtain the required training data by performing directory mapping between the directory where the container is located and the local directory.
In some embodiments, different training data may be stored in different subdirectories, including one or more subdirectories under the root directory. In case only one sub-directory is included in the root directory, the electronic device may create a link between the directory in which the first container is located and the sub-directory in the local directory.
In case at least two sub-directories are included under the root directory, the electronic device may determine from among them in which sub-directory the training data needed for the first container is, and then access that sub-directory to obtain the training data. Specifically, in response to the electronic device including at least two subdirectories under the root directory, a target subdirectory is determined from the at least two subdirectories, and the target subdirectory is a subdirectory in which training data required for the first container to perform the training task is located. The electronic device may then create a link between the directory in which the first container is located and the target subdirectory.
When the electronic device creates the link, the electronic device may first obtain the directory where the first container is located and the subdirectory of the root directory, and then create the link between the directory and the subdirectory. The directory mapping process is a read-write operation of a local directory, networking is not needed, and the success rate can even reach one hundred percent.
In some embodiments, when the electronic device needs to create a container to perform a training task, it may first detect whether a root directory of the network storage system is already mounted in its local directory, and if so, perform the directory mapping process shown in this step 503. Specifically, the electronic device may query whether a root directory of the network storage system is included in the local directory of the current device in response to creating the first container corresponding to the first training task. The electronic device may perform the step of creating a link between a directory in which the first container is located and a subdirectory of a root directory of the local directory in response to the root directory of the local directory comprising the network storage system.
Combining the above embodiments of querying whether there is a mount point and querying the subdirectories, a specific possible embodiment is provided, where the electronic device may respond that the local directory includes a root directory of the network storage system, and the root directory includes at least two subdirectories, determine a target subdirectory from the at least two subdirectories, where the target subdirectory is a subdirectory where the container executes data required by the training task, and then create a link between the directory where the container is located and the target subdirectory.
When detecting whether the local directory of the electronic device has the root directory of the network storage system, there may be a case that the local directory of the electronic device does not have the root directory of the network storage system, and the electronic device needs to perform the mounting step again, that is, perform the step 501 and the step 502. Specifically, the electronic device may perform, in response to that the local directory does not include the root directory of the network storage system, the steps of obtaining an address of the network storage system and mounting the root directory in the address into the local directory of the current device.
Through the query or detection step, when the previous mounting step has a problem or a mounting point has a problem, corresponding measures can be taken to re-mount so as to ensure the acquisition of training data and the success rate of task execution.
It should be noted that, in step 503, in response to creating a container corresponding to any training task, a process of creating a link between the directory where the container is located and the subdirectory of the root directory in the local directory is performed, and here, only a first container corresponding to a first training task is taken as an example for description, and then, a second container corresponding to a second training task may also be used, which is not limited in this embodiment of the application.
504. The electronic equipment reads the training data required for executing the first training task from the subdirectory based on the link in the process of executing the first training task by the first container.
The electronic equipment creates a link between a directory where the container is located and a subdirectory, and when a training data acquisition step is executed in the process that the first container executes a first training task, the electronic equipment can jump from the directory where the container is located to the subdirectory based on the link, the subdirectory is in contact with the subdirectory of the network storage system, and the container can go into the network storage system through the subdirectory to acquire corresponding training data. Of course, the first training task may be continuously executed when the first container acquires the training data, and details of the training process are not repeated here.
505. And the electronic equipment responds to the destroy instruction of the first container and deletes the link between the directory where the first container is located and the subdirectory of the root directory in the local directory.
After the first container performs the first training task, the first container may be destroyed. When the electronic device receives the destroy instruction of the first container, the link created in step 503 may be deleted, but the root directory mounted by the network storage system is not deleted, so that the network storage system is also mounted in the local directory of the electronic device, and when there are other training tasks subsequently, the mounting point may be used continuously without further mounting steps.
It should be noted that, in the embodiment of the present application, since the network storage system is mounted in the local directory of the electronic device, rather than directly mounted in the directory where the container is located when the container is created, the mounting point is not destroyed when the container is destroyed, and thus, the mounting step does not need to be performed every time a training task is performed. The mounting times are greatly reduced, and the task execution efficiency is improved.
The mounting step preposition method provided by the embodiment of the application can enable the root directory mounted in the local directory to be multiplexed. When each container is destroyed by using the ending root directory, only the link created in the directory mapping process is destroyed, and for other containers, the link can be created naturally without repeated mounting. In addition, the mounting step is preposed, and the parallelism of a plurality of training tasks can be supported, namely, the links between the subdirectory under the root directory and the directories where a plurality of containers are positioned can be created, and then the plurality of training tasks are executed simultaneously.
In some embodiments, for the link deletion step described above, when a container is destroyed and a link is also deleted, the container destruction may be considered successful. If the link is not successfully deleted, the electronic device may also be deleted again. Specifically, the electronic equipment retries to delete the link in response to a failure in deleting the link between the directory in which the container is located and the subdirectory of the root directory in the local directory.
That is, the destruction process mainly deletes the link of the bind mapping directory, and does not perform umount (uninstall) operation of the electronic device mount point, and when the link deletion is successful, the container is destroyed; if the link deletion is unsuccessful, a retry operation is executed, since the link deletion operation is the operation of the local file system, almost no failure occurs, and 100% success can be guaranteed after the retry is increased.
506. And the electronic equipment responds to the second container corresponding to the created second training task, and inquires whether the local directory of the current equipment contains the root directory of the network storage system.
In this step 506, the electronic device needs to perform another training task: the training data required for the second training task may be the same as or different from the training data required for the first training task. For the second training task, the electronic device may create a second container, and perform the second training task based on the second container.
The electronic device may first query whether the local directory is already mounted with the root directory of the network storage system, and if so, may perform the following steps 507 and 508, and if not, the electronic device further needs to perform the mounting step. In consideration of the present embodiment, umount is not performed when the container is destroyed, and thus, the query result may be available. The root directory of the mounted network storage system can be multiplexed.
507. The electronic equipment responds to the root directory containing the network storage system in the local directory, and creates a link between the directory where the second container is located and the subdirectory of the root directory in the local directory.
508. The electronic equipment reads the training data required for executing the second training task from the subdirectory based on the link during the second container executing the second training task.
509. And the electronic equipment responds to the destroy instruction of the second container and deletes the link between the directory where the second container is located and the subdirectory of the root directory in the local directory.
Steps 507 to 509 are similar to steps 503 to 505, and are not repeated herein.
For the above-mentioned pre-loading manner, the following describes the loading step, the catalog mapping and the container destruction process. Taking a network storage system as a network disk as an example, the method provided by the embodiment of the present application may be that, in a scenario where a service operation AI training requires a GPU (Graphics Processing Unit, image processor) computing power, a mounting operation needs to be completed in advance for training data stored in a network file system, and an operation of mounting the network disk in a task starting process is prepositioned, that is, the operation of mounting the network disk is placed before a task is started, and a network disk mounting operation is not included in a task starting path, so that a path length of task starting is reduced, and a mounting process is decoupled from a task starting process, thereby preventing a task from being unable to be started due to a mounting failure.
The network disk pre-mount operation, which may also be referred to as a pre-mount operation, occurs at an equipment start-up stage (i.e., equipment restart) or at a stage when a first task container on the whole equipment is started. In these two phases, the mount operation of the network disk root directory is completed, and then the network disk directory can be seen on the device, that is, corresponding to step 401 and step 402, or step 501 and step 502. The training data used specifically by the business is stored in some subdirectories of the mounted network disk root directory, when the business task container is started, the subdirectory where the training data is located can be mapped into the task container in a directory mapping mode, and the task container can finish the access to the training data through the read-write operation of a file system. The delivery efficiency of the task container is greatly improved through the pre-mounting of the net disk, the success rate of an AI training process is improved, and the time delay of a training task is reduced.
As shown in fig. 6, when the physical device is restarted 601 or the task container is created 602, the operation 603 of determining whether the physical device has a network disk storage root directory mounted thereon is performed, and if the mounting point exists, the step 605 of performing a bind mapping operation of the directory is performed, that is, corresponding to the step 503, the subdirectory of the training data is mapped into the container, and the creation of the task container is completed. If there is no mount point, the mount operation of the network packing directory is executed first in step 604, and the mount root directory is sent to the physical device, and this flow will communicate with the network disc storage to complete the mount flow, that is, corresponding to step 501 and step 502.
With reference to fig. 3 and fig. 6, the mount process from the network disk storage to the physical device is a mount process, which occurs at the restart of the device or the first task container creation stage on the device, that is, the mount at this time has network communication, and when the mount is successful, the mount of the training data used in the subsequent container training continuously multiplexes the existing mount points. The method is mainly characterized in that the mounted directory is a root directory of the storage system, the training data are all subdirectories of the root directory, and the subdirectories can continuously multiplex the existing root directory mounting points. The method comprises the steps that binding directory mapping is carried out from physical equipment to produced task containers, the operation occurs in the creation process of the task containers, due to the localized directory operation, a local file system can guarantee 100% success, the time consumption of the operation process is instantaneous and within milliseconds, and the time consumption and the success rate of task starting cannot be influenced even in the process of task container starting. The key point of the Bind directory is the mapping of the training data subdirectory, and the mapping can be completed only in a localized file system without performing network communication with a storage system.
As shown in fig. 7, in the process of destroying the container, the task container destruction 701 mainly includes a step 702 of deleting a link of the bind mapping directory, and does not perform a umount operation of a physical device mount point, and then it may be determined whether the deletion is successful 703, and when the link deletion is successful, the container is sold completely 704; if the link deletion is unsuccessful, a retry operation is executed, since the link deletion operation is the operation of the local file system, almost no failure occurs, and 100% success can be guaranteed after the retry is increased.
As can be seen from the foregoing description of the embodiments, the method provided in the embodiments of the present application may include the following beneficial effects:
firstly, the network disk mounting is preposed before the task container is started, so that the task starting failure caused by the network disk mounting failure is avoided, and the success rate of the task starting operation is improved. And secondly, the consumed time of the network disk mounting operation cannot be counted in the process consumed time statistics of the task container starting, and the consumed time of the task starting stage is reduced. And thirdly, the training data is mounted in a directory mapping mode, network communication is not required in mounting operation, only the directory is locally operated, and 100% availability can be basically guaranteed. And fourthly, in the cluster, by means of pre-mounting of the root directory and mapping of the subdirectories, especially in a scene of a plurality of task containers of a single device, not only is the network disk mounting of the containers reduced, but also the load of the storage cluster is reduced, and the stability of the storage cluster is facilitated. And fifthly, when the task is finished, in the process of destroying the container, only the mapping of the catalog is recovered, the mounting of a net disk is not required to be disconnected through network communication, and the destroying speed of the container is high and the success rate is high.
The embodiment of the application provides a network storage system pre-mounting mode, a mounting step of a network storage system is preposed before a container is created, the mounting step is not executed when the container is created, on one hand, the mounting step is preposed, and the time consumption of the mounting step is not counted in the time consumption statistics of a task execution flow, so that the task execution efficiency is improved. On the other hand, the mounting step is preposed, the mounted directory and the directory where the container is located are mapped when the container is created, the directory mapping process is the localization operation of the directory, network communication is not needed, the mounting failure condition possibly caused by the network communication can be avoided, the directory mapping process almost consumes little time, the task execution efficiency is high, and the success rate of task execution is also improved. On the other hand, the network storage system mounting step and the task execution flow are decoupled, if the network storage system mounting fails, mounting is carried out again, the task execution flow cannot be directly influenced, the condition that the task execution fails due to the network storage system mounting failure is avoided, and the success rate of task execution is further improved.
All the above optional technical solutions can be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.
Fig. 8 is a schematic structural diagram of a training task performing apparatus provided in an embodiment of the present application, and referring to fig. 8, the apparatus includes:
an obtaining module 801, configured to obtain an address of a network storage system, where training data required for executing a training task is stored in a subdirectory under a root directory in the network storage system;
a mount module 802, configured to mount the root directory in the address to a local directory of the current device;
a creating module 803, configured to create, in response to creating a container corresponding to any training task, a link between a directory where the container is located and a sub-directory of the root directory in the local directory;
an executing module 804, configured to read, from the subdirectory, training data required for executing the training task based on the link in the process of executing the training task by the container.
In some embodiments, the acquisition module 801 is configured to perform any of:
responding to the restart of the current equipment, and executing the step of acquiring the address of the network storage system;
and responding to the container corresponding to the training task created for the first time in the current equipment, and executing the step of acquiring the address of the network storage system.
In some embodiments, the creation module 803 is used to:
responding to at least two subdirectories under the root directory, and determining a target subdirectory from the at least two subdirectories, wherein the target subdirectory is a subdirectory in which training data required by the container to execute the training task are located;
a link is created between the directory in which the container resides and the target subdirectory.
In some embodiments, the creation module 803 is used to:
in response to the creation of a container corresponding to any training task, inquiring whether a root directory of the network storage system is included in the local directory of the current equipment;
the step of creating a link between the directory in which the container is located and a subdirectory of the root directory in the local directory is performed in response to the local directory containing the root directory of the network storage system.
In some embodiments, the obtaining module 801 and the mounting module 802 are configured to, in response to that the local directory does not include a root directory of the network storage system, perform the steps of obtaining an address of the network storage system and mounting the root directory in the address into the local directory of the current device.
In some embodiments, the apparatus further comprises:
and the deleting module is used for responding to a destroying instruction of any container and deleting the link between the directory where the container is located and the subdirectory of the root directory in the local directory.
In some embodiments, the deletion module is further configured to retry deleting the link in response to a deletion failure of the link between the directory in which the container is located and the subdirectory of the root directory in the local directory.
In some embodiments, the network storage system is a distributed storage system.
In some embodiments, the network storage system is a distributed portable operating system interface posix system.
In some embodiments, the network storage system is a block chain system.
In some embodiments, the root directory mounted in the local directory may be multiplexed.
In some embodiments, the container corresponding to the training task is a first container; the creating module 803 is further configured to create a link between the directory in which the second container is located and the subdirectory of the root directory in the local directory in response to creating a second container corresponding to another training task.
According to the device provided by the embodiment of the application, the mounting step of the network storage system is preposed before the container is created, rather than the mounting step is executed when the container is created, on one hand, the mounting step is preposed, and the time consumption of the mounting step is not counted in the time consumption statistics of the task execution flow, so that the task execution efficiency is improved. On the other hand, the mounting step is preposed, the mounted directory and the directory where the container is located are mapped when the container is created, the directory mapping process is the localization operation of the directory, network communication is not needed, the mounting failure condition possibly caused by the network communication can be avoided, the directory mapping process almost consumes little time, the task execution efficiency is high, and the success rate of task execution is also improved. On the other hand, the network storage system mounting step and the task execution flow are decoupled, if the network storage system mounting fails, mounting is carried out again, the task execution flow cannot be directly influenced, the condition that the task execution fails due to the network storage system mounting failure is avoided, and the success rate of task execution is further improved.
It should be noted that: in the training task execution device provided in the above embodiment, when executing a training task, only the division of the above functional modules is exemplified, and in practical applications, the above functions can be distributed by different functional modules as needed, that is, the internal structure of the training task execution device is divided into different functional modules to complete all or part of the above described functions. In addition, the training task execution device and the training task execution method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.
Fig. 9 is a schematic structural diagram of an electronic device provided in this embodiment of the present application, where the electronic device 900 may generate relatively large differences due to different configurations or performances, and can include one or more processors (CPUs) 901 and one or more memories 902, where the memory 902 stores at least one computer program, and the at least one computer program is loaded and executed by the processors 901 to implement the training task execution method provided in each method embodiment. The electronic device can also include other components for implementing device functions, for example, the electronic device can also have components such as a wired or wireless network interface and an input/output interface for input/output. The embodiments of the present application are not described herein in detail.
The electronic device in the above method embodiment can be implemented as a terminal. For example, fig. 10 is a block diagram of a terminal according to an embodiment of the present disclosure. The terminal 1000 can be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3(Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4) player, a notebook computer or a desktop computer. Terminal 1000 can also be referred to as user equipment, portable terminal, laptop terminal, desktop terminal, or the like by other names.
In general, terminal 1000 can include: a processor 1001 and a memory 1002.
Processor 1001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 1001 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1001 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1001 may be integrated with a GPU, which is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, the processor 1001 may further include an AI (Artificial Intelligence) processor for processing a computing operation related to machine learning.
Memory 1002 may include one or more computer-readable storage media, which may be non-transitory. The memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1002 is used to store at least one instruction for execution by processor 1001 to implement a training task execution method provided by method embodiments herein.
In some embodiments, terminal 1000 can also optionally include: a peripheral interface 1003 and at least one peripheral. The processor 1001, memory 1002 and peripheral interface 1003 may be connected by a bus or signal line. Various peripheral devices may be connected to peripheral interface 1003 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1004, display screen 1005, camera assembly 1006, audio circuitry 1007, positioning assembly 1008, and power supply 1009.
The peripheral interface 1003 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 1001 and the memory 1002. In some embodiments, processor 1001, memory 1002, and peripheral interface 1003 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1001, the memory 1002, and the peripheral interface 1003 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.
The Radio Frequency circuit 1004 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 1004 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1004 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1004 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1004 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1004 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 1005 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1005 is a touch display screen, the display screen 1005 also has the ability to capture touch signals on or over the surface of the display screen 1005. The touch signal may be input to the processor 1001 as a control signal for processing. At this point, the display screen 1005 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display screen 1005 can be one, disposed on a front panel of terminal 1000; in other embodiments, display 1005 can be at least two, respectively disposed on different surfaces of terminal 1000 or in a folded design; in other embodiments, display 1005 can be a flexible display disposed on a curved surface or a folded surface of terminal 1000. Even more, the display screen 1005 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display screen 1005 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.
The camera assembly 1006 is used to capture images or video. Optionally, the camera assembly 1006 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1006 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
The audio circuit 1007 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1001 for processing or inputting the electric signals to the radio frequency circuit 1004 for realizing voice communication. For stereo sound collection or noise reduction purposes, multiple microphones can be provided, each at a different location of terminal 1000. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1001 or the radio frequency circuit 1004 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuit 1007 may also include a headphone jack.
A Location component 1008 is employed to locate a current geographic Location of terminal 1000 for purposes of navigation or LBS (Location Based Service). The Positioning component 1008 can be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.
Power supply 1009 is used to supply power to various components in terminal 1000. The power source 1009 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power source 1009 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, terminal 1000 can also include one or more sensors 1010. The one or more sensors 1010 include, but are not limited to: acceleration sensor 1011, gyro sensor 1012, pressure sensor 1013, fingerprint sensor 1014, optical sensor 1015, and proximity sensor 1016.
Acceleration sensor 1011 can detect acceleration magnitudes on three coordinate axes of a coordinate system established with terminal 1000. For example, the acceleration sensor 1011 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1001 may control the display screen 1005 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1011. The acceleration sensor 1011 may also be used for acquisition of motion data of a game or a user.
The gyro sensor 1012 may detect a body direction and a rotation angle of the terminal 1000, and the gyro sensor 1012 and the acceleration sensor 1011 may cooperate to acquire a 3D motion of the user on the terminal 1000. From the data collected by the gyro sensor 1012, the processor 1001 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
Pressure sensor 1013 can be disposed on a side frame of terminal 1000 and/or underneath display screen 1005. When pressure sensor 1013 is disposed on a side frame of terminal 1000, a user's grip signal on terminal 1000 can be detected, and processor 1001 performs left-right hand recognition or shortcut operation according to the grip signal collected by pressure sensor 1013. When the pressure sensor 1013 is disposed at a lower layer of the display screen 1005, the processor 1001 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1005. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 1014 is used to collect a fingerprint of the user, and the processor 1001 identifies the user according to the fingerprint collected by the fingerprint sensor 1014, or the fingerprint sensor 1014 identifies the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 1001 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying, and changing settings, etc. Fingerprint sensor 1014 may be disposed on a front, back, or side of terminal 1000. When a physical key or vendor Logo is provided on terminal 1000, fingerprint sensor 1014 can be integrated with the physical key or vendor Logo.
The optical sensor 1015 is used to collect the ambient light intensity. In one embodiment, the processor 1001 may control the display brightness of the display screen 1005 according to the ambient light intensity collected by the optical sensor 1015. Specifically, when the ambient light intensity is high, the display brightness of the display screen 1005 is increased; when the ambient light intensity is low, the display brightness of the display screen 1005 is turned down. In another embodiment, the processor 1001 may also dynamically adjust the shooting parameters of the camera assembly 1006 according to the intensity of the ambient light collected by the optical sensor 1015.
Proximity sensor 1016, also known as a distance sensor, is typically disposed on a front panel of terminal 1000. Proximity sensor 1016 is used to gather the distance between the user and the front face of terminal 1000. In one embodiment, when proximity sensor 1016 detects that the distance between the user and the front surface of terminal 1000 is gradually reduced, processor 1001 controls display screen 1005 to switch from a bright screen state to a dark screen state; when proximity sensor 1016 detects that the distance between the user and the front of terminal 1000 is gradually increased, display screen 1005 is controlled by processor 1001 to switch from a breath-screen state to a bright-screen state.
Those skilled in the art will appreciate that the configuration shown in FIG. 10 is not intended to be limiting and that terminal 1000 can include more or fewer components than shown, or some components can be combined, or a different arrangement of components can be employed.
The electronic device in the above method embodiment can be implemented as a server. For example, fig. 11 is a schematic structural diagram of a server provided in this embodiment of the present application, where the server 1100 may generate relatively large differences due to different configurations or performances, and can include one or more processors (CPUs) 1101 and one or more memories 1102, where the memory 1102 stores at least one computer program, and the at least one computer program is loaded and executed by the processors 1101 to implement the training task execution method provided by each method embodiment described above. Certainly, the server can also have components such as a wired or wireless network interface and an input/output interface to facilitate input and output, and the server can also include other components for implementing the functions of the device, which is not described herein again.
In an exemplary embodiment, a computer-readable storage medium, such as a memory including at least one computer program, executable by a processor to perform the training task execution method of the above embodiments, is also provided. For example, the computer-readable storage medium can be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product or a computer program is also provided, which comprises one or more program codes, which are stored in a computer-readable storage medium. The one or more processors of the electronic device can read the one or more program codes from the computer-readable storage medium, and the one or more processors execute the one or more program codes, so that the electronic device can perform the training task execution method.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
It should be understood that determining B from a does not mean determining B from a alone, but can also determine B from a and/or other information.
Those skilled in the art will appreciate that all or part of the steps for implementing the above embodiments can be implemented by hardware, or can be implemented by a program for instructing relevant hardware, and the program can be stored in a computer readable storage medium, and the above mentioned storage medium can be read only memory, magnetic or optical disk, etc.
The above description is intended only to be an alternative embodiment of the present application, and not to limit the present application, and any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method for performing a training task, the method comprising:
acquiring an address of a network storage system, wherein training data required for executing a training task are stored in subdirectories under a root directory in the network storage system;
mounting the root directory in the address to a local directory of the current equipment;
in response to the container corresponding to any training task being created, creating a link between the directory where the container is located and the subdirectory of the root directory in the local directory;
and reading training data required for executing the training task from the subdirectory based on the link in the process of executing the training task by the container.
2. The method of claim 1, wherein the obtaining the address of the network storage system comprises any one of:
responding to the restart of the current equipment, and executing the step of acquiring the address of the network storage system;
and responding to the container corresponding to the training task created for the first time in the current equipment, and executing the step of acquiring the address of the network storage system.
3. The method of claim 1, wherein creating a link between a directory in which the container is located and a subdirectory of the root directory in the local directory in response to creating a container corresponding to any training task comprises:
responding to at least two subdirectories under the root directory, and determining a target subdirectory from the at least two subdirectories, wherein the target subdirectory is a subdirectory where training data required by the container to execute the training task are located;
creating a link between the directory in which the container resides and the target subdirectory.
4. The method of claim 1, wherein creating a link between a directory in which the container is located and a subdirectory of the root directory in the local directory in response to creating a container corresponding to any training task comprises:
in response to the creation of a container corresponding to any training task, inquiring whether a root directory of the network storage system is contained in the local directory of the current equipment;
and responding to a root directory of a network storage system contained in the local directory, and executing the step of creating the link between the directory in which the container is located and the subdirectory of the root directory in the local directory.
5. The method of claim 4, further comprising;
and responding to the situation that the local directory does not contain the root directory of the network storage system, and executing the steps of acquiring the address of the network storage system and mounting the root directory in the address into the local directory of the current device.
6. The method of claim 1, further comprising:
and in response to a destroy instruction of any container, deleting the link between the directory where the container is located and the subdirectory of the root directory in the local directory.
7. The method of claim 6, further comprising:
and in response to the link deletion failure between the directory where the container is located and the subdirectory of the root directory in the local directory, retrying to delete the link.
8. A training task performing apparatus, the apparatus comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring an address of a network storage system, and training data required for executing a training task are stored in subdirectories under a root directory in the network storage system;
the mounting module is used for mounting the root directory in the address into a local directory of the current equipment;
the creating module is used for responding to the creation of a container corresponding to any training task and creating a link between the directory where the container is located and the subdirectory of the root directory in the local directory;
and the execution module is used for reading training data required by the training task from the subdirectory based on the link in the process that the container executes the training task.
9. An electronic device, comprising one or more processors and one or more memories, wherein at least one computer program is stored in the one or more memories, and loaded and executed by the one or more processors to implement the training task execution method according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which at least one computer program is stored, which is loaded and executed by a processor to implement the training task execution method according to any one of claims 1 to 7.
CN202110402585.XA 2021-04-14 2021-04-14 Training task execution method, device, equipment and storage medium Pending CN113703956A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110402585.XA CN113703956A (en) 2021-04-14 2021-04-14 Training task execution method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110402585.XA CN113703956A (en) 2021-04-14 2021-04-14 Training task execution method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113703956A true CN113703956A (en) 2021-11-26

Family

ID=78648014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110402585.XA Pending CN113703956A (en) 2021-04-14 2021-04-14 Training task execution method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113703956A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115629772A (en) * 2022-09-05 2023-01-20 摩尔线程智能科技(北京)有限责任公司 Kubernetes software installation method and device and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115629772A (en) * 2022-09-05 2023-01-20 摩尔线程智能科技(北京)有限责任公司 Kubernetes software installation method and device and electronic equipment
CN115629772B (en) * 2022-09-05 2023-09-19 摩尔线程智能科技(北京)有限责任公司 Kubernetes software installation method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN108536463B (en) Method, device and equipment for acquiring resource package and computer readable storage medium
CN108228894B (en) Method, device and terminal for checking recently used files
CN110674022B (en) Behavior data acquisition method and device and storage medium
CN108491526B (en) Log data processing method and device, electronic equipment and storage medium
CN108717432B (en) Resource query method and device
CN112256425A (en) Load balancing method and system, computer cluster, information editing method and terminal
US20220244930A1 (en) Application porting method and apparatus, device, and medium
CN111949680A (en) Data processing method and device, computer equipment and storage medium
CN111190748A (en) Data sharing method, device, equipment and storage medium
CN114205365B (en) Application interface migration system, method and related equipment
CN110636144A (en) Data downloading method and device
CN111241115A (en) Data synchronization method, device, equipment and storage medium
WO2022063037A1 (en) Method and apparatus for installing patch package
CN113703956A (en) Training task execution method, device, equipment and storage medium
CN112084157A (en) File recovery method and device, computer equipment and storage medium
CN111682983B (en) Interface display method and device, terminal and server
CN110995842A (en) Method, device and equipment for downloading service data and storage medium
CN114168369A (en) Log display method, device, equipment and storage medium
CN112711636B (en) Data synchronization method, device, equipment and medium
CN111522798B (en) Data synchronization method, device, equipment and readable storage medium
CN113268234A (en) Page generation method, device, terminal and storage medium
CN112732282A (en) Installation package downloading method and device
CN112597417A (en) Page updating method and device, electronic equipment and storage medium
CN111596936A (en) Application program updating method and device
CN114443177A (en) Application running method and device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination