CN114202027B - Method for generating execution configuration information, method and device for model training - Google Patents

Method for generating execution configuration information, method and device for model training Download PDF

Info

Publication number
CN114202027B
CN114202027B CN202111513923.3A CN202111513923A CN114202027B CN 114202027 B CN114202027 B CN 114202027B CN 202111513923 A CN202111513923 A CN 202111513923A CN 114202027 B CN114202027 B CN 114202027B
Authority
CN
China
Prior art keywords
tensor
operator
interface
topological structure
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111513923.3A
Other languages
Chinese (zh)
Other versions
CN114202027A (en
Inventor
李龙
巩伟宝
吴志华
敖玉龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111513923.3A priority Critical patent/CN114202027B/en
Publication of CN114202027A publication Critical patent/CN114202027A/en
Application granted granted Critical
Publication of CN114202027B publication Critical patent/CN114202027B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a generation method, a model training method, a device, equipment and a storage medium for executing configuration information, which relate to the technical field of artificial intelligence, in particular to the technical fields of image processing, deep learning and the like. The specific implementation scheme is as follows: setting a topological structure of a process aiming at the process for training the model; and performing marking operation on tensors and operators for training the model according to the topological structure to obtain execution configuration information, wherein the execution configuration information comprises: segmentation information of tensors, correspondence between tensors and processes and segmentation information of variables of operators.

Description

Method for generating execution configuration information, method and device for model training
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to the technical field of image processing, deep learning, and the like.
Background
In recent years, with the continuous promotion of scientific and technological companies at home and abroad, the scale of the deep learning model is continuously and rapidly increased, and ultra-large scale model training has become one of the most important core competitiveness. In order to support very large scale model training, various parallel strategies and key technologies are proposed successively, and common distributed training strategies and technologies include data parallelism, tensor segmentation-based model parallelism, pipeline parallelism, and the like, and combined strategies based on these strategies and technologies. However, in practical applications, algorithm engineers are required to have deep knowledge of the implementation principles and hardware architecture of the underlying technologies, so that these combination technologies can be used efficiently and correctly, and the application thresholds of these technologies are improved. In addition, multiple deep learning training hardware platforms are continuously emerging, and heterogeneous training by the cooperation of the multiple hardware platforms becomes a trend. Deep learning training needs to be able to flexibly support heterogeneous training platforms.
Disclosure of Invention
The disclosure provides a generation method, a model training method, a device, equipment and a storage medium for executing configuration information.
According to an aspect of the present disclosure, there is provided a method of generating execution configuration information, including: setting a topological structure of a process for training a model; and performing labeling operation on tensors and operators for the training model according to the topological structure to obtain execution configuration information, wherein the execution configuration information comprises: the segmentation information of the tensor, the corresponding relation between the tensor and the process and the segmentation information of the operator variable.
According to another aspect of the present disclosure, there is provided a model training method including: and distributing the operation of the training model to a plurality of nodes for execution according to execution configuration information, wherein the execution configuration information is determined according to the method disclosed by the embodiment of the disclosure.
According to another aspect of the present disclosure, there is provided a generating apparatus for performing configuration information, including: the setting module is used for setting the topological structure of a process for training a model; and a labeling module, configured to perform labeling operation on tensors and operators used for the training model according to the topology structure, to obtain execution configuration information, where the execution configuration information includes: the segmentation information of the tensor, the corresponding relation between the tensor and the process and the segmentation information of the operator variable.
According to another aspect of the present disclosure, there is provided a model training apparatus including: and the execution module is used for indicating the plurality of nodes to execute the operation of the training model according to the execution configuration information, wherein the execution configuration information is determined according to the method disclosed by the embodiment of the disclosure.
Another aspect of the present disclosure provides an electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods shown in the embodiments of the present disclosure.
According to another aspect of the disclosed embodiments, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the methods shown in the disclosed embodiments.
According to another aspect of the disclosed embodiments, there is provided a computer program product comprising a computer program/instruction, characterized in that the computer program/instruction, when executed by a processor, implements the steps of the method shown in the disclosed embodiments.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 schematically illustrates a flowchart of a method of generating execution configuration information according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a schematic diagram of a process topology interface according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a schematic diagram of a tensor cut interface according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a schematic diagram of a process matching interface according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a schematic diagram of an operator slicing interface, according to an embodiment of the disclosure;
FIG. 6 schematically illustrates a schematic diagram of a node transfer interface according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a schematic diagram of a pipeline setup interface according to an embodiment of the disclosure;
FIG. 8 schematically illustrates a flow chart of a model training method according to an embodiment of the disclosure;
FIG. 9 schematically illustrates a schematic diagram of a computational graph, according to an embodiment of the present disclosure;
fig. 10 schematically illustrates a block diagram of a generating apparatus that executes configuration information according to an embodiment of the present disclosure;
FIG. 11 schematically illustrates a block diagram of a model training apparatus according to an embodiment of the present disclosure; and
FIG. 12 schematically illustrates a block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the technical scheme of the disclosure, the related processes of collection, storage, use, processing, transmission, provision, disclosure and the like of tensor, operator and the like are in accordance with the regulations of related laws and regulations, and the public order welcome is not violated. An application scenario of the method and apparatus for generating execution configuration information provided in the present disclosure will be described below with reference to fig. 1.
Fig. 1 schematically illustrates a flowchart of a method of generating execution configuration information according to an embodiment of the present disclosure.
As shown in fig. 1, the execution configuration information generation method 100 includes setting a topology of a process for training a model in operation S110.
According to embodiments of the present disclosure, the model may be, for example, a deep learning model. The process for training the model may be one or more. It will be appreciated that in the case where there are a plurality of processes for training the model, the topology may be set for each process separately.
Then, in operation S120, a labeling operation is performed on tensors (tensors) and operators for training the model according to the topology, resulting in execution configuration information.
According to an embodiment of the present disclosure, executing the configuration information may include, for example: according to embodiments of the present disclosure, the marking operation may be used to mark the attributes of the tensor and the operator. The attribute of the tensor may include, for example, segmentation information of the tensor, a correspondence between the tensor and the process, and the attribute of the operator may include, for example, segmentation information of an input variable and/or an output variable of the operator.
According to embodiments of the present disclosure, the tensor used to train the model may be one or more, and the operator may be one or more. It will be appreciated that in the case of multiple tensors or operators for training the model, the marking operation may be performed for each tensor or operator, respectively.
The support of the heterogeneous training platform by the related technology is poor, and the related technology is difficult to be applied to more complex scenes, such as a heterogeneous parameter server mode of CPU and GPU co-training. According to the method for generating the execution configuration information, tensors and operators can be marked respectively, so that the requirement of a complex scene can be met, heterogeneous training platforms such as CPU and GPU collaborative training can be supported, policies such as shaping and offlow can be supported, and the existing distributed parallel policies are compatible.
According to an embodiment of the present disclosure, a process topology interface may be preset to set a topology of each process.
For example, fig. 2 schematically shows a schematic diagram of a process topology interface according to an embodiment of the present disclosure.
As shown in fig. 2, the Process topology interface may be a Process Mesh (parent/None). The input parameters of the Process Mesh may include Mesh, and parent or None, with no return values. Wherein, the mesh represents a logical topology structure (hereinafter referred to as a topology structure) of the process group. Note that the mesh is a logical topology, and is independent of a specific device topology. The mesh may take the form of a list, for example. The parent represents the parent ProcessMesh instance of the process group. None represents a parent-less ProcessMesh instance.
Based on the above, the process and the topology structure data corresponding to the process can be input into the process topology interface to set the topology structure of each process.
Illustratively, the structure of the ProcessMesh may be as shown in table 1. Table 1 includes 6 elements corresponding to 6 processes, and process IDs of 2,4, 3, 1, 0, and 5, respectively. The 6 elements are divided into 2 rows, each row comprising 3 elements. The shape attribute of the mesh parameter of the processmesmesh may be [2,3] to represent the topology of the process. The value of each element in Table 1 may be used to represent a corresponding process ID, e.g., mesh [0] [0] represents the logical process represented by the first element of the first row in Table 1, process 2.
2 4 3
1 0 5
TABLE 1
According to implementations of the present disclosure, for the same number of processes, different topologies may be represented by processmash. For example, for 32 processes, it can be represented as the following different topologies: mesh.shape= [2,2,8] and mesh.shape= [4,2,4]. Illustratively, in the present embodiment, it is assumed that the first dimension represents data parallelism [ dp ], the second dimension represents pipeline parallelism [ pp ], and the third dimension represents model parallelism [ mp ].
Wherein, mesh. Shape= [2,2,8] can represent 2-way data parallelism, 2-way pipeline parallelism and 8-way model parallelism; mesh.shape= [4,2,4] may represent 4-way data parallelism, 2-way pipeline parallelism, and 4-way model parallelism.
In this embodiment, each process may correspond to a node. According to an embodiment of the present disclosure, a node may be, for example, a server. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service (Virtual Private Server or VPS for short) are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.
According to the embodiment of the disclosure, a tensor segmentation interface may be preset to set segmentation information of the tensor.
For example, fig. 3 schematically illustrates a schematic diagram of a tensor cut interface according to an embodiment of the present disclosure.
As shown in fig. 3, the tensor slicing interface may be a card_tensor (x, mesh, dims_mapping). The input parameters of the card_tensor may include x, mesh, and dims_mapping, and the return value may be the annotated tensor x'. Wherein x is a tensor to be marked, mesh represents a topological structure of a process, and dims_mapping represents a corresponding relationship between tensor dimensions and the topological structure. dims_mapping can be used to mark the following information: the ith dimension of x is split along the second ms_mapping i dimension of the mesh. Where i is a positive integer. Illustratively, if the value of dims_mapping [ i ] is-1, it may be indicated that the ith dimension of x is not split.
Based on the above, the tensor, the topological structure data of the process corresponding to the tensor, and the corresponding relation between the dimension of the tensor and the dimension of the topological structure data can be input into the tensor segmentation interface to set the segmentation information of the tensor.
According to the embodiment of the disclosure, a process matching interface may be preset to set a correspondence between tensors and processes.
For example, fig. 4 schematically shows a schematic diagram of a process matching interface according to an embodiment of the present disclosure.
As shown in fig. 4, the process matching interface may be set_card_mask (x, mask). The input parameters of set_card_mask may include x and mask, and the return value may be the annotated tensor x'. Where x is the tensor to be marked, and mask represents the corresponding information between the tensor and the process, where the corresponding information may be used to indicate whether the tensor exists in each process. Illustratively, the mask may include elements that are in one-to-one correspondence with the processes. The value of each element may include 0 and 1, where 0 may represent absence and 1 may represent presence. In this embodiment, the shape attribute of the mask may be the same as the shape attribute of the mesh structure corresponding to x. For example, assume that the mesh structure of x is shown in table 2 and the shape attribute of the mask structure is shown in table 3. If the value of the corresponding position of the process 2 in the mesh in the mask is 1, it indicates that x exists in the process 2. If the value of the corresponding position of the process 4 in the mesh in the mask is 0, x is not present in the process 4. Similarly, x is found to exist in processes 2,3 and 0, and not in processes 4, 1 and 5.
2 4 3
1 0 5
TABLE 2
1 0 1
0 1 0
TABLE 3 Table 3
Based on the above, the tensor and the corresponding information between the tensor and the process can be input into the process matching interface to set the corresponding relationship between the tensor and the process.
According to the embodiment of the disclosure, because the processes are in one-to-one correspondence with the nodes, the corresponding relation between tensors and the processes is set through the process matching interface, so that a distributed strategy can be realized. For example, the model has four parameters P1, P2, P3, P4 and two nodes D1, D2. By setting the corresponding relation between tensors and processes through the process matching interface, the distribution relation can be changed into parameters P1 and P3 existing on the node D1, and parameters P2 and P4 existing on the node D2, so that the quantity of the parameters stored on each node can be reduced.
According to an embodiment of the disclosure, an operator segmentation interface may be preset, for setting segmentation information of variables of an operator.
For example, fig. 5 schematically illustrates a schematic diagram of an operator slicing interface according to an embodiment of the disclosure.
As shown in fig. 5, the operator slicing interface may be a board_op (op_fn, mesh, dims_mapping_direct, kwargs). The input parameters of the board_op may include op_ fn, mesh, dims _mapping_direct and kwargs, and the return value may be the output variable Tensor of the operator. Where op_fn represents a creation interface (e.g., a callable) corresponding to an operator. The mesh represents the topology structure data of the process corresponding to the operator, namely the topology structure of the process which the operator needs to identify. The dims_mapping_subject represents the correspondence of the variables of the operator to the segmentation information. The dims_mapping_subject may include one or more key-value pairs. The key name in each key value pair represents the input variable name or the output variable name of the operator, and the key value represents the segmentation information of the corresponding variable.
Based on the above, the creating interface corresponding to the operator, the topological structure data of the process corresponding to the operator, and the corresponding relation between the variable of the operator and the segmentation information can be input into the operator segmentation interface to set the segmentation information of the variable of the operator.
According to another embodiment of the present disclosure, the execution configuration information may further include storage node information of the tensor. A deposit node transfer interface may be preset for setting deposit node information of each tensor.
For example, fig. 6 schematically illustrates a schematic diagram of a node transfer interface according to an embodiment of the present disclosure.
As shown in fig. 6, the node transfer interface may be set_offset_device (x, device). The input parameters of set_offset_device may include x and device, and the return value may be the annotated tensor x'. Where x is the tensor to be marked and device is the target node storing the tensor.
Based on the above, the tensor and the node identifier of the corresponding deposit node can be input into the deposit node transfer interface to set the deposit node information of the tensor. By setting the storage node information of the tensor, the tensor can be transferred to the target node and stored by the target node.
As an alternative embodiment, the tensor may be stored in a storage node, and when the computing node needs to use the tensor, the tensor may be transferred from the corresponding storage node to the computing node through the node transfer interface. Thus enabling a separation of tensor storage and computation.
According to another embodiment of the present disclosure, a pipeline setup interface may also be preset for setting up execution nodes for marking operations and/or model training operations.
For example, fig. 7 schematically shows a schematic diagram of a pipeline setup interface according to an embodiment of the disclosure.
As shown in fig. 7, the pipeline setting interface may be set_pipeline_stage (stage). The input parameter of set_pipeline_stage may include stage and the return value may be None. Wherein a stage may represent an identity of a target node.
By inputting the identification of the target node into the pipeline setting interface, the execution node of the subsequent operation can be set as the target node, so that the subsequent operation is executed by the target node.
According to embodiments of the present disclosure, training operations may be split across a pipeline setup interface to multiple nodes for execution.
According to the embodiment of the disclosure, the marking operation is performed through the interface, so that the development process of the distributed training program in a complex scene can be simplified.
The model training method provided by the present disclosure will be described below with reference to fig. 8.
Fig. 8 schematically illustrates a flow chart of a model training method according to an embodiment of the disclosure.
As shown in fig. 8, the model training method 800 includes instructing a plurality of nodes to perform an operation of training a model according to execution configuration information in operation S810.
According to embodiments of the present disclosure, the execution configuration information may be determined, for example, according to the methods shown above.
In the related art, the development difficulty of the distributed training program in a complex scene is high, and the support to the heterogeneous training platform is poor.
According to the model training method disclosed by the embodiment of the invention, the requirements of various application scenes on complex parallel strategies can be supported, various distributed parallel strategies are compatible, and the development process of a distributed training program in the complex scene is simplified.
The model training method shown above is further described with reference to fig. 9 in conjunction with the specific embodiments. Those skilled in the art will appreciate that the following example embodiments are merely for the understanding of the present disclosure, and the present disclosure is not limited thereto.
According to embodiments of the present disclosure, operations that need to be performed in a model training process may be represented in a computational Graph (Graph) manner. The operator to be performed in the operation and the input and output data of the operator may be represented as nodes in the computation graph, and the dependency relationship between the operator and the input and output data of the operator may be represented as edges in the computation graph. In addition, the nodes may be provided with attributes such as the size of the data, etc.
Fig. 9 schematically illustrates a schematic diagram of a computational graph according to an embodiment of the present disclosure.
Fig. 9 shows a calculation diagram of the operation out=x+y. Wherein, the operator is an addition operator (add), the tensor X, Y is input data of the operator, and the tensor Out is output data. X, Y and Out are each provided with an additional shape attribute for representing the size of the corresponding tensor.
In this embodiment, the tensors X, Y and Out and the properties of the addition operator can be marked using the interfaces shown above. The computational graph then needs to be modified according to the properties of the labels.
For example, the split information of the tensors may be respectively marked using a card_tensor interface. Suppose the shape information of tensor X is [2,4], the shape information of mesh (i.e., mesh. Shape) is [2,1,4], and the value of dims_mapping is [0, -1]. If the first dimension value of dims_mapping is 0, the first dimension (value 2) of X is segmented according to the first dimension (value 2) of the mesh, i.e. the first dimension 2 of X is segmented into 2 shares, and the dimension value of each share is 1. The second dimension of dims_mapping is-1, then the second dimension representing X is not sliced. Based on this, shape information of X in the calculation map may be modified to [1,4].
For example, the set_card_mask interface may be used to set the existence relationship of tensors on the process: presence or absence. When a tensor exists in a certain process, a broadcast operation is added to a computational graph corresponding to the process, and the tensor is broadcast to all other processes in the same communication group. When the tensor does not exist in the same process, the calculation diagram is modified, and the broadcast operation is added, which means that the tensor is received in the same communication group.
For example, the splitting information dims_mapping of the input and output data of the card_op setting operator may be utilized. The computational graph is then modified according to the input and output dims_mapping.
For example, set_offset_device may be utilized: and modifying the device information of the Tensor in the calculation map according to the identified information. In the process of executing the calculation graph, if the device which encounters the Tensor is different from the device which is actually executed, the data needs to be copied to the executing device, and after the use is completed, the data is copied back to the device marked by the Tensor.
For example, the identity of the target node may be set using set_pipeline_stage. When the stage to which the process belongs is different from the marked stage, tensors and operators corresponding to the process are deleted from the calculation map.
After the above operations are performed, a calculation map which is finally performed by the process is obtained on each process, so that model training can be performed according to the calculation map.
According to the method of the embodiment of the disclosure, the requirements of various application scenes on the complex parallel strategies can be supported, various distributed parallel strategies are compatible, and the development process of the distributed training program in the complex scene is simplified.
Fig. 10 schematically illustrates a block diagram of a generating apparatus that executes configuration information according to an embodiment of the present disclosure.
As shown in fig. 10, the generating apparatus 1000 that executes configuration information includes a setting module 1010 and a marking module 1020.
A setting module 1010, configured to set, for a process for training a model, a topology of the process.
The labeling module 1020 is configured to perform a labeling operation on tensors and operators used for the training model according to the topology structure, so as to obtain execution configuration information, where the execution configuration information includes: the segmentation information of the tensor, the corresponding relation between the tensor and the process and the segmentation information of the operator variable.
Fig. 11 schematically illustrates a block diagram of a model training apparatus according to an embodiment of the present disclosure.
As shown in fig. 11, the model training apparatus 1100 includes an execution module 1110 for instructing a plurality of nodes to execute an operation of training a model according to execution configuration information determined according to a method according to an embodiment of the present disclosure.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 12 schematically illustrates a block diagram of an example electronic device 1200 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 12, the apparatus 1200 includes a computing unit 1201, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data required for the operation of the device 1200 may also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other via a bus 1204. An input/output (I/O) interface 1205 is also connected to the bus 1204.
Various components in device 1200 are connected to I/O interface 1205, including: an input unit 1206 such as a keyboard, mouse, etc.; an output unit 1207 such as various types of displays, speakers, and the like; a storage unit 1208 such as a magnetic disk, an optical disk, or the like; and a communication unit 1209, such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the device 1200 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.
The computing unit 1201 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The computing unit 1201 performs the various methods and processes described above, for example, performing the generation of configuration information and model training methods. For example, in some embodiments, the generation of the execution configuration information and the model training method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1200 via ROM 1202 and/or communication unit 1209. When the computer program is loaded into the RAM 1203 and executed by the computing unit 1201, one or more steps of the above-described generation of execution configuration information and model training method may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured to perform the generation of the execution configuration information and the model training method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (9)

1. A model training method, comprising:
aiming at a process for training a model, setting a topological structure of the process according to the process and topological structure data corresponding to the process;
setting segmentation information of a tensor according to the tensor, topological structure data of a process corresponding to the tensor and a corresponding relation between the dimensionality of the tensor and the dimensionality of the topological structure data;
setting a corresponding relation between the tensor and the process according to the tensor and the corresponding information between the tensor and the process;
inputting a creation interface corresponding to an operator, topological structure data of a process corresponding to the operator and a corresponding relation between a variable of the operator and segmentation information into an operator segmentation interface to set the segmentation information of the variable of the operator, wherein input parameters of the operator segmentation interface comprise the corresponding relation between the creation interface corresponding to the operator, the topological structure data of the process corresponding to the operator and the variable of the operator to the segmentation information, and a return value of the operator segmentation interface comprises an output variable of the operator;
distributing the operation of the training model to a plurality of nodes for execution according to the segmentation information of the tensor, the corresponding relation between the tensor and the process and the segmentation information of the variables of the operator, wherein the method further comprises:
and in the process of executing the operation of training the model, inputting the tensor and the node identification of the corresponding deposit node into a deposit node transfer interface so as to transfer the tensor from the corresponding storage node to the computing node.
2. The method of claim 1, wherein the setting the topology of the process according to the process and topology data corresponding to the process comprises:
and inputting the process and the topological structure data corresponding to the process into a process topological interface so as to set the topological structure of each process.
3. The method according to claim 1, wherein the setting the segmentation information of the tensor according to the tensor, topology data of a process corresponding to the tensor, and a correspondence between a dimension of the tensor and a dimension of the topology data includes:
and inputting the tensor, topological structure data of a process corresponding to the tensor and a corresponding relation between the dimensionality of the tensor and the dimensionality of the topological structure data into a tensor segmentation interface to set segmentation information of the tensor.
4. The method of claim 1, wherein the setting the correspondence between the tensor and the process according to the tensor and correspondence information between the tensor and the process comprises:
and inputting the tensor and the corresponding information between the tensor and the process into a process matching interface so as to set the corresponding relation between the tensor and the process.
5. The method of any of claims 1-4, further comprising:
and setting an execution node for the operation by using a pipeline setting interface.
6. A model training apparatus comprising:
the setting module is used for setting the topological structure of the process according to the process and topological structure data corresponding to the process aiming at the process for training the model;
the marking module is used for setting segmentation information of the tensor according to the tensor, topological structure data of a process corresponding to the tensor and the corresponding relation between the dimensionality of the tensor and the dimensionality of the topological structure data; setting a corresponding relation between the tensor and the process according to the tensor and the corresponding information between the tensor and the process; inputting a creation interface corresponding to an operator, topological structure data of a process corresponding to the operator and a corresponding relation between a variable of the operator and segmentation information into an operator segmentation interface to set the segmentation information of the variable of the operator, wherein input parameters of the operator segmentation interface comprise the corresponding relation between the creation interface corresponding to the operator, the topological structure data of the process corresponding to the operator and the variable of the operator to the segmentation information, and a return value of the operator segmentation interface comprises an output variable of the operator; distributing the operation of the training model to a plurality of nodes for execution according to the segmentation information of the tensor, the corresponding relation between the tensor and the process and the segmentation information of the operator variable; and
the execution module is used for distributing the operation of the training model to a plurality of nodes for execution according to the segmentation information of the tensor, the corresponding relation between the tensor and the process and the segmentation information of the variables of the operator; and in the process of executing the operation of training the model, inputting the tensor and the node identification of the corresponding deposit node into a deposit node transfer interface so as to transfer the tensor from the corresponding storage node to the computing node.
7. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
8. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.
9. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1-5.
CN202111513923.3A 2021-12-10 2021-12-10 Method for generating execution configuration information, method and device for model training Active CN114202027B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111513923.3A CN114202027B (en) 2021-12-10 2021-12-10 Method for generating execution configuration information, method and device for model training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111513923.3A CN114202027B (en) 2021-12-10 2021-12-10 Method for generating execution configuration information, method and device for model training

Publications (2)

Publication Number Publication Date
CN114202027A CN114202027A (en) 2022-03-18
CN114202027B true CN114202027B (en) 2023-05-23

Family

ID=80652674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111513923.3A Active CN114202027B (en) 2021-12-10 2021-12-10 Method for generating execution configuration information, method and device for model training

Country Status (1)

Country Link
CN (1) CN114202027B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114841315A (en) * 2022-04-22 2022-08-02 北京百度网讯科技有限公司 Method and system for implementing hybrid expert model, electronic device and storage medium
CN114884908B (en) * 2022-04-29 2024-02-13 浪潮电子信息产业股份有限公司 Data synchronization method, device, equipment and storage medium
CN114820279B (en) * 2022-05-18 2023-03-24 北京百度网讯科技有限公司 Distributed deep learning method and device based on multiple GPUs and electronic equipment
CN114997329A (en) * 2022-06-21 2022-09-02 北京百度网讯科技有限公司 Method, apparatus, device, medium and product for generating a model
CN116596091B (en) * 2022-11-08 2024-02-02 北京百度网讯科技有限公司 Model training method, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10572773B2 (en) * 2017-05-05 2020-02-25 Intel Corporation On the fly deep learning in machine learning for autonomous machines
US20210133591A1 (en) * 2019-11-04 2021-05-06 Baidu Usa Llc Reducing training times of deep neural networks through efficient hybrid parallelism
CN112183668B (en) * 2020-11-03 2022-07-22 支付宝(杭州)信息技术有限公司 Method and device for training service models in parallel
CN113568860B (en) * 2021-07-23 2022-08-19 北京百度网讯科技有限公司 Deep learning-based multi-machine cluster topology mapping method and device and program product

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周详.《分布式机器学习系统调度技术优化研究》.《中国优秀硕士学位论文全文数据库 信息科技辑》.2021,(第第04期期),第I140-47页. *

Also Published As

Publication number Publication date
CN114202027A (en) 2022-03-18

Similar Documents

Publication Publication Date Title
CN114202027B (en) Method for generating execution configuration information, method and device for model training
CN112559631B (en) Data processing method and device of distributed graph database and electronic equipment
CN114091685B (en) Tensor segmentation method, device and equipment for deep learning framework and storage medium
CN112866391A (en) Message pushing method and device, electronic equipment and storage medium
CN114880337B (en) Map data integrated updating method, device, equipment and storage medium
CN113344074B (en) Model training method, device, equipment and storage medium
CN112559632B (en) State synchronization method and device of distributed graph database, electronic equipment and medium
CN112560936A (en) Model parallel training method, device, equipment, storage medium and program product
CN116938953A (en) Block chain-based data processing method and device, electronic equipment and storage medium
CN116860751A (en) Data processing method and device, electronic equipment and storage medium
US20220269659A1 (en) Method, device and storage medium for deduplicating entity nodes in graph database
CN116303461A (en) Component library creation method and device, electronic equipment and storage medium
CN114091686B (en) Data processing method and device, electronic equipment and storage medium
CN115905322A (en) Service processing method and device, electronic equipment and storage medium
CN115730681B (en) Model training method, device, equipment and storage medium
CN116431698B (en) Data extraction method, device, equipment and storage medium
CN112783507B (en) Data stream guiding playback method and device, electronic equipment and readable storage medium
CN112835007B (en) Point cloud data conversion method and device, electronic equipment and storage medium
CN112580803B (en) Model acquisition method, apparatus, electronic device, storage medium, and program product
CN113326890B (en) Labeling data processing method, related device and computer program product
CN114650222A (en) Parameter configuration method and device, electronic equipment and storage medium
CN114494818A (en) Image processing method, model training method, related device and electronic equipment
CN118034647A (en) Task processing method, device and medium in workflow oriented to low-code platform
CN116800665A (en) Edge node management method, device, equipment and storage medium
CN113986112A (en) Soft keyboard display method, related device and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant