CN114862655B

CN114862655B - Operation control method and device for model training and electronic equipment

Info

Publication number: CN114862655B
Application number: CN202210542222.0A
Authority: CN
Inventors: 曾锦乐; 李敏; 吴志华
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2023-03-10
Anticipated expiration: 2042-05-18
Also published as: CN114862655A

Abstract

The application discloses an operation control method and device for model training and electronic equipment, and relates to the technical field of artificial intelligence, in particular to the technical field of natural language processing and computer vision. The specific implementation scheme is as follows: in the process of performing model training by using the first batch of training data by the multiple GPUs, reading second batches of training data after the first batch by using the multiple first processes running on the CPUs, performing data sorting on the multiple second batches of training data according to data lengths to determine target training data corresponding to each first process, sending the target training data to the second process corresponding to each first process, so that after the GPU corresponding to each second process performs model training by using the first batch of training data, performing model training by using the target training data received by the second process, and preprocessing the training data by using the multiple first processes running on the CPUs, so that the model training and the training data preprocessing are performed synchronously, and the efficiency is improved.

Description

Operation control method and device for model training and electronic equipment

Technical Field

The application relates to the technical field of artificial intelligence, in particular to the technical field of natural language processing and computer vision, and particularly relates to an operation control method and device for model training and electronic equipment.

Background

The neural network can be trained through the GPU, and the training speed of the neural network can be accelerated by training the neural network through the GPUs.

However, with the development of deep neural networks, in the face of larger-scale training data, when training a more complex deep neural network, even if a plurality of GPUs are used to train the deep neural network, the training time is still too long, and the efficiency is still low.

Disclosure of Invention

The present disclosure provides an operation control method, apparatus, electronic device, and storage medium for model training with higher efficiency.

According to an aspect of the present disclosure, there is provided an operation control method for model training, including:

in the process of carrying out model training by using a first batch of training data by a plurality of GPUs, reading a second batch of training data after the first batch by using a plurality of first processes operated on a CPU;

performing data sorting on the second batch of training data read by the plurality of first processes according to the lengths of the training data read by all the first processes, so as to determine target training data corresponding to each first process according to a sorting result;

and sending the target training data to a second process corresponding to each first process, so that the GPU corresponding to each second process performs model training by using the target training data received by the second process after model training by using the training data of the first batch is completed.

According to another aspect of the present disclosure, there is provided an operation control apparatus for model training, including:

the reading module is used for reading the training data of a second batch after a first batch by adopting a plurality of first processes operated on a CPU in the process of carrying out model training by adopting the training data of the first batch by a plurality of GPUs;

the determining module is used for sorting the second batch of training data read by the plurality of first processes according to the lengths of the training data read by all the first processes so as to determine target training data corresponding to each first process according to a sorting result;

and the sending module is used for sending the target training data to a second process corresponding to each first process, so that the GPU corresponding to each second process adopts the target training data received by the second process to perform model training after the model training is completed by adopting the training data of the first batch.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the foregoing method embodiments.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the foregoing method embodiments.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program, wherein the computer program, when executed by a processor, implements the method of the aforementioned method embodiments.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic flowchart of an operation control method for model training according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart diagram illustrating another operation control method for model training according to an embodiment of the present disclosure;

FIG. 3 is a schematic flowchart of another operation control method for model training according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of training data preprocessing provided by an embodiment of the present application;

FIG. 5 is a schematic flowchart of another operation control method for model training according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of interaction between a first process and a second process in a CPU according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an operation control device for model training according to an embodiment of the present application;

fig. 8 is a schematic block diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

An operation control method, an apparatus, and an electronic device for model training according to an embodiment of the present application are described below with reference to the drawings.

Fig. 1 is a schematic flowchart of an operation control method for model training according to an embodiment of the present disclosure.

The execution main body of the remote control method in the embodiment of the application is a Central Processing Unit (CPU).

As shown in fig. 1, the method comprises the following steps:

step 101, in the process of performing model training by using a first batch of training data by a plurality of GPUs, a plurality of first processes running on a CPU are used to read a second batch of training data after the first batch.

The model trained by the Graphics Processing Unit (GPU) using the first batch of training data is a model of a neural network, such as a pre-trained language Representation (BERT) model, and training of the BERT model depends on GPU computing resources. In the training process, the model needs to be trained for multiple times by using multiple batches of training data to improve the training effect of the model, and for distinguishing, the training data is called as first batch of training data and second batch of training data. Therefore, the first batch of training data is the batch of training data used in the current round of training of the model, and the current round of training is any round of training process before the last round of training of the model, for example, the model needs to be trained 10 times, so the first batch of training data may be the training data corresponding to the first round of training of the model, or the training data used in any round of 2 nd to 9 th rounds. And the second batch of training data is the training data which is required to be adopted after the model is trained by adopting the first batch of training data.

Two processes run in a Central Processing Unit (CPU), and for distinction, the processes are called a first process and a second process, and the first process and the second process have a corresponding relationship. The first process is used for reading training data used for training the model and performing data processing to obtain target training data meeting the training requirements of the model. And the second process is used for controlling the first process to read the training data needing to be preprocessed, namely the training data of the second batch, and acquiring the target training data from the first process after the training data is processed by the first process, so that the target training data is adopted by the corresponding GPU to train the model.

In the embodiment of the application, in the process of performing model training by using the training data of the first batch by the multiple GPUs, the GPUs are running, the CPUs are in a relatively idle state, the training data of the second batch after the first batch is read through the first thread running in the CPUs, the GPUs are not required to read the training data of the next batch, the load of the GPUs is reduced, the synchronous operation of the GPUs on the training of the models and the synchronous operation of the GPU on the training data of the next batch which is pre-read in the CPUs is realized, and the efficiency is improved.

The training data of each batch may be stored in a non-volatile memory, such as a magnetic disk, a flash memory, a solid-state memory, an optical disk, and the like.

And 102, sorting the second batch of training data read by the plurality of first processes according to the lengths of the training data read by all the first processes, so as to determine target training data corresponding to each first process according to a sorting result.

Furthermore, after the multiple first processes running on the CPU read the second batch of training data after the first batch, the second batch of training data read by all the first processes is subjected to data sorting according to the length of the training data, that is, the second batch of training data read by the multiple first processes is taken as a total data set, and the multiple training data in the data set are subjected to data sorting according to the length of the training data, so as to obtain a sorting result of the second batch of training data read by the multiple processes, wherein the sorting may be performed in a descending order according to the length of the data or performed in an ascending order according to the length of the data. Furthermore, the target training data corresponding to each first process is determined from the sequencing result, and the diversity of the training data is improved through the reselection of the training data, so that the training effect is improved. As an implementation mode, the sequence of the second batch of training data read by the plurality of first processes is disordered through data sorting, and then according to a sorting result, a random number can be generated to serve as an index for data reading, and the target training data corresponding to each first process is selected randomly, so that the effect of randomly reading data is achieved, the randomness of the target training data corresponding to each first process is improved, and the effect of training the model by adopting the target training data is improved.

And 103, sending target training data to second processes corresponding to the first processes, so that the GPUs corresponding to the second processes perform model training by using the target training data received by the second processes after model training by using the training data of the first batch is completed.

The model training is performed by adopting a plurality of GPUs (graphics processing units) in parallel, namely, the GPUs train one model in parallel, and the training processes of the GPUs on the model are mutually independent. The CPU comprises a plurality of processes which are divided into a first process and a second process, the first process corresponds to the second process, each GPU corresponds to one second process, and the second process corresponding to each GPU is used for controlling the GPU to train the model. As an example, the BERT model is trained in parallel by using M GPUs, and then 2 × M threads are started in the CPU, which are M first threads and M second threads, respectively.

In the embodiment of the application, after the second batch of training data acquired by the plurality of first processes is processed by the CPU, the target training data is sent to the second process corresponding to each first process, so that the second process transmits the acquired target training data to the corresponding GPU, and after the GPU finishes model training by using the first batch of training data, the GPU performs model training by using the received target training data.

In the operation control method for model training in the embodiment of the application, in the process that the GPU trains the model by adopting the first batch of training data, the multiple first processes running on the CPU are adopted to preprocess the second batch of training data required to be adopted by the GPU, and the target training data corresponding to each first batch are sent to each corresponding second process, so that after the GPU finishes training by adopting the current batch, the target training data received by the second processes are adopted to train the model, the training of the model by the GPU and the processing of the subsequent training data by the CPU are synchronously carried out, and the operation control efficiency of the model training is improved.

Based on the foregoing embodiment, the embodiment of the present application further provides a possible implementation manner of an operation control method for model training, and fig. 2 is a schematic flow diagram of the operation control method for model training provided in the embodiment of the present application, and as shown in fig. 2, the method includes the following steps:

step 201, in the process of performing model training by using the training data of the first batch by the multiple GPUs, a second process running on the CPU is used to send control information to a corresponding first process.

In this embodiment of the present application, the second process running on the CPU is configured to send a control message to the first process. The control information is used for controlling the first process to read the training data of the second batch after the first batch, so that the first process is controlled by the second process running in the CPU, the training data of the second batch can be read in advance by the first process running in the CPU, the preprocessing is facilitated, and the calculation load of the GPU is reduced.

Step 202, synchronizing the second batch of training data read by the plurality of first processes among the plurality of first processes, so that each first process obtains a data set of the training data read by the plurality of first processes.

In an implementation manner of the embodiment of the application, the multiple first processes may communicate with each other through a communication tool between the processes, so that each first process can obtain data of the second batch read by other first processes, that is, each first process obtains a data set of training data read by the multiple first processes, thereby implementing data synchronization between the multiple first processes, and implementing that each first process obtains the same training data of the second batch.

And 203, adopting each first process to sort the training data in the obtained data set according to the data length, and determining corresponding target training data from the data set according to the sorting result.

In the embodiment of the application, the training data in the data sets obtained by each first process are sorted according to the data length, and according to the sorting result, the random numbers generated randomly can be used as index numbers to determine the corresponding target training data from the data sets. The second batch of training data obtained by all the first processes are synchronized among the first processes, and then the first processes independently perform data processing to obtain corresponding target data, so that the second batch of training data is processed in the plurality of first processes in the CPU instead of the GPU, the problem that the throughput of model training is influenced when subsequent training data are processed in the GPU is avoided, and the efficiency of model training is improved.

And 204, sending target training data to second processes corresponding to the first processes, so that the GPUs corresponding to the second processes perform model training by using the target training data received by the second processes after model training by using the training data of the first batch is completed.

In step 204, the explanation in the foregoing embodiments can be referred to, and the principle is the same, which is not described herein again.

In the operation control method for model training according to the embodiment of the application, the first process is controlled by the second process operated in the CPU, so that the first process operated in the CPU can read the training data of the second batch in advance, thereby facilitating the preprocessing of the data and reducing the computational load of the GPU terminal. And then, the second batch of training data acquired by all the first processes is synchronized among the first processes, and then each first process independently performs data processing to obtain corresponding target data, so that the second batch of training data is processed in a plurality of first processes in the CPU instead of the GPU, the problem that the throughput of model training is influenced by the processing of subsequent training data in the GPU is avoided, and the efficiency of model training is improved.

Based on the foregoing embodiments, the present application further provides a possible implementation manner of an operation control method for model training, fig. 3 is a schematic flow chart of another operation control method for model training provided in the present application, and as shown in fig. 3, step 203 includes the following steps:

step 301, for any first process, data sorting is performed on training data in the data set according to the data length.

And training data in a data set corresponding to any first process are sorted in an ascending order or a descending order according to the data length.

Step 302, determining a sequence number corresponding to the first process according to the process number of the first process and the total number of the first processes running on the CPU.

As an implementation manner, a corresponding sequence number q is determined according to a process number i of a first process and a total number M of the first processes running on the CPU, where a value of q is determined according to n × M + i, n is an integer, and n is a multiple of the total number of the first processes.

Wherein, the process number i of the first process may be randomly determined according to the number of the first processes.

Step 303, according to the sequence number corresponding to the first process, taking the training data ordered as the sequence number in the data set as the target training data corresponding to the first process.

The number of the sequence numbers corresponding to the first process may be one or more, and when the second batch of training data read by the first process includes multiple data, the number of the sequence numbers corresponding to the first process is matched with the number of the data included in the second batch of training data, for example, if the second batch of training data acquired by the first process 1 includes 2 data, the number of the sequence numbers corresponding to the first process 1 is also determined to be 2, so as to reacquire 2 data to serve as the target training data.

For clarity, in order to illustrate that, according to the second batch of training data read by the multiple first processes, the target training data corresponding to each first process is determined, as an example, as shown in fig. 4, the number of the first processes is 2, which is respectively a first process 1 and a first process 2, that is, the value of M is 2, and the first process includes 4 training data, for example, it is determined that n takes values of 0,1,2, and 3, respectively. Therefore, the following description will be given by taking the example of determining the target training data corresponding to the first process.

Firstly, synchronizing a second batch of training data read by a first process 1 and a second batch of training data read by a first process 2 to obtain a data set corresponding to the first process 1, wherein the data set comprises the second batch of training data read by the first process 1 and the second batch of training data read by the first process 2, and further, sequencing the training data in the data set corresponding to the first process 1 in a descending order according to the data length to obtain a sequencing result, wherein the process number of the first process is 1, the M value is 2, and n respectively takes values of 0,1,2 and 3.

Therefore, according to n × M + i, it is determined that the sequence numbers corresponding to the first process are 1,3,5 and 7, respectively, and according to the sequence in the data set, the training data Seq (0, 4) ranked as sequence number 1, the training data Seq (0, 1) ranked as sequence number 3, the training data Seq (0, 2) ranked as sequence number 5, and the training data Seq (1, 1) ranked as sequence number 7 in the data set are used as the target training data corresponding to the first process, so that the training data is extracted from the data set corresponding to the first process based on the process number of the first process and the total number of the first process, the target training data corresponding to the first process includes the training data of the second batch read by other first processes, diversification of samples is realized, and the effect of subsequent model training is improved. Similarly, the target training data corresponding to the first process 2 can be determined, and the principle is the same, which is not described herein again.

According to the operation control method for model training, training data are extracted from the data set corresponding to the first process based on the process number of the first process and the total number of the first processes aiming at each first process, so that target training data corresponding to the first process comprise second batches of training data read by other first processes, sample diversification is achieved, and the effect of subsequent model training is improved.

Based on the foregoing embodiments, the present application further provides a possible implementation manner of an operation control method for model training, and fig. 5 is a schematic flow diagram of another operation control method for model training provided in the present application, and as shown in fig. 5, the method includes the following steps:

step 501, in the process of performing model training by using a first batch of training data by a plurality of GPUs, a plurality of first processes running on a CPU are used to read a second batch of training data after the first batch.

Step 502, performing data sorting on the second batch of training data read by the plurality of first processes according to the lengths of the training data read by all the first processes, so as to determine target training data corresponding to each first process according to a sorting result.

Step 501 and step 502 may refer to the explanations in the foregoing embodiments, and the principle is the same, which is not described herein again.

Step 503, monitoring the target sub-processes in each second process.

The target sub-process is configured to obtain the target training data from the first process corresponding to the second process, and as an implementation manner, inter-process communication may be implemented by using an inter-process communication tool, such as an MPI.

In the embodiment of the present application, each second process includes at least one sub-process, where the target sub-process is one of the at least one sub-process.

Step 504, when it is monitored that the target subprocess of at least one second process acquires the target training data, the target training data is sent to the corresponding GPU, so that the GPU corresponding to each second process performs model training by using the target training data received by the second process after the model training by using the training data of the first batch is completed.

In the embodiment of the application, the target subprocesses in each second process are monitored, a plurality of subprocesses are arranged in the second process, wherein the subprocesses comprise a subprocess for monitoring whether the affiliated second process acquires target training data from the corresponding first process or not, and a subprocess for controlling the first process to read next batch of training data, so that the subdivision of the functions of the second process is realized, the control of the second process on the first process and the acquisition of the target training data from the first process are separately managed, the reusability of the second process is improved, and when the target subprocess of at least one second process acquires the target training data, the first process corresponding to the second process determines the target training data, the target training data can be sent to the corresponding GPU, so that the GPU corresponding to each second process adopts the target training data received by the second process to perform model training after the model training is completed by adopting the training data of the first batch, the GPU is not required to wait for receiving the training data of the second batch and processing the training data process after the model training is performed by adopting the training data of the first batch, the processing operation on the input training data on the GPU is placed in the first process and the second process in the CPU to be performed, the CPU operation and the GPU operation are performed synchronously by preprocessing the training data of the next batch, and the BERT model training speed is obviously improved.

As an implementation manner, when it is monitored that the target subprocess of at least one second process acquires the target training data, the corresponding GPU waits for the completion of model training by using the first batch of training data, and when the corresponding GPU completes model training by using the first batch of training data, the target training data is sent to the corresponding GPU and triggers the corresponding GPU to perform model training by using the target training data. By setting a plurality of first processes and second processes in the CPU and communicating the first processes and the second processes, a target subprocess in the second process can acquire target training data of a next batch obtained by preprocessing the first processes, and under the condition that the GPU adopts the training data of the first batch to perform model training, the target subprocess in the second process is controlled to send the target training data to the corresponding GPU and trigger the GPU to perform model training by adopting the target training data, so that the efficiency of GPU model training is improved.

According to the operation control method for model training, after the GPU corresponding to each second process adopts the training data of the first batch to perform model training, the target training data received by the second process is adopted to perform model training, the GPU is not required to wait for receiving the training data of the second batch and processing the training data process after the GPU adopts the training data of the first batch to perform model training, the processing operation on the input training data on the GPU is placed in the first process and the second process in the CPU, the CPU operation and the GPU operation are performed synchronously by preprocessing the training data of the next batch, and the BERT model training speed is remarkably improved.

Based on the foregoing embodiment, fig. 6 is a schematic diagram of interaction between a first process and a second process in a CPU provided in the embodiment of the present application, which is specifically described as follows:

taking the model training as an example, a second process running in the CPU sends control information, and sends the control information generated by the second process to a corresponding first process through communication between the second process and the first process, and the first process reads first batch of training data from a storage unit of set training data in response to the received control information. And then, executing a training data selection process, namely Exchange Padding, in the first process, synchronizing second batches of training data read by the first processes among the first processes so that each first process obtains a data set of the training data read by the first processes, adopting each first process to sort the training data in the obtained data set according to the data length, determining corresponding target training data from the data set according to the sorting result, and sending the target training data to the corresponding second process so that the GPU corresponding to each second process adopts the target training data to perform model training. Meanwhile, the second process sends out control information to control the corresponding first process to pre-read the training data of the second batch after the first batch, and the training data of the second batch is processed by the same method as the training data of the first batch, which is not described in this embodiment again.

It should be noted that the explanation of the operation control method for model training in the foregoing embodiment is also applicable to this embodiment, and details are not described here.

In the embodiment of the application, the processing operation of the GPU on the input training data is carried out in the first process and the second process in the CPU, the CPU operation and the GPU operation are carried out synchronously by preprocessing the training data of the next batch, and the training speed of the BERT model is remarkably improved.

In order to implement the above embodiments, an operation control device for model training is further provided in the embodiments of the present application. Fig. 7 is a schematic structural diagram of an operation control device for model training according to an embodiment of the present application.

As shown in fig. 7, the apparatus includes:

the reading module 71 is configured to, in the process of performing model training by using the first batch of training data by the multiple GPUs, read, by using multiple first processes running on the CPU, the second batch of training data after the first batch.

The determining module 72 is configured to perform data sorting on the second batch of training data read by the multiple first processes according to lengths of the training data read by all the first processes, so as to determine, according to a sorting result, target training data corresponding to each first process.

A sending module 73, configured to send the target training data to a second process corresponding to each of the first processes, so that after the GPU corresponding to each of the second processes completes model training by using the training data of the first batch, model training is performed by using the target training data received by the second process.

As an implementation manner, the reading module 71 is specifically configured to:

and in the process that a plurality of GPUs carry out model training by adopting the training data of the first batch, each second process running on the CPU is adopted to send control information to the corresponding first process, wherein the control information is used for controlling the first process to read the training data of the second batch after the first batch.

As one implementation, the sending module 73 includes:

the monitoring unit is used for monitoring target subprocesses in the second processes; the target subprocess is used for acquiring the target training data from the first process corresponding to the second process;

and the sending unit is used for sending the target training data to the corresponding GPU under the condition that it is monitored that the target subprocess of at least one second process acquires the target training data, so that the GPU corresponding to each second process adopts the target training data received by the second process to perform model training after the model training is completed by adopting the training data of the first batch.

As an implementation manner, the sending unit is specifically configured to:

waiting for the corresponding GPU to finish model training by adopting the training data of the first batch under the condition that the target subprocess of at least one second process acquires the target training data;

and under the condition that the corresponding GPU adopts the training data of the first batch to perform model training, sending the target training data to the corresponding GPU and triggering the corresponding GPU to perform model training by adopting the target training data.

As an implementation, the determining module 72 is specifically configured to:

synchronizing the second batch of training data read by the plurality of first processes among the plurality of first processes so that each first process obtains a data set of the training data read by the plurality of first processes;

and performing data sorting on the training data in the obtained data sets by adopting the first processes according to the data length, and determining corresponding target training data from the data sets according to sorting results.

As an implementation manner, the determining module 72 is further specifically configured to:

for any first process, carrying out data sorting on the training data in the data set according to the data length;

and according to the sequence number corresponding to the first process, taking the training data which are sequenced into the sequence number in the data set as target training data corresponding to the first process.

As one implementation, the determining module 72 is further configured to:

determining a corresponding sequence number q according to the process number i of the first process and the total number M of the first processes running on the CPU;

wherein, the value of q is determined according to n M + i, and n is an integer.

It should be noted that the foregoing explanation of the method embodiments is also applicable to the apparatus of this embodiment, and the principle is the same, and is not repeated here.

In the operation control device for model training of the embodiment of the application, in the process that the GPU trains the model by adopting the first batch of training data, the second batch of training data required to be adopted by the GPU is preprocessed by adopting a plurality of first processes operated on the CPU, the target training data corresponding to each first batch are sent to each corresponding second process, and after the GPU finishes training by adopting the current batch, the model training is carried out by adopting the target training data received by the second processes, so that the training of the model by the GPU and the subsequent training data processing by the CPU are synchronously carried out, and the operation control efficiency of the model training is improved.

In order to implement the foregoing embodiments, the present application further proposes an electronic device, which includes a memory, a processor and a computer program stored on the memory and executable on the processor, and when the processor executes the program, the electronic device implements the method according to the foregoing method embodiments.

In order to implement the above-mentioned embodiments, the present application also proposes a non-transitory computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the method as described in the foregoing method embodiments.

In order to implement the above-mentioned embodiments, the present application also proposes a computer program product having a computer program stored thereon, which, when being executed by a processor, implements the method as described in the aforementioned method embodiments.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 8 is a schematic block diagram of an electronic device provided in an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 8, the electronic apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes in accordance with a computer program stored in a ROM (Read-Only Memory) 802 or a computer program loaded from a storage unit 808 into a RAM (Random Access Memory) 803. In the RAM 803, various programs and data required for the operation of the electronic apparatus 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An I/O (Input/Output) interface 805 is also connected to the bus 804.

A number of components in the electronic device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 808 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing Unit 801 include, but are not limited to, a CPU (Central Processing Unit), a GPU (graphics Processing Unit), various dedicated AI (Artificial Intelligence) computing chips, various computing Units running machine learning model algorithms, a DSP (Digital Signal Processor), and any suitable Processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as the operation control method for model training. For example, in some embodiments, the run control method for model training may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication unit 809. When the computer program is loaded into RAM 803 and executed by the computing unit 801, one or more steps of the run control method for model training described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the run control method for model training in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be realized in digital electronic circuitry, integrated circuitry, FPGAs (Field Programmable Gate arrays), ASICs (Application-Specific Integrated circuits), ASSPs (Application Specific Standard products), SOCs (System On Chip, system On a Chip), CPLDs (Complex Programmable Logic devices), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present application may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an EPROM (Electrically Programmable Read-Only-Memory) or flash Memory, an optical fiber, a CD-ROM (Compact Disc Read-Only-Memory), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a Display device (e.g., a CRT (Cathode Ray Tube) or LCD (Liquid Crystal Display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: LAN (Local Area Network), WAN (Wide Area Network), internet, and blockchain Network.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in a conventional physical host and a VPS (Virtual Private Server). The server may also be a server of a distributed system, or a server incorporating a blockchain.

According to an embodiment of the present application, there is also provided a computer program product, which when executed by an instruction processor in the computer program product, executes the operation control method for model training proposed in the above embodiment of the present application.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An operation control method for model training, comprising:

in the process that a plurality of GPUs carry out model training by adopting training data of a first batch, a plurality of first processes running on the CPU are adopted to read the training data of a second batch after the first batch, wherein the model training is carried out by adopting the plurality of GPUs for parallel training, and the model training processes of the GPUs are mutually independent;

sending the target training data to a second process corresponding to each first process, so that after the GPU corresponding to each second process adopts the training data of the first batch to perform model training, the target training data received by the second process is adopted to perform model training;

wherein, in the process that the multiple GPUs perform model training by using the training data of the first batch, reading the training data of the second batch after the first batch by using the multiple first processes operated on the CPU, comprises:

and in the process of carrying out model training by using the first batch of training data by the multiple GPUs, sending control information to the corresponding first process by using each second process operated on the CPU, wherein the control information is used for controlling the first process to read the second batch of training data after the first batch.

2. The method according to claim 1, wherein the sending the target training data to the second process corresponding to each of the first processes, so that after the GPU corresponding to each of the second processes performs model training by using the training data of the first batch, performing model training by using the target training data received by the second process includes:

monitoring target subprocesses in the second processes; the target subprocess is used for acquiring the target training data from the first process corresponding to the second process;

and under the condition that it is monitored that the target sub-process of at least one second process acquires the target training data, sending the target training data to the corresponding GPU, so that the GPU corresponding to each second process performs model training by adopting the target training data received by the second process after the model training by adopting the training data of the first batch is completed.

3. The method according to claim 2, wherein, when it is monitored that the target subprocess of at least one of the second processes acquires the target training data, the target training data is sent to the corresponding GPU, so that after the GPU corresponding to each of the second processes finishes performing model training by using the training data of the first batch, the method for performing model training by using the target training data received by the second process includes:

4. The method according to any one of claims 1 to 3, wherein the performing data sorting on the second batch of training data read by the plurality of first processes according to lengths of the training data read by all the first processes to determine the target training data corresponding to each first process according to a sorting result includes:

and adopting the first processes to carry out data sorting on the training data in the obtained data sets according to the data length, so as to determine corresponding target training data from the data sets according to sorting results.

5. The method according to claim 4, wherein the step of performing data sorting on the training data in the respective obtained data sets by using the respective first processes according to data lengths so as to determine corresponding target training data from the data sets according to sorting results includes:

6. The method of claim 5, wherein the method further comprises:

7. An operation control device for model training, comprising:

the reading module is used for reading training data of a second batch after a first batch by adopting a plurality of first processes operated on a CPU in the process that a plurality of GPUs carry out model training by adopting the training data of the first batch, wherein the model training is carried out by adopting the GPUs for parallel training, and the process of model training by each GPU is mutually independent;

a determining module, configured to perform data sorting on the second batch of training data read by the multiple first processes according to lengths of the training data read by all the first processes, so as to determine, according to a sorting result, target training data corresponding to each first process;

a sending module, configured to send the target training data to a second process corresponding to each of the first processes, so that after the GPU corresponding to each of the second processes completes model training using the training data of the first batch, model training is performed using the target training data received by the second process;

the reading module is specifically configured to:

8. The apparatus of claim 7, wherein the means for transmitting comprises:

9. The apparatus according to claim 8, wherein the sending unit is specifically configured to:

10. The apparatus according to any one of claims 7 to 9, wherein the determining means is specifically configured to:

11. The apparatus of claim 10, wherein the determining module is further specifically configured to:

and according to the sequence number corresponding to the first process, taking the training data which is sequenced into the sequence number in the data set as target training data corresponding to the first process.

12. The apparatus of claim 11, wherein the means for determining is further configured to:

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.