WO2021135810A1 - 数据处理方法及装置、计算机设备、存储介质、计算机程序 - Google Patents

数据处理方法及装置、计算机设备、存储介质、计算机程序 Download PDF

Info

Publication number
WO2021135810A1
WO2021135810A1 PCT/CN2020/133647 CN2020133647W WO2021135810A1 WO 2021135810 A1 WO2021135810 A1 WO 2021135810A1 CN 2020133647 W CN2020133647 W CN 2020133647W WO 2021135810 A1 WO2021135810 A1 WO 2021135810A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
prefetches
adjustment step
training
Prior art date
Application number
PCT/CN2020/133647
Other languages
English (en)
French (fr)
Inventor
张衡
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Priority to KR1020217031141A priority Critical patent/KR20210130796A/ko
Priority to JP2021557139A priority patent/JP2022526333A/ja
Priority to SG11202110625XA priority patent/SG11202110625XA/en
Publication of WO2021135810A1 publication Critical patent/WO2021135810A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the field of machine learning technology, and in particular to a data processing method and device, computer equipment, storage medium, and computer program.
  • Deep learning models require multiple rounds of iterative training based on a large number of sample data.
  • it is usually realized by multi-process parallel training.
  • each parallel process will also pre-read the training data required for the next round of training.
  • each parallel process needs to communicate and synchronize data after executing this round of training, if a process reads the training data used in the next round of training too slowly, it will cause the entire training process to be Delay, which in turn leads to a decrease in training efficiency.
  • the embodiments of the present disclosure provide at least one data processing method and device.
  • the embodiments of the present disclosure provide a data processing method, which is applied to the training of a deep learning model.
  • the training includes one or more processes, including: for a target process among the one or more processes, Perform the first update process on the prefetched quantity of sample data to obtain the target prefetched quantity; in response to the quantity of sample data currently included in the pre-sampled data queue corresponding to the target process does not reach the target prefetched quantity, read New sample data, and store the read new sample data in the pre-sample data queue.
  • the main process performs the first update process on the prefetch quantity to obtain the target prefetch quantity, and when the sample data currently included in the data queue does not reach the target prefetch quantity, the new sample data pool is read Therefore, after the main process is able to perform one iteration of training, the sample data needed for the next iteration of training has been read; in fact, in most cases, the main process reads the data
  • the time it takes is often less than the time it takes to perform an iterative training, so it can be guaranteed that there will always be a sufficient number of sample data stored in the data queue to meet the use of subsequent iterations of training, even if the main process If the reading time of a certain sample data is too long, it will not cause the iterative training to be delayed due to the number of samples not being read in time, thereby improving the training efficiency.
  • the performing the first update process on the prefetch quantity of sample data to obtain the target prefetch quantity includes: according to the total current occupation of the pre-sample data queue corresponding to the one or more processes The memory space and the upper threshold of the memory usage are first updated to obtain the target prefetch quantity.
  • the amount of prefetched sample data can be dynamically updated based on the total memory space currently occupied by the pre-sampled data queue and the upper threshold of memory usage, and the amount of prefetched sample data can be flexibly allocated to meet training requirements.
  • the first update processing is performed on the prefetched quantity of sample data, Obtaining the target prefetch quantity, including: according to the total memory space currently occupied by the pre-sampled data queue corresponding to the one or more processes, the upper memory usage threshold, and the data for training the deep learning model by the target process Throughput, the first update process is performed on the prefetch quantity of sample data to obtain the target prefetch quantity.
  • the number of prefetches is dynamically updated based on the total memory space currently occupied by the pre-sampled data queue, the upper threshold of memory usage, and the data throughput for training the deep learning model.
  • the pre-sampling The amount of data in this data queue can keep up with the consumption of sample data.
  • the data throughput is reduced, the amount of memory occupied by the pre-sampled data queue can be reduced as much as possible, and the excess memory can be used for other tasks, making adjustments more flexible .
  • the first update processing is performed on the prefetched quantity of sample data
  • Obtaining the target prefetch quantity includes: when the total memory space currently occupied by the pre-sampled data queues corresponding to the one or more processes does not reach the upper memory usage threshold, increasing the prefetch quantity by the first Adjust the step size to obtain the target prefetch quantity; and/or when the total memory space currently occupied by the pre-sampled data queues corresponding to the one or more processes reaches the upper memory usage threshold, all The number of prefetches is reduced by a second adjustment step to obtain the target number of prefetches.
  • the total memory space currently occupied by the pre-sampled data queue does not reach the upper threshold of the memory usage, as much pre-sampled data as possible, the total memory space currently occupied by the pre-sampled data queue reaches the memory
  • the upper threshold is used, the number of sample data prefetches is reduced, and the length of the pre-sampled data queue is flexibly adjusted.
  • the number of prefetches is increased
  • the first adjustment step is larger to obtain the target prefetch quantity, including: the total memory space currently occupied by the pre-sampled data queues corresponding to the one or more processes does not reach the upper memory usage threshold, and the
  • the number of prefetches is increased by a first adjustment step to obtain the target number of prefetches.
  • the method further includes: the total memory space currently occupied by the pre-sampled data queues corresponding to the one or more processes does not reach the upper memory usage threshold, and the data throughput If the preset data throughput condition is not met, the number of prefetches is reduced by a third adjustment step to obtain the target number of prefetches.
  • the preset data throughput condition includes at least one of the following: a current value of the data throughput is greater than a historical value, wherein the historical value is a number of data before the current iterative training
  • the average value of the data throughput of the historical iterative training is either the value of the data throughput of the previous iterative training of the current iterative training; or the current value of the data throughput is greater than the data throughput threshold.
  • the method further includes: performing a second update process on the adjustment step size of the prefetch quantity to obtain a target adjustment step size, wherein the target adjustment step size is used for the prefetching The next update of the quantity is processed.
  • the performing second update processing on the adjustment step size of the prefetch quantity to obtain the target adjustment step size includes: increasing the prefetch quantity in the first update processing In the case, the adjustment step size of the prefetch quantity is increased; and/or in the case of reducing the prefetch quantity in the first update process, the adjustment step size of the prefetch quantity is reduced.
  • the number of prefetches can be increased faster to ensure that the sample data stored in the pre-sampled data queue reaches a larger number faster to meet the use of subsequent training iteration cycles Need to avoid delays in the model training process due to the small number of prefetches; at the same time, when the number of prefetches needs to be reduced, reduce the number of prefetches more gently to ensure that the length of the pre-sampled data queue will change more smoothly , To avoid the shock of the training process due to the rapid decline of the number of prefetched sample data.
  • the embodiments of the present disclosure also provide a data processing device, which is applied to the training of a deep learning model.
  • the training includes one or more processes, including: a first update module, which is configured to target the one or more A target process in the process performs the first update process on the prefetch quantity of sample data to obtain the target prefetch quantity; the reading module is used to respond to the samples currently included in the pre-sample data queue corresponding to the target process If the quantity of data does not reach the target prefetch quantity, read new sample data, and store the read new sample data in the pre-sample data queue.
  • a first update module which is configured to target the one or more A target process in the process performs the first update process on the prefetch quantity of sample data to obtain the target prefetch quantity
  • the reading module is used to respond to the samples currently included in the pre-sample data queue corresponding to the target process If the quantity of data does not reach the target prefetch quantity, read new sample data, and store the read new sample data in the pre-
  • embodiments of the present disclosure also provide a computer device, including a processor, a storage medium, and a bus.
  • the storage medium stores machine-readable instructions executable by the processor.
  • the processor and the storage medium communicate through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect or the steps in any one of the possible implementation manners of the first aspect are executed.
  • the embodiments of the present disclosure also provide a computer-readable storage medium with a computer program stored on the computer-readable storage medium.
  • the computer program executes the first aspect or any of the first aspects when the computer program is run by a processor. Steps in one possible implementation.
  • Fig. 1 shows a flowchart of a data processing method provided by an embodiment of the present disclosure.
  • Fig. 2 shows a schematic diagram of a data processing device provided by an embodiment of the present disclosure.
  • Fig. 3 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
  • each process will read in advance the training data needed for the next round of training while performing the calculations of the current round of training. After each parallel process completes this round of training, it needs to communicate and synchronize data with other processes, and the next round of training tasks will only start after the completion of the communication and data synchronization of all processes; if there is any process
  • the training task has a time delay. For example, when the training data used in the next round of training is pre-read, the reading time exceeds the execution time of the training task, which will cause a time delay in the training tasks of all processes. This in turn leads to a decrease in training efficiency.
  • the present disclosure provides a data processing method and device applied to deep learning model training.
  • the data processing method it is possible to perform the first update process on the prefetch quantity of sample data to obtain the target prefetch quantity, and respond to that the quantity of sample data currently included in the presample data queue does not reach the target prefetch quantity , Read new sample data, and store the read new sample data in the pre-sample data queue. In this way, after the main process has performed one iteration of training, the sample data required for the next iteration of training has been read.
  • the main process dynamically updates the prefetch quantity to obtain the target prefetch quantity, and reads new sample data from the sample data pool when the amount of sample data currently included in the data queue does not reach the target prefetch quantity.
  • the time it takes for the main process to read new sample data is often less than the time it takes to perform an iterative training, so it can be guaranteed that there will always be a sufficient number of samples stored in the data queue
  • the data is used to meet the following iterations of training. Even if the main process takes too long to read a certain sample data, it will not cause the iterative training to be delayed due to the insufficient number of samples, thereby improving the training efficiency.
  • a data processing method disclosed in the embodiment of the present disclosure is first introduced in detail.
  • the data processing method provided in the embodiment of the present disclosure is applied to the training of a deep learning model, and its execution body is generally used for training.
  • the data processing method may be implemented by a processor invoking computer-readable instructions stored in the memory.
  • FIG. 1 it is a flowchart of a data processing method provided by an embodiment of the present disclosure.
  • the method includes steps S101 to S102, wherein:
  • S101 Perform a first update process on the prefetch quantity of sample data to obtain the target prefetch quantity
  • S102 In response to the number of sample data currently included in the pre-sample data queue does not reach the target pre-fetch quantity, read new sample data, and store the read new sample data in the pre-sample data queue.
  • one main process can train the deep learning model, and the main process performs the first update process on the prefetch quantity of sample data to obtain the target prefetch quantity.
  • the multiple main processes can train the deep learning model in parallel, and each main process respectively performs the first update process on the prefetch quantity of sample data to obtain the target prefetch quantity.
  • the number of prefetches corresponding to different main processes may be different, and the number of target prefetches corresponding to different main processes may also be different.
  • Each main process corresponds to a pre-sampled data queue.
  • the pre-sampled data queue corresponding to any main process stores multiple sample data, and each main process is based on the data stored in the corresponding pre-sampled data queue.
  • the sample data trains the deep learning model.
  • the pre-sample data queue is, for example, a first-in-first-out queue; the main process starts a new iterative training, and first reads a set of sample data from the pre-sample data queue corresponding to the main process; the set of sample data is being After reading, it will be deleted from the pre-sampled data queue to reserve a storage location for new sample data.
  • the main process will train a deep learning model based on a set of sample data; in a set of sample data, at least one sample data is included.
  • the number of prefetches referred to in the embodiments of the present disclosure refers to the number of sample data groups.
  • the main process reads new sample data in response to the quantity of sample data currently included in the pre-sampled data queue does not reach the current pre-fetch quantity.
  • the main process When the predetermined prefetch quantity update condition is reached, the main process performs the first update process on the current prefetch quantity of sample data to obtain the target prefetch quantity; the main process responds to the sample data currently included in the pre-sampled data queue The number of has not reached the target prefetch number, read new sample data.
  • the prefetch quantity of the sample data and the target prefetch quantity may be the same or different.
  • the prefetch quantity update condition includes, for example, one or more of the following a1 to a3:
  • the update cycle is the cycle for updating the prefetch quantity.
  • the update period is, for example, a time period; for example, the update period can be 1 hour, and the first update process for the prefetched quantity can be triggered every 1 hour.
  • the update period is, for example, a preset number of iterative training; for example, every time the main process performs 5 iterations of the deep learning model, it triggers a first update process for the prefetched number.
  • the time consumed for different iterations of training may be different, and the duration of different update cycles may also be different.
  • a2 The data throughput when training the deep learning model based on sample data is greater than the first threshold.
  • the data throughput is used to characterize the sample data processing speed of the main process when training the deep learning model. If the data throughput of the main process during the training of the deep learning model is greater than the first threshold, it is considered that the sample data stored in the pre-sampled data queue is consumed faster. At this time, if the number of prefetches is kept small, the number of sample data stored in the pre-sampled data queue may not keep up with the consumption of training in time. Therefore, it may be considered to increase the number of sample data prefetched into the pre-sampled data queue, thereby triggering the first update process of the prefetched number of sample data.
  • the data throughput can be obtained in the following way:
  • the prefetch quantity update condition In response to reaching the prefetch quantity update condition, based on the training progress of the deep learning model when the prefetch quantity update condition is reached, determine at least one target iterative training from multiple historical iterative training; according to the sample data used for each target iterative training The number of sample data included in the group and the time spent in iterative training for each target determine the data throughput when training the deep learning model based on the sample data.
  • the target iterative training is, for example, at least one iterative training closest to the time when the prefetch quantity update condition is reached.
  • a main process has performed 5 iterations of training the deep learning model, and when the prefetch quantity update condition is reached, the deep learning model is undergoing the 6th iteration training.
  • the fifth iterative training can be determined as the target iterative training; if the fifth iterative training takes 15 minutes, the number of sample data used is 64, The data throughput is, for example, 64 ⁇ 15.
  • the 3, 4, and 5 iterative training can be determined as the target iterative training; if the 3, 4, and 5 iterative training takes 12 minutes, 14 minutes, and For 15 minutes, the number of sample data used for each iteration training is 64.
  • the data throughput is, for example, 64 ⁇ 3 ⁇ (12+14+15), and the unit is: pieces/minute.
  • the currently ongoing iterative training can also be determined as the target iterative training, and based on the number of samples that have completed training and the duration of the current iterative training, Determine data throughput.
  • the deep learning model has been trained for 5 iterations, and when the prefetch quantity update condition is reached, the deep learning model is being trained for the sixth iteration.
  • the 6th iterative training can be determined as the target iterative training; in the 6th iterative training, 64 samples in a sample data group need to be used to train the deep learning model.
  • the number of sample data that has been trained is currently 30 A.
  • the current training iteration has lasted for 4 minutes, and the data throughput is, for example, 30 ⁇ 4.
  • a3 The data throughput when training the deep learning model based on sample data is less than the second threshold.
  • the second threshold is smaller than the first threshold.
  • the data throughput of the main process during the training of the deep learning model is less than the second threshold, it is considered that the consumption speed of the sample data stored in the pre-sampled data queue is too slow.
  • the number of pre-fetches is kept large, the sample data stored in the pre-sampled data queue may always accumulate, which will occupy a larger memory. Therefore, you can consider reducing the pre-fetch to the pre-sampled data queue.
  • the number of sample data in then triggers the first update process of the prefetched number of sample data.
  • the following method can be used to perform the first update processing on the prefetch quantity of the sample data:
  • the first update processing is performed on the prefetch quantity of the sample data to obtain the target prefetch quantity.
  • the total memory space occupied by the pre-sampled data queue refers to the total memory space occupied by the sample data queues corresponding to all main processes.
  • the number of prefetches is reduced by the second adjustment step to obtain the target number of prefetches.
  • the first adjustment step size refers to the adjustment step size when the prefetch quantity of sample data is increased
  • the second adjustment step size refers to the adjustment step size when the prefetch quantity of the sample data is reduced.
  • first adjustment step length and the second adjustment step length may have the same size or different sizes.
  • the first adjustment step size is greater than the second adjustment step size; in this case, when the number of prefetches needs to be increased, the number of prefetches can be increased faster to ensure that the pre-sampled data queue is faster
  • the stored sample data reaches a larger number faster to meet the needs of subsequent training iteration cycles, and to avoid the model training process being delayed due to the small number of prefetches; at the same time, when the number of prefetches needs to be reduced, more Gently reducing the number of prefetches can ensure that the length of the pre-sampled data queue changes more steadily, and avoid the shock of the training process due to the rapid decrease in the number of prefetched sample data.
  • the pre-fetched quantity of sample data can be performed based on the total memory space currently occupied by the pre-sampled data queue, the upper threshold of memory usage, and the data throughput for training the deep learning model.
  • the target prefetch quantity is obtained.
  • the number of prefetches is increased by the first adjustment step to obtain the target number of prefetches.
  • the method further includes: when the total memory space currently occupied by the pre-sampled data queue does not reach the upper memory usage threshold, and the data throughput does not meet the preset data throughput condition, reducing the number of prefetches
  • the third adjustment step is small to get the target prefetch quantity.
  • the first adjustment step size and the third adjustment step size may have the same size, or may have different sizes.
  • the second adjustment step length and the third adjustment step length may have the same size or different sizes.
  • the foregoing preset data throughput conditions include at least one of the following b1 to b2:
  • the current value of data throughput is greater than the historical value, where the historical value is the average of the data throughput corresponding to multiple historical iteration training before the current iteration training or the data throughput of the previous iteration training of the current iteration training Numerical value.
  • the specific determination method can be referred to, for example, a2.1 above, which will not be repeated here.
  • the current value of the data throughput can be referred to, for example, as shown in a2.2 above, which will not be repeated here.
  • the data processing method provided by the embodiments of the present disclosure on the basis of the foregoing embodiments, further includes:
  • the second update process is performed on the adjustment step size of the prefetch quantity to obtain the target adjustment step size, where the target adjustment step size is used for the next update process of the prefetch quantity.
  • the adjustment step size of the number of prefetches can be increased to obtain the target adjustment step size
  • the adjustment step size of the number of prefetches is reduced to obtain the target adjustment step size.
  • M1, M2, M3, M4, and M5 respectively execute the data processing method provided in the embodiment of the present disclosure.
  • Example 1 Perform the first update process on the prefetch quantity of sample data based on the upper threshold of memory usage.
  • M1 detects the pre-sampling data queues L1 and M2 corresponding to M1, the pre-sampling data queues L2 and M3 corresponding to the pre-sampling data queues L3, the pre-sampling data queues corresponding to M4 and the pre-sampling data queues corresponding to L4 and M5 Whether the total memory space occupied by the data queue L5 reaches the upper memory usage threshold; if not, skip to 1.2(a) and 1.2(b); if it is, or M1 fails to apply for memory from the main process of the operating system, Then skip to 1.3.
  • M1 performs the first update process on the prefetch quantity of sample data to obtain the target prefetch quantity:
  • Target prefetch quantity prefetch quantity+first adjustment step size, where the first adjustment step size is the target adjustment step size obtained by performing the second update process on the adjustment step last time.
  • the target adjustment step size obtained after the second update processing this time the first adjustment step size *2, that is, the first adjustment step size used in the next first update processing is the adjustment step size used in the first update processing this time 2 times.
  • M1 detects whether the second adjustment step is greater than 1; if the second adjustment step is greater than 1, jump to 1.4(a) and 1.4(b). If not, skip to 1.5.
  • M1 performs the second update process on the second adjustment step:
  • the second adjustment step length after adjustment the second adjustment step length before adjustment/2.
  • M1 performs the first update process on the prefetch quantity of sample data to obtain the target prefetch quantity:
  • Target prefetch quantity prefetch quantity-second adjustment step size.
  • the second adjustment step size is the adjusted adjustment step size in 1.4(a).
  • M1 keeps the second adjustment step size unchanged, and based on the unchanged second adjustment step size, performs the first update process on the prefetch quantity of the sample data to obtain the target prefetch quantity:
  • the target prefetch quantity the prefetch quantity-the second adjustment step that remains unchanged.
  • Example 2 M1 performs the first update process on the prefetch quantity of sample data based on the upper threshold of memory usage and the data throughput for training the deep learning model.
  • M1 detects the pre-sampled data queues L1 and M2 corresponding to the pre-sampled data queues L2 and M3 corresponding to the pre-sampled data queues L3, the pre-sampled data queues corresponding to M4 and the pre-sampled data queues corresponding to L4 and M5 Whether the total memory space occupied by the data queue L5 reaches the upper memory usage threshold; if not, skip to 2.2; if so, or if M1 fails to apply for memory from the main process of the operating system, skip to 2.7.
  • M1 detects whether the data throughput of the deep learning model for training meets the preset data throughput conditions; if so, skip to 2.3(a) and 2.3(b); if not, skip to 2.4(a) And 2.4(b).
  • M1 performs the first update process on the prefetch quantity of sample data to obtain the target prefetch quantity:
  • Target prefetch quantity prefetch quantity+first adjustment step size, where the first adjustment step size is the target adjustment step size obtained by performing the second update process on the adjustment step last time.
  • the target adjustment step size obtained after the second update processing this time the first adjustment step size*2.
  • M1 performs the first update process on the prefetch quantity of sample data to obtain the target prefetch quantity:
  • Target prefetch quantity prefetch quantity-third adjustment step size.
  • M1 detects whether the third adjustment step is greater than 1; if the third adjustment step is greater than 1, jump to 2.5. If not, skip to 2.6.
  • the adjusted third adjustment step length the third adjustment step length before adjustment/2.
  • M1 keeps the third adjustment step unchanged.
  • the third adjustment step size is used when performing the first update process on the prefetched quantity next time.
  • M1 detects whether the second adjustment step is greater than 1; if the second adjustment step is greater than 1, jump to 2.8(a) and 2.8(b). If not, skip to 2.9.
  • the second adjustment step length after adjustment the second adjustment step length before adjustment/2.
  • M1 performs the first update process on the prefetch quantity of sample data to obtain the target prefetch quantity:
  • Target prefetch quantity prefetch quantity-second adjustment step size.
  • the second adjustment step size is the adjusted adjustment step size in 2.8(a).
  • M1 keeps the second adjustment step size unchanged, and based on the unchanged second adjustment step size, performs the first update process on the prefetch quantity of the sample data to obtain the target prefetch quantity:
  • the target prefetch quantity the prefetch quantity-the second adjustment step that remains unchanged.
  • the main process can directly read the new sample data from the sample database when the number of sample data currently included in the pre-sampled data queue does not reach the target prefetched number, or it can directly read the new sample data from the sample database.
  • the communication of the child process controls the child process to read new sample data from the sample database.
  • the main process can check the number of sample data extracted from the pre-sample data queue and the number of sample data read into the pre-sample data queue , To determine the number of sample data currently stored in the pre-sampled data queue, and then compare this number with the target prefetch number; if the number is less than the target prefetch number, read it directly from the sample database The new sample data is stored in the pre-sampled data queue.
  • the main process can determine the number of sample data currently stored in the pre-sampled data queue through communication with the sub-process, and then Compare this quantity with the target prefetch quantity; when the quantity is less than the target prefetch quantity, send a sample data read instruction to the child process; the sample data read instruction carries the sample that needs to be read Data quantity information: After receiving the sample data read instruction sent by the main process, the child process reads new sample data and stores it in the pre-sample data queue based on the quantity information carried in the sample data read instruction.
  • the main process since the main process performs the first update process on the prefetch quantity to obtain the target prefetch quantity, and when the sample data quantity currently included in the data queue does not reach the target prefetch quantity, the sample data pool Read new sample data in the process; therefore, after the main process is able to perform one iteration of training, the sample data needed for the next iteration of training has been read; in fact, in most cases, the main process
  • the time required to read the data is often less than the time it takes to perform an iterative training, so it can be guaranteed that there will always be a sufficient number of sample data stored in the data queue to meet the use of subsequent iterations of training. Even if the main process takes too long to read a certain sample data, it will not cause delays in iterative training due to the number of samples not being read in time, thereby improving training efficiency.
  • the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possibility.
  • the inner logic is determined.
  • the embodiment of the present disclosure also provides a data processing device corresponding to the data processing method. Since the principle of the device in the embodiment of the present disclosure to solve the problem is similar to the above-mentioned data processing method of the embodiment of the present disclosure, the implementation of the device is You can refer to the implementation of the method, and the repetition will not be repeated.
  • FIG. 2 a schematic diagram of a data processing device provided by an embodiment of the present disclosure is applied to the training of a deep learning model.
  • the training includes one or more processes.
  • the device includes: a first update module 21, And reading module 22; where,
  • the first update module 21 is configured to perform a first update process on the prefetch quantity of sample data for a target process among the one or more processes to obtain the target prefetch quantity;
  • the reading module 22 is configured to read new sample data in response to the number of sample data currently included in the pre-sampled data queue corresponding to the target process not reaching the target pre-fetched number, and read all the sample data.
  • the new sample data is stored in the pre-sampled data queue.
  • the main process since the main process performs the first update process on the prefetch quantity to obtain the target prefetch quantity, and when the sample data quantity currently included in the pre-sampled data queue does not reach the target prefetch quantity, Read new sample data in the sample data pool; therefore, after the main process has performed one iteration of training, the sample data needed for the next iteration of training has been read; in fact, in most cases ,
  • the time required for the main process to read data is often less than the time it takes to perform an iterative training, so it can be guaranteed that there will always be a sufficient number of sample data stored in the data queue to meet the subsequent iterations of training
  • the first update module 21 when the first update module 21 performs the first update process on the prefetch quantity of sample data to obtain the target prefetch quantity, it is used to:
  • the first update processing is performed on the prefetch quantity of the sample data to obtain the target prefetch quantity.
  • the first update module 21 prefetches the sample data according to the total memory space currently occupied by the pre-sampled data queue corresponding to the one or more processes and the upper memory usage threshold.
  • the quantity is first updated and the target prefetch quantity is obtained, it is used to:
  • the pre-sample data Fetch the quantity and perform the first update process to obtain the target prefetch quantity.
  • the first update module 21 prefetches sample data according to the total memory space currently occupied by the pre-sampled data queue corresponding to the one or more processes and the upper memory usage threshold.
  • the quantity is first updated and the target prefetch quantity is obtained, it is used to:
  • the number of prefetches is increased by a first adjustment step to obtain the Target prefetch quantity
  • the number of prefetches is reduced by a second adjustment step to obtain the target The number of prefetches.
  • the first update module 21 when the total memory space currently occupied by the pre-sampled data queues corresponding to the one or more processes does not reach the upper memory usage threshold, will When the number of prefetches is increased by the first adjustment step and the target number of prefetches is obtained, it is used to:
  • the total memory space currently occupied by the pre-sampled data queues corresponding to the one or more processes does not reach the upper memory usage threshold, and the data throughput of the target process for training the deep learning model meets a preset
  • the number of prefetches is increased by a first adjustment step to obtain the target number of prefetches.
  • the first update module 21 is further configured to:
  • the number of prefetches is reduced by a third adjustment step to obtain the target number of prefetches.
  • the preset data throughput condition includes at least one of the following:
  • the current value of the data throughput is greater than the historical value, where the historical value is the average value of the data throughput of the multiple historical iteration training before the current iteration training or the total training value of the previous iteration of the current iteration training.
  • the current value of the data throughput is greater than the data throughput threshold.
  • the device further includes: a second update module 23, configured to perform a second update process on the adjustment step size of the prefetched quantity to obtain a target adjustment step size, wherein the target adjustment step The step size is used for the next update processing of the prefetched quantity.
  • a second update module 23 configured to perform a second update process on the adjustment step size of the prefetched quantity to obtain a target adjustment step size, wherein the target adjustment step The step size is used for the next update processing of the prefetched quantity.
  • the second update module 23 when the second update module 23 performs a second update process on the adjustment step size of the prefetched quantity to obtain the target adjustment step size, it is used to:
  • the adjustment step size of the number of prefetches is reduced.
  • the embodiment of the present disclosure also provides a computer device 30. As shown in FIG. 3, it is a schematic structural diagram of the computer device 30 provided by the embodiment of the present disclosure, including:
  • the processor 31 and the memory 32 communicate through the bus 33, so that The processor 31 executes the following instructions in the user mode:
  • the performing the first update process on the prefetch quantity of sample data to obtain the target prefetch quantity includes:
  • the first update processing is performed on the prefetch quantity of the sample data to obtain the target prefetch quantity.
  • the pre-sample data is preset according to the total memory space currently occupied by the pre-sampled data queue corresponding to the one or more processes and the upper memory usage threshold. Fetch the quantity and perform the first update process to obtain the target prefetch quantity, including:
  • the pre-sample data Fetch the quantity and perform the first update process to obtain the target prefetch quantity.
  • the pre-sample data is preset according to the total memory space currently occupied by the pre-sampled data queue corresponding to the one or more processes and the upper memory usage threshold. Fetch the quantity and perform the first update process to obtain the target prefetch quantity, including:
  • the number of prefetches is increased by a first adjustment step to obtain the Target prefetch quantity
  • the number of prefetches is reduced by a second adjustment step to obtain the target The number of prefetches.
  • the total memory space currently occupied by the pre-sampled data queues corresponding to the one or more processes does not reach the upper memory usage threshold, and the data throughput of the target process for training the deep learning model meets a preset
  • the number of prefetches is increased by a first adjustment step to obtain the target number of prefetches.
  • the method further includes:
  • the number of prefetches is reduced by a third adjustment step to obtain the target number of prefetches.
  • the preset data throughput condition includes at least one of the following:
  • the current value of the data throughput is greater than the historical value, where the historical value is the average value of the data throughput of the multiple historical iteration training before the current iteration training or the total training value of the previous iteration of the current iteration training.
  • the current value of the data throughput is greater than the data throughput threshold.
  • the method further includes:
  • a second update process is performed on the adjustment step size of the prefetch quantity to obtain a target adjustment step size, where the target adjustment step size is used for the next update process of the prefetch quantity.
  • the performing the second update process on the adjustment step size of the prefetch quantity to obtain the target adjustment step size includes:
  • the adjustment step size of the number of prefetches is reduced.
  • the embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored, and the computer program executes the steps of the data processing method described in the above method embodiment when the computer program is run by a processor.
  • the storage medium may be a volatile or non-volatile computer readable storage medium.
  • the computer program product of the data processing method provided by the embodiment of the present disclosure includes a computer-readable storage medium storing program code, and the instructions included in the program code can be used to execute the steps of the data processing method described in the above method embodiment
  • the above method embodiment which will not be repeated here.
  • the embodiments of the present disclosure also provide a computer program, which, when executed by a processor, implements any one of the methods in the foregoing embodiments.
  • the computer program product can be specifically implemented by hardware, software, or a combination thereof.
  • the computer program product is specifically embodied as a computer storage medium.
  • the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.
  • SDK software development kit
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile computer readable storage medium executable by a processor.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

一种数据处理方法及装置、计算机设备、存储介质、计算机程序,用于深度学习模型的训练,该方法包括:对样本数据的预取数量进行第一更新处理,得到目标预取数量(S101);响应于预取样本数据队列中当前包括的样本数据的数量未达到所述目标预取数量,读取新的样本数据,并将读取的所述新的样本数据存储至所述预取样本数据队列中(S102)。

Description

数据处理方法及装置、计算机设备、存储介质、计算机程序
相关申请的交叉引用
本申请要求2019年12月30日提交的、申请号为201911403669.4的中国专利申请的优先权,该中国专利申请的全部内容以引用的形式并入本文。
技术领域
本公开涉及机器学习技术领域,具体而言,涉及一种数据处理方法及装置、计算机设备、存储介质以及计算机程序。
背景技术
深度学习模型需要基于大量样本数据来进行多轮迭代训练。为了提升机器学习模型在训练过程中的收敛速度,通常采用多进程并行训练的方式来实现。在采用多进程并行训练的方式对深度学习模型进行每一轮训练中,在执行本轮训练的计算任务的同时,各个并行的进程还会预先读取下一轮训练时所需要的训练数据。但由于各个并行的进程在执行完该轮训练后,所有进程之间需要进行通信和数据同步,若某个进程读取下一轮训练所用训练数据的速度过慢,就会导致整个训练进程被拖延,进而导致训练效率的下降。
发明内容
本公开实施例至少提供一种数据处理方法及装置。
第一方面,本公开实施例提供一种数据处理方法,应用于深度学习模型的训练,所述训练包括一个或多个进程,包括:针对所述一个或多个进程中的一个目标进程,对样本数据的预取数量进行第一更新处理,得到目标预取数量;响应于所述目标进程对应的预取样本数据队列中当前包括的样本数据的数量未达到所述目标预取数量,读取新的样本数据,并将读取的所述新的样本数据存储至所述预取样本数据队列中。
这样,由于主进程会对预取数量进行第一更新处理,以得到目标预取数量,并在数据队列中当前包括的样本数据量未达到目标预取数量时,从样本数据池中读取新的样本数据;因而,主进程能够在执行完一次迭代训练后,其下一迭代训练所需要用到的样本数据已经完成了读取;而实际上在大多数情况下,主进程读取数据所需要耗费的时间往 往少于其执行一个迭代训练所需要用到的时间,因此可以保证在数据队列中,一直会存储有足够数量的样本数据来满足后续几个迭代训练的使用,即使主进程对某个样本数据的读取时间过久,也不会造成由于样本数量未及时读取而导致的迭代训练发生延误,进而提升了训练效率。
一种可能的实施方式中,所述对样本数据的预取数量进行第一更新处理,得到目标预取数量,包括:根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对样本数据的预取数量进行第一更新处理,得到目标预取数量。
这样,能够基于预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,来对对样本数据的预取数量进行动态更新,灵活调配预取的样本数据的量,以满足训练需求。
一种可能的实施方式中,所述根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对样本数据的预取数量进行第一更新处理,得到目标预取数量,包括:根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间、内存使用上限阈值、以及所述目标进程对所述深度学习模型进行训练的数据吞吐量,对样本数据的预取数量进行第一更新处理,得到所述目标预取数量。
这样,基于预取样本数据队列当前占用的总内存空间、内存使用上限阈值、以及对所述深度学习模型进行训练的数据吞吐量来动态更新预取数量,在数据吞吐量增加时,使得预取样本数据队列中的数据量能够跟上样本数据的消耗,在数据吞吐量降低时,能够尽可能的减少预取样本数据队列占据内存的量,进而多余的内存可以用于其他工作,调整更加灵活。
一种可能的实施方式中,所述根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对样本数据的预取数量进行第一更新处理,得到目标预取数量,包括:在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值的情况下,将所述预取数量增大第一调节步长,得到所述目标预取数量;和/或在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间达到所述内存使用上限阈值的情况下,将所述预取数量减小第二调节步长,得到所述目标预取数量。
这样,在预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值的 情况,尽可能多的预取样本数据,在预取样本数据队列当前占用的总内存空间达到所述内存使用上限阈值的情况下则减少样本数据的预取数量,灵活调整预取样本数据队列的长度。
一种可能的实施方式中,所述在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值的情况下,将所述预取数量增大第一调节步长,得到所述目标预取数量,包括:在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值,且所述目标进程对所述深度学习模型进行训练的数据吞吐量满足预设数据吞吐量条件的情况下,将所述预取数量增大第一调节步长,得到所述目标预取数量。
一种可能的实施方式中,所述方法还包括:在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值,且所述数据吞吐量未满足所述预设数据吞吐量条件的情况下,将所述预取数量减小第三调节步长,得到所述目标预取数量。
一种可能的实施方式中,所述预设数据吞吐量条件,包括下述至少一种:所述数据吞吐量的当前数值大于历史数值,其中,所述历史数值为当前迭代训练之前的多个历史迭代训练的所述数据吞吐量的平均值或者为当前迭代训练的前一迭代训练的所述数据吞吐量的数值;或者所述数据吞吐量的当前数值大于数据吞吐量阈值。
一种可能的实施方式中,所述方法还包括:对所述预取数量的调节步长进行第二更新处理,得到目标调节步长,其中,所述目标调节步长用于所述预取数量的下一次更新处理。
一种可能的实施方式中,所述对所述预取数量的调节步长进行第二更新处理,得到目标调节步长,包括:在所述第一更新处理中增大所述预取数量的情况下,增大所述预取数量的调节步长;和/或在所述第一更新处理中减小所述预取数量的情况下,减小所述预取数量的调节步长。
这样,能够在需要增加预取数量时,更快增加预取数量,以更快的保证预取样本数据队列中存储的样本数据更快的达到较多的数量,以满足后续训练迭代周期的使用需求,避免由于预取数量过小而导致的模型训练过程被延误;同时,在需要降低预取数量时,较为和缓的降低预取数量,能够保证预取样本数据队列的长度变化会更加的平稳,避免由于预取的样本数据的数量的过快下降导致训练过程的震荡。
第二方面,本公开实施例还提供一种数据处理装置,应用于深度学习模型的训练,所述训练包括一个或多个进程,包括:第一更新模块,用于针对所述一个或多个进程中的一个目标进程,对样本数据的预取数量进行第一更新处理,得到目标预取数量;读取模块,用于响应于所述目标进程对应的预取样本数据队列中当前包括的样本数据的数量未达到所述目标预取数量,读取新的样本数据,并将读取的所述新的样本数据存储至所述预取样本数据队列中。
第三方面,本公开实施例还提供一种计算机设备,包括:处理器、存储介质和总线,所述存储介质存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述处理器与所述存储介质之间通过总线通信,所述机器可读指令被所述处理器执行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。
第四方面,本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本公开实施例所提供的一种数据处理方法的流程图。
图2示出了本公开实施例所提供的一种数据处理装置的示意图。
图3示出了本公开实施例所提供的一种计算机设备的示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中 附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
经研究发现,在采用多进程并行训练的方式对深度学习模型进行训练时,在执行本轮训练的计算的同时,各个进程都会预先读取下一轮训练时所需要的训练数据。各个并行的进程在执行完成本轮训练之后,需要和其他进程进行通信和数据同步,并在所有进程的通信和数据同步完成后,才会开始下一轮训练任务;若其中有任一进程的训练任务发生了时间的延误,例如在预读取下一轮训练所用的训练数据时,读取时间超出了该轮训练任务的执行时间,就会造成所有进程的训练任务都会发生时间的延误,进而导致训练效率的下降。
基于上述研究,本公开提供了一种应用于深度学习模型训练的数据处理方法及装置。根据该数据处理方法,能够对样本数据的预取数量进行第一更新处理,得到目标预取数量,并响应于预取样本数据队列中当前包括的样本数据的数量未达到所述目标预取数量,读取新的样本数据,并将读取的所述新的样本数据存储至所述预取样本数据队列中。这样,使得主进程都能够在执行完一次迭代训练后,其下一迭代训练所需要用到的样本数据已经完成了读取。主进程会对预取数量进行动态更新,以得到目标预取数量,并在数据队列中当前包括的样本数据量未达到目标预取数量时,从样本数据池中读取新的样本数据。在大多数情况下,主进程读取新的样本数据所需要耗费的时间往往少于其执行一个迭代训练所需要用到的时间,因此可以保证在数据队列中,一直会存储有足够数量的样本数据来满足后续几个迭代训练的使用,即使主进程对某个样本数据的读取时间过久,也不会造成由于样本数量不够而导致的迭代训练发生延误,进而提升了训练效率。
针对现有方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及本公开所提出的解决方案,都应该是发明人在本公开过程中对本公开做出的贡献。
下面将结合本公开中附图,对本公开中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开的组件可以以各种不同的配置来布置和设计。因此,以下对在附图 中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
为便于对本实施例进行理解,首先对本公开实施例所公开的一种数据处理方法进行详细介绍,本公开实施例所提供的数据处理方法应用于深度学习模型的训练,其执行主体一般为用于对深度学习模型进行训练的主进程或者子进程。在一些可能的实现方式中,该数据处理方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
下面以执行主体为至少一个主进程中的任一主进程为例对本公开实施例提供的数据处理方法加以说明。
参见图1所示,为本公开实施例提供的数据处理方法的流程图,方法包括步骤S101~S102,其中:
S101:对样本数据的预取数量进行第一更新处理,得到目标预取数量;
S102:响应于预取样本数据队列中当前包括的样本数据的数量未达到目标预取数量,读取新的样本数据,并将读取的新的样本数据存储至预取样本数据队列中。
下面分别对上述S101~S102加以详细说明。
I:在上述S101中,当有一个主进程的情况下,一个主进程能够对深度学习模型进行训练,且该主进程对样本数据的预取数量进行第一更新处理,得到目标预取数量。
在有多个主进程的情况下,多个主进程能够对深度学习模型进行并行训练,各个主进程分别对样本数据的预取数量进行第一更新处理,得到目标预取数量。这里,不同的主进程所对应的预取数量可以不同,不同的主进程对应的目标预取数量也可以不同。
每个主进程对应一预取样本数据队列,与任一主进程对应的预取样本数据队列中存储有多个样本数据,且每个主进程基于与之对应的预取样本数据队列中存储的样本数据对深度学习模型进行训练。
该预取样本数据队列例如为先进先出队列;主进程在开始一个新的迭代训练,首先从与该主进程对应的预取样本数据队列中读取一组样本数据;该组样本数据在被读取之后会从预取样本数据队列中删除,以为新的样本数据留出存储位置。
这里,需要注意的是,一个迭代训练中,主进程会基于一组样本数据训练深度学习模型;在一组样本数据中,包括至少一个样本数据。本公开实施例中所指的预取数量,是指样本数据组的数量。
在未达到预先确定的预取数量更新条件时,主进程响应于预取样本数据队列中当前包括的样本数据的数量未达到当前的预取数量,读取新的样本数据。
当达到预先确定的预取数量更新条件时,主进程对样本数据的当前的预取数量进行第一更新处理,得到目标预取数量;主进程响应于预取样本数据队列中当前包括的样本数据的数量未达到目标预取数量,读取新的样本数据。其中,样本数据的预取数量和目标预取数量,可能相同,也可能不同。
具体地,预取数量更新条件例如包括下述a1~a3中一种或者多种:
a1:达到预设的更新周期。
这里,更新周期为对预取数量进行更新的周期。
该更新周期例如为时间周期;例如可以更新周期为1小时,则可以每经过1小时就触发一次对预取数量的第一更新处理。
该更新周期例如为预设数量的迭代训练;例如,主进程每对深度学习模型进行5次迭代训练,就触发一次对预取数量的第一更新处理。此时,由于不同次的迭代训练所耗费的时间可能会有所区别,进而造成不同的更新周期的持续时长也会有所不同。
a2:基于样本数据对深度学习模型进行训练时的数据吞吐量大于第一阈值。
这里,数据吞吐量用于表征主进程在对深度学习模型进行训练时的样本数据处理速度。若主进程在对深度学习模型进行训练时的数据吞吐量大于第一阈值,则认为对预取样本数据队列中存储的样本数据的消耗速度较快。此时,若保持较小的预取数量,则可能会使得预取样本数据队列中存储的样本数据的数量无法及时跟上训练的消耗。因此可以考虑增加预取到预取样本数据队列中的样本数据的数量,进而触发对样本数据的预取数量的第一更新处理。
a2.1:这里,例如可以采用下述方式得到数据吞吐量:
响应于达到预取数量更新条件,基于达到预取数量更新条件时对深度学习模型的训练进度,从多个历史迭代训练中,确定至少一个目标迭代训练;根据各个目标迭代训练所使用的样本数据组中所包括的样本数据的数量,以及各个目标迭代训练所耗费的时间, 确定基于样本数据对深度学习模型进行训练时的数据吞吐量。
这里,目标迭代训练,例如为距离达到预取数量更新条件的时刻最近的至少一个迭代训练。
例如某主进程已经对深度学习模型进行5次迭代训练,且在达到预取数量更新条件时,正在对深度学习模型进行第6次迭代训练。此时,针对有一个目标迭代训练的情况,可以将第5个迭代训练确定为目标迭代训练;若该第5个迭代训练所耗费的时间为15分钟,使用的样本数据的数量为64个,则数据吞吐量例如为:64÷15。
针对有三个目标迭代训练的情况,可以将3、第4和第5个迭代训练确定为目标迭代训练;若将3、第4和第5个迭代训练分别耗费的时间为12分钟、14分钟和15分钟,每个迭代训练所使用的样本数据的数量均为64个,则数据吞吐量例如为:64×3÷(12+14+15),单位为:个/分钟。
a2.2:在另一实施例中,还可以将当前正在进行的迭代训练确定为目标迭代训练,并根据当前正进行的迭代训练中,已经完成训练的样本的数量,以及已经持续的时间,确定数据吞吐量。
例如在某主进程已经对深度学习模型进行5次迭代训练,且在达到预取数量更新条件时,正在对深度学习模型进行第6次迭代训练。可以将第6次迭代训练确定为目标迭代训练;在第6个迭代训练中,需要使用一个样本数据组中的64个样本对深度学习模型进行训练,当前已经完成训练的样本数据的数量为30个,当前训练迭代已经持续的时间为4分钟,则数据吞吐量例如为:30÷4。
a3:基于样本数据对深度学习模型进行训练时的数据吞吐量小于第二阈值。
这里,第二阈值小于第一阈值。
若主进程在对深度学习模型进行训练时的数据吞吐量小于第二阈值,则认为对预取样本数据队列中存储的样本数据的消耗速度过慢。此时,若保持较大的预取数量,则可能会使得预取样本数据队列中存储的样本数据一直积存,导致占据较大的内存,因此可以考虑减小预取到预取样本数据队列中的样本数据的数量,进而触发对样本数据的预取数量的第一更新处理。
此处,数据吞吐量的确定方式与上述a2类似,不再赘述。
在满足预取数量更新条件后,例如可以采用下述方式对样本数据的预取数量进行第 一更新处理:
根据预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对样本数据的预取数量进行第一更新处理,得到目标预取数量。
示例性的,例如可以通过检测预取样本数据队列所占用的总内存空间是否达到内存使用上限阈值,并基于检测结果对样本数据的预取数量进行第一更新处理,得到目标预取数量。
此处,预取样本数据队列所占用的总内存空间,是指所有主进程对应的样本数据队列所占用的总内存空间。
具体地:在预取样本数据队列当前占用的总内存空间未达到内存使用上限阈值的情况下,将预取数量增大第一调节步长,得到目标预取数量;和/或
在预取样本数据队列当前占用的总内存空间达到内存使用上限阈值的情况下,将预取数量减小第二调节步长,得到目标预取数量。
此处,第一调节步长,是指在将样本数据的预取数量增大时的调节步长;第二调节步长,是指在将样本数据的预取数量减小时的调节步长。
其中,第一调节步长和第二调节步长可以具有相同大小,也可以具有不同大小。
示例性的,第一调节步长例如大于第二调节步长;在该种情况下,能够在需要增加预取数量时,更快增加预取数量,以更快的保证预取样本数据队列中存储的样本数据更快的达到较多的数量,以满足后续训练迭代周期的使用需求,避免由于预取数量过小而导致的模型训练过程被延误;同时,在需要降低预取数量时,较为和缓的降低预取数量,能够保证预取样本数据队列的长度变化会更加的平稳,避免由于预取的样本数据的数量的过快下降导致训练过程的震荡。
进一步地,在另一实施例中,还可以根据预取样本数据队列当前占用的总内存空间、内存使用上限阈值、以及对深度学习模型进行训练的数据吞吐量,对样本数据的预取数量进行第一更新处理,得到目标预取数量。
此处,在上述实施例的基础上,例如在预取样本数据队列当前占用的总内存空间未达到内存使用上限阈值的情况下:
在对深度学习模型进行训练的数据吞吐量满足预设数据吞吐量条件的情况下,将预取数量增大第一调节步长,得到目标预取数量。
在另一实施例中,还包括:在预取样本数据队列当前占用的总内存空间未达到内存使用上限阈值,且数据吞吐量未满足预设数据吞吐量条件的情况下,将预取数量减小第三调节步长,得到目标预取数量。
具体地,第一调节步长和第三调节步长可以具有相同大小,也可以具有不同大小。类似的,第二调节步长和第三调节步长可以具有相同大小,也可以具有不同大小。
在另一实施例中,若预取样本数据队列中当前占用的总内存空间达到内存使用上限阈值,不论在对深度学习模型进行训练的数据吞吐量是否满足预设数据吞吐量条件的情况下,都会将预取数量减小第二调节步长,得到目标预取数量。
上述的预设数据吞吐量条件包括下述b1~b2中至少一种:
b1:数据吞吐量的当前数值大于历史数值,其中,历史数值为当前迭代训练之前的多个历史迭代训练对应的数据吞吐量的平均值或者为当前迭代训练的前一迭代训练的数据吞吐量的数值。
具体的确定方式例如可以参见上述a2.1,这里不再赘述。
b2:数据吞吐量的当前数值大于数据吞吐量阈值。
这里,数据吞吐量的当前数值,例如可以参见上述a2.2所示,这里不再赘述。
另外,本公开实施例提供的数据处理方法,在上述实施例的基础上,还包括:
对预取数量的调节步长进行第二更新处理,得到目标调节步长,其中,目标调节步长用于预取数量的下一次更新处理。
此处,例如可以在第一更新处理中增大预取数量的情况下,增大预取数量的调节步长以得到目标调节步长;和/或
在第一更新处理中减小预取数量的情况下,减小预取数量的调节步长以得到目标调节步长。
具体示例:
有M1、M2、M3、M4和M5共五个进程并行执行对同一深度学习模型的训练任务。
其中M1、M2、M3、M4和M5分别执行本公开实施例提供的数据处理方法。
以M1执行该数据处理方法为例:
示例1:基于内存使用上限阈值对样本数据的预取数量进行第一更新处理。
1.1:M1检测M1对应的预取样本数据队列L1、M2对应的预取样本数据队列L2、M3对应的预取样本数据队列L3、M4对应的预取样本数据队列L4和M5对应的预取样本数据队列L5占用的总内存空间是否达到内存使用上限阈值;如果否,则跳转至1.2(a)和1.2(b);如果是,或M1在向操作系统的主进程申请内存时申请失败,则跳转至1.3。
1.2(a):M1对样本数据的预取数量进行第一更新处理,得到目标预取数量:
目标预取数量=预取数量+第一调节步长,其中,第一调节步长为上一次对调节步长进行第二更新处理得到的目标调节步长。
1.2(b)M1对第一调节步长进行第二更新处理:
本次第二更新处理后得到的目标调节步长=第一调节步长*2,即下一次第一更新处理所使用的第一调节步长是本次第一更新处理所使用的调节步长的2倍。
1.3:M1检测第二调节步长是否大于1;如果第二调节步长大于1,则跳转至1.4(a)和1.4(b)。如果否,则跳转至1.5。
1.4(a):M1对第二调节步长进行第二更新处理:
调整后的第二调节步长=调整前的第二调节步长/2。
1.4(b):M1对样本数据的预取数量进行第一更新处理,得到目标预取数量:
目标预取数量=预取数量-第二调节步长。其中,第二调节步长为1.4(a)中的调整后的调节步长。
1.5:M1保持第二调节步长不变,并基于保持不变的第二调节步长,对样本数据的预取数量进行第一更新处理,得到目标预取数量:
目标预取数量=预取数量-保持不变的第二调节步长。
示例2:M1基于内存使用上限阈值、以及对深度学习模型进行训练的数据吞吐量对样本数据的预取数量进行第一更新处理。
2.1:M1检测M1对应的预取样本数据队列L1、M2对应的预取样本数据队列L2、M3对应的预取样本数据队列L3、M4对应的预取样本数据队列L4和M5对应的预取样本数据队列L5占用的总内存空间是否达到内存使用上限阈值;如果否,则跳转至2.2;如果是,或M1在向操作系统的主进程申请内存时申请失败,则跳转至2.7。
2.2:M1检测深度学习模型进行训练的数据吞吐量是否满足预设数据吞吐量条件;如果是,则跳转至2.3(a)和2.3(b);如果否,则跳转至2.4(a)和2.4(b)。
2.3(a):M1对样本数据的预取数量进行第一更新处理,得到目标预取数量:
目标预取数量=预取数量+第一调节步长,其中,第一调节步长为上一次对调节步长进行第二更新处理得到的目标调节步长。
2.3(b):M1对第一调节步长进行第二更新处理:
本次第二更新处理后得到的目标调节步长=第一调节步长*2。
2.4(a):M1对样本数据的预取数量进行第一更新处理,得到目标预取数量:
目标预取数量=预取数量-第三调节步长。
2.4(b):M1检测第三调节步长是否大于1;如果第三调节步长大于1,则跳转至2.5。如果否,则跳转至2.6。
2.5:M1对第三调节步长进行第二更新处理:
调整后的第三调节步长=调整前的第三调节步长/2。
2.6:M1保持第三调节步长不变。该第三调节步长用于下次对预取数量进行第一更新处理时使用。
2.7:M1检测第二调节步长是否大于1;如果第二调节步长大于1,则跳转至2.8(a)和2.8(b)。如果否,则跳转至2.9。
2.8(a):M1对第二调节步长进行第二更新处理:
调整后的第二调节步长=调整前的第二调节步长/2。
2.8(b):M1对样本数据的预取数量进行第一更新处理,得到目标预取数量:
目标预取数量=预取数量-第二调节步长。其中,第二调节步长为2.8(a)中的调整后的调节步长。
2.9:M1保持第二调节步长不变,并基于保持不变的第二调节步长,对样本数据的预取数量进行第一更新处理,得到目标预取数量:
目标预取数量=预取数量-保持不变的第二调节步长。
通过上述示例中的各个步骤,完成对样本数据的预取数量进行第一更新处理。
II:在上述S102中,主进程可以在预取样本数据队列中当前包括的样本数据的数量未达到目标预取数量的时候,直接从样本数据库中读取新的样本数据,也可以通过与一子进程的通信,来控制子进程实现从样本数据库中读取新的样本数据。
针对主进程直接从样本数据库中读取新的样本数据的情况,主进程能够通过从预取样本数据队列中提取的样本数据的数量,以及读取到预取样本数据队列中的样本数据的数量,来确定预取样本数据队列中当前所存储的样本数据的数量,然后将该数量与目标预取数量进行比对;在该数量小于目标预取数量的情况下,直接从样本数据库中读取新的样本数据,并存储至预取样本数据队列中。
针对主进程通过控制子进程实现从样本数据库中读取新的样本数据的情况,主进程能够通过与子进程之间的通信,确定预取样本数据队列中当前所存储的样本数据的数量,然后将该数量与目标预取数量进行比对;在该数量小于目标预取数量的情况下,向子进程发送一样本数据读取指令;该样本数据读取指令中,携带有需要读取的样本数据的数量信息;子进程在接收到主进程发送的样本数据读取指令后,基于样本数据读取指令中携带的数量信息,读取新的样本数据存储至预取样本数据队列中。
本公开实施例中,由于主进程会对预取数量进行第一更新处理,以得到目标预取数量,并在数据队列中当前包括的样本数据量未达到目标预取数量时,从样本数据池中读取新的样本数据;因而,主进程能够在执行完一次迭代训练后,其下一迭代训练所需要用到的样本数据已经完成了读取;而实际上在大多数情况下,主进程读取数据所需要耗费的时间往往少于其执行一个迭代训练所需要用到的时间,因此可以保证在数据队列中,一直会存储有足够数量的样本数据来满足后续几个迭代训练的使用,即使主进程对某个样本数据的读取时间过久,也不会造成由于样本数量未及时读取而导致的迭代训练发生延误,进而提升了训练效率。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
基于同一发明构思,本公开实施例中还提供了与数据处理方法对应的数据处理装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述数据处理方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。
参照图2所示,为本公开实施例提供的一种数据处理装置的示意图,应用于深 度学习模型的训练,所述训练包括一个或多个进程,所述装置包括:第一更新模块21、和读取模块22;其中,
第一更新模块21,用于针对所述一个或多个进程中的一个目标进程,对样本数据的预取数量进行第一更新处理,得到目标预取数量;
读取模块22,用于响应于所述目标进程对应的预取样本数据队列中当前包括的样本数据的数量未达到所述目标预取数量,读取新的样本数据,并将读取的所述新的样本数据存储至所述预取样本数据队列中。
本公开实施例中,由于主进程会对预取数量进行第一更新处理,以得到目标预取数量,并在预取样本数据队列中当前包括的样本数据量未达到目标预取数量时,从样本数据池中读取新的样本数据;因而,主进程能够在执行完一次迭代训练后,其下一迭代训练所需要用到的样本数据已经完成了读取;而实际上在大多数情况下,主进程读取数据所需要耗费的时间往往少于其执行一个迭代训练所需要用到的时间,因此可以保证在数据队列中,一直会存储有足够数量的样本数据来满足后续几个迭代训练的使用,即使主进程对某个样本数据的读取时间过久,也不会造成由于样本数量未及时读取而导致的迭代训练发生延误,进而提升了训练效率。
一种可能的实施方式中,所述第一更新模块21,在对样本数据的预取数量进行第一更新处理,得到目标预取数量时,用于:
根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对样本数据的预取数量进行第一更新处理,得到目标预取数量。
一种可能的实施方式中,所述第一更新模块21,在根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对样本数据的预取数量进行第一更新处理,得到目标预取数量时,用于:
根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间、内存使用上限阈值、以及所述目标进程对所述深度学习模型进行训练的数据吞吐量,对样本数据的预取数量进行第一更新处理,得到所述目标预取数量。
一种可能的实施方式中,所述第一更新模块21,在根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对样本数据的预取数量进行第一更新处理,得到目标预取数量时,用于:
在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所 述内存使用上限阈值的情况下,将所述预取数量增大第一调节步长,得到所述目标预取数量;和/或
在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间达到所述内存使用上限阈值的情况下,将所述预取数量减小第二调节步长,得到所述目标预取数量。
一种可能的实施方式中,所述第一更新模块21,在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值的情况下,将所述预取数量增大第一调节步长,得到所述目标预取数量时,用于:
在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值,且所述目标进程对所述深度学习模型进行训练的数据吞吐量满足预设数据吞吐量条件的情况下,将所述预取数量增大第一调节步长,得到所述目标预取数量。
一种可能的实施方式中,所述第一更新模块21,还用于:
在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值,且所述数据吞吐量未满足所述预设数据吞吐量条件的情况下,将所述预取数量减小第三调节步长,得到所述目标预取数量。
一种可能的实施方式中,所述预设数据吞吐量条件,包括下述至少一种:
所述数据吞吐量的当前数值大于历史数值,其中,所述历史数值为当前迭代训练之前的多个历史迭代训练的所述数据吞吐量的平均值或者为当前迭代训练的前一迭代训练的所述数据吞吐量的数值;或者
所述数据吞吐量的当前数值大于数据吞吐量阈值。
一种可能的实施方式中,所述装置还包括:第二更新模块23,用于对所述预取数量的调节步长进行第二更新处理,得到目标调节步长,其中,所述目标调节步长用于所述预取数量的下一次更新处理。
一种可能的实施方式中,所述第二更新模块23,在对所述预取数量的调节步长进行第二更新处理,得到目标调节步长时,用于:
在所述第一更新处理中增大所述预取数量的情况下,增大所述预取数量的调节步长;和/或
在所述第一更新处理中减小所述预取数量的情况下,减小所述预取数量的调节步长。
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。
本公开实施例还提供了一种计算机设备30,如图3所示,为本公开实施例提供的计算机设备30的结构示意图,包括:
处理器31、存储器32、和总线33;存储器32用于存储执行指令,包括内存321和外部存储器322;这里的内存321也称内存储器,用于暂时存放处理器31中的运算数据,以及与硬盘等外部存储器322交换的数据,处理器31通过内存321与外部存储器322进行数据交换,当所述计算机设备300运行时,所述处理器31与所述存储器32之间通过总线33通信,使得所述处理器31在用户态执行以下指令:
针对所述一个或多个进程中的一个目标进程,对样本数据的预取数量进行第一更新处理,得到目标预取数量;
响应于所述目标进程对应的预取样本数据队列中当前包括的样本数据的数量未达到所述目标预取数量,读取新的样本数据,并将读取的所述新的样本数据存储至所述预取样本数据队列中。
一种可能的实施方式中,处理器31执行的指令中,所述对样本数据的预取数量进行第一更新处理,得到目标预取数量,包括:
根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对样本数据的预取数量进行第一更新处理,得到目标预取数量。
一种可能的实施方式中,处理器31执行的指令中,所述根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对样本数据的预取数量进行第一更新处理,得到目标预取数量,包括:
根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间、内存使用上限阈值、以及所述目标进程对所述深度学习模型进行训练的数据吞吐量,对样本数据的预取数量进行第一更新处理,得到所述目标预取数量。
一种可能的实施方式中,处理器31执行的指令中,所述根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对样本数据的预取数量进行第一更新处理,得到目标预取数量,包括:
在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值的情况下,将所述预取数量增大第一调节步长,得到所述目标预取数量;和/或
在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间达到所述内存使用上限阈值的情况下,将所述预取数量减小第二调节步长,得到所述目标预取数量。
一种可能的实施方式中,处理器31执行的指令中,所述在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值的情况下,将所述预取数量增大第一调节步长,得到所述目标预取数量,包括:
在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值,且所述目标进程对所述深度学习模型进行训练的数据吞吐量满足预设数据吞吐量条件的情况下,将所述预取数量增大第一调节步长,得到所述目标预取数量。
一种可能的实施方式中,处理器31执行的指令中,所述方法还包括:
在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值,且所述数据吞吐量未满足所述预设数据吞吐量条件的情况下,将所述预取数量减小第三调节步长,得到所述目标预取数量。
一种可能的实施方式中,处理器31执行的指令中,所述预设数据吞吐量条件,包括下述至少一种:
所述数据吞吐量的当前数值大于历史数值,其中,所述历史数值为当前迭代训练之前的多个历史迭代训练的所述数据吞吐量的平均值或者为当前迭代训练的前一迭代训练的所述数据吞吐量的数值;或者
所述数据吞吐量的当前数值大于数据吞吐量阈值。
一种可能的实施方式中,处理器31执行的指令中,所述方法还包括:
对所述预取数量的调节步长进行第二更新处理,得到目标调节步长,其中,所述目标调节步长用于所述预取数量的下一次更新处理。
一种可能的实施方式中,处理器31执行的指令中,所述对所述预取数量的调节步长进行第二更新处理,得到目标调节步长,包括:
在所述第一更新处理中增大所述预取数量的情况下,增大所述预取数量的调节步长;和/或
在所述第一更新处理中减小所述预取数量的情况下,减小所述预取数量的调节步长。
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的数据处理方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。
本公开实施例所提供的数据处理方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行上述方法实施例中所述的数据处理方法的步骤,具体可参见上述方法实施例,在此不再赘述。
本公开实施例还提供一种计算机程序,该计算机程序被处理器执行时实现前述实施例的任意一种方法。该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可 以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。

Claims (21)

  1. 一种数据处理方法,应用于深度学习模型的训练,所述训练包括一个或多个进程,其特征在于,包括:
    针对所述一个或多个进程中的一个目标进程,对样本数据的预取数量进行第一更新处理,得到目标预取数量;
    响应于所述目标进程对应的预取样本数据队列中当前包括的样本数据的数量未达到所述目标预取数量,读取新的样本数据,并将读取的所述新的样本数据存储至所述预取样本数据队列中。
  2. 根据权利要求1所述的数据处理方法,其特征在于,所述对样本数据的预取数量进行第一更新处理,得到目标预取数量,包括:
    根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对所述样本数据的预取数量进行第一更新处理,得到目标预取数量。
  3. 根据权利要求2所述的数据处理方法,其特征在于,所述根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对所述样本数据的预取数量进行第一更新处理,得到目标预取数量,包括:
    根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间、内存使用上限阈值、以及所述目标进程对所述深度学习模型进行训练的数据吞吐量,对所述样本数据的预取数量进行第一更新处理,得到所述目标预取数量。
  4. 根据权利要求2或3所述的数据处理方法,其特征在于,所述根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对所述样本数据的预取数量进行第一更新处理,得到目标预取数量,包括:
    在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值的情况下,将所述预取数量增大第一调节步长,得到所述目标预取数量;和/或
    在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间达到所述内存使用上限阈值的情况下,将所述预取数量减小第二调节步长,得到所述目标预取数量。
  5. 根据权利要求4所述的数据处理方法,其特征在于,所述在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值的情况下,将所述预取数量增大第一调节步长,得到所述目标预取数量,包括:
    在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所 述内存使用上限阈值,且所述目标进程对所述深度学习模型进行训练的数据吞吐量满足预设数据吞吐量条件的情况下,将所述预取数量增大第一调节步长,得到所述目标预取数量。
  6. 根据权利要求5所述的数据处理方法,其特征在于,所述方法还包括:
    在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值,且所述数据吞吐量未满足所述预设数据吞吐量条件的情况下,将所述预取数量减小第三调节步长,得到所述目标预取数量。
  7. 根据权利要求5或6所述的数据处理方法,其特征在于,所述预设数据吞吐量条件,包括下述至少一种:
    所述数据吞吐量的当前数值大于历史数值,其中,所述历史数值为当前迭代训练之前的多个历史迭代训练的所述数据吞吐量的平均值或者为当前迭代训练的前一迭代训练的所述数据吞吐量的数值;或者
    所述数据吞吐量的当前数值大于数据吞吐量阈值。
  8. 根据权利要求1至7中任一项所述的数据处理方法,其特征在于,所述方法还包括:
    对所述预取数量的调节步长进行第二更新处理,得到目标调节步长,其中,所述目标调节步长用于所述预取数量的下一次更新处理。
  9. 根据权利要求8所述的数据处理方法,其特征在于,所述对所述预取数量的调节步长进行第二更新处理,得到目标调节步长,包括:
    在所述第一更新处理中增大所述预取数量的情况下,增大所述预取数量的调节步长;和/或
    在所述第一更新处理中减小所述预取数量的情况下,减小所述预取数量的调节步长。
  10. 一种数据处理装置,应用于深度学习模型的训练,所述训练包括一个或多个进程,其特征在于,包括:
    第一更新模块,用于针对所述一个或多个进程中的一个目标进程,对样本数据的预取数量进行第一更新处理,得到目标预取数量;
    读取模块,用于响应于所述目标进程对应的预取样本数据队列中当前包括的样本数据的数量未达到所述目标预取数量,读取新的样本数据,并将读取的所述新的样本数据存储至所述预取样本数据队列中。
  11. 根据权利要求10所述的数据处理装置,其特征在于,所述第一更新模块,在对样本数据的预取数量进行第一更新处理,得到目标预取数量时,用于:
    根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对所述样本数据的预取数量进行第一更新处理,得到目标预取数量。
  12. 根据权利要求11所述的数据处理装置,其特征在于,所述第一更新模块,在根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对所述样本数据的预取数量进行第一更新处理,得到目标预取数量时,用于:
    根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间、内存使用上限阈值、以及所述目标进程对所述深度学习模型进行训练的数据吞吐量,对所述样本数据的预取数量进行第一更新处理,得到所述目标预取数量。
  13. 根据权利要求11或12所述的数据处理装置,其特征在于,所述第一更新模块,在根据所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间以及内存使用上限阈值,对所述样本数据的预取数量进行第一更新处理,得到目标预取数量时,用于:
    在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值的情况下,将所述预取数量增大第一调节步长,得到所述目标预取数量;和/或
    在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间达到所述内存使用上限阈值的情况下,将所述预取数量减小第二调节步长,得到所述目标预取数量。
  14. 根据权利要求13所述的数据处理装置,其特征在于,所述第一更新模块,在在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值的情况下,将所述预取数量增大第一调节步长,得到所述目标预取数量时,用于:
    在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值,且所述目标进程对所述深度学习模型进行训练的数据吞吐量满足预设数据吞吐量条件的情况下,将所述预取数量增大第一调节步长,得到所述目标预取数量。
  15. 根据权利要求14所述的数据处理装置,其特征在于,所述第一更新模块,还用于:
    在所述一个或多个进程对应的预取样本数据队列当前占用的总内存空间未达到所述内存使用上限阈值,且所述数据吞吐量未满足所述预设数据吞吐量条件的情况下,将 所述预取数量减小第三调节步长,得到所述目标预取数量。
  16. 根据权利要求14或15所述的数据处理装置,其特征在于,所述预设数据吞吐量条件,包括下述至少一种:
    所述数据吞吐量的当前数值大于历史数值,其中,所述历史数值为当前迭代训练之前的多个历史迭代训练的所述数据吞吐量的平均值或者为当前迭代训练的前一迭代训练的所述数据吞吐量的数值;或者
    所述数据吞吐量的当前数值大于数据吞吐量阈值。
  17. 根据权利要求10至16中任一项所述的数据处理装置,其特征在于,所述装置还包括:第二更新模块,用于对所述预取数量的调节步长进行第二更新处理,得到目标调节步长,其中,所述目标调节步长用于所述预取数量的下一次更新处理。
  18. 根据权利要求17所述的数据处理装置,其特征在于,所述第二更新模块,在对所述预取数量的调节步长进行第二更新处理,得到目标调节步长时,用于:
    在所述第一更新处理中增大所述预取数量的情况下,增大所述预取数量的调节步长;和/或
    在所述第一更新处理中减小所述预取数量的情况下,减小所述预取数量的调节步长。
  19. 一种计算机设备,其特征在于,包括:处理器、存储介质和总线,所述存储介质存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述处理器与所述存储介质之间通过总线通信,所述机器可读指令被所述处理器执行时执行如权利要求1至9任一所述的数据处理方法。
  20. 一种计算机可读存储介质,其特征在于,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至9任意一项所述的数据处理方法。
  21. 一种计算机程序,所述计算机程序被处理器执行时实现权利要求1至9任意一项所述的数据处理方法。
PCT/CN2020/133647 2019-12-30 2020-12-03 数据处理方法及装置、计算机设备、存储介质、计算机程序 WO2021135810A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR1020217031141A KR20210130796A (ko) 2019-12-30 2020-12-03 데이터 처리 방법과 장치, 컴퓨터 디바이스, 기록 매체 및 컴퓨터 프로그램
JP2021557139A JP2022526333A (ja) 2019-12-30 2020-12-03 データ処理方法と装置、コンピュータデバイス、記録媒体、及びコンピュータプログラム
SG11202110625XA SG11202110625XA (en) 2019-12-30 2020-12-03 Data processing methods and apparatuses, computer devices, storage media and computer programs

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911403669.4 2019-12-30
CN201911403669.4A CN113128531B (zh) 2019-12-30 2019-12-30 一种数据处理方法及装置

Publications (1)

Publication Number Publication Date
WO2021135810A1 true WO2021135810A1 (zh) 2021-07-08

Family

ID=76686451

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/133647 WO2021135810A1 (zh) 2019-12-30 2020-12-03 数据处理方法及装置、计算机设备、存储介质、计算机程序

Country Status (6)

Country Link
JP (1) JP2022526333A (zh)
KR (1) KR20210130796A (zh)
CN (1) CN113128531B (zh)
SG (1) SG11202110625XA (zh)
TW (1) TWI763168B (zh)
WO (1) WO2021135810A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113612773A (zh) * 2021-08-03 2021-11-05 厦门至恒融兴信息技术股份有限公司 人工智能实现的智能报文识别和解析系统与方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530111A (zh) * 2013-08-20 2014-01-22 山东中创软件工程股份有限公司 一种流程定义的获取方法及装置
CN104463324A (zh) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 一种基于大规模高性能集群的卷积神经网络并行处理方法
CN104572205A (zh) * 2015-01-12 2015-04-29 安一恒通(北京)科技有限公司 一种用于软件加速的方法和装置
CN104765760A (zh) * 2015-01-04 2015-07-08 北京可思云海科技有限公司 一种基于json格式的页面生成和显示方法
CN105337896A (zh) * 2014-07-25 2016-02-17 华为技术有限公司 报文处理方法和装置
CN110287010A (zh) * 2019-06-12 2019-09-27 北京工业大学 一种面向Spark时间窗口数据分析的缓存数据预取方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102508638B (zh) * 2011-09-27 2014-09-17 华为技术有限公司 用于非一致性内存访问的数据预取方法和装置
US9098418B2 (en) * 2012-03-20 2015-08-04 Apple Inc. Coordinated prefetching based on training in hierarchically cached processors
CN103902260B (zh) * 2012-12-25 2016-08-31 华中科技大学 一种对象文件系统的预取方法
US10389839B2 (en) * 2016-06-01 2019-08-20 Intel Corporation Method and apparatus for generating data prefetches specifying various sizes to prefetch data from a remote computing node
JP7011146B2 (ja) * 2017-03-27 2022-01-26 富士通株式会社 画像処理装置、画像処理方法、画像処理プログラム、及び教師データ生成方法
US10909038B2 (en) * 2018-12-30 2021-02-02 Chengdu Haiguang Integrated Circuit Design Co. Ltd. Cache management method, cache and storage medium
CN110245094B (zh) * 2019-06-18 2020-12-29 华中科技大学 一种基于深度学习的块级缓存预取优化方法和系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530111A (zh) * 2013-08-20 2014-01-22 山东中创软件工程股份有限公司 一种流程定义的获取方法及装置
CN105337896A (zh) * 2014-07-25 2016-02-17 华为技术有限公司 报文处理方法和装置
CN104463324A (zh) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 一种基于大规模高性能集群的卷积神经网络并行处理方法
CN104765760A (zh) * 2015-01-04 2015-07-08 北京可思云海科技有限公司 一种基于json格式的页面生成和显示方法
CN104572205A (zh) * 2015-01-12 2015-04-29 安一恒通(北京)科技有限公司 一种用于软件加速的方法和装置
CN110287010A (zh) * 2019-06-12 2019-09-27 北京工业大学 一种面向Spark时间窗口数据分析的缓存数据预取方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113612773A (zh) * 2021-08-03 2021-11-05 厦门至恒融兴信息技术股份有限公司 人工智能实现的智能报文识别和解析系统与方法
CN113612773B (zh) * 2021-08-03 2023-06-16 厦门至恒融兴信息技术股份有限公司 人工智能实现的智能报文识别和解析系统与方法

Also Published As

Publication number Publication date
CN113128531B (zh) 2024-03-26
KR20210130796A (ko) 2021-11-01
CN113128531A (zh) 2021-07-16
JP2022526333A (ja) 2022-05-24
TWI763168B (zh) 2022-05-01
TW202125271A (zh) 2021-07-01
SG11202110625XA (en) 2021-10-28

Similar Documents

Publication Publication Date Title
US9483189B2 (en) Systems and methods for scheduling write requests for a solid state storage device
US7568189B2 (en) Code translation and pipeline optimization
Jeong et al. Boosting {Quasi-Asynchronous}{I/O} for Better Responsiveness in Mobile Devices
US20080052716A1 (en) Method and apparatus to control priority preemption of tasks
US20110072171A1 (en) Dma and graphics intervace emulation
CN108292162B (zh) 用于多线程访问的软件定义fifo缓冲器
CN103501327B (zh) 基于移动终端应用的下载方法和装置
JP7513527B2 (ja) 予測ミス回復の待ち時間を短縮するための偶発的な分岐予測の格納
WO2021135810A1 (zh) 数据处理方法及装置、计算机设备、存储介质、计算机程序
US20150195371A1 (en) Changing a cache queue based on user interface pointer movement
EP3252595A1 (en) Method and device for running process
CN109101276A (zh) 在cpu中执行指令的方法
JP2022500749A (ja) フェッチグループのシーケンスのための分岐予測ユニットへのアクセスの制御
CN114911596A (zh) 针对模型训练的调度方法、装置、电子设备和存储介质
JP7269318B2 (ja) 早期リターン予測を有する分岐ターゲットバッファ
JP2014191663A (ja) 演算処理装置、情報処理装置、および演算処理装置の制御方法
CN110806898B (zh) 处理器及指令操作方法
CN109213698B (zh) Vivt缓存访问方法、仲裁单元及处理器
JP2000207224A (ja) ソフトウェアプリフェッチ方法
JP6584655B2 (ja) フリップキューの管理に基づくグラフィックスコンテキストのスケジューリング
JP2008015668A (ja) タスク管理装置
JP6507807B2 (ja) 制御方法、制御装置、及びプログラム
JP6739689B1 (ja) プログラマブルコントローラ
US10942875B2 (en) System and method for regulating host IOs and internal background operations in a storage system
CN117806837A (zh) 一种硬盘任务管理方法、装置、存储介质及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20909949

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021557139

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217031141

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20909949

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20909949

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.01.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20909949

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 521431231

Country of ref document: SA